Nerd Babel

What Did I Buy?

In upgrading from copper to fiber, I’ve been exploring the different options and learning as I go. Some learning curves have been steep, others have been “relearning” what I already knew.

One of the biggest things I needed to learn is that there are “switches” that are actually “routers”. That was mind-bending.

The other is that the network dudes talk about VLAN and Tagged VLAN. They are different things. In the environments I’ve been working in, there are only tagged VLANs which are called “VLAN”. Same name, different meaning.

The starting place when moving from copper to fiber is to understand what a Small Form-Factor Pluggable is. This is the magic that makes it all happen. This is standardized into SFP and SFP+. The SFP standard only supports 1G and lower speeds.

The SFP+ supports higher speed modules. 10G, 25G, 40G and 100G are all standards I’ve seen.

I’m only working with 10G modules, at this time.

They have modules that are RJ45 copper that will run at slower speeds or up to 10G. The only issue is that they draw more power and run hot. Can’t touch them when running hot.

The fix for this is to purchase a switch or router that has RJ45 Ethernet ports and at least one SFP+ port.

I found a small, six port, switch. This comes with 4 RJ45 ports, rated at 2.5G each, and 2 SFP+ ports rated at 10G each. Cool.

This allows me to daisy-chain them if I wanted.

In reality, it meant that I had one host connected at 10G while the others were at 2.5G.

I also found a L2/L3 “switch” that looks much like the switch above.

Having done the upgrades, I started looking into upgrading the router between the outside world and the DMZ. The routers I’ve been getting to not support any crypto, so they don’t have good VPN capability, something I want.

So I went looking. Looking for a “motherboard with SFP”. Something interesting popped. A mini ITX motherboard with 4 SFP+ ports and 4 RJ45 ports along with HDMI, VGA and the standard USB ports. It also provided space for two M.2 SSD modules, 2 DDR4 slots and two 6GByte SATA ports.

It might not be the fastest computer on the block, but it looks like a good starting point.

This leads me to other motherboards of the same ilk. And what I found was a bunch of these motherboards. And the port layouts all look the same. The specifications all look the same.

What we have is a “standard” motherboard which is put in a “standard” case along with a wall wart, HDMI cable and a mounting bracket. The branding stays the same.

I have an L2 switch that I’m going to take apart in a bit. It has a limit of 1550 byte packets, making it useless for my new network. I wonder if I will find an M.2 module in that box or something else that allows me to change the software.

Meanwhile, that motherboard is on my wish list. I’ll load pfSense on it along with FRR and replace my current router. Giving me a considerable boost in capabilities and letting me dispense with the VyOS configuration language. Which I really don’t like.

happy new year 2025 countdown clock on abstract glittering midnight sky with copy space, festive party invitation card concept for new years eve

What Time Is It?

I own a pocket watch. It is beautiful, but I don’t use it very often.

I know that I own a couple of watches. One of them is a battery powered solar recharging thing.

My standard “watch” today is my cell phone.

When I was in high school, I was very interested in accurate time keeping. As was my father.

This meant that we would call the “time” phone number to set our watches, at least once a week.

My grandfather had a “railroad watch”. This was a wristwatch that was approved by the railroad for time keeping. It was approved by the SooLine for use as a time keeping device. Amazing, until that model of watch was approved, the railroad required the use of pocket watches.

This was because a level of accuracy was required that only pocket watches or well regulated wristwatches could maintain.

The big thing in my youth were “quartz” watches. Instead of using a tuning fork or a mechanical balance/regulator, they used the vibrations of a quartz crystal to keep track of the time.

What this meant was that you had devices that were now able to maintain the same wrong time over an extended period of time.

The user had to set them correctly.

As an example, for years, maybe even to today, my wife would set her car clock (and many other clocks) 10 minutes fast. “So she would be on time for appointments.”

I set my car to my phone’s reported time.

One of the fun things that I did as a kid was to call up the Naval Observatory to get the current time. This was reported from their atomic clock. One of the most accurate time keeping devices in the world.

Accurate Time

Many protocols require accurate time. It is wonderful that you have a time piece that is accurate to within 1 second per year, but if it is reporting the wrong time, it is not particularly useful to the protocol.

What we want, is to know what time it is right now, and then to set our time to that.

We get the current time from a known, accurate time source. Today, that is often GPS satellites.

If you have ever wondered how GPS works, it works because your device knows where each satellite is at any instant of time. Each satellite transmits its ID and the current time. Over and over again.

That is all they do.

And here is the magic, if your device knows what time it is, and it knows where the satellite should be at his time, it can calculate the distance by comparing the difference in time.

If you are directly under a GPS satellite, it takes about 67ms for the signal to reach your device. From this, we can use the speed of light to figure out the distance traveled. Then some simple math and we know the location of your device.

We can also get accurate time by listening for the atomic clocks via radio. If you know where you are, and you know where the clock is, you can calculate the delay between the atomic clock and your device, then match your device to the atomic clock.

Today, when people want to use that type of process, they use a GPS device and get the time ticks from the device.

How long did it take?

This is where it starts to get complicated.

The standard for communications with a GPS device is 4800 or 9600 baud across a 3 wire serial connection. The protocol, the text being transmitted, specifies the time when the last character is transmitted.

That data is being received. Your device is processing it. Your device takes a certain amount of time to process the record it just received. It takes time to process that record. All of that is latency.

If you do not know the latency in your device, you do not know what time it is. For grins, just think of that serial link being 300,000,000 meters long. That would put a 1-second latency by itself.

There are ways of calculating the latency, but I do not remember what they are.

Latency is the important piece of information.

Calculating Latency

Many network people have run ping. It is a tool for testing reachability and latency between your device and some other device on the Internet.


ping -c 5 8.8.8.8
PING 8.8.8.8 (8.8.8.8) 56(84) bytes of data.
64 bytes from 8.8.8.8: icmp_seq=1 ttl=116 time=11.1 ms
64 bytes from 8.8.8.8: icmp_seq=2 ttl=116 time=11.1 ms
64 bytes from 8.8.8.8: icmp_seq=3 ttl=116 time=11.6 ms
64 bytes from 8.8.8.8: icmp_seq=4 ttl=116 time=11.1 ms
64 bytes from 8.8.8.8: icmp_seq=5 ttl=116 time=11.0 ms

--- 8.8.8.8 ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 4004ms
rtt min/avg/max/mdev = 11.022/11.179/11.616/0.220 ms

This is a test from one of my faster machines to a Google DNS server. This tells me that it takes 11.179 ms to reach that DNS server. Testing to one of my network timeservers, the average is 78.094 ms.

This means that the time reported by the timeserver will be off by some amount. In a simple world, we would guess that it is 1/2 of 78.094.

But, I use NTP. NTP does many transmissions to multiple timeservers to discover the actual time. It is reporting that the latency is 78.163512 ms. A little more accurate. It tells me that the dispersion is 0.000092 ms.

How does it know this? Because of many samples and because of four different time stamps.

When my device sends an NTP request packet, it puts the current time in it. When the server receives the packet, it puts the current time in it. When the server transmits the response, it adds the current time again. Finally, when the reply is received, the current time is added to the packet. This gives us four different time stamps from two different sources.

We compute the total latency via mine(R)-mine(S). We know the processing time by server(S)-server(R). The difference between server(R)-mine(S) and mine(R)-server(S) as the symmetry between the two paths the request and response traveled.

From these values, we can calculate the network distance, in seconds, between us and them.

Assume we transmit at time 0(M), it is received at 100(S), the response is transmitted at 105(S) and we receive it at 78(M).

How can we receive our reply before the server sent it? Easy, we have to different views of what time it is.

The latency is 78. This means that the halfway point was at 38. It took 5 to process the reply and get it on the wire again. If we do simple stuff, this means that our time is off from their time by 67.

But we can do better. By looking at the reported latency between the two legs, we can actually calculate how long it took for us to receive the reply.

NTP uses multiple timeservers to get a consensus as to the time. It monitors each timeserver to determine which one jitters the least.

All of this means that we can have very accurate times.

And having accurate measurements of the time, NTP will calculate how much the computer’s clock drifts over time. It will then modify the clock rate in parts per million to get the drift as close to zero as possible.

This means, that the longer your device runs NTP, the more accurate it becomes.

chaotic mess of network cables all tangled together

How to you get there from here?

The Internet is a fantastic creature. I’m not speaking of the information you can find on the internet. Nor am I speaking of the entertainment that is available on the Internet.

The mere fact that you can ask for information at your desk or on your phone and somehow that request gets there, and the response gets back, is mind bogglingly complex.

Here is the dirty little secret about computers. It is all zeros and ones. There are no pictures, there are no videos, there are no songs nor even text, it is all zeros and ones.

We group these zeros and ones into units of different sizes. The three primary sizes are 8, 32, and 64, with a spattering of 16. At the lowest level, we think about these in groups of 8, called octets.

You might know them as “Bytes”.

Now, zeros and ones are a bit difficult to read and write. So we use base 16 to read and write bytes.

Base 16 has 16 digits, just like base 10 has 10 digits. 0, 1, 2, 3, 4, 5, 6, 7, 8, and 9. are the digits of base 10.

For base 16, we add A, B, C, D, E, and F as the six extra digits.

So we have a 32-bit number that looks like this: 4C4F5645 in hex (base 16) and 1280267845 in base 10, and “LOVE” as ASCII.

It is all zeros and ones. It takes meaning when we decide how those bits will be interpreted.

When you ask Google to search for “The Vine of Liberty”, your browser starts with a name, which it needs to convert to an address. The name is “www.google.com”. Depending on where you are, one of the addresses will be 142.250.69.68.

This is a different representation of a 32-bit word. In this “dotted quad”, each number represents the decimal value of an 8-bit byte.

For you, the simple household, your device asks, “How can I get this message to 142.250.69.68?”

Your device looks up the address in the “routing table”. Your device likely only has a single entry in the routing table. The route of last resort, or default route.

When no other table entries match, then send the request to a default router

A router has a single job, to move packets (requests and responses) from one network to another. When your default router receives your device’s request, it looks up the IP address (142.250.69.68) in its routing table. Again, it is likely that there is only a single entry in that table, the default route.

This is the simple way that things work in simple networks. It continues to work until the moment when a router has to make a choice. Does it send the packet from network H (your home network) to network A or to network B.

That router will have a routing table. It will find a match for 142.250.69.68 in that table, which will tell that router which network to forward your request to.

If nothing about the Internet ever changed, that would be all that was needed. Every router would know how to get to every address and that would be it.

But it isn’t that easy. The Internet changes, constantly. This means that we need to be able to change those routing tables quickly and easily.

The answer to that issue is a routing protocol. The oldest was RIP. It doesn’t work well today as it sends too much data too often. Back in slower networking times, RIP was taking up nearly 70% of my bandwidth. We stopped that.

There are two major types of routing protocols, external and internal. The primary external protocol, today, is the Border Gateway Protocol, or BGP. I don’t have to worry about that.

What I do need to worry about is internal routing. For internal routing, I use a combination of static routes and OSPF.

And this is where it gets complex. The data center has two physical networks. A management network and a production network.

The management network runs on a single subnet, with each host having a unique address on that subnet.

The production network runs on multiple subnets, each subnet serving to isolate problems. In addition, traffic on the production network needs to be able to reach the Internet.

The management network requires zero routing. One network space. No connection to the outside world.

On top of the physical network are layered multiple other networks. There is the OVN NAS network. This is how each of the hypervisors gets access to block storage (and shared file systems). There is the OVN NAS data network. There is the OVN VM network, the container network.

In addition, there are other networks used inside the container environment.

Some of these networks exist in isolation. Others are used as transport networks. No traffic originates nor terminates in these transport networks.

But other networks need to be able to speak to each other.

That means that every device needs to know how to reach every address. This means that OSPF is doing magic all the time to make things work.

Why? Redundancy. Every device has at least two paths to the next hop. If the primary link fails, the secondary link takes over.

This is done by rebuilding the routing table.

OVN links don’t fail (unless the idiot driving the keyboard does something stupid). The physical network can fail. When this happens, OVN just routes the tunnels in different directions.

So why this rant?

Because I can’t get parts of this to work!

My need is to move the containers into the OVN.

And I can’t get routing to work consistently. ARGH!

Oh well. Filler done.

chaotic mess of network cables all tangled together

You Get What You Pay For

My first fiber switch turned out to be a L3 managed “switch”. Way cool. But I purchased a cheap switch and found that it completely undocumented.

It has taken me a while to figure things out.

The configuration GUI is an What You See Is All You Get type. There is enough there that you can get the switch up and running, but not enough to fully configure the L3 Switch.

To accomplish that, you need to use the CLI. Not a problem, I like CLI’s.

Of course, there is no documentation but for tab completion and very limited help screens.

I get it mostly working.

After playing with the Free Range Routing Suite (FRR) for a while and getting OSPF working on all of my hosts and the primary router, I was feeling pretty confident.

It seems that FRR took their configuration model almost directly from Cisco’s CLI. The number of times I used a Cisco help page to determine how to configure an OSPF setting is remarkable.

The new L3 switch turns out to have a Cisco like configuration language. And what isn’t Cisco like, is FRR like. Neither Cisco nor FRR, but close.

Today I had a tremendous success, I moved a ceph host from the physical network to the OVN network.

This included moving that segment of the network to a new subnet. And everything sort of worked.

The issue turned out to be a routing issue.

The correct answer is to turn on OSPF within the new physical router. It does support it, after all.

Having played with the damn thing for a few hours, breaking my network multiple times, I was about to give up when I happened to notice a strange value for a setting.

That setting? MTU, of course.

Even though every interface shows an MTU of 9000. Even though jumbo frames are turned on and using a 9000 byte frame.

Even though an MTU of 9000 is very much supported, the MTU of the “VLAN” was set to 1500.

Now, Cisco VLANs are not the same as a tagged VLAN. A tagged VLAN acts like a separate physical network. They are where you place interface settings. These VLANs can then be assigned to a physical port.

The physical port’s MTU overrides the VLAN MTU. This means my jumbo packets from host to host work.

The problem is that the VLAN MTU is maxed out at 2000 bytes. This seems to only affect the OSPF traffic and not the physical interface. But I’m dead in the water or I need to figure out how to do this differently.

Still, I didn’t pay an arm plus a leg for this physical router. I’ll get it to work.

chaotic mess of network cables all tangled together

Maximum Transmission Unit (MTU)

In 1983, CCITT and ISO merged their network definition to create The Basic Reference Model for Open Systems Interconnection.

This is the “famous” seven layer model. Which works for ISO standards but is a poor match for the Internet.

The three layers we are interested in are:

  1. Physical layer
  2. Data link layer
  3. Network layer

1 Physical Layer

The physical layer defines the electrical, mechanical, and procedural interface to the transmission medium. WTF?

Ok, let’s look at this in terms of some real examples. If you have a computer that is more than a few years old, it will have a network connection in it or a port that a network connection can be attached to.

The most common mechanical connection, the socket and connector, is the RJ-45. This is the thing that looks like a big telephone connector. Oh yeah, many of the youngsters don’t remember every plugging a phone into the wall.

This connector consists of 8 connectors. The location and form of these connectors defines part of the mechanical system.

The other part is that those 8 connectors are attached to four pairs of wires. The pairs of wire are twisted and bundled into a single cable. Each of the 8 wires are numbered, and the mechanical definition of the RJ-45 defines which wires are attached to which connector, at both ends.

When I say “numbered”, the physical reality is that the wires are color coded.

The electrical definition defines which wires are used for transmitting and which are used for receiving. It defines if the signals are ground referenced or differences between two wires.

Everything about how to connect the physical devices and how to transmit a signal are specified at Layer 1, the physical layer.

2 Data Link Layer

This layer defines how data is transmitted over the L1 physical network. It defines what how to use the physical layer.

For example, Frame Relay is a data link protocol for connecting distant devices. Each Protocol Data Unit (PDU), consists of a flag field, an address field, an information field, and a frame check sequence, or checksum field.

The information field contains the actual data (information) that is being transmitted.

The Frame Relay standard states that the information field must be at least 262 octets (bytes) and recommends that it support at least 1600 octets.

It is important to note that a length of 262 cannot be (easily) expressed in a single byte. This means that the length field must be at least 2 bytes wide.

While Frame Relay is still in use, today, it is not as common as it used to be. There are better options.

A much more common L2 protocol is Ethernet. This is called a Frame. The Frame consists of a preamble, start frame delimiter, destination address, src address, tag (or zeros), type or length, payload, CRC and a gap.

As originally defined, an Ethernet packet had a maximum length of 1500 octets.

Packet Size

In networking, we talk about sending a packet. A packet is a more generic term for “frame”. We have packets at the data link layer and at the network layer.

Every packet contains enough information to identify the source and destination of the packet, the length of the packet, and the payload. There will often be a header to identify more about the type of the packet.

As a packet moves through a network, it might be “fragmented” as it passes through a network segment which has an MTU smaller than the packet size.

There must be enough information to reconstruct the packet, even when the packet has become fragmented.

Fragmenting is something we want to avoid, if possible.

To that end, a part of the connection process is to discover the MTU for each device.

Consider a simple network segment. A network segment is a piece of the network that is connected at L2.

We have devices A and B. Device A is using a fiber physical layer and device B is using a copper physical layer. B is attached to switch 2, switch 2 is connected to switch 1, and switch 1 is connected to device A.

If all four devices are using old style Ethernet frames, then the MTU will default to 1500. A simple database backup is 3.3 GB. This means we will have to transmit at least 2,305,845 packets.

This requires each device to handle 2.3 million interrupts.

On the other hand, if we were to use jumbo packets, then we reduce this to around 384,307 packets. This is a huge savings in load on the network segment.

The two switches, as L2 devices, are going to either be store and forward switches, or simple hubs. Nobody uses hubs anymore. So they must be switches.

Each switch receives the packet, storing it, then transmits that packet on a different port.

The switch must be able to store the complete packet/frame. If it can not, it will drop the packet.

When designing your network, you want to make sure that all the switches on the network support the largest MTU you might be using.

Devices A and B will discover what their MTUs are. The smaller will rule. The switches, on the other hand, are transparent. They do not get a say in the MTU discovery.

What this means, is that you can have devices on the network that respond to simple testing, such as sending pings, but which fail for larger packets.

Conclusion of Rant

I accidentally purchased a switch (L2) when I was intending to purchase a router (L3).

This should not have been an issue. I intended to use some switches, regardless.

The specifications look good. MTU is documented as 12000.

I plug everything together and start testing. My first network test is always “ping”. If ping isn’t working, nothing else will work well enough.

That worked perfectly.

Then I attempted to login to the remote site using SSH. This silently failed, before timing out with destination unreachable.

Ping works, SSH doesn’t?

This makes no sense.

Until I found it. SSH does a key exchange with my RSA public key. The key size is 1679 bytes. This is larger than the supported MTU of switch 2 at 1500.

The network fails, silently.

So I have email out to the manufacturer, hoping for a positive response.

Cybersecurity IT engineers are working on protecting networks from cyber attacks from hackers on the Internet. Secure access to online privacy and personal data protection

There is a reason…

The problem that people have been attempting to solve, for years, is the lack of space in the IPv4 addresses space.

There are currently more devices attached to the Internet or “the network” than there are addresses in the IPv4 space. This requires address overlap.

The smallest section of a network is the “subnet”. A subnet can hold anywhere from 2 to over a million devices.

Consider a small business network. They have three networks, a network that is connected to the Internet, labeled DMZ, a network for the security cameras, labeled CCTV, and the working network, labeled Internal.

They have a router between the Internal network and the DMZ. There is another router that takes traffic from the DMZ and transfers it to the Internet.

The CCTV network does not need to ever touch the DMZ network, nor does it really need to touch the Internal network. So they run a completely separate physical network so that CCTV traffic is never available on the Internal or DMZ networks.

This could become costly. Consider a situation where you need to connect multiple buildings. Maybe some of those buildings can be connected with fiber, but others are using radio links. Radio links are expensive.

The traffic is low enough that there is no justification for a second radio link. Besides, it is difficult to run two radio links side-by-side.

The solution that was implemented is the Virtual LAN, or VLAN.

When you define a VLAN, you set a tag in the Ethernet frame, identifying which VLAN this frame belongs to. Now, we can put all the CCTV traffic on a VLAN and use the same physical network as we use for the Internal network. All is good.

This isn’t a complete solution, it is possible to configure a network card to listen to a particular VLAN, even if that device isn’t supposed to be on the VLAN. It is also another configuration point which smaller devices might not support.

As an example, I’ve never found a method to put my cell phone on a particular VLAN. It is likely possible, I’ve just never found it.

Same with my CCTV cameras. They exist only on the default, untagged, network.

One of the very nice parts of using a VLAN, is that you can have overlapping address space. I can have 192.168.88.0/24 on the physical network and 192.168.89.0/22 on the same physical network but with a VLAN tag of 87. They are overlapping address spaces, but they do not interfere with each other.

The solution was to allow a L2 switch port to be tagged. Now, by device which only uses the untagged frame can be plugged into a tagged port. All traffic coming from that port will have a VLAN tag added to it. All traffic sent to that port will have the VLAN tag stripped from it.

This means that a CCTV device sends and receives on the default (no tag) network. It reaches the switch and the packet is now on a VLAN. Another device on the Internal network is also on the same VLAN. That device, a monitoring station, can now see the CCTV footage.

If a port receives a frame that is tagged, it drops the frame. This keeps VLANs from leaking from their approved segment.

If there is a need for a port to accept multiple VLANs, it is configured as a trunk.

Thursday, I attempted to move ceph to an OVN network. This would eliminate the need for a VLAN and would give me a single subnet across multiple physical subnets. It failed.

Friday, I attempted to put a new L2 switch into place. The good news was that I didn’t need to break my entire network to do the testing.

The test computer has two NIC’s. One is connected to the management physical network. The other to the back plane network. I was able to establish a connection to the management port.

Once there, I could establish that I had full bandwidth to other nodes on the physical network, using the physical subnet. I could even reach multiple subnets using that same interface.

Then I tried the VLAN. The VLAN failed. There was no network traffic passing through.

It also looks like they do not have a large enough MTU.

Conclusion

I’m still black boxing this thing. It has been a painful trip. I have more than a few more tests to run. It is just overly painful trying to get there.

chaotic mess of network cables all tangled together

Are Those Level 4 Plates? (I wish, Nerd Bable)

Sunday was supposed to be the day I migrated a couple of machines. I have a new physical device which is described as a Level 2 switch with SFP+ ports.

The idea is to replace my small mixed routers, 2 SFP+ ports plus some RJ45 ports with either a L2 SFP+ only switch or an L3 SFP+ only routers. This allows me to move some servers around and to increase the bandwidth from nodes to the backbone.

The switch arrived with a nice little instruction manual which claims I can find a web interface at 192.168.2.1 while the website claims there is no management interface.

Plugging it into an Ethernet port with an Ethernet SFP module gives me nothing on 192.168.2.1 and nothing on 192.168.2.x/24 but for my machine. It looks like it is unmanaged.

This means, it should be a simple plug in replacement for my tiny switch, giving an upgraded data path to the backbone.

It didn’t work.

So now I have to do some more testing. I’ll figure this out, one way or another, but it is another bottleneck in my path to full conversion to fiber from copper.

bottleneck, bottle opening, glass

Why Is It So Slow? Or How Many Bottlenecks?

My mentor, Mike, use to say “There is always a bottleneck.”

What he meant by this, was that for any system, there will be a place which limits the throughput. If you can find, and eliminate, that bottleneck, then you can improve the performance of the system. Which will then slam into the next bottleneck.

Consider this in light of traffic. It is obvious to everybody, because it happens every day, that traffic does a massive slowdown just past the traffic signal where the road goes from four lanes to two. That is the point which we want to optimize.

The state comes out, evaluates just how bad the bottleneck is. The money people argue, and 15 years later they widen the road.

They widen the road between the first and second signal. Traffic now clears the first traffic signal with no issues.

And the backup is now just past the second signal, where the road narrows again.

We didn’t “solve” the bottleneck, we just moved it.

With computers, there are many bottlenecks that are kept in balance. How fast can we move data to and from the network, how fast can we move data to and from mass storage, how fast can we move data from memory? These all balance.

As a concrete example, the speed of memory is not fixed at the speed of the socket. If there are more memory lanes or wider memory lanes, you can move data faster.

If you have a fast CPU, but it is waiting for data from memory, it doesn’t matter. The CPU has to be balanced against the memory speed.

My mentor was at a major manufacturer, getting a tour and an introduction to their newest machine. He had an actual application that could also be used for benchmarking. One of the reasons it was a powerful benchmarking tool, was that it was “embarrassingly parallel”.

In other words, if it had access to 2 CPUs, it would use them both and the process would run twice as fast. 8 CPUs? 8 times as fast. Since the organization he worked for purchased many big computers (two Crays), and he was the go-to guy for evaluating computers, his opinion meant something.

He ran his code on a two CPU version, found it adequate. Requested to look at the actual designs for the machines. He spent an hour or two pouring over the design documents and then said.

“We want an 8 CPU version of this. That will match the compute (CPU) power to the memory bandwidth.”

The company wasn’t interested until they understood that the customer would pay for these custom machines.

Six months later, these 8 custom machines were in the QA bay being tested when another customer came by and inquired about them.

When they were told they were custom-builds, they pulled rank and took all 8 of them and ordered “many” more.

What happened, was that my mentor was able to identify the bottleneck. Having identified it, he removed that bottleneck by adding more CPUs. The new bottleneck was no longer the lack of compute power, it was memory access speed.

The Tight Wire Balancing Act

I deal with systems of systems. It is one of the things that I was trained in. I.e., actual classes and instruction.

Most people have no idea of how complex a modern Internet service is. I.e., a website.

This site is relatively simple. It consists of a pair of load balancers sitting in front of an ingress server. The ingress server runs in a replicated container on a clustered set of container servers. The application has a web service provider that handles assets and delegates execution to an execution engine.

This runs a framework (WordPress) under PHP. On top of that is layered my custom code.

The Framework needs access to a database engine. That engine could be unique to just this project, but that is a waste of resources and does not allow for replication. So the DB Engine is a separate system.

The DB could run as a cluster, but that would slow it down and adds a level of complexity that I’m not interested in supporting.

The DB is then replicated to two slaves with constant monitoring. If the Master database engine goes offline, the monitors promote one of the slaves to be the new master. It then isolates the old master so it does not think it is the master anymore.

In addition, then non promoted slave is pointed at the new master to replicate.

I wish it was that simple, but the monitors also need to reconfigure the load balancers to direct database traffic to the new master.

And all of this must be transparent to the website.

One of the issues I have been having recently, is that in the process of making the systems more reliable, I’ve been breaking them. It sounds stupid, but it happens.

So one of the balancing acts, is balancing redundancy against complexity, against security.

As another example, my network is physically secured. I am examining the option of running all my OVN tunnels over IPsec. This would encrypt all traffic. This adds a CPU load. How much will IPsec “cost” on a 10 Gigabit connection.

Should my database engines be using SSD or rust? Should it be using a shared filesystem, allowing the engine to move to different servers/nodes?

It is all a balancing act.

And every decision moves the bottlenecks.

Some bottlenecks are hard to spot. Is it a slow disk or is it slow SATA links or is it slow network speed?

Is it the number of disks? Would it be faster to have 3 8TB drives or 2 12TB drives? Or maybe 4 6TB drives? Any more than 4 and there can be issues.

Are we CPU bound or memory bound? Will we get a speedup if we add more memory?

Conclusion

I ave so many bottles in the air I can’t count them all. It requires some hard thinking to get all the infrastructure “right”

chaotic mess of network cables all tangled together

One Step Forward, ??? Steps Back

Networking used to be simple. It is unclear to me why I think that. Maybe because when I started all of this, it was simple.

Networks are broken down into two major classes, Point-to-Point (P2P) or broadcast. When you transmit on a P2P port, the data goes to a dedicated port on the other side of a physical link. There it comes out.

Each port is provided an IP address. A routing table tells the router which port to transmit on to reach a particular network. A router works in a store and forward procedure. It reads the entire packet from a port, then retransmits that packet, modified as needed, on a different port.

A broadcast network is one where multiple devices are connected to a single physical network. What is transmitted on the link is heard by all the other nodes on the same physical network.

Originally, that physical network was a switch. Your network card would connect to a switch, the switch then transmits everything it receives on one port to all other ports.

Switches could be connected to each other. The only requirement was that of time. The amount of time it takes for a packet to travel from one end of the physical network to the other was limited. If it took more time than that limit, the network became unstable.

This concept of everything going back to a single switch was expensive. The cabling was expensive, the switch was expensive, the network card was expensive. A working network started at around $50,000. $30K for the switch, $10K for each network card. Hundreds of dollars for cabling.

The original Internet protocol was only going to have addressing for 65,000 machines. How many machines would be network attached if each site required $50k just to get one or two machines hooked up. We compromised at 4 billion.

We are working on getting everything on IP version 6 with 18,446,744,073,709,551,616 IP addresses. I think somebody told me that that is enough addresses for every atom in the known universe to have an IPv6 address.

From those expensive switches, we moved to 2-base-10 and “thick” Ethernet. These had the same limitations, but the costs were starting to come down. Something around $1000 to get into thick net and a few hundred to get into thin net.

Routers were still expensive. With the advent of 10baseT, we saw costs drop again. You could get an Ethernet hub for under a hundred dollars. Routers were only a few thousand. The world was good.

The other day I purchased an 8 port 10 Gigabit router for under a hundred dollars. It has 160 Gigabit internal switching. This means it can move 10 Gigabit per second from and to every port.

It cost less than $35 for two fiber transceivers. It cost around $33 for an Intel-based NIC capable of 10 Gigabits.

This means that I can upgrade a server to 10 Gibibit capability for around $60. Not bad.

A Step Forward

My data center was rather small. It was set up as a single /23 (512 addresses) connected via L2 switches. The switches were all one Gigabit copper.

You can buy 10 Gigabit L2 switches, but they are either copper, with limited distances and a need for high-quality cabling, or they are expensive.

Moving to an L3 device got me a better price and more features.

Moving to an L3 router gave me some more options. One of the big ones is the ability to have multiple paths to each device to provide high availability.

This requires that each node have multiple network interfaces and multiple routers and switchers. With the routers being cross connected, with each node being able to handle multi-path communications.

This is the step forward.

A step backwards

This High Availability (HA) solution requires multi-path capabilities. This is not always available for every piece of software. I want to keep things simple.

A Solution

A solution is to move from a physical network with multiple paths and redundant capabilities to virtual networking.

Each node will have two physical network interfaces. The interfaces will route using OSPF. This is a quick response system that will find other paths if one link or router fails. This provides the HA I want for the network.

Each node will have two VPCs for the ceph cluster, one or more VPC for each container system, and one or more VPC for each VM cluster. A VPC is a “virtual private cloud” It is a virtual network with only allowed traffic.

You can have multiple networks on a single physical network. For example, you can have 192.168.0.0/24 be your “regular” subnet and 172.16.5.0/24 be your data plane subnet. A network interface configured as 192.168.0.7 will only “hear” traffic on subnet 192.168.0.0/24.

But you can configure a network interface to hear every packet. Allowing a node to “spy” on all traffic.

With a VPC, there is only subnet 192.168.0.0/24 on the one VPC and only 172.16.5.0/24 on the other. Packets are not switched from one VPC to the other. You need a router to move data from one VPC to another. And the two VPCs must have different subnets; otherwise the router doesn’t know what to do.

OVN Logical Switch

It turns out that a VPC is the same as an OVN logical switch. Any traffic on one logical switch is restricted to that switch. You need to send traffic to a logical router to get the traffic in or out of the VPC.

Since the traffic is going through a router, that router can apply many filters and rules to protect the VPC from leaking data or accepting unwanted data.

I configured 4 VPCs for testing. DMZ is part of the physical network. Any virtual port on the DMZ VPC is exposed to traffic on the physical network. This is how traffic can enter or exit the virtual clouds.

The second VPC is “internal”. This is a network for every physical node to exist. By using the internal VPC, each node can communicate with each other, regardless of the physical topology.

That was working.

There was a data plane VPC and a management VPC. Those VPCs were connected to the DMZ through a router. The router is distributed across multiple nodes. If one node goes down, the other node is ready to take up the traffic.

Falling way back

I now have a VPC for testing. The idea is to test everything extensively before moving any nodes to the virtual network. I need to be able to reboot any node and have everything still function.

The VPC came up perfectly. My notes made it easy to create the VPC and configure it.

The problem began when I added a router to the VPC.

Now I can’t get traffic to flow to the VPC.

WTF?

Damaged Hardware Equipment In Dustbin

Bad Hardware Design

I have had good luck with picking up discarded computers, upgrading them, and making them functional members of the computer or services farm.

A computer consists of persistent storage (disk drives and SSD), dynamic storage (memory), a processor (CPU), and I/O devices.

Data is read from disk into memory, the processor then either executes it or processes it, the results are sent to an output devices. I/O devices allow the input from disks, keyboards, persistent storage devices, networks or other devices. They also send output to video devices, networks, printers, and storage devices.

The thing that defines how a computer can be configured is the motherboard. The motherboard accepts one or more processors, one or more memory devices, one or more I/O devices.

Some motherboards come with built-in I/O devices. For example, A motherboard will come with built-in disk controllers, sound cards, video drivers, USB controllers, P/S-2 keyboard and mouse, serial drivers and many more. These are the connectors that you see on the back of your computer or elsewhere on the case.

Many of these drivers lead to a connector or a socket. If your motherboard has SATA disk controllers, there will be SATA connectors on the motherboard. If your motherboard has built-in video, the back will have an ISA video connector and/or an HDMI connector. It might have a DVI connector as well.

The covers most of what you find on the motherboard. The rest are the important sockets.

There will normally be extension slots. These are where you would plug in extra I/O devices, such as network cards, disk controllers, or video cards. There will normally be memory slots. Depending on the amount of memory supported by the CPU and motherboard, this could be two, four, eight, or even more. Finally, there is normally a socket for the CPU.

For me, I have found that the cheapest way to upgrade a computer is to give it more memory. Most software is memory intensive. If you exceed the amount of memory in your machine, your machine has to make space for the program you want to run. Then it has to read into memory, from disk, the program or its data before it can continue.

The more memory, the less “paging” needs to happen.

Upgrading the CPU is another possibility. This is normally a fairly reasonable thing to do. Consider an AMD Ryzen 7 3700, which is the CPU in one of my machines. It runs $150 on Amazon, today. I purchased it for $310 a few years ago.

Today, I can upgrade to a Ryzen 9 5950x from a Ryzen 7 3700x for $350.

Buying the latest and greatest CPU is expensive. Buying second tier, older CPUs is much more price effective.

The motherboard in this particular server is nearing its end of life. It has an AM4 socket, which has been replaced with the AM5 socket. This means it is unlike that any “new” CPUs will be released for the AM4.

Bad Design

The first place I see bad computer designs is in the actual case. This is not as bad as it used to be. It used to be that opening an HP case was sure to get you sliced up. Every edge was razor sharp.

The next major “bad design” is a case and motherboard combination which is non-standard. The only motherboard that will ever fit in that case is a motherboard from that company. Likely the only place to get such a motherboard is from E-Bay.

The next issue is when there are not enough memory slots, or worse, not enough memory addressing lines. Apple was actually famous for this.

In the old days, Apple used a 68020 class CPU. The CPU that they were using had a 32-bit address register. This is 4 Gigabytes of addressing. More than enough for the time period. Except…

Apple didn’t use all 32 bits, they only used 24 bits, leaving 8 bits unused. This gives 16 Megabytes of addressable memory. More than enough in a time period where people still remembered Billy saying “Nobody will ever need more than 640 Kilobytes of memory”.

Apple made use of the extra 8 bits in the address register for “Handles”. Not important.

Most CPUs today use a 64-bit address registers. I don’t know of a CPU that uses all 64 bits for addressing.

Which takes us to bad designs, again. Some motherboards only bring enough address lines to the memory slots to handle what is the “largest” memory card currently available. This means that you can have slots that support 16 Gigabyte DIMMs, but the motherboard only supports 4 Gigabyte DIMMs.

Often, it is worse. Cheaper motherboards will only have 2 DIMM slots. There is nothing more frustrating than having a machine with 8 GB of memory and finding out that it isn’t one 8 GB DIMM leaving room for another 8 GB, but instead two 4 GB DIMMs. Which means that when you receive that 8 GB DIMM you have 12 GB total instead of the goal of 16 GB, and you have a 4 GB DIMM that isn’t good for anything.

Sub Conclusion

If you want to be able to upgrade your computer, buy a motherboard with the latest socket design. AMD or Intel. Buy one that has enough DIMM slots to handle 4 times the amount of memory you think you are going to need. Buy a CPU that is at 1/4 to 1/3 the price of the top-tier CPU. Depending on the release date, maybe even less than that.

Make sure it has a slot for your video card AND having one PCIe-16 slot still open. You might never use it, but if you need it, you will be very frustrated at saving yourself $10.

Source of the rant

My wife is using an employer supplied laptop for her work. All of her personal work has to be done on her phone. With the kids off to university, their old HP AIO computer is available.

The only problem is that word “OLD”. A quick online search shows that I should be able to upgrade the memory from 4 GB to 16 GB and the CPU from an old Intel to an i7 CPU. This means that I can bring this shell back to life for my wife to use.

At the same time, I intend to replace a noisy fan.

Looking online, the cost of a replacement CPU will be $25. The cost of the memory, another $25. Plus $25 for a new keyboard and mouse combination. $75 for a renewed computer. Happiness exists.

Before I order anything, I boot into my Linux “rescue/install” USB thumb drive. I run lscpu and it spits out the CPU type. Which is AMD. AMD sockets do NOT support i7 CPUs. This means that my online research does not match what my software is saying. I trust the software more than the research.

Turns out that there are two versions of this particular All In One model. One is AMD-based, the other is Intel-based. The Intel-based version has a socketed CPU. The AMD version has the CPU soldered into place. It cannot be upgraded.

These maroons have rendered this machine locked in the past. With no way to upgrade the CPU, it is too slow for today’s needs. Even with maximum memory.

Conclusion

An old computer is sometimes garbage. Put it out of your misery. Use it for target practice or take it to the dump.