Nerd Babel – Page 5 – The Vine of Liberty

chaotic mess of network cables all tangled together

Nerd Babel

How to you get there from here?

Dec 19, 2024 Chris Johnson

The Internet is a fantastic creature. I’m not speaking of the information you can find on the internet. Nor am I speaking of the entertainment that is available on the Internet.

The mere fact that you can ask for information at your desk or on your phone and somehow that request gets there, and the response gets back, is mind bogglingly complex.

Here is the dirty little secret about computers. It is all zeros and ones. There are no pictures, there are no videos, there are no songs nor even text, it is all zeros and ones.

We group these zeros and ones into units of different sizes. The three primary sizes are 8, 32, and 64, with a spattering of 16. At the lowest level, we think about these in groups of 8, called octets.

You might know them as “Bytes”.

Now, zeros and ones are a bit difficult to read and write. So we use base 16 to read and write bytes.

Base 16 has 16 digits, just like base 10 has 10 digits. 0, 1, 2, 3, 4, 5, 6, 7, 8, and 9. are the digits of base 10.

For base 16, we add A, B, C, D, E, and F as the six extra digits.

So we have a 32-bit number that looks like this: 4C4F5645 in hex (base 16) and 1280267845 in base 10, and “LOVE” as ASCII.

It is all zeros and ones. It takes meaning when we decide how those bits will be interpreted.

When you ask Google to search for “The Vine of Liberty”, your browser starts with a name, which it needs to convert to an address. The name is “www.google.com”. Depending on where you are, one of the addresses will be 142.250.69.68.

This is a different representation of a 32-bit word. In this “dotted quad”, each number represents the decimal value of an 8-bit byte.

For you, the simple household, your device asks, “How can I get this message to 142.250.69.68?”

Your device looks up the address in the “routing table”. Your device likely only has a single entry in the routing table. The route of last resort, or default route.

When no other table entries match, then send the request to a default router

A router has a single job, to move packets (requests and responses) from one network to another. When your default router receives your device’s request, it looks up the IP address (142.250.69.68) in its routing table. Again, it is likely that there is only a single entry in that table, the default route.

This is the simple way that things work in simple networks. It continues to work until the moment when a router has to make a choice. Does it send the packet from network H (your home network) to network A or to network B.

That router will have a routing table. It will find a match for 142.250.69.68 in that table, which will tell that router which network to forward your request to.

If nothing about the Internet ever changed, that would be all that was needed. Every router would know how to get to every address and that would be it.

But it isn’t that easy. The Internet changes, constantly. This means that we need to be able to change those routing tables quickly and easily.

The answer to that issue is a routing protocol. The oldest was RIP. It doesn’t work well today as it sends too much data too often. Back in slower networking times, RIP was taking up nearly 70% of my bandwidth. We stopped that.

There are two major types of routing protocols, external and internal. The primary external protocol, today, is the Border Gateway Protocol, or BGP. I don’t have to worry about that.

What I do need to worry about is internal routing. For internal routing, I use a combination of static routes and OSPF.

And this is where it gets complex. The data center has two physical networks. A management network and a production network.

The management network runs on a single subnet, with each host having a unique address on that subnet.

The production network runs on multiple subnets, each subnet serving to isolate problems. In addition, traffic on the production network needs to be able to reach the Internet.

The management network requires zero routing. One network space. No connection to the outside world.

On top of the physical network are layered multiple other networks. There is the OVN NAS network. This is how each of the hypervisors gets access to block storage (and shared file systems). There is the OVN NAS data network. There is the OVN VM network, the container network.

In addition, there are other networks used inside the container environment.

Some of these networks exist in isolation. Others are used as transport networks. No traffic originates nor terminates in these transport networks.

But other networks need to be able to speak to each other.

That means that every device needs to know how to reach every address. This means that OSPF is doing magic all the time to make things work.

Why? Redundancy. Every device has at least two paths to the next hop. If the primary link fails, the secondary link takes over.

This is done by rebuilding the routing table.

OVN links don’t fail (unless the idiot driving the keyboard does something stupid). The physical network can fail. When this happens, OVN just routes the tunnels in different directions.

So why this rant?

Because I can’t get parts of this to work!

My need is to move the containers into the OVN.

And I can’t get routing to work consistently. ARGH!

Oh well. Filler done.

Nerd Babel

You Get What You Pay For

Dec 9, 2024 Chris Johnson

My first fiber switch turned out to be a L3 managed “switch”. Way cool. But I purchased a cheap switch and found that it completely undocumented.

It has taken me a while to figure things out.

The configuration GUI is an What You See Is All You Get type. There is enough there that you can get the switch up and running, but not enough to fully configure the L3 Switch.

To accomplish that, you need to use the CLI. Not a problem, I like CLI’s.

Of course, there is no documentation but for tab completion and very limited help screens.

I get it mostly working.

After playing with the Free Range Routing Suite (FRR) for a while and getting OSPF working on all of my hosts and the primary router, I was feeling pretty confident.

It seems that FRR took their configuration model almost directly from Cisco’s CLI. The number of times I used a Cisco help page to determine how to configure an OSPF setting is remarkable.

The new L3 switch turns out to have a Cisco like configuration language. And what isn’t Cisco like, is FRR like. Neither Cisco nor FRR, but close.

Today I had a tremendous success, I moved a ceph host from the physical network to the OVN network.

This included moving that segment of the network to a new subnet. And everything sort of worked.

The issue turned out to be a routing issue.

The correct answer is to turn on OSPF within the new physical router. It does support it, after all.

Having played with the damn thing for a few hours, breaking my network multiple times, I was about to give up when I happened to notice a strange value for a setting.

That setting? MTU, of course.

Even though every interface shows an MTU of 9000. Even though jumbo frames are turned on and using a 9000 byte frame.

Even though an MTU of 9000 is very much supported, the MTU of the “VLAN” was set to 1500.

Now, Cisco VLANs are not the same as a tagged VLAN. A tagged VLAN acts like a separate physical network. They are where you place interface settings. These VLANs can then be assigned to a physical port.

The physical port’s MTU overrides the VLAN MTU. This means my jumbo packets from host to host work.

The problem is that the VLAN MTU is maxed out at 2000 bytes. This seems to only affect the OSPF traffic and not the physical interface. But I’m dead in the water or I need to figure out how to do this differently.

Still, I didn’t pay an arm plus a leg for this physical router. I’ll get it to work.

Nerd Babel

Maximum Transmission Unit (MTU)

Dec 5, 2024 Chris Johnson

In 1983, CCITT and ISO merged their network definition to create The Basic Reference Model for Open Systems Interconnection.

This is the “famous” seven layer model. Which works for ISO standards but is a poor match for the Internet.

The three layers we are interested in are:

Physical layer
Data link layer
Network layer

1 Physical Layer

The physical layer defines the electrical, mechanical, and procedural interface to the transmission medium. WTF?

Ok, let’s look at this in terms of some real examples. If you have a computer that is more than a few years old, it will have a network connection in it or a port that a network connection can be attached to.

The most common mechanical connection, the socket and connector, is the RJ-45. This is the thing that looks like a big telephone connector. Oh yeah, many of the youngsters don’t remember every plugging a phone into the wall.

This connector consists of 8 connectors. The location and form of these connectors defines part of the mechanical system.

The other part is that those 8 connectors are attached to four pairs of wires. The pairs of wire are twisted and bundled into a single cable. Each of the 8 wires are numbered, and the mechanical definition of the RJ-45 defines which wires are attached to which connector, at both ends.

When I say “numbered”, the physical reality is that the wires are color coded.

The electrical definition defines which wires are used for transmitting and which are used for receiving. It defines if the signals are ground referenced or differences between two wires.

Everything about how to connect the physical devices and how to transmit a signal are specified at Layer 1, the physical layer.

2 Data Link Layer

This layer defines how data is transmitted over the L1 physical network. It defines what how to use the physical layer.

For example, Frame Relay is a data link protocol for connecting distant devices. Each Protocol Data Unit (PDU), consists of a flag field, an address field, an information field, and a frame check sequence, or checksum field.

The information field contains the actual data (information) that is being transmitted.

The Frame Relay standard states that the information field must be at least 262 octets (bytes) and recommends that it support at least 1600 octets.

It is important to note that a length of 262 cannot be (easily) expressed in a single byte. This means that the length field must be at least 2 bytes wide.

While Frame Relay is still in use, today, it is not as common as it used to be. There are better options.

A much more common L2 protocol is Ethernet. This is called a Frame. The Frame consists of a preamble, start frame delimiter, destination address, src address, tag (or zeros), type or length, payload, CRC and a gap.

As originally defined, an Ethernet packet had a maximum length of 1500 octets.

Packet Size

In networking, we talk about sending a packet. A packet is a more generic term for “frame”. We have packets at the data link layer and at the network layer.

Every packet contains enough information to identify the source and destination of the packet, the length of the packet, and the payload. There will often be a header to identify more about the type of the packet.

As a packet moves through a network, it might be “fragmented” as it passes through a network segment which has an MTU smaller than the packet size.

There must be enough information to reconstruct the packet, even when the packet has become fragmented.

Fragmenting is something we want to avoid, if possible.

To that end, a part of the connection process is to discover the MTU for each device.

Consider a simple network segment. A network segment is a piece of the network that is connected at L2.

We have devices A and B. Device A is using a fiber physical layer and device B is using a copper physical layer. B is attached to switch 2, switch 2 is connected to switch 1, and switch 1 is connected to device A.

If all four devices are using old style Ethernet frames, then the MTU will default to 1500. A simple database backup is 3.3 GB. This means we will have to transmit at least 2,305,845 packets.

This requires each device to handle 2.3 million interrupts.

On the other hand, if we were to use jumbo packets, then we reduce this to around 384,307 packets. This is a huge savings in load on the network segment.

The two switches, as L2 devices, are going to either be store and forward switches, or simple hubs. Nobody uses hubs anymore. So they must be switches.

Each switch receives the packet, storing it, then transmits that packet on a different port.

The switch must be able to store the complete packet/frame. If it can not, it will drop the packet.

When designing your network, you want to make sure that all the switches on the network support the largest MTU you might be using.

Devices A and B will discover what their MTUs are. The smaller will rule. The switches, on the other hand, are transparent. They do not get a say in the MTU discovery.

What this means, is that you can have devices on the network that respond to simple testing, such as sending pings, but which fail for larger packets.

Conclusion of Rant

I accidentally purchased a switch (L2) when I was intending to purchase a router (L3).

This should not have been an issue. I intended to use some switches, regardless.

The specifications look good. MTU is documented as 12000.

I plug everything together and start testing. My first network test is always “ping”. If ping isn’t working, nothing else will work well enough.

That worked perfectly.

Then I attempted to login to the remote site using SSH. This silently failed, before timing out with destination unreachable.

Ping works, SSH doesn’t?

This makes no sense.

Until I found it. SSH does a key exchange with my RSA public key. The key size is 1679 bytes. This is larger than the supported MTU of switch 2 at 1500.

The network fails, silently.

So I have email out to the manufacturer, hoping for a positive response.

Cybersecurity IT engineers are working on protecting networks from cyber attacks from hackers on the Internet. Secure access to online privacy and personal data protection

Nerd Babel

There is a reason…

Dec 1, 2024 Chris Johnson

The problem that people have been attempting to solve, for years, is the lack of space in the IPv4 addresses space.

There are currently more devices attached to the Internet or “the network” than there are addresses in the IPv4 space. This requires address overlap.

The smallest section of a network is the “subnet”. A subnet can hold anywhere from 2 to over a million devices.

Consider a small business network. They have three networks, a network that is connected to the Internet, labeled DMZ, a network for the security cameras, labeled CCTV, and the working network, labeled Internal.

They have a router between the Internal network and the DMZ. There is another router that takes traffic from the DMZ and transfers it to the Internet.

The CCTV network does not need to ever touch the DMZ network, nor does it really need to touch the Internal network. So they run a completely separate physical network so that CCTV traffic is never available on the Internal or DMZ networks.

This could become costly. Consider a situation where you need to connect multiple buildings. Maybe some of those buildings can be connected with fiber, but others are using radio links. Radio links are expensive.

The traffic is low enough that there is no justification for a second radio link. Besides, it is difficult to run two radio links side-by-side.

The solution that was implemented is the Virtual LAN, or VLAN.

When you define a VLAN, you set a tag in the Ethernet frame, identifying which VLAN this frame belongs to. Now, we can put all the CCTV traffic on a VLAN and use the same physical network as we use for the Internal network. All is good.

This isn’t a complete solution, it is possible to configure a network card to listen to a particular VLAN, even if that device isn’t supposed to be on the VLAN. It is also another configuration point which smaller devices might not support.

As an example, I’ve never found a method to put my cell phone on a particular VLAN. It is likely possible, I’ve just never found it.

Same with my CCTV cameras. They exist only on the default, untagged, network.

One of the very nice parts of using a VLAN, is that you can have overlapping address space. I can have 192.168.88.0/24 on the physical network and 192.168.89.0/22 on the same physical network but with a VLAN tag of 87. They are overlapping address spaces, but they do not interfere with each other.

The solution was to allow a L2 switch port to be tagged. Now, by device which only uses the untagged frame can be plugged into a tagged port. All traffic coming from that port will have a VLAN tag added to it. All traffic sent to that port will have the VLAN tag stripped from it.

This means that a CCTV device sends and receives on the default (no tag) network. It reaches the switch and the packet is now on a VLAN. Another device on the Internal network is also on the same VLAN. That device, a monitoring station, can now see the CCTV footage.

If a port receives a frame that is tagged, it drops the frame. This keeps VLANs from leaking from their approved segment.

If there is a need for a port to accept multiple VLANs, it is configured as a trunk.

Thursday, I attempted to move ceph to an OVN network. This would eliminate the need for a VLAN and would give me a single subnet across multiple physical subnets. It failed.

Friday, I attempted to put a new L2 switch into place. The good news was that I didn’t need to break my entire network to do the testing.

The test computer has two NIC’s. One is connected to the management physical network. The other to the back plane network. I was able to establish a connection to the management port.

Once there, I could establish that I had full bandwidth to other nodes on the physical network, using the physical subnet. I could even reach multiple subnets using that same interface.

Then I tried the VLAN. The VLAN failed. There was no network traffic passing through.

It also looks like they do not have a large enough MTU.

Conclusion

I’m still black boxing this thing. It has been a painful trip. I have more than a few more tests to run. It is just overly painful trying to get there.

Filler Nerd Babel Rant

Are Those Level 4 Plates? (I wish, Nerd Bable)

Nov 25, 2024 Chris Johnson

Sunday was supposed to be the day I migrated a couple of machines. I have a new physical device which is described as a Level 2 switch with SFP+ ports.

The idea is to replace my small mixed routers, 2 SFP+ ports plus some RJ45 ports with either a L2 SFP+ only switch or an L3 SFP+ only routers. This allows me to move some servers around and to increase the bandwidth from nodes to the backbone.

The switch arrived with a nice little instruction manual which claims I can find a web interface at 192.168.2.1 while the website claims there is no management interface.

Plugging it into an Ethernet port with an Ethernet SFP module gives me nothing on 192.168.2.1 and nothing on 192.168.2.x/24 but for my machine. It looks like it is unmanaged.

This means, it should be a simple plug in replacement for my tiny switch, giving an upgraded data path to the backbone.

It didn’t work.

So now I have to do some more testing. I’ll figure this out, one way or another, but it is another bottleneck in my path to full conversion to fiber from copper.

Nerd Babel

Why Is It So Slow? Or How Many Bottlenecks?

Nov 24, 2024 Chris Johnson

My mentor, Mike, use to say “There is always a bottleneck.”

What he meant by this, was that for any system, there will be a place which limits the throughput. If you can find, and eliminate, that bottleneck, then you can improve the performance of the system. Which will then slam into the next bottleneck.

Consider this in light of traffic. It is obvious to everybody, because it happens every day, that traffic does a massive slowdown just past the traffic signal where the road goes from four lanes to two. That is the point which we want to optimize.

The state comes out, evaluates just how bad the bottleneck is. The money people argue, and 15 years later they widen the road.

They widen the road between the first and second signal. Traffic now clears the first traffic signal with no issues.

And the backup is now just past the second signal, where the road narrows again.

We didn’t “solve” the bottleneck, we just moved it.

With computers, there are many bottlenecks that are kept in balance. How fast can we move data to and from the network, how fast can we move data to and from mass storage, how fast can we move data from memory? These all balance.

As a concrete example, the speed of memory is not fixed at the speed of the socket. If there are more memory lanes or wider memory lanes, you can move data faster.

If you have a fast CPU, but it is waiting for data from memory, it doesn’t matter. The CPU has to be balanced against the memory speed.

My mentor was at a major manufacturer, getting a tour and an introduction to their newest machine. He had an actual application that could also be used for benchmarking. One of the reasons it was a powerful benchmarking tool, was that it was “embarrassingly parallel”.

In other words, if it had access to 2 CPUs, it would use them both and the process would run twice as fast. 8 CPUs? 8 times as fast. Since the organization he worked for purchased many big computers (two Crays), and he was the go-to guy for evaluating computers, his opinion meant something.

He ran his code on a two CPU version, found it adequate. Requested to look at the actual designs for the machines. He spent an hour or two pouring over the design documents and then said.

“We want an 8 CPU version of this. That will match the compute (CPU) power to the memory bandwidth.”

The company wasn’t interested until they understood that the customer would pay for these custom machines.

Six months later, these 8 custom machines were in the QA bay being tested when another customer came by and inquired about them.

When they were told they were custom-builds, they pulled rank and took all 8 of them and ordered “many” more.

What happened, was that my mentor was able to identify the bottleneck. Having identified it, he removed that bottleneck by adding more CPUs. The new bottleneck was no longer the lack of compute power, it was memory access speed.

The Tight Wire Balancing Act

I deal with systems of systems. It is one of the things that I was trained in. I.e., actual classes and instruction.

Most people have no idea of how complex a modern Internet service is. I.e., a website.

This site is relatively simple. It consists of a pair of load balancers sitting in front of an ingress server. The ingress server runs in a replicated container on a clustered set of container servers. The application has a web service provider that handles assets and delegates execution to an execution engine.

This runs a framework (WordPress) under PHP. On top of that is layered my custom code.

The Framework needs access to a database engine. That engine could be unique to just this project, but that is a waste of resources and does not allow for replication. So the DB Engine is a separate system.

The DB could run as a cluster, but that would slow it down and adds a level of complexity that I’m not interested in supporting.

The DB is then replicated to two slaves with constant monitoring. If the Master database engine goes offline, the monitors promote one of the slaves to be the new master. It then isolates the old master so it does not think it is the master anymore.

In addition, then non promoted slave is pointed at the new master to replicate.

I wish it was that simple, but the monitors also need to reconfigure the load balancers to direct database traffic to the new master.

And all of this must be transparent to the website.

One of the issues I have been having recently, is that in the process of making the systems more reliable, I’ve been breaking them. It sounds stupid, but it happens.

So one of the balancing acts, is balancing redundancy against complexity, against security.

As another example, my network is physically secured. I am examining the option of running all my OVN tunnels over IPsec. This would encrypt all traffic. This adds a CPU load. How much will IPsec “cost” on a 10 Gigabit connection.

Should my database engines be using SSD or rust? Should it be using a shared filesystem, allowing the engine to move to different servers/nodes?

It is all a balancing act.

And every decision moves the bottlenecks.

Some bottlenecks are hard to spot. Is it a slow disk or is it slow SATA links or is it slow network speed?

Is it the number of disks? Would it be faster to have 3 8TB drives or 2 12TB drives? Or maybe 4 6TB drives? Any more than 4 and there can be issues.

Are we CPU bound or memory bound? Will we get a speedup if we add more memory?

Conclusion

I ave so many bottles in the air I can’t count them all. It requires some hard thinking to get all the infrastructure “right”

Nerd Babel

One Step Forward, ??? Steps Back

Nov 11, 2024 Chris Johnson

Networking used to be simple. It is unclear to me why I think that. Maybe because when I started all of this, it was simple.

Networks are broken down into two major classes, Point-to-Point (P2P) or broadcast. When you transmit on a P2P port, the data goes to a dedicated port on the other side of a physical link. There it comes out.

Each port is provided an IP address. A routing table tells the router which port to transmit on to reach a particular network. A router works in a store and forward procedure. It reads the entire packet from a port, then retransmits that packet, modified as needed, on a different port.

A broadcast network is one where multiple devices are connected to a single physical network. What is transmitted on the link is heard by all the other nodes on the same physical network.

Originally, that physical network was a switch. Your network card would connect to a switch, the switch then transmits everything it receives on one port to all other ports.

Switches could be connected to each other. The only requirement was that of time. The amount of time it takes for a packet to travel from one end of the physical network to the other was limited. If it took more time than that limit, the network became unstable.

This concept of everything going back to a single switch was expensive. The cabling was expensive, the switch was expensive, the network card was expensive. A working network started at around $50,000. $30K for the switch, $10K for each network card. Hundreds of dollars for cabling.

The original Internet protocol was only going to have addressing for 65,000 machines. How many machines would be network attached if each site required $50k just to get one or two machines hooked up. We compromised at 4 billion.

We are working on getting everything on IP version 6 with 18,446,744,073,709,551,616 IP addresses. I think somebody told me that that is enough addresses for every atom in the known universe to have an IPv6 address.

From those expensive switches, we moved to 2-base-10 and “thick” Ethernet. These had the same limitations, but the costs were starting to come down. Something around $1000 to get into thick net and a few hundred to get into thin net.

Routers were still expensive. With the advent of 10baseT, we saw costs drop again. You could get an Ethernet hub for under a hundred dollars. Routers were only a few thousand. The world was good.

The other day I purchased an 8 port 10 Gigabit router for under a hundred dollars. It has 160 Gigabit internal switching. This means it can move 10 Gigabit per second from and to every port.

It cost less than $35 for two fiber transceivers. It cost around $33 for an Intel-based NIC capable of 10 Gigabits.

This means that I can upgrade a server to 10 Gibibit capability for around $60. Not bad.

A Step Forward

My data center was rather small. It was set up as a single /23 (512 addresses) connected via L2 switches. The switches were all one Gigabit copper.

You can buy 10 Gigabit L2 switches, but they are either copper, with limited distances and a need for high-quality cabling, or they are expensive.

Moving to an L3 device got me a better price and more features.

Moving to an L3 router gave me some more options. One of the big ones is the ability to have multiple paths to each device to provide high availability.

This requires that each node have multiple network interfaces and multiple routers and switchers. With the routers being cross connected, with each node being able to handle multi-path communications.

This is the step forward.

A step backwards

This High Availability (HA) solution requires multi-path capabilities. This is not always available for every piece of software. I want to keep things simple.

A Solution

A solution is to move from a physical network with multiple paths and redundant capabilities to virtual networking.

Each node will have two physical network interfaces. The interfaces will route using OSPF. This is a quick response system that will find other paths if one link or router fails. This provides the HA I want for the network.

Each node will have two VPCs for the ceph cluster, one or more VPC for each container system, and one or more VPC for each VM cluster. A VPC is a “virtual private cloud” It is a virtual network with only allowed traffic.

You can have multiple networks on a single physical network. For example, you can have 192.168.0.0/24 be your “regular” subnet and 172.16.5.0/24 be your data plane subnet. A network interface configured as 192.168.0.7 will only “hear” traffic on subnet 192.168.0.0/24.

But you can configure a network interface to hear every packet. Allowing a node to “spy” on all traffic.

With a VPC, there is only subnet 192.168.0.0/24 on the one VPC and only 172.16.5.0/24 on the other. Packets are not switched from one VPC to the other. You need a router to move data from one VPC to another. And the two VPCs must have different subnets; otherwise the router doesn’t know what to do.

OVN Logical Switch

It turns out that a VPC is the same as an OVN logical switch. Any traffic on one logical switch is restricted to that switch. You need to send traffic to a logical router to get the traffic in or out of the VPC.

Since the traffic is going through a router, that router can apply many filters and rules to protect the VPC from leaking data or accepting unwanted data.

I configured 4 VPCs for testing. DMZ is part of the physical network. Any virtual port on the DMZ VPC is exposed to traffic on the physical network. This is how traffic can enter or exit the virtual clouds.

The second VPC is “internal”. This is a network for every physical node to exist. By using the internal VPC, each node can communicate with each other, regardless of the physical topology.

That was working.

There was a data plane VPC and a management VPC. Those VPCs were connected to the DMZ through a router. The router is distributed across multiple nodes. If one node goes down, the other node is ready to take up the traffic.

Falling way back

I now have a VPC for testing. The idea is to test everything extensively before moving any nodes to the virtual network. I need to be able to reboot any node and have everything still function.

The VPC came up perfectly. My notes made it easy to create the VPC and configure it.

The problem began when I added a router to the VPC.

Now I can’t get traffic to flow to the VPC.

WTF?

Nerd Babel Rant

Bad Hardware Design

Nov 4, 2024 Chris Johnson

I have had good luck with picking up discarded computers, upgrading them, and making them functional members of the computer or services farm.

A computer consists of persistent storage (disk drives and SSD), dynamic storage (memory), a processor (CPU), and I/O devices.

Data is read from disk into memory, the processor then either executes it or processes it, the results are sent to an output devices. I/O devices allow the input from disks, keyboards, persistent storage devices, networks or other devices. They also send output to video devices, networks, printers, and storage devices.

The thing that defines how a computer can be configured is the motherboard. The motherboard accepts one or more processors, one or more memory devices, one or more I/O devices.

Some motherboards come with built-in I/O devices. For example, A motherboard will come with built-in disk controllers, sound cards, video drivers, USB controllers, P/S-2 keyboard and mouse, serial drivers and many more. These are the connectors that you see on the back of your computer or elsewhere on the case.

Many of these drivers lead to a connector or a socket. If your motherboard has SATA disk controllers, there will be SATA connectors on the motherboard. If your motherboard has built-in video, the back will have an ISA video connector and/or an HDMI connector. It might have a DVI connector as well.

The covers most of what you find on the motherboard. The rest are the important sockets.

There will normally be extension slots. These are where you would plug in extra I/O devices, such as network cards, disk controllers, or video cards. There will normally be memory slots. Depending on the amount of memory supported by the CPU and motherboard, this could be two, four, eight, or even more. Finally, there is normally a socket for the CPU.

For me, I have found that the cheapest way to upgrade a computer is to give it more memory. Most software is memory intensive. If you exceed the amount of memory in your machine, your machine has to make space for the program you want to run. Then it has to read into memory, from disk, the program or its data before it can continue.

The more memory, the less “paging” needs to happen.

Upgrading the CPU is another possibility. This is normally a fairly reasonable thing to do. Consider an AMD Ryzen 7 3700, which is the CPU in one of my machines. It runs $150 on Amazon, today. I purchased it for $310 a few years ago.

Today, I can upgrade to a Ryzen 9 5950x from a Ryzen 7 3700x for $350.

Buying the latest and greatest CPU is expensive. Buying second tier, older CPUs is much more price effective.

The motherboard in this particular server is nearing its end of life. It has an AM4 socket, which has been replaced with the AM5 socket. This means it is unlike that any “new” CPUs will be released for the AM4.

Bad Design

The first place I see bad computer designs is in the actual case. This is not as bad as it used to be. It used to be that opening an HP case was sure to get you sliced up. Every edge was razor sharp.

The next major “bad design” is a case and motherboard combination which is non-standard. The only motherboard that will ever fit in that case is a motherboard from that company. Likely the only place to get such a motherboard is from E-Bay.

The next issue is when there are not enough memory slots, or worse, not enough memory addressing lines. Apple was actually famous for this.

In the old days, Apple used a 68020 class CPU. The CPU that they were using had a 32-bit address register. This is 4 Gigabytes of addressing. More than enough for the time period. Except…

Apple didn’t use all 32 bits, they only used 24 bits, leaving 8 bits unused. This gives 16 Megabytes of addressable memory. More than enough in a time period where people still remembered Billy saying “Nobody will ever need more than 640 Kilobytes of memory”.

Apple made use of the extra 8 bits in the address register for “Handles”. Not important.

Most CPUs today use a 64-bit address registers. I don’t know of a CPU that uses all 64 bits for addressing.

Which takes us to bad designs, again. Some motherboards only bring enough address lines to the memory slots to handle what is the “largest” memory card currently available. This means that you can have slots that support 16 Gigabyte DIMMs, but the motherboard only supports 4 Gigabyte DIMMs.

Often, it is worse. Cheaper motherboards will only have 2 DIMM slots. There is nothing more frustrating than having a machine with 8 GB of memory and finding out that it isn’t one 8 GB DIMM leaving room for another 8 GB, but instead two 4 GB DIMMs. Which means that when you receive that 8 GB DIMM you have 12 GB total instead of the goal of 16 GB, and you have a 4 GB DIMM that isn’t good for anything.

Sub Conclusion

If you want to be able to upgrade your computer, buy a motherboard with the latest socket design. AMD or Intel. Buy one that has enough DIMM slots to handle 4 times the amount of memory you think you are going to need. Buy a CPU that is at 1/4 to 1/3 the price of the top-tier CPU. Depending on the release date, maybe even less than that.

Make sure it has a slot for your video card AND having one PCIe-16 slot still open. You might never use it, but if you need it, you will be very frustrated at saving yourself $10.

Source of the rant

My wife is using an employer supplied laptop for her work. All of her personal work has to be done on her phone. With the kids off to university, their old HP AIO computer is available.

The only problem is that word “OLD”. A quick online search shows that I should be able to upgrade the memory from 4 GB to 16 GB and the CPU from an old Intel to an i7 CPU. This means that I can bring this shell back to life for my wife to use.

At the same time, I intend to replace a noisy fan.

Looking online, the cost of a replacement CPU will be $25. The cost of the memory, another $25. Plus $25 for a new keyboard and mouse combination. $75 for a renewed computer. Happiness exists.

Before I order anything, I boot into my Linux “rescue/install” USB thumb drive. I run lscpu and it spits out the CPU type. Which is AMD. AMD sockets do NOT support i7 CPUs. This means that my online research does not match what my software is saying. I trust the software more than the research.

Turns out that there are two versions of this particular All In One model. One is AMD-based, the other is Intel-based. The Intel-based version has a socketed CPU. The AMD version has the CPU soldered into place. It cannot be upgraded.

These maroons have rendered this machine locked in the past. With no way to upgrade the CPU, it is too slow for today’s needs. Even with maximum memory.

Conclusion

An old computer is sometimes garbage. Put it out of your misery. Use it for target practice or take it to the dump.

Explainer Nerd Babel

Two Factor Authentication

Oct 28, 2024 Chris Johnson

There are two parts to access control, the first is authentication, the second is authorization.

Authentication is the process of proving you are who you claim to be.

There are three ways to prove you are who you say you are, something you know, something you have, or something about you.

When you hand your driver’s license to the police officer at a traffic stop, you are authenticating yourself. You are using two-factor authentication. The first part is that you have that particular physical license in your possession. The second is that the picture on the ID matches you.

After the officer matches you to the ID you provided, he then proceeds to authenticate the ID. Does it have all the security markings? Does the picture on the DL match the picture that his in-car computer provides to him? Does the description on the DL match the image on the card?

He will then determine if you are authorized to drive. He does this by checking with a trusted source that the ID that he holds is not suspended.

People Are Stupid

While you are brilliant, all those other people are stupid.

So consider this scenario. Somebody claims that they can read your palm and figure things out about you. Your favorite uncle on your mother’s side of the family is Bill Jones. You laugh and reply, you got that wrong, James Fillmore is my favorite uncle.

So, one of the more common security questions to recover a password is “What is your mother’s maiden name?” Do you think that the person who just guessed your favorite uncle incorrectly might do better at guessing your mother’s maiden name?

It was assumed that only you know that information. The fact is that the information is out there, it just takes a bit of digging.

The HR department at a client that I used to work for liked to announce people’s birthdays, to make them feel good.

She announced my birthday over the group chat. I went into her office and explained that she had just violated my privacy.

The next time you are at the doctor’s office, consider what they use to authenticate you. “What is your name and date of birth?”

I lie every time some website asks for my date of birth, unless it is required for official reasons.

Finally, people like to pick PINs and codes that they can remember. And they use things that match what they remember. What is a four-digit number that is easy for most people to remember? The year of their birth.

You do not want to know how many people use their year of birth for their ATM PIN.

In addition, it is easy to fool people into giving you their password. We call that phishing today. But it is the case that many people will read that their account has been compromised and rush to fix it. Often by clicking on the link in the provided e-mail.

A few years back, I was dealing with a creditor. They have a requirement to not give out information. A blind call asking me to authenticate myself to them. I refused. I made them give me the name of their company as well as their extension and employ number.

I then looked up the company on the web. Verified that the site had been in existence for multiple years. Verified with multiple sources what their main number was. Then called the main number and asked to be connected to the representative.

Did this properly authenticate her? Not really, but it did allow us to move forward until we had cross authenticated each other.

Biometrics

If you have watched NCIS, they have a magic gizmo on the outside of the secure room. To gain access, the cop looks into the retina scanner. The scanner verifies that pattern it scans with what is on record and, if you are authorized, unlocks the door.

Older shows and movies used palm scanners or fingerprint scanners. The number of movies in which the MacGuffin is the somebody taking a body part or a person to by-pass biometric scanners is in the 1000s, if not higher.

So let’s say that you are using a biometric to unlock your phone. Be it a face scan or a fingerprint scan.

The bad guys (or the cops) have you and your phone. While they cannot force you to give up your password, they can certainly hold the phone up to your face to unlock it. Or forcibly use your finger to unlock it.

Biometrics are not at the point where I would trust them. Certainly, not cheap biometric scanners.

It Doesn’t Look Good

We need to protect people from themselves. We can’t trust biometrics. That leaves “something they have”.

When you go to open unlock your car, you might use a key fob. Press the button and the car unlocks. That is something you have, and it is what is used to authenticate you. Your car knows that when you authenticate with your key fob, you are authorized to request that the doors be unlocked.

If you are old school, and still use a physical key to unlock your home, the lock in your door uses an inverse pattern to authenticate the key that you possess. It knows that anybody who has that key is authorized to unlock the door.

Since people might bypass the lock or make an unauthorized duplicate of your key, you might add two-factor authentication. Not only do they have to have something in their possession, they must all know the secret code for the alarm.

Two-Factor Authentication

Two-Factor authentication is about providing you with something that only you possess. You need to be able to prove that you have control of that object and that the answer cannot be replayed.

Consider you are coming back from patrol. You reach the gate and the sentry calls out “thunder”. You are supposed to reply with “dance”. You have now authenticated and can proceed.

The bad guy now walks up. The sentry calls out “thunder”. The bad guy repeats what you said, “dance”. And the bad guy now walks through the gate.

This is a “replay” attack. Any time a bad guy can repeat back something that intercepted to gain authentication, you have a feeble authentication.

The first authenticator that I used was a chip on a card. It was the size of a credit card, you were expected to carry it with you. When you tried to log in, you were prompted for a number from the card. The card had a numeric keypad. You input your PIN. The card printed a number. That number was only good for a short time.

You entered that number as your password, and you were authenticated.

There were no magic radios. Bluetooth didn’t exist. Wi-Fi was still years in the future. And it worked even if you were 100s of miles away, logging in over a telnet session or a dial-up modem.

How?

Each card had a unique serial number and a very accurate clock. The time of day was combined with the serial number and your pin to create a number. The computer also knew the time, accurately. When you provided the number, it could run a magic algorithm and verify that the number came from the card with that serial number.

One of the keys to computer security is that we don’t store keys in a recoverable format. Instead, we store cryptographic hashes of your password. We apply the same hash to the password/pass phrase you provided us and then compare that to the stored hash. If they match, the password is correct. There is no known methods for going from the hash to the plaintext password.

That security card had some other features. It could be programmed to have a self-destruct PIN, or an alert PIN, or a self-destruct after too many PIN entries in a given amount of time.

When it self-destructed, it just changed an internal number, so the numbers generated would never again be correct. If the alert PIN was set up, using the generated number would inform the computer that the PIN was given under duress. The security policies would determine what happened next.

Today, we started to see simple two-factor authentication. “We sent a text to your phone, enter the number you received.” “We emailed the account on record, read and click on the link.”

These depend on you having control of your email account or your phone. And that nobody is capable of intercepting the SMS text.

A slightly more sophisticated method is a push alert to an app on your phone. This method requires radio communications with your phone app. The site requesting you to authenticate transmits a code to your phone app. Your phone app then gives you a code to give to the site. Thus, authenticating you.

There are other pieces of magic involved in these. It isn’t a simple number, there is a bunch of math/cryptology involved.

Another method is using your phone to replace the card described above.

I authenticate to my phone to prove I’m authorized to run the authenticator application. There is a 6-digit number I have to transcribe to the website within 10 seconds. After 10 seconds, a new number appears.

I’ve not looked into all the options available, it just works.

The cool thing about that authenticator, is that it works, even if all the radios in my phone are off.

Finally, there are security keys. This is what I prefer.

I need to put the key into the USB port. The key and the website exchange information. I press the button on the security key, and I’m authenticated.

Another version requires me to type a passphrase to unlock the key before it will authenticate to the remote site.

Conclusion

If you have an option, set up two-factor authentication. Be it an authenticator app on your phone or a Yubico security key. It will help protect you from stupids.

Nerd Babel

Data Security

Oct 27, 2024 Chris Johnson

Data security is the protection of your data throughout its lifecycle.

Let’s pretend you have a naughty image of yourself that you don’t want anybody else to see.

The most secure way of protecting that image is to have never taken that image in the first place. It is too late now.

If you put that image on a portable USB drive, then somebody can walk off with that USB drive. The protection on that image is only as good as the physical security of that device.

Dick, the kiddy diddler, who is in the special prison for the rest of his life, kept his kiddy porn on USB thumb drives. They were stored around his bed. Once the cops served their warrant, all of those USB drives were available to be examined.

They were examined. Dick was evil and stupid.

The next best way is to encrypt the image using a good encryption tool.

To put this in perspective, the old Unix crypt program implemented an improved version of the German Enigma machine. It was improved because it could encrypt/decrypt a 256 character alphabet rather than the original 27 characters.

Using the crypt breakers workbench, a novice can crack a document encrypted with the Unix crypt command in about 30 minutes.

At the time, crypt was the only “good” encryption available at the command line. The only other was a rot-13 style obfuscation tool.

In our modern times, we have access to real cryptography. Some of it superb. We will consider using AES-256, the American Encryption Standard. This is currently considered secure into the 2050s at current compute power increases.

AES-256 uses a 256-bit key. You are not going to remember a 256-bit number. That is a hex number 64 characters long. So you use something like PGP/GnuPG. PGP stands for Pretty Good Privacy.

In its simplest form, you provide a passphrase to the tool, and it converts that into a 256-bit number, which is used to encrypt the file. Now make sure you don’t forget the pass phrase and also that you delete (for real) the original image.

Now, if you want to view that image, why I don’t know, you have to reverse the process. You will again have the decrypted file on your disk while you examine the image. Don’t forget to remove it when you are done looking.

We can take this to a different level, by using the public key capabilities of PGP. In this process, you generate two large, nearly prime, numbers. These numbers, with some manipulation, are used to encrypt keys. These are manipulated into a Public Key and a Private Key. The public key can decrypt files encrypted with the private key. The private key can decrypt files encrypted with the public key.

The computer now uses a good random number generator to create a 256-bit key. That key is used to encrypt your plaintext file. The key is then encrypted with your “Public Key” and attached to the file.

Now you can decrypt the file using your “Private Key”.

This means that your private key is now the most valuable thing. So you encrypt that with a pass phrase.

Now you need to provide the pass phrase to the PGP program to enable it to decrypt your private key, which you can then use to decrypt or encrypt files. All great stuff.

I went a step further. My PGP key requires a security fob to decrypt. This means it requires something I know, a pass phrase, plus something I have, the security fob.

This means that there are two valuable items you have, the private key and your pass phrase. Let’s just say that those need both physical and mental protection. You need to make sure that nobody can see you type in your pass phrase, plus your pass phrase has to be something you can remember, plus it has to be long enough that your fingers can’t be read as you type it.

And, don’t ever type it on a wireless keyboard. You would have to trust that nobody is intercepting the transmission from the keyboard to the computer system.

In addition to that, most keyboards are electronically noisy. This means that the electrical interference that is given off by your keyboard can be read and used to guess at key sequences.

Finally, you need to make sure that nobody has installed a keylogger to capture every key you type. These can go inside your keyboard, or just plug into the end of your USB cable.

All of this is painful to do. And you need to go through the decryption phase every time you want to look at your secret document.

So we can use disk encryption.

The idea here is similar to PGP. You generate a large block of random bits. This will be your encryption/decryption key. This block of random bits is then encrypted with a pass phrase. When you mount your disk drive, you need to “unlock” the decryption key. Once that is done, the data on that disk is accessible in plain format.

You can tell your computer to forget the key and then none of the data is available. You can unmount the file system and the data is protected. You can turn off your computer and the data is now unavailable and protected.

Of course, they might have your pass phrase, in which case they will just use it to decrypt your key.

But there is a neat thing that you can do, you can wipe the decryption key. If this is done, then even with your pass phrase, there is nothing that can be done.

The government has strict requirements on how to erase magnetic media, disk drives, magnetic tapes, and the like. For magnetic tape, they use a machine that has a strong magnetic field. This field will scramble any data on the tape if used correctly.

This is not good enough for disk drives, though. The “short” version of erasing a magnetic disk is to write all zeros, then write all ones, then write random numbers. This will make it difficult to recover the data. The longer version, “Gutman”, requires 35 passes.

Sounds good, let’s do it on a test drive. Here is a 12 TB drive that is 75% full. The 75% doesn’t help us. We still need to erase every sector.

Our SATA 3, 6 Gbit I/O channel is not our bottleneck, it is the time to write the data. That is 210 Mbit/second. So more than five days, per pass.

If we have encrypted the drive, we only have to wipe a few sectors. That can be done in far less than a second.

But, it gets better. You can buy “secure” drives. These drives have the encryption built in. You send a magic command to the drive, and it wipes its key and makes the entire disk just random bits, nearly instantly.

This key on disk method is what Ceph uses, under the hood.

Of course, that is only part of the solution, the next part is on the wire encryption. This requires still more.

Conclusion

The biggest issue facing people who are trying to create secure environments is that they need to make sure that they have identified who the black hat is.

Will they be able to physically access your equipment? Assume yes.
Will they be able to tap into your network? Assume yes.
Will they be able to physically compromise your keyboard? Maybe?
Will they be able to take your stuff?
Will they be able to force you to give your pass phrase?
Will they be able to access your computer without a password?
Will you be able to boot your network from total outage without having to visit each node?