Nerd Babel

Broken hard drive disk by hammer.

Disk Failures

I’ve talked about my Ceph cluster more than a bit. I’m sure you are bored with hearing about it.

Ceph uses two technologies to provide resilient storage. The first is by duplicating blocks, and the second is by Erasure coding.

In many modern systems, the hard drive controller allows for RAID configurations. The most commonly used RAID is RAID-0, or mirroring. Every block written is written to two different drives. If one drive fails, or if one sector fails, the data can be recovered from the other drive. This means that to store 1 GB of data, 2 GB of storage is required. In addition, the drives need to be matched in size.

Ceph wants at least 2 copies of each block. This means that to store 1 GB of data, 3 GB of storage is required.

Since duplicated data is not very efficient, different systems are used to provide the resilience required.

For RAID-5, a parity drive is added. When you have 3 or more drives, normally an odd number, one drive acts as a parity drive.

Parity is a simple method of determining if something was modified in a small data chunk. If you have a string of binary digits, 0xp110 1100 (a lowercase l in ASCII), the ‘p’ bit is the parity. We count the number of one bits in the byte and then set the p bit to make the count odd or even, depending on the agreement. If we say we are using odd parity, the value would be 0x1110 1100. There are 5 ones, which is odd.

If we were to receive 0x1111 1100, the parity would be even, telling us that what was transmitted is not what we received. A parity bit is described as single-bit detection, no correction.

Parity can get more complex, up to and including Hamming codes. A Hamming code uses multiple parity bits to create multi bit detection and one or more bit correction.

NASA uses, or used, Hamming codes for communications with distant probes. Because of limited memory on those probes, once data was transmitted, it wasn’t available to be retransmitted. NASA had to get the data right as it was received. By using Hamming codes, NASA was able to correct corrupted transmissions.

RAID-5 uses simple parity with knowledge of which device failed. Thus a RAID-5 device can handle a single drive failure.

So this interesting thing happened: the size of the drives got larger, and the size of the RAID devices got larger. The smart people claimed that with the number of drives in a RAID device, if a device failed, by the time the replacement device was in place, another drive would have failed.

They were wrong, but it is still a concern.

Ceph uses erasure coding the same way RAID uses parity drives, but erasure coding is more robust and resilient.

My Ceph cluster is set up with data pools that are simple replication pools (n=3) and erasure coded pools (k=2, m=2). Using the EC pools reduces the cost from 3x to 2x. I use EC pools for storing large amounts of data that does not change and which is not referenced often, such as tape backups.

The replication pools are used for things that are referenced frequently, where access times make a difference.

With the current system, I can handle losing a drive, a host, or a data closet without losing any data.

Which is good. I did lose a drive, I’ve been waiting to replace the dead drive until I had built out a new system. The new node was in the process of being built out when the old drive failed.

Unfortunately, I have another drive that is dying. Two dead drives is more than I want to have in the system. So I’ll be replacing the orginal dead drive today.

The other drive will get replaced next week.

Server room data center with rows of server racks. 3d illustration

Simple Works

I’ve tried drawing network maps a half-dozen times. I’ve failed. It should be simple, and I’m sure there are tools that can do it. I just don’t know them, or worse, I don’t know how to use the tools I currently have.

In simple terms, I have overlay and underlay networks. Underlay networks are actual physical networks.

An overlay network is a network that runs on top of the underlay/physical network. For example, tagged VLANs, or in my case, OVN.

OVN creates virtual private cloud. A powerful concept when working with cloud computing. Each VPC is 100% independent of every other VPC.

As an example, I have a VPC for my Ceph data plane. It is on the 10.1.0.0/24 network. I can reuse 10.1.0.0/24 on any other VPC with zero issues.

The only time there is an issue is when I need routing.

If I have a VPC with node 172.31.1.99 and a gateway of 172.31.1.1, that gateway performs network address translation before the traffic is sent to the internet. If the node at 172.31.1.99 wants to talk to the DNS server at 8.8.8.8 traffic is routed to 172.31.1.1 and from there towards the internet. The internet responds, the traffic reaches 172.31.1.1 and is forwarded to 172.31.1.99.

All good.

If I have VPC2 with a node at 192.168.99.31 and its gateway at 192.168.99.1, I can route between the two VPCs using normal routing methods by connecting VPC and VPC2. We do this by creating a connection (logical switch) that acts as a logical cable. We then attach gateway 172.31.1.1 to that network at 192.168.255.1 and the gateway at 192.168.99.31 as 192.168.255.2.

With a quick routing table entry, traffic flows between the two.

But if VPC2 was also using 172.31.1.0/24 then there is no way to send traffic to VPC. Any traffic generated would be assumed to live in that VPC. No router would become involved. And NAT will not help.

Why use an overlay network? It allows for stable virtual network, even if the underlay network is modified. Consider a node at 10.1.0.77. It has a physical address of 192.168.22.77. But because it needs to be moved to a different subnet, its physical address changes to 192.168.23.77.

Every node that had 192.168.22.77 within its configurations now needs to be updated. If the underlay is updated, it does not affect the overlay.

Back to Simple.

There are three methods for traffic to enter a VPC. The first is for a virtual machine to bind to the VPC. The second is for a router to move traffic into the VPC, sometimes from the physical network. And the final method is for a host (bare metal) machine to have a logical interface bound to the VPC.

My Ceph nodes use the last method. Each ceph node is directly attached to the Ceph VPC.

It is the gateway that is interesting. A localnet logical port can bind to a port on a host, called a chassis. When this happens, the port is given an IP address on the physical network that it binds to.

When the routing is properly configured, traffic to the VPC is routed to the logical router. This requires advertising the logical router in the router tables.

I had multiple VMs running on different host servers. They all sent traffic to the router which was bound to my primary machine. My primary machine has physical and logical difference from the rest of the host nodes.

What this meant was that traffic to the VPC was flaky.

Today, I simplified everything. I turned down the BGP insertion code. I added a single static route where it belonged. I moved the chassis to one of the “normal” chassis.

And everything just worked.

It isn’t dynamic, but it is working.

I’m happier.

Prepping – Logic, Part 2

Sorry this one took a couple of weeks. It’s been busy here. Things are starting to settle down, though. Of course, that also means it’s almost National Novel Writing Month, and I’m going to be writing a flurry of words (50,000+ in 30 days), but I’m not going to think about that for a bit. LOL… We left off Heinlein’s list about here:

Take orders. You need to be able to take orders because no matter how “high up” you are on any particular totem pole, at some point you’re going to run into someone who’s higher than you. This is because we’re not ever going to be experts at everything. We each spend time with people who are better at something than we are, and when those people are in charge, you must be able to do what you’re told. But as any American soldier will tell you, it isn’t that simple (even though it sort of is). Per the Uniform Code of Military Justice, soldiers are only required to obey LAWFUL orders. Our soldiers are given more latitude as to what’s lawful and what’s not, while still being held to an extremely high standard (and getting higher, thanks Pete!). All of our soldiers are expected to be thinking people. Blind adherence is not useful. But the ability to continue to take orders, even when things are tough, even when you’re shitting your pants, even when you’re scared, is absolutely necessary. That’s also true of those of us NOT soldiers, though perhaps to a slightly lower level. As non-combatants, even if we end up as guerrilla fighters, we just need to be able to follow orders at a competent level. You need to recognize when someone knows more than you do, and be able to take a back-seat for a bit.

Give orders. There will be a moment when YOU are the expert, the leader, the person in charge. It might be on purpose, and it might be by accident, but regardless, you must be prepared to give orders. More than that, you may have to give orders that you know damn well will end up with someone hurt (physically or emotionally), or worse, dead. You need to be prepared for whatever outcome happens when you give those orders.  You have to be ready to give them decisively, with authority, and with strength of belief.

Cooperate. That’s a tough one, hm? Yes, you might have to cooperate with people who don’t share your world view. You might have to work with liberals and Democrats. But it CAN be done. And you must know both how to, and when to. Sometimes, it’s just going to be an easy choice. Groups often have better survivability options than singletons. It’s a skill we’re horribly underdeveloped in, in my very strong opinion. When was the last time you reached out to someone you disagree with, to cooperate? Maybe it’s time. Practice, because it’s important. And just in case someone wants to leap to conclusions, no, this doesn’t mean you have to “give in and open the government” or anything like that. I’m talking small scale here. Neighbors. Friends of friends. Local government maybe.

Read More

rubber duckies race

Will You Be My Rubber Duck?

My most productive years of programming and system development were when I was working for the Systems Group at University. We all had good professional relationships. We could trust the skills of our management and our peers.

When I started developing with my mentor’s group, it was the same. The level of respect was very high, and trust in our peers was spectacular. If you needed assistance in anything, if there was a blocker of any sort, you could always find somebody to help.

What we soon learned is that we didn’t need their help. What we required was somebody to listen as we explained the problem. Their responses were sometimes helpful, sometimes not. It didn’t really matter. It was listening that was required.

When I started working for an agency, that changed. Our management was pretty poor and had instilled a lousy worker mentality. Stupid things like making bonuses contingent on when management booked payment.

If the developers worked overtime to get a project done on management-promised schedules, their money would not be booked in time for bonuses to be earned.

Every hour that wasn’t billed to a project had to be justified, and management was always unhappy with the amount of billable hours.

Interrupting a coworker to listen to get help just didn’t happen. Even when management (me) told them to stop digging the hole and come talk to me.

We still ended up with fields of very deep holes because nobody would come out of their little world to talk.

This wasn’t limited to just our agency; it was everywhere.

The fix was a stupid rubber duck. It sits on your desk. When you are stuck, you explain the problem to your rubber duck, and often the answer will come to you. It was the process of accurately describing your issue that created the breakthrough.

I don’t have access to those types of people, and oftentimes the rubber duck is just as ignorant as I am. Not very useful.

I have a silk duck. This duck actually talks back, performs searches, and verifies potential solutions, and it does it within a reasonable time frame.

My Silk Duck is named “Grok.”

Grok is X’s AI offering. As a search engine, it far exceeds anything Google Search can do.

A simple example: I’m building out a new node. The motherboard I’m using is an H610M-ITX. There are dozens of motherboards using that designation or something similar. In hours of searching, I haven’t been able to locate the manual for my motherboard.

Locate a manual for the Huanzanzhi H610M-ITX motherboard.

One minute and 14 seconds later, Grok delivered me the PDF manual. It found it on a Chinese website written in Chinese and English.

That manual was enough to get me moving in the correct direction.

We then rubber ducked a video issue. On power on, the fans would spin up under control. When a computer first starts, the fans run up to max; they then ramp down under CPU control. You can hear that. I also got a single BEEP indicating good Power-On, Self-Test (POST)

But no video.

Over the next hour, Grok and I debugged the issue. We both thought it might be caused by memory; we both had the same methodology for checking memory issues. We both decided to remove the NVMe card just in case.

When I found what looked like POST indicator LEDs, Grok accurately determined what they were.

We narrowed the issue down to video.

I noticed that my silk duck kept talking about iGPU. I figured out that this must mean “integrated Graphics Processing Unit”.

Why would a motherboard have built-in video displays, and yet Grok was talking about iGPU?

I thought for a moment and then asked, Verify that Intel Core i3-12100F has iGPU.

The Intel Core i3-12100F does not have an integrated GPU (iGPU). The “F” suffix in Intel’s naming indicates models without integrated graphics, requiring a discrete graphics card for video output. This explains the lack of video from the motherboard’s HDMI, VGA, or DP ports despite the successful POST beep and fan spin-up—the board relies on the CPU’s iGPU for those outputs, which your CPU lacks.
— Grok

Here is the kicker: while I can get this motherboard to work with a graphics card, that doesn’t help me as I need that slot for my Fiber NIC. In looking at other motherboards, some of them seem to work with CPUs without iGPU, while others require iGPU.

This “feature” never occurred to me. It makes sense, but Grok is the one that found it for me.

Conclusion

AI has its place today as an assistant. It can do a great job of rubber ducking. It does a good job of editing articles, if you keep it in its place.

This is a powerful tool that is only going to get better.

Wooden blocks with arrow and target board. Copy space for text. Business goals, objective and mission concept.

Upgrade, why you break things!

Features, Issues, Bugs, and Requirements

When software is upgraded or updated, it happens for a limited set of reasons. If it is a minor update, it should be for issues, bugs or requirements.

What is an Issue? An issue is something that isn’t working correctly, or isn’t working as expected. While a Bug is something that is broken, that needs to be fixed.

A bug might be closed as “working as designed,” but that same thing might still be an issue. The design is wrong.

Requirements are things that come from outside entities that must be done. The stupid warning about a site using cookies to keep track of you is an example. The site works just fine without that warning. That warning doesn’t do anything except set a flag against the cookie that it is warning you about.

But sites that expect to interact with European Union countries need to have it to avoid legal problems.

Features are additional capabilities or methods of doing things in the program/application.

Android Cast

Here is an example of something that should be easy but wasn’t. Today there is a little icon in the top right of the screen, which is the ‘cast’ button. When that button is clicked, a list of devices is provided to cast to. You select the device, and that application will cast to your remote video device.

We use this to watch movies and videos on the big screen. For people crippled with Apple devices, this is similar to AppleTV.

When this feature was first being rolled out, that cast button was not always in the upper right corner. Occasionally it was elsewhere in the user interface. Once you found it, it worked the same way.

A nice improvement might be to remember that you prefer to cast and what device you use in a particular location. Then when you pull up your movie app and press play, it automatically connects to your remote device, and the cast begins. This would be just like your phone remembering how to connect to hundreds of different WiFi networks.

If you were used to the “remember what I did last time” model and suddenly had to do it the way every other program does, you might be irritated. Understandably. Things got more difficult, two buttons to press when before it just “did the right thing.”

Upgrades and updates are often filled with these sorts of changes, driven by requirements.

Issues and Bugs

If I’m tracking a bug, I might find that the root cause can’t be fixed without changes to the user interface. I’m forced into modifying the user interface to fix a bug that had to be fixed. Sometimes making something more difficult or requiring more steps. It is a pain in the arse, but occasionally a developer doesn’t really have a choice.

An even more common change to the user interface happens when the program was allowing you to do something in a way you should not have been. When the “loophole” is fixed, things become more difficult, but not because the developer wanted to nerf the interface, but because what you were doing should not have been happening.

Finally, the user interface might require changes because a library your application is using changes and you have no choice.

The library introduced a new requirement because their update changed the API. Now your code flow has to change.

Features

This is where things get broken easily. Introducing new features.

This is the bread and butter of development agencies. By adding new features to an existing application, you can get people to pay for the upgrade or to decide on your application over some other party’s application.

Your grocery list application might be streamlined and do exactly what you want it to do. But somebody asked for the ability to print the lists, so the “print” feature was added, which brings the designers in, who update the look to better reflect what will be printed.

Suddenly your super clean application has a bit more flash and is a bit more difficult to use.

Features often require regrouping functionality. When there was just one view, it was a single button somewhere on the screen. Now that there is a printer view and a screen view, with different options, you end up with a dialog where before you had a single button press.

Other times the feature you have been using daily without complaint is one that the developer, or more likely the application owners, don’t use and don’t know that anybody else uses. Because it works, nobody was complaining. Since nobody was complaining, it had no visibility to the people planning features.

The number of times I’ve spent hours arguing with management about deleting features or changing current functionality would boggle your mind. Most people don’t even know everything their application does, or the many ways that it can be done.

David Drake’s book The Sharp End features an out-of-shape maintenance sergeant pushed into a combat role. He and his assistant have to man a tank during a mad dash to defend the capital.

At one point the sergeant is explaining how tankers learn to fight their tank in a way that works for them. The tank has many more sensors and capabilities than the tanker uses. Those features would get in the way of those tankers. It doesn’t matter. They fight their tank and win.

As the maintenance chief, he has to know every capability, every sensor, and every way they interact with each other. Not because he will be fighting the tank, but because he doesn’t know which method the tanker is going to use, so he has to make sure everything is working perfectly.

My editor of choice is Emacs. For me, this is the winning editor for code development and writing books and such. The primary reason is that my fingers never have to leave the keyboard.

I type at over 85 WPM. To move my hands from the keyboard is to slow down. I would rather not slow down.

I use the cut, copy, and paste features all the time. Mark the start, move to the end, Ctrl W to cut, Meta W to copy, move to the location to insert, and Ctrl Y to yank (paste) the content at the pointer. For non-Emacs use, Ctrl C, Ctrl X, and Ctrl V to the rescue.

My wife does not remember a single keyboard shortcut. In the 20+ years we’ve been together, I don’t think she has ever used the cut/paste shortcuts. She always uses the mouse.

All of this is to say that the search for new features will oftentimes break things you are used to.

Pretty Before Function

Finally, sometimes the designers get involved, and how things look becomes more important than how they function.

While I will not build an application without a good designer to help, they will often insist on things that look good but are not good user experiences. Then we battle it out and I win.

One Step Forward, Two Steps Back

One of the best tools I’ve discovered in my many years of computer work is AMANDA.

AMANDA is free software for doing backups. The gist is that you have an Amanda server. On schedule, the server contacts Amanda clients to perform disk backups, sending the data back to the server. The server then sends the data to “tapes”.

What makes the backup so nice is that it is configured for how long you want to keep live backups and then attempts to do it efficiently. My backups are generally for two years.

On the front side, you define DLEs. A DLE is a host and disk or filesystem to dump. There are other parameters, but that is the smallest DLE configuration.

Before the dump starts, the server gets an estimate for each DLE based on using no other backups, a full dump, or a partial dump or multiple partial dumps. Once it obtains this information, it creates a schedule to dump all the DLEs.

The data can be encrypted on the client or the server, is transferred to the server, sometimes to a holding disk, sometimes directly to tape. I can be compressed on the server or the client.

In the end, the data is written to disk.

Every client that I have is backed up using Amanda. It just works.

In the olden days, I configured it to dump to physical tapes. If everything fits on one tape, great. If it didn’t, I could use multi tape systems or even tape libraries. The tape size limitations were removed along the way so that DLEs can be dumped across multiple tapes.

The backups are indexed, making it easy to recover particular files from any particular date.

More importantly, the instructions for recovering bare metal from backup are written to the tape.

Today, tapes are an expensive method of doing backups. It is cheaper to backup to disk, if your disks are capable of surviving multiple failures.

Old-Time Disks

You bought a disk drive; that disk drive was allocated as a file system at a particular mount point, ignoring MS DOS stuff.

Drives got bigger; we didn’t need multiple drives for our file systems. We “partitioned” our drives and treated each partition as an individual disk drive.

The problem becomes that a disk failure is catastrophic. We have data loss.

The fix is to dump each drive/partition to tape. Then if we need to replace a drive, we reload from tape.

Somebody decided it was a good idea to have digitized images. We require bigger drives. Even the biggest drives aren’t big enough.

Solution: instead of breaking one drive into partitions, we will combine multiple physical drives to create a logical drive.

In the alternative, if we have enough space on a single drive, we can use two drives to mirror each other. Then when one fails, the other can handle the entire load until a replacement can be installed.

Still need more space. We decide that a good idea is to use a Hamming code. By grouping 3 or more drives as a single logical drive, we can use one drive as a parity drive. If any drive fails, that parity drive can be used to reconstruct the contents of the missing drive. Things slow down, but it works, until you lose a second drive.

Solution: combine RAID-5 drives with mirroring. Never mind, we are now at the point where for every gigabyte of data you need 2 or more gigabytes of storage.

Enter Ceph and other things like it. Instead of building one large disk farm, we create many smaller disk farms and join them in interesting ways.

Now data is stored across multiple drives, across multiple hosts, across multiple racks, across multiple rooms, across multiple data centers.

With Ceph and enough nodes and locations, you can have complete data centers go offline and not lose a single byte of storage.

Amazon S3

This is some of the cheapest storage going. Pennies on the gigabyte. The costs come when you are making to many access requests. But for a virtual tape drive where you are only writing (free), it is a wonderful option.

You create a bucket and put objects into your bucket. Objects can be treated as (very) large tape blocks. This just works.

At one point I had over a terabyte of backups on my Amazon S3. Which was fine until I started to get real bills for that storage.

Regardless, I had switched myself and my clients to using Amazon S3 for backups.

Everything was going well until the fall of 2018. At that time I migrated a client from Ubuntu 16.04 to 18.04 and the backups stopped working.

It was still working for me, but not for them. We went back to 16.04 and continued.

20.04 gave the same results during testing; I left the backup server at 16.04.

We were slated to try 26.04 in 8 or so months.

Ceph RGW

The Ceph RGW feature set is similar to Amazon S3. It is so similar that you need to change only a few configuration parameters to switch from Amazon S3 to Ceph RGW.

With the help of Grok, I got Ceph RGW working, and the Amazon s3cmd worked perfectly.

Then I configured Amanda to use S3 style virtual tapes to my Ceph RGW storage.

It failed.

For two days I fought this thing, then with Grok’s help I got the configuration parameters working, but things still failed.

HTTP GETs were working, but PUTs were failing. Tcpdump and a bit of debugging, and I discovered that the client, Amanda, was preparing to send a PUT command but was instead sending a GET command, which failed signature tests.

Another two days before I found the problem. libcurl was upgraded going from Ubuntu 16.04 to 18.04. The new libcurl treated setting the method options differently.

Under old curl, you set the method you wanted to use to “1,” and you got a GET, PUT, POST, or HEAD. If you set GET to 0, PUT to 1, and POST/HEAD to 0, you get a PUT.

The new libcurl seems to override these settings. This means that you can have it do GET or HEAD but no other. GET is the default if everything is zero. Because of the ordering, you might get the HEAD method to work.

This issue has existed since around 2018. It is now 2025, and the fix has been presented to Amanda at least twice; I was the latest to do so. The previous was in 2024. And it still hasn’t been fixed.

I’m running my patched version, at least that seems to be working.

chaotic mess of network cables all tangled together

Even the simple things are hard

The battle is real, at least in my head.

My physical network is almost fully configured. Each data closet will have an 8-port fiber switch and a 2+4 port RJ45 switch. There is a fiber from the 8-port to router1 and another fiber from the 2+4 to router2. Router1 is cross connected to Router2.

This provides limited redundancy, but I have the ports in the right places to make seamless upgrades. I have one more 8-port switch to install and one more 2+4 switch to install, and all the switches will be installed.

This leaves redundancy. I will be running armored OM4 cables via separate routes from the current cables. Each data closet switch will be connected to 3 other switches. Router1 and two other data closets. When this is completed, it will mean that I will have a ring for the closets reaching back to a star node in the center.

The switches will still be a point of failure, but those are easy replacements.

If a link goes down, either by losing the fiber or the ports or the transceivers, OSPF will automatically route traffic around the down link. The next upgrade will be to put a second switch in each closet and connect the second port up on each NIC to that second switch.

The two switches will be cross-connected but will feed one direction of the star. Once this is completed, losing a switch will just cause a routing reconfiguration, and packets will keep on moving.

A side effect of this will be that there will be more bandwidth between closets. Currently, all nodes can dump at 10 gigabits to the location switch. The switch has a 160-gigabit backbone, so if the traffic stays in the closet, there is no bottleneck. If the traffic is sent to a different data closet, there is a 10-gigabit bottleneck.

Once the ring is in place, We will have a total of 30 gigabits leaving each closet.  This might make a huge difference.

That is the simple stuff.

The simpler stuff for me, is getting my OVN network to network correctly.

The gist, I create a logical switch and connect my VMs to it. Each VM creates an interface on the OVS internal bridge. All good. I then create a logical router. This router is attached to the logical switch. From the VM I can ping the VM, the router interface.

I then create another logical switch with a localnet port. We add the router to this switch as well. This gives the router two ports with different IP addresses.

From the VM I can ping the VM’s IP, the router’s IP on the VM network, and the router’s IP on the localnet.

What I can’t do is get the ovn-controller to create the patch in the OVS to move traffic from the localnet port to the physical netwrok.

I don’t understand why, and it is upsetting me.

Time to start the OVN network configuration process over again.

 

Learning new things

Another deranged asshole killed children at a school. 2 dead, 17 wounded. Nationwide headlines. The blood vultures leap to blame me for a shooting that took place more than a 1000 miles awy.

Meanwhile, CBS News is running a headline on August 28, 2025: “6 dead, 27 hurt in Chicago weekend shootings, police say.”
6 dead, 27 hurt in Chicago weekend shootings, police say

I would rather not deal with it today.

OpenStack

Over the last month, I’ve been dealing with somebody who has not kept up with the technology he is using. It shows. I like to learn new things.

For the last two years I’ve been working with two major technologies. Ceph and Open Virtual Networks. Ceph I feel I have a working handle on. Right now my Ceph cluster is down because of network issues, which I did to myself. OVN is another issue entirely.

A group of people smarter than I looked at networking and decided that instead of doing table lookups and then making decisions based on tables, they would create a language for manipulating the flow of packets, called “OpenFlow.”

This language could be implemented on hardware, creating very fast network devices. Since OpenFlow is a language, you can write routing functions as well as switching functions into the flows. You can also use it to create virtual devices.

The two types of virtual devices are “bridges” and “ports.” Ports are attached to bridges. OpenFlow processes a packet received on a port, called ingress, to move the packet to the egress port. There is lots going on in the process, but that is the gist.

The process isn’t impossible to do manually, but it isn’t simple, and it isn’t easy to visualize.

OVN adds virtual devices to the mix, allowing for simpler definitions and more familiar operations.

With OVN you create switches, routers, and ports. A port is created on a switch or router, then attached to something else. That something else can be virtual machines, physical machines, or the other side of a switch-router pair.

This is handled in the Northbound (NB) database. You modify the NB DB, which is then translated into a more robust flow language, which is stored in the Southbound (SB) database. This is done with the “ovn-north” process. This process keeps the two databases in sync with each other. Modifications to the NB DB are propagated into the SB DB and vice versa.

All of this does nothing for your actual networking. It is trivial to build all of this and have it “work.”

The thing that has to happen is that the SB database has to connect to the OpenvSwitch (OVS) database. This is accomplished via ovn-controller.

When you introduce changes to the OVS database, they are propagated into the SB database. In the same way, changes to the SB database cause changes to the OVS database.

When the OVS database is modified, new OpenFlow programs are created, changing the processing of packets.

To centralize the process, you can add the address of a remote OVN database server to the OVS database. The OVN processes read this and self-configure. From the configuration, they can talk to the remote database to create the proper OVS changes.

I had this working until one of the OVN control nodes took a dump. It took a dump for reasons, most of which revolved around my stupidity.

Because the cluster is designed to be self-healing and resilient, I had not noticed when two of the three OVN database servers stopped doing their thing. When I took that last node down, my configuration was stopped.

I could bring it back to life, but I’m not sure whether it is worth the time.

Now here’s the thing: everything I just explained comes from two or three very out-of-date web pages that haven’t been updated in many years. They were written to others with some understanding of the OVS/OVN systems. And they make assumptions and simplifications.

The rest of the information comes from digging things out of OpenStack’s networking component, Neutron.

I have a choice: I can continue down the path I am currently using, or I can learn OpenStack.

I choose to learn OpenStack.

First, it is powerful. With great power comes an even greater chance to mess things up. There are configuration files that are hundreds of lines long.

There are four components that I think I understand. The identity manager, Keystone. This is where you create and store user credentials and roles. The next is the storage component, Glance. This is where your disk images and volumes are accessed. Then there is the compute component, named Nova, which handles building and configuring virtual machines. Finally there is the networking component, called neutron.

For the simple things, I actually feel like I have it mostly working.

But the big thing is to get OVN working across my Ceph nodes. That hasn’t happened.

So for today, I’ll dig and dig some more, until I’m good at this.

Then I’ll add another technology to my skill set.

flashlight, blackout, power failure, energy, energy crisis, night, dark, supply failure, catastrophe, power supply, power plant, nuclear power plant, oil, gas, natural gas, green energy, error, breakdown, failure, heating, electricity, report, flashlight, flashlight, blackout, blackout, blackout, blackout, blackout, failure

Power Outage

Today I was waiting for clients to get back to me. While I waited, I started installing OpenStack.

So far it has been going well. A few typos slowed things down. Errors are not always clear, but I am now at the point of installing neutron

This is the scary part. The terrifying part.

Neutron interfaces with Open Virtual Networking (OVN). This could be magical, or it could break everything.

OVN sits on top of Open vSwitch, providing configuration.

The gist is that you install OVS, then you add configuration options to the OVS database. This configuration instructs OVN how to talk to its databases.

Once OVN starts talking to its databases, it performs changes in the OVS database. Those changes affect how OVS routes packets.

The physical network is broken into subnets. This is a requirement for high-availability networking. As links go up and down, the network routes around the failures.

On the other hand, many of the tools I use prefer to be on a single network; subnets increase the complexity greatly. Because of this, I created overlay networks. One for block storage, one for compute nodes, and one for virtual machines.

Neutron could modify the OVN or OVS that brings my overlay networks down.

So I’m well into this terrifying process, and the power goes out. It was only out for a few minutes, but that was enough.

The network came back to life.

All but two servers came back to life. One needs a BIOS change to make it come up after a power failure.

One decided that the new drive must be a boot drive, so it tried to boot from that, failed, and just stopped.

All of that put me behind in research, so nothing interesting in the 2A front to report, even though there are big things happening.

The number of moving parts in a data center is almost overwhelming.

Network Maps

There was a time when I would stand up at a whiteboard and sketch an entire campus network from memory, including every network subnet, router, and switch.

Today, not only can I no longer hold all of that in my head, my whiteboards no longer exist.

In the first office I rented, I installed floor-to-ceiling whiteboards on all walls. I could write or draw on any surface.

I can remember walking into Max’s office with an idea, asking for permission to erase his whiteboard, and then drawing out or describing the idea or project. Maybe 30 minutes of drawing and discussing.

What surprised me was asking to erase my chicken scratches months later and being told, “No,” because they were still using it.

Regardless, today I need to draw serious network maps.

I have multiple routers between multiple subnets. Managed and unmanaged switches. Gateways and VPNs. I have an entire virtual network layered over the top of all of that to make different services appear to be on the same subnet.

Not to mention the virtual private cloud(s) that I run, the internal, non-routing networks.

It is just to much for me to do in my head.

Oh, here’s one that’s currently messing with me. I have a VPC. It has multiple gateways allowing access residing on different chassis in different subnets. I can’t figure out how to make it work today. Even though it was working yesterday.

I’ll be messing with networks for the next week to get things stabalized.