chaotic mess of network cables all tangled together

One Step Forward, ??? Steps Back

Networking used to be simple. It is unclear to me why I think that. Maybe because when I started all of this, it was simple.

Networks are broken down into two major classes, Point-to-Point (P2P) or broadcast. When you transmit on a P2P port, the data goes to a dedicated port on the other side of a physical link. There it comes out.

Each port is provided an IP address. A routing table tells the router which port to transmit on to reach a particular network. A router works in a store and forward procedure. It reads the entire packet from a port, then retransmits that packet, modified as needed, on a different port.

A broadcast network is one where multiple devices are connected to a single physical network. What is transmitted on the link is heard by all the other nodes on the same physical network.

Originally, that physical network was a switch. Your network card would connect to a switch, the switch then transmits everything it receives on one port to all other ports.

Switches could be connected to each other. The only requirement was that of time. The amount of time it takes for a packet to travel from one end of the physical network to the other was limited. If it took more time than that limit, the network became unstable.

This concept of everything going back to a single switch was expensive. The cabling was expensive, the switch was expensive, the network card was expensive. A working network started at around $50,000. $30K for the switch, $10K for each network card. Hundreds of dollars for cabling.

The original Internet protocol was only going to have addressing for 65,000 machines. How many machines would be network attached if each site required $50k just to get one or two machines hooked up. We compromised at 4 billion.

We are working on getting everything on IP version 6 with 18,446,744,073,709,551,616 IP addresses. I think somebody told me that that is enough addresses for every atom in the known universe to have an IPv6 address.

From those expensive switches, we moved to 2-base-10 and “thick” Ethernet. These had the same limitations, but the costs were starting to come down. Something around $1000 to get into thick net and a few hundred to get into thin net.

Routers were still expensive. With the advent of 10baseT, we saw costs drop again. You could get an Ethernet hub for under a hundred dollars. Routers were only a few thousand. The world was good.

The other day I purchased an 8 port 10 Gigabit router for under a hundred dollars. It has 160 Gigabit internal switching. This means it can move 10 Gigabit per second from and to every port.

It cost less than $35 for two fiber transceivers. It cost around $33 for an Intel-based NIC capable of 10 Gigabits.

This means that I can upgrade a server to 10 Gibibit capability for around $60. Not bad.

A Step Forward

My data center was rather small. It was set up as a single /23 (512 addresses) connected via L2 switches. The switches were all one Gigabit copper.

You can buy 10 Gigabit L2 switches, but they are either copper, with limited distances and a need for high-quality cabling, or they are expensive.

Moving to an L3 device got me a better price and more features.

Moving to an L3 router gave me some more options. One of the big ones is the ability to have multiple paths to each device to provide high availability.

This requires that each node have multiple network interfaces and multiple routers and switchers. With the routers being cross connected, with each node being able to handle multi-path communications.

This is the step forward.

A step backwards

This High Availability (HA) solution requires multi-path capabilities. This is not always available for every piece of software. I want to keep things simple.

A Solution

A solution is to move from a physical network with multiple paths and redundant capabilities to virtual networking.

Each node will have two physical network interfaces. The interfaces will route using OSPF. This is a quick response system that will find other paths if one link or router fails. This provides the HA I want for the network.

Each node will have two VPCs for the ceph cluster, one or more VPC for each container system, and one or more VPC for each VM cluster. A VPC is a “virtual private cloud” It is a virtual network with only allowed traffic.

You can have multiple networks on a single physical network. For example, you can have 192.168.0.0/24 be your “regular” subnet and 172.16.5.0/24 be your data plane subnet. A network interface configured as 192.168.0.7 will only “hear” traffic on subnet 192.168.0.0/24.

But you can configure a network interface to hear every packet. Allowing a node to “spy” on all traffic.

With a VPC, there is only subnet 192.168.0.0/24 on the one VPC and only 172.16.5.0/24 on the other. Packets are not switched from one VPC to the other. You need a router to move data from one VPC to another. And the two VPCs must have different subnets; otherwise the router doesn’t know what to do.

OVN Logical Switch

It turns out that a VPC is the same as an OVN logical switch. Any traffic on one logical switch is restricted to that switch. You need to send traffic to a logical router to get the traffic in or out of the VPC.

Since the traffic is going through a router, that router can apply many filters and rules to protect the VPC from leaking data or accepting unwanted data.

I configured 4 VPCs for testing. DMZ is part of the physical network. Any virtual port on the DMZ VPC is exposed to traffic on the physical network. This is how traffic can enter or exit the virtual clouds.

The second VPC is “internal”. This is a network for every physical node to exist. By using the internal VPC, each node can communicate with each other, regardless of the physical topology.

That was working.

There was a data plane VPC and a management VPC. Those VPCs were connected to the DMZ through a router. The router is distributed across multiple nodes. If one node goes down, the other node is ready to take up the traffic.

Falling way back

I now have a VPC for testing. The idea is to test everything extensively before moving any nodes to the virtual network. I need to be able to reboot any node and have everything still function.

The VPC came up perfectly. My notes made it easy to create the VPC and configure it.

The problem began when I added a router to the VPC.

Now I can’t get traffic to flow to the VPC.

WTF?


Comments

3 responses to “One Step Forward, ??? Steps Back”

  1. Jolie Avatar

    Are they using the same routing protocol? I mean that may be a stupid question but can’t hurt to check the basics, right? Says the person that’s more rusty than the Tin Woodsman at this. Oh and I’m sure there’s no way you have duplicate ips anywhere since you are using 198. and 172.

    1. This is turning into issues with cluster and which node has control of the databases. Interactions between OSPF in FFR and OVN configurations.

      Not 10 minutes after I finished this post. I had packets routing correctly.

      Today, DHCP came to life, which is a big one.

      I am now in the process of documenting everything I did, in hopes that I can reproduce.

      No IP duplication, this time. No MAC duplication, this time.

      The actual issue was that node5 had a static route in its OSPF configuration. It was being added(?) but it wasn’t working. Moving the static route to NODE4 and allowing it to advertise the route via OSPF, and the issue resolved itself.

  2. Straight Shootr Avatar
    Straight Shootr

    Thick net?

    “Yah, I vant to suck your bits!!”

    /ducks behind couch to avoid being hit by a terminator…