Server room data center with rows of server racks. 3d illustration

I’ve tried drawing network maps a half-dozen times. I’ve failed. It should be simple, and I’m sure there are tools that can do it. I just don’t know them, or worse, I don’t know how to use the tools I currently have.

In simple terms, I have overlay and underlay networks. Underlay networks are actual physical networks.

An overlay network is a network that runs on top of the underlay/physical network. For example, tagged VLANs, or in my case, OVN.

OVN creates virtual private cloud. A powerful concept when working with cloud computing. Each VPC is 100% independent of every other VPC.

As an example, I have a VPC for my Ceph data plane. It is on the 10.1.0.0/24 network. I can reuse 10.1.0.0/24 on any other VPC with zero issues.

The only time there is an issue is when I need routing.

If I have a VPC with node 172.31.1.99 and a gateway of 172.31.1.1, that gateway performs network address translation before the traffic is sent to the internet. If the node at 172.31.1.99 wants to talk to the DNS server at 8.8.8.8 traffic is routed to 172.31.1.1 and from there towards the internet. The internet responds, the traffic reaches 172.31.1.1 and is forwarded to 172.31.1.99.

All good.

If I have VPC2 with a node at 192.168.99.31 and its gateway at 192.168.99.1, I can route between the two VPCs using normal routing methods by connecting VPC and VPC2. We do this by creating a connection (logical switch) that acts as a logical cable. We then attach gateway 172.31.1.1 to that network at 192.168.255.1 and the gateway at 192.168.99.31 as 192.168.255.2.

With a quick routing table entry, traffic flows between the two.

But if VPC2 was also using 172.31.1.0/24 then there is no way to send traffic to VPC. Any traffic generated would be assumed to live in that VPC. No router would become involved. And NAT will not help.

Why use an overlay network? It allows for stable virtual network, even if the underlay network is modified. Consider a node at 10.1.0.77. It has a physical address of 192.168.22.77. But because it needs to be moved to a different subnet, its physical address changes to 192.168.23.77.

Every node that had 192.168.22.77 within its configurations now needs to be updated. If the underlay is updated, it does not affect the overlay.

Back to Simple.

There are three methods for traffic to enter a VPC. The first is for a virtual machine to bind to the VPC. The second is for a router to move traffic into the VPC, sometimes from the physical network. And the final method is for a host (bare metal) machine to have a logical interface bound to the VPC.

My Ceph nodes use the last method. Each ceph node is directly attached to the Ceph VPC.

It is the gateway that is interesting. A localnet logical port can bind to a port on a host, called a chassis. When this happens, the port is given an IP address on the physical network that it binds to.

When the routing is properly configured, traffic to the VPC is routed to the logical router. This requires advertising the logical router in the router tables.

I had multiple VMs running on different host servers. They all sent traffic to the router which was bound to my primary machine. My primary machine has physical and logical difference from the rest of the host nodes.

What this meant was that traffic to the VPC was flaky.

Today, I simplified everything. I turned down the BGP insertion code. I added a single static route where it belonged. I moved the chassis to one of the “normal” chassis.

And everything just worked.

It isn’t dynamic, but it is working.

I’m happier.

Leave a Reply

Your email address will not be published. Required fields are marked *