When I started to babysit Cray Supercomputers it was just another step. Massive mainframe handling many users, doing many things.
But I quickly learned that there are ways of making “supercomputers” that don’t require massive mainframes. My mentor used to say, “Raytracing is embarrassingly parallel.”
What was meant by that is that every ray fired is completely independent of every other ray fired. His adjunct program rrt
was able to distribute work across 1000s of different compute nodes.
We were constantly attempting to improve our ability to throw more compute power at any problem we were encountering. It was always about combining more and more nodes to create more and more powerful compute centers.
Which moved the bottleneck. We went from being CPU starved to being memory starved to being network starved. So we added more network bandwidth until it all balanced out again. Until we bottlenecked on networks again.
After his passing, I did work with a company that supported multiple large corporations.
I was introduced to VMware. A virtualization framework.
Instead of taking “small” computers and joining them together to create larger computers, we were taking “medium” computers and breaking them into small virtual devices.
What is a virtual device
A virtual device is nominally a network interface, a virtual disk drive, or a compute instance.
To create a virtual computer (instance), you tell your vm manager to create a virtual drive, attach it to a virtual computer, attach a virtual DVD drive, allocate a virtual network interface, and boot.
The virtual drive can be a physical drive on the host computer. It can be a partition on a physical drive, it can be a file on the host computer, or it can be a network-attached drive.
If you attach from the host computer, you can only move the drive to other instances on the same computer.
If you attach a network-attached drive, you can only move the drive to other instances with access to the network-attached drive.
I use libvirt
for my virtual manager. If I expect the instance to stay on the same host, I use a file on the host computer. That is easy.
If I need to be able to migrate the virtual computer to different machines, I’ll use a Ceph Raw Block Device or a file on a shared filesystem.
What are the cons of using a virtual machine
It can be slower than a physical device. It doesn’t have to be, but sometimes it is.
While you can oversubscribe CPUs, you can’t oversubscribe memory. Memory is always an issue with virtual machines.
When the network isn’t fast enough, network-attached drives will feel slower.
And the big one: if the Network Attached Storage (NAS) fails, all instances depending on the NAS will also fail. Which is why I use Ceph. Ceph can survive multiple drive or node failures.
Another big con: if a host computer fails, it will cause all virtual computers running on that host to also fail.
What are the pros of using a virtual machine
It is trivial to provision virtual machines. There is an entire framework OpenStack
that does exactly this. Using OpenStack you can provision an instance with just a few simple commands.
You can migrate an instance from one host computer to another. Even if the disk drive is located on the host computer, it is possible to move the contents of that drive to another host computer.
If you are using a NAS, you can attach a virtual drive to an instance, work on it with that instance, then detach that virtual drive and attach it to a different instance. This means you don’t have to use over the wire data moves.
You can also increase the size of a virtual drive, and the instance can take advantage of more disk space without having to be rebooted or any downtime.
Besides increasing the size, we can attach new drives.
This means that storage management is much easier.
Virtual Networks
The host computer lives on one or more physical networks. The instances can be bridged onto that physical network.
The instance can also be protected behind a Network Address Translation (NAT) service. This gives complete outbound connectivity but requires extra configuration for inbound.
But an instance can be placed within a Virtual Private Cloud (VPC). A VPC provides the complete internet IP space to the instance (or instances).
This means that user A can have their instances on 192.168.100.x and user B can have their instances on 192.168.100.x with out collisions.
None of user A’s traffic appears in user B’s VPC.
VPCs can be connected to share with gateways. When this is done, all the VPCs must use non-overlapping subnets.
In other words, 192.168.100.1 on user A’s VPC cannot communicate with an instance on user B’s VPC at address 192.168.100.55.
But if user A agrees to use 192.169.100.x and user B agrees to use 192.168.99.x then the VPCs can be connected with a (virtual) router.
Using a VPC means that the user must use a gateway to talk to any other VPC or physical network. This places a NAT service in the gateway.
A physical address is assigned to the gateway, which forwards all traffic to one or more VPC IPs.
Conclusion
While every infrastructure manager (network manager) needs to know their VM Manager. They all work in similar ways. If you know the basics, the rest is just a matter of finding the correct button or command.
This stuff is easy once the infrastructure is set up.