One Step Forward, Two Steps Back

One of the best tools I’ve discovered in my many years of computer work is AMANDA.

AMANDA is free software for doing backups. The gist is that you have an Amanda server. On schedule, the server contacts Amanda clients to perform disk backups, sending the data back to the server. The server then sends the data to “tapes”.

What makes the backup so nice is that it is configured for how long you want to keep live backups and then attempts to do it efficiently. My backups are generally for two years.

On the front side, you define DLEs. A DLE is a host and disk or filesystem to dump. There are other parameters, but that is the smallest DLE configuration.

Before the dump starts, the server gets an estimate for each DLE based on using no other backups, a full dump, or a partial dump or multiple partial dumps. Once it obtains this information, it creates a schedule to dump all the DLEs.

The data can be encrypted on the client or the server, is transferred to the server, sometimes to a holding disk, sometimes directly to tape. I can be compressed on the server or the client.

In the end, the data is written to disk.

Every client that I have is backed up using Amanda. It just works.

In the olden days, I configured it to dump to physical tapes. If everything fits on one tape, great. If it didn’t, I could use multi tape systems or even tape libraries. The tape size limitations were removed along the way so that DLEs can be dumped across multiple tapes.

The backups are indexed, making it easy to recover particular files from any particular date.

More importantly, the instructions for recovering bare metal from backup are written to the tape.

Today, tapes are an expensive method of doing backups. It is cheaper to backup to disk, if your disks are capable of surviving multiple failures.

Old-Time Disks

You bought a disk drive; that disk drive was allocated as a file system at a particular mount point, ignoring MS DOS stuff.

Drives got bigger; we didn’t need multiple drives for our file systems. We “partitioned” our drives and treated each partition as an individual disk drive.

The problem becomes that a disk failure is catastrophic. We have data loss.

The fix is to dump each drive/partition to tape. Then if we need to replace a drive, we reload from tape.

Somebody decided it was a good idea to have digitized images. We require bigger drives. Even the biggest drives aren’t big enough.

Solution: instead of breaking one drive into partitions, we will combine multiple physical drives to create a logical drive.

In the alternative, if we have enough space on a single drive, we can use two drives to mirror each other. Then when one fails, the other can handle the entire load until a replacement can be installed.

Still need more space. We decide that a good idea is to use a Hamming code. By grouping 3 or more drives as a single logical drive, we can use one drive as a parity drive. If any drive fails, that parity drive can be used to reconstruct the contents of the missing drive. Things slow down, but it works, until you lose a second drive.

Solution: combine RAID-5 drives with mirroring. Never mind, we are now at the point where for every gigabyte of data you need 2 or more gigabytes of storage.

Enter Ceph and other things like it. Instead of building one large disk farm, we create many smaller disk farms and join them in interesting ways.

Now data is stored across multiple drives, across multiple hosts, across multiple racks, across multiple rooms, across multiple data centers.

With Ceph and enough nodes and locations, you can have complete data centers go offline and not lose a single byte of storage.

Amazon S3

This is some of the cheapest storage going. Pennies on the gigabyte. The costs come when you are making to many access requests. But for a virtual tape drive where you are only writing (free), it is a wonderful option.

You create a bucket and put objects into your bucket. Objects can be treated as (very) large tape blocks. This just works.

At one point I had over a terabyte of backups on my Amazon S3. Which was fine until I started to get real bills for that storage.

Regardless, I had switched myself and my clients to using Amazon S3 for backups.

Everything was going well until the fall of 2018. At that time I migrated a client from Ubuntu 16.04 to 18.04 and the backups stopped working.

It was still working for me, but not for them. We went back to 16.04 and continued.

20.04 gave the same results during testing; I left the backup server at 16.04.

We were slated to try 26.04 in 8 or so months.

Ceph RGW

The Ceph RGW feature set is similar to Amazon S3. It is so similar that you need to change only a few configuration parameters to switch from Amazon S3 to Ceph RGW.

With the help of Grok, I got Ceph RGW working, and the Amazon s3cmd worked perfectly.

Then I configured Amanda to use S3 style virtual tapes to my Ceph RGW storage.

It failed.

For two days I fought this thing, then with Grok’s help I got the configuration parameters working, but things still failed.

HTTP GETs were working, but PUTs were failing. Tcpdump and a bit of debugging, and I discovered that the client, Amanda, was preparing to send a PUT command but was instead sending a GET command, which failed signature tests.

Another two days before I found the problem. libcurl was upgraded going from Ubuntu 16.04 to 18.04. The new libcurl treated setting the method options differently.

Under old curl, you set the method you wanted to use to “1,” and you got a GET, PUT, POST, or HEAD. If you set GET to 0, PUT to 1, and POST/HEAD to 0, you get a PUT.

The new libcurl seems to override these settings. This means that you can have it do GET or HEAD but no other. GET is the default if everything is zero. Because of the ordering, you might get the HEAD method to work.

This issue has existed since around 2018. It is now 2025, and the fix has been presented to Amanda at least twice; I was the latest to do so. The previous was in 2024. And it still hasn’t been fixed.

I’m running my patched version, at least that seems to be working.

ByChris Johnson

Old-Time Disks

Amazon S3

Ceph RGW

Related

By Chris Johnson

Related Post

Rhetoric

Court Games

Working with AI

The Vine of Liberty