It was just a little issue…

It is 2100 and after 6 hours of working with our cloud provider, everything is back.

There was a hardware glitch that caused a node to fail. The website automatically moved to a new node and attempted to restart. Unfortunately, that hardware glitch caused the cluster to believe that the node was still there and still working. Since it was there and working, none of the resources (disk space) used by GFZ was released.

Because the resource did not release, the website on the new node would not start.

Linode took 8 calls from me, 22 ticket updates and worked the entire 6 hours to get things working again.

I’m sorry the site was down for so long. I’m working with Linode management to make sure it doesn’t happen again. Furthermore, I’m also looking at options for shared file systems so that a pod can move from node to node seamlessly.

AWA


Comments

4 responses to “It was just a little issue…”

  1. RicktheBear Avatar
    RicktheBear

    No worries. As they say in respiratory therapy: spit happens.

  2. RufusJ Avatar
    RufusJ

    Thanks for doing all of this! You are appreciated!!

  3. B.Zh Avatar

    yikes. galactic bit plumbing!

  4. My Man, Never had a doubt.