Site Status

ByAnonymous Web Angel(GFZ)

Oct 10, 2023

The site has not been as stable as I want it to be. We are experiencing a failure about once every 48-72hours. The outage normally lasts less than 5 minutes. Today it exceeded 5 minutes.

I know what the issue is. K8S is killing off parts of the infrastructure. Normally, it is the database engine.

When the database goes down, the site tells K8S that it is sick. This results in the 503 errors you might have seen.

The root cause is that K8S doesn’t think there are enough resources available and “reaps” something, normally the RDBMS.

The fix for this is to move from rook-ceph with an internal cluster to rook-ceph with an external cluster. The advantage of an external cluster is that it requires less resources within K8S, and I have better control over it.

I have created an external cluster within my own K8S test system. I’m in the process of documenting how to bring up a K8S external cluster. It isn’t working yet. I’ll get there.

2 thoughts on “Site Status”

it's just Boris says:

2023-10-11 at 06:39

Thanks, both for the effort and the explanation.

2
Bad Dancer says:

2023-10-11 at 08:25

I feel like a bullfrog someone is trying to explain the finer points of organic chemistry to but I appreciate the work ya do keeping this place of respite up.

2

Comments are closed.

Site Status

ByAnonymous Web Angel(GFZ)

Related

By Anonymous Web Angel(GFZ)

Related Post

Poking my head in the door…

Call for Submissions!

Expectations…

2 thoughts on “Site Status”

The Vine of Liberty