May 16: FRA managed Postgres control-plane outage
#May 16: FRA managed Postgres control-plane outage (12:23UTC)
Managed Postgres clusters in the FRA region became intermittently unreachable after the regional Kubernetes control plane (FKS) got overloaded/stuck and the Kubernetes API began timing out. Because Patroni uses Kubernetes for coordination in this setup, those API failures prevented clusters from reliably determining primary/replica state, causing widespread connection failures and flapping health. We recovered by reducing resource pressure, defragmenting the affected etcd instances, restoring the control plane’s ability to reconcile, and then repairing clusters one-by-one.