May 6: Machines API hitting failed hosts in SIN

May 6: Machines API hitting failed hosts in SIN (00:01UTC)

Some ListMachines API calls returned 500s—primarily for apps/orgs that had Machines in the sin region. This was caused by these API queries hitting failed hosts in SIN. For some context, when an organization has machines located in regions faraway from the one that handled the Machines API request, we replay / forward that request to their respective regions for more up-to-date information. At the time of the incident, one host in SIN appeared up (accepting TCP connections) but responded every connection attempt with a reset. fly-proxy, our load-balancer component, had an independent bug that prevented it from treating these requests as retryable. Cordoning that host mitigated the incident, and the fly-proxy bug has also been fixed since.