April 28: Machines API bug caused `fly deploy` to create duplicate Machines

April 28: Machines API bug caused `fly deploy` to create duplicate Machines (23:41UTC)

On April 28th, some fly deploy runs incorrectly received an empty Machines list from the Machines API for apps that had already been deployed. When that happened, flyctl created new Machines instead of updating the existing ones, resulting in duplicates.

This was tracked down to new affinity behavior in flaps (our Machines API service). This will likely make its way to a Fresh Produce near you sometime soon, but the abstract is that after some operations, API requests are replayed to the same flaps instance for a brief duration (while state propagates through Corrosion, our distributed database).

Another place flaps uses fly-replay is when fanning out to list Machines from multiple regions, where it used the replay itself as a signal to strictly return Machines local to its region. When this received a new affinity replay, it returned the response for a fanout replay instead (that is, it did not list Machines from other regions). So, if all your Machines were in yyz, but you had affinity with flaps in ord, flyctl would be given an empty list of Machines.

At 00:28 UTC on April 29th, we mitigated the issue by disabling app affinity in the API. Since then the bug has been fixed, but duplicate Machines created during the incident will persist until removed manually. Customers who ran fly deploy on flyctl v0.4.41 or v0.4.42 between April 28th and April 29th can run fly scale show to review their apps’ Machine counts. If more Machines appear than are wanted, they can be removed individually with fly machines destroy <id> --force.