Tips to fine-tune and (not) benchmark your app on Fly.io

It’s safe to assume Fly.io can support thousands of requests or connections per second and can keep hundreds of thousands of them live at any given moment. This is even truer when the clients connect from multiple geographic locations.

That settled, you can still use the following guidelines to gather data to fine-tune and optimize your app and configuration to run on Fly.io.

Fly.io metrics

We have various ways to access metrics for the Fly Proxy, your app, and Fly Machines. Learn more about Fly Metrics.

  • Edge HTTP response time is the total time it takes to respond with the first HTTP response headers.
  • App HTTP response time is the same, but only considers your app’s response time, not the overhead of routing, load balancing, retries, and so on.

CPU and memory resource usage are interesting, but are rarely the culprit unless you have everything else in your benchmark right. The application logic often is often the issue when it comes to excessive CPU and memory usage.

Routing

Check which Fly.io region you’re reaching.

The region you reach should be the fastest region for you, which may not be the closest geographically. You can use the debug.fly.dev server to check your region:

curl debug.fly.dev

Use traceroute or mtr to see the routing to Fly.io.

For example:

traceroute debug.fly.dev

Check the base latency for the region you’re reaching:

ping debug.fly.dev

Important: Do you have a stable internet connection? ping can jitter quite a bit on bad routes.

Application logic and performance

Some common culprits of excessive CPU and memory usage are:

  • Using a single thread
  • Blocking
  • Various misconfigurations, for example, wrong or missing environment variables

HTTP response times

Fly.io HTTP response time metrics measure how long it takes for your app to start responding with headers, not the response body.

Multithreading

If your app has multiple threads, make sure that contention isn’t an issue. You can also scale CPU resources as needed.

Fly.io configuration

Some configuration settings that are more likely to affect app performance and benchmarking.

Machine sizing

Right-size Machines to use more parallel processing. Learn more about Machine sizing.

Concurrency settings for connections or requests

Refer to our guidelines for concurrency settings.

Auto stop and start feature

Auto stop and start can be turned off if you want to test the performance of your app without that variable.

If you don’t turn off auto stop and start, then you’ll likely have some pretty high p99 values due to the initial cost of the Fly Proxy queueing connections while waiting for the Machine to start.

Distributed systems

Fly.io has rate limits and hard limits outside of the control of each app; these limits are per edge and per service. If you test from a single location, then you’ll hit our rate limits much faster than if you test from multiple locations.

Pay attention to how your benchmarking nodes are routing. Inefficient routing adds latency and will skew your results. If incorrect routing is causing issues with your app testing or your production apps, then post in community or contact support (depending on you plan) and we’ll look into it with our network providers.

Challenges with benchmarking

You probably won’t be able to realistically benchmark your app for a combination of reasons, including:

  • Users will have a large variety of ISPs and be routed differently based on ad hoc peering agreements, sometimes in non-optimal ways.
  • It’s difficult to remove the Anycast routing factor from benchmarking; you have to live with it.

In some cases, a better option might be to send part of your real traffic to Fly.io and then compare metrics.