Global Container Runtime

Fly is a global application runtime that runs your code close to users and scales compute in cities where your app is busiest. Write your code, package it into a Docker image, and deploy it to worldwide infrastructure that keeps it snappy.

Get started for freelaunch a container on Fly.

How it works

You can run most applications with a Dockerfile using the flyctl command. The first time you deploy an app, we assign it a global IP address. By default, apps listen for HTTP and HTTPS connections, though they can be configured to accept any kind of TCP traffic.

When users connect to your global IPs, we dynamically assign compute resources in datacenters closest to them. More users might create demand for more resources in multiple locations worldwide, while low-traffic applications may only require a small amount of resources in a single location.

Auto-scaling

Fly auto-scaling is designed to be simple and easy to understand. When more users connect to your app, we add CPUs and memory. By default, we allocate one CPU to the first 20 TCP connections, and then an additional CPU for every 20 TCP connections.

You can customize the concurrency threshold per application. If you're running a simple proxy service, you might want to allow 100 connections per CPU. If you're doing heavy CPU work, it might be best to run 4 concurrent connections per CPU.

Technical details

MicroVMs

Application code runs in Firecracker microVMs. These are lightweight, secure virtual machines based on strong hardware virtualization. Your workloads are safely isolated no matter where they’re running on our infrastructure.

MicroVMs provide strong hardware-virtualization-based security and workload isolation, this allows us to safely run applications from different customers on shared hardware.

We make a best effort attempt to dedicate hardware resources to only one microVM at a time. CPU cores, for instance, should only ever be doing work for one microVM so your apps don't have to contend with steal.

The virtualized applications run on dedicated physical servers with 8-32 physical CPU cores and 32-256GB of RAM.

Compute scaling

MicroVMs each get 1CPU and 1GB of memory by default. We scale up and down by adding or removing application processes, each in their own microVM.

We use "concurrent connections" to determine microVM capacity. By default, we allocate one microVM per 20 concurrent connections. This is a reasonably good number for most apps, but it is configurable for apps with different needs.

When a client connects, we send them to the nearest microVM with capacity. If the existing VMs are at capacity, we launch more in the busiest regions. When there are idle VMs, we shut them off.

BGP Anycast

We broadcast and accept traffic from ranges of IP addresses (both ipv4 and ipv6) in all our datacenters. When we receive a connection on one of those IPs, we match it back to an active customer application, and then proxy the TCP connection to closest available microVM.

Proxy

Every server in our infrastructure runs a Rust-based proxy named fly-proxy. The proxy is responsible for accepting client connections, matching them to customer applications, applying handlers (eg: TLS termination), and backhaul between servers.

Backhaul

If you have users in Dallas, and an available MicroVM in Chicago, we will accept traffic in Dallas, terminate TLS (unless you've disabled that handler), and then connect to your MicroVM over a Wireguard tunnel between datacenters. Wireguard allows us to pass along almost any kind of network connection with very little additional latency.