GPUs on Fly.io are available to everyone!

A cartoon hot air balloon with a bundle of sandwiches to share with the world.
Image by Annie Ruygt

Fly.io makes it easy to spin up compute around the world, now including powerful GPUs. Unlock the power of large language models, text transcription, and image generation with our datacenter-grade muscle!

GPUs are now available to everyone!

We know you’ve been excited about wanting to use GPUs on Fly.io and we’re happy to announce that they’re available for everyone. If you want, you can spin up GPU instances with any of the following cards:

  • Ampere A100 (40GB) a100-40gb
  • Ampere A100 (80GB) a100-80gb
  • Lovelace L40s (48GB) l40s

To use a GPU instance today, change the vm.size for one of your apps or processes to any of the above GPU kinds. Here’s how you can spin up an Ollama server in seconds:

app = "your-app-name"
region = "ord"
vm.size = "l40s"

[http_service]
  internal_port = 11434
  force_https = false
  auto_stop_machines = true
  auto_start_machines = true
  min_machines_running = 0
  processes = ["app"]

[build]
  image = "ollama/ollama"

[mounts]
  source = "models"
  destination = "/root/.ollama"
  initial_size = "100gb"

Deploy this and bam, large language model inferencing from anywhere. If you want a private setup, see the article Scaling Large Language Models to zero with Ollama for more information. You never know when you have a sandwich emergency and don’t know what you can make with what you have on hand.

We are working on getting some lower-cost A10 GPUs in the next few weeks. We’ll update you when they’re ready.

If you want to explore the possibilities of GPUs on Fly.io, here’s a few articles that may give you ideas:

Depending on factors such as your organization’s age and payment history, you may need to go through additional verification steps.

If you’ve been experimenting with Fly.io GPUs and have made something cool, let us know on the Community Forums or by mentioning us on Mastodon! We’ll boost the cool ones.