Fly GPUs quickstart
GPUs are deprecated and will be unavailable after August 1.
You can use any base image for your Dockerfile, but it is convenient to base it on
ubuntu:22.04and install libraries from NVIDIA’s official apt repository:RUN apt install -y cuda-nvcc-12-2 libcublas-12-2 libcudnn8is usually enough.Notes:- Do not install meta packages like:
cuda-runtime-* cuda-libraries-12-2is good, but a bulky start. Once you know what libs are needed at build and runtime, please pick accordingly to optimize final image size.- Use multi-stage docker builds as much as possible.
- Do not install meta packages like:
From flyctl, create an app using either
fly launchorfly apps create.Note: GPUs are not available in all regions. There are these GPU types available: Nvidia A10, L40S, A100-PCIe-40GB, and A100-SXM4-80GB.
Currently GPUs are available in the following regions:a10:ordl40s:orda100-40gb:orda100-80gb:iad,sjc,syd,ams
Create or modify the
fly.tomlconfig file in the project source directory, replacing values with your own:app = "my-gpu-app" primary_region = "ord" vm.size = "a100-40gb" # Use a volume to store LLMs or any big file that doesn't fit in a Docker image [[mounts]] source = "data" destination = "/data" [http_service] internal_port = 8080 auto_stop_machines = falseNotes:- Make sure
vm.sizeis set infly.toml, valid values area10,l40s,a100-40gbanda100-80gb. - Make sure to include a
[[mounts]]section infly.toml. - The volume gets created automatically by
fly deploy. - Use the volume to store the models and large files that can’t be shipped as a docker image.
- Make sure
Deploy your app:
fly deploy
That’s pretty much it to get an app running with a Machine on a GPU.
Volumes and GPU Machines
Important: If you create any additional volumes, they need to be created with the same constraints as your Machine.
Here’s an example of creating a new one hundred gigabyte volume for storing ML models in the ord region, on a machine with a GPU:
fly volumes create models \
--size 100 \
--vm-gpu-kind a100-40gb \
--region ord
Example Dockerfile:
FROM ubuntu:22.04 as base
RUN apt update -q && apt install -y ca-certificates wget && \
wget -qO /cuda-keyring.deb https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.1-1_all.deb && \
dpkg -i /cuda-keyring.deb && apt update -q
FROM base as builder
RUN apt install -y --no-install-recommends git cuda-nvcc-12-2
RUN git clone --depth=1 https://github.com/nvidia/cuda-samples.git /cuda-samples
RUN cd /cuda-samples/Samples/1_Utilities/deviceQuery && \
make && install -m 755 deviceQuery /usr/local/bin
FROM base as runtime
#RUN apt install -y --no-install-recommends libcudnn8 libcublas-12-2
COPY --from=builder /usr/local/bin/deviceQuery /usr/local/bin/deviceQuery
CMD ["sleep", "inf"]
Examples using Fly GPUs
- Elixir Llama2-13b on Fly GPUs: https://gist.github.com/chrismccord/59a5e81f144a4dfb4bf0a8c3f2673131
- Github fly-apps repos with the
gputopic: https://github.com/orgs/fly-apps/repositories?q=topic%3Agpu - Fly.io CUDA example: https://gist.github.com/dangra/f8123001fe0f2453a8cd638b89738465
- Deploying CLIP on Fly.io: https://gist.github.com/simonw/52c7734e34cac2b26ea1378845674edc