--- title: Fly GPUs quickstart layout: docs toc: false nav: firecracker --- <div class="warning icon"> **GPUs are deprecated and will be unavailable after August 1.** </div> 1. You can use any base image for your Dockerfile, but it is convenient to base it on `ubuntu:22.04` and install libraries from NVIDIA's official apt repository: `RUN apt install -y cuda-nvcc-12-2 libcublas-12-2 libcudnn8` is usually enough. <div class = "note icon"> **Notes**: - Do not install meta packages like: `cuda-runtime-*` - `cuda-libraries-12-2` is good, but a bulky start. Once you know what libs are needed at build and runtime, please pick accordingly to optimize final image size. - Use multi-stage docker builds as much as possible. </div> 1. From flyctl, create an app using either `fly launch` or `fly apps create`. <div class="note icon"> **Note**: GPUs are not available in all regions. There are these GPU types available: Nvidia A10, L40S, A100-PCIe-40GB, and A100-SXM4-80GB. <br>Currently GPUs are available in the following regions: - `a10`: `ord` - `l40s`: `ord` - `a100-40gb`: `ord` - `a100-80gb`: `iad`, `sjc`, `syd`, `ams` </div> 1. Create or modify the `fly.toml` config file in the project source directory, replacing values with your own: ```toml app = "my-gpu-app" primary_region = "ord" vm.size = "a100-40gb" # Use a volume to store LLMs or any big file that doesn't fit in a Docker image [[mounts]] source = "data" destination = "/data" [http_service] internal_port = 8080 auto_stop_machines = false ``` <div class="note icon"> **Notes**: - Make sure `vm.size` is set in `fly.toml`, valid values are `a10`, `l40s`, `a100-40gb` and `a100-80gb`. - Make sure to include a `[[mounts]]` section in `fly.toml`. - The volume gets created automatically by `fly deploy`. - Use the volume to store the models and large files that can't be shipped as a docker image. </div> 1. Deploy your app: ```cmd fly deploy ``` That's pretty much it to get an app running with a Machine on a GPU. ## Volumes and GPU Machines <div class="important icon"> **Important**: If you create any additional volumes, they need to be created with the same constraints as your Machine. </div> Here's an example of creating a new one hundred gigabyte volume for storing ML models in the `ord` region, on a machine with a GPU: ```cmd fly volumes create models \ --size 100 \ --vm-gpu-kind a100-40gb \ --region ord ``` Example Dockerfile: ``` FROM ubuntu:22.04 as base RUN apt update -q && apt install -y ca-certificates wget && \ wget -qO /cuda-keyring.deb https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.1-1_all.deb && \ dpkg -i /cuda-keyring.deb && apt update -q FROM base as builder RUN apt install -y --no-install-recommends git cuda-nvcc-12-2 RUN git clone --depth=1 https://github.com/nvidia/cuda-samples.git /cuda-samples RUN cd /cuda-samples/Samples/1_Utilities/deviceQuery && \ make && install -m 755 deviceQuery /usr/local/bin FROM base as runtime #RUN apt install -y --no-install-recommends libcudnn8 libcublas-12-2 COPY --from=builder /usr/local/bin/deviceQuery /usr/local/bin/deviceQuery CMD ["sleep", "inf"] ``` ## Examples using Fly GPUs - Elixir Llama2-13b on Fly GPUs: https://gist.github.com/chrismccord/59a5e81f144a4dfb4bf0a8c3f2673131 - Github fly-apps repos with the `gpu` topic: https://github.com/orgs/fly-apps/repositories?q=topic%3Agpu - Fly.io CUDA example: https://gist.github.com/dangra/f8123001fe0f2453a8cd638b89738465 - Deploying CLIP on Fly.io: https://gist.github.com/simonw/52c7734e34cac2b26ea1378845674edc

Fly GPUs quickstart

GPUs are deprecated and will be unavailable after August 1.

You can use any base image for your Dockerfile, but it is convenient to base it on ubuntu:22.04 and install libraries from NVIDIA’s official apt repository: RUN apt install -y cuda-nvcc-12-2 libcublas-12-2 libcudnn8 is usually enough.
Notes:
- Do not install meta packages like: cuda-runtime-*
- cuda-libraries-12-2 is good, but a bulky start. Once you know what libs are needed at build and runtime, please pick accordingly to optimize final image size.
- Use multi-stage docker builds as much as possible.
From flyctl, create an app using either fly launch or fly apps create.
Note: GPUs are not available in all regions. There are these GPU types available: Nvidia A10, L40S, A100-PCIe-40GB, and A100-SXM4-80GB.

Currently GPUs are available in the following regions:
- a10: ord
- l40s: ord
- a100-40gb: ord
- a100-80gb: iad, sjc, syd, ams
Create or modify the fly.toml config file in the project source directory, replacing values with your own:
```
app = "my-gpu-app"
primary_region = "ord"
vm.size = "a100-40gb"

# Use a volume to store LLMs or any big file that doesn't fit in a Docker image
[[mounts]]
source = "data"
destination = "/data"

[http_service]
internal_port = 8080
auto_stop_machines = false
```
Notes:
- Make sure vm.size is set in fly.toml, valid values are a10, l40s, a100-40gb and a100-80gb.
- Make sure to include a [[mounts]] section in fly.toml.
- The volume gets created automatically by fly deploy.
- Use the volume to store the models and large files that can’t be shipped as a docker image.
Deploy your app:
```
fly deploy
```

That’s pretty much it to get an app running with a Machine on a GPU.

Volumes and GPU Machines

Important: If you create any additional volumes, they need to be created with the same constraints as your Machine.

Here’s an example of creating a new one hundred gigabyte volume for storing ML models in the ord region, on a machine with a GPU:

fly volumes create models \
  --size 100 \
  --vm-gpu-kind a100-40gb \
  --region ord

Example Dockerfile:

FROM ubuntu:22.04 as base
RUN apt update -q && apt install -y ca-certificates wget && \
    wget -qO /cuda-keyring.deb https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.1-1_all.deb && \
    dpkg -i /cuda-keyring.deb && apt update -q

FROM base as builder
RUN apt install -y --no-install-recommends git cuda-nvcc-12-2
RUN git clone --depth=1 https://github.com/nvidia/cuda-samples.git /cuda-samples
RUN cd /cuda-samples/Samples/1_Utilities/deviceQuery && \
    make && install -m 755 deviceQuery /usr/local/bin

FROM base as runtime
#RUN apt install -y --no-install-recommends libcudnn8 libcublas-12-2
COPY --from=builder /usr/local/bin/deviceQuery /usr/local/bin/deviceQuery
CMD ["sleep", "inf"]

Examples using Fly GPUs

Elixir Llama2-13b on Fly GPUs: https://gist.github.com/chrismccord/59a5e81f144a4dfb4bf0a8c3f2673131
Github fly-apps repos with the gpu topic: https://github.com/orgs/fly-apps/repositories?q=topic%3Agpu
Fly.io CUDA example: https://gist.github.com/dangra/f8123001fe0f2453a8cd638b89738465
Deploying CLIP on Fly.io: https://gist.github.com/simonw/52c7734e34cac2b26ea1378845674edc

or Open in ChatGPT

Report an issue or edit this page on GitHub