Fly GPUs quickstart

GPUs are deprecated and will be unavailable after August 1.

  1. You can use any base image for your Dockerfile, but it is convenient to base it on ubuntu:22.04 and install libraries from NVIDIA’s official apt repository: RUN apt install -y cuda-nvcc-12-2 libcublas-12-2 libcudnn8 is usually enough.

    Notes:

    • Do not install meta packages like: cuda-runtime-*
    • cuda-libraries-12-2 is good, but a bulky start. Once you know what libs are needed at build and runtime, please pick accordingly to optimize final image size.
    • Use multi-stage docker builds as much as possible.
  2. From flyctl, create an app using either fly launch or fly apps create.

    Note: GPUs are not available in all regions. There are these GPU types available: Nvidia A10, L40S, A100-PCIe-40GB, and A100-SXM4-80GB.


    Currently GPUs are available in the following regions:

    • a10: ord
    • l40s: ord
    • a100-40gb: ord
    • a100-80gb: iad, sjc, syd, ams
  3. Create or modify the fly.toml config file in the project source directory, replacing values with your own:

    app = "my-gpu-app"
    primary_region = "ord"
    vm.size = "a100-40gb"
    
    # Use a volume to store LLMs or any big file that doesn't fit in a Docker image
    [[mounts]]
    source = "data"
    destination = "/data"
    
    [http_service]
    internal_port = 8080
    auto_stop_machines = false
    

    Notes:

    • Make sure vm.size is set in fly.toml, valid values are a10, l40s, a100-40gb and a100-80gb.
    • Make sure to include a [[mounts]] section in fly.toml.
    • The volume gets created automatically by fly deploy.
    • Use the volume to store the models and large files that can’t be shipped as a docker image.
  4. Deploy your app:

    fly deploy
    

That’s pretty much it to get an app running with a Machine on a GPU.

Volumes and GPU Machines

Important: If you create any additional volumes, they need to be created with the same constraints as your Machine.

Here’s an example of creating a new one hundred gigabyte volume for storing ML models in the ord region, on a machine with a GPU:

fly volumes create models \
  --size 100 \
  --vm-gpu-kind a100-40gb \
  --region ord

Example Dockerfile:

FROM ubuntu:22.04 as base
RUN apt update -q && apt install -y ca-certificates wget && \
    wget -qO /cuda-keyring.deb https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.1-1_all.deb && \
    dpkg -i /cuda-keyring.deb && apt update -q

FROM base as builder
RUN apt install -y --no-install-recommends git cuda-nvcc-12-2
RUN git clone --depth=1 https://github.com/nvidia/cuda-samples.git /cuda-samples
RUN cd /cuda-samples/Samples/1_Utilities/deviceQuery && \
    make && install -m 755 deviceQuery /usr/local/bin

FROM base as runtime
#RUN apt install -y --no-install-recommends libcudnn8 libcublas-12-2
COPY --from=builder /usr/local/bin/deviceQuery /usr/local/bin/deviceQuery
CMD ["sleep", "inf"]

Examples using Fly GPUs