Using GPUs with FKS

Fly Kubernetes is in beta and not recommended for critical production usage. To report issues or provide feedback, email us at beta@fly.io.

GPUs are available on Fly Kubernetes. Information about our GPUs can be found in our Fly GPUs documentation.

GPUs are consumed by requesting a GPU resource, similarly to requesting cpu or memory. There is a custom resource gpu.fly.io/<gpu type> that is used to request GPUs. Note that:

The available GPU types are: a10, l40s, a100-pcie-40gb and a100-sxm4-80gb. Check the documentation for which regions they are available in.
Pods are deployed in the same region as your cluster. You can place your Pod in a particular region by adding a fly.io/region: <region> annotation to your Pod’s metadata.
GPU resource requests should only specify the limits section.
You can specify CPU and memory resources alongside GPU resources. The minimum number of cores supported is 2 and the minimum amount of memory is 4096 MiB. The VMs deployed will always be performance Machines.
The valid number of GPUs you can request are: 1, 2, 4 and 8.
You can only request one type of GPU at a time

Below is an example of a Pod with a GPU:

    apiVersion: v1
kind: Pod
metadata:
  name: ollama
  annotations:
    fly.io/region: ams # optional
spec:
  containers:
    - name: ollama
      image: ollama/ollama:latest
      resources:
        limits:
          gpu.fly.io/a100-80gb: 1

  

This will deploy a GPU Machine with the size a100-80gb with 1 GPU core in the region ams.

Report an issue or edit this page on GitHub