Whisper

Since you are now officially a Fly.io expert, let’s dive right in and run the following two commands in the project directory of your deep dive demo app:

fly launch --attach --from https://github.com/rubys/cog-whisper
fly deploy

Next try capturing a new audio clip. It will be transcribed automatically by OpenAI Whisper, an automatic speech recognition (ASR) program that converts speech into text.

What just happened

You provisioned a new Machine, this time with an L40S GPU. It will stop when not in use. It will restart when a new request comes in. It is only available on the private network using Flycast.

It runs OpenAI Whisper accessed via a COG interface.

This process involves taking audio clips from Tigris, passing them to Whisper, and updating Postgres with the results. The Node code for this is about two dozen lines of code, and for Rails is about a dozen.

And, as always, there’s no lock in. You can opt to replace this with a machine hosted by Replicate or elsewhere.

Next: Recap the deep dive