Observing Elixir in Production

Fly networking lets you VPN in and run Observer directly in production. Deploy your Elixir application and try it out!

Elixir, Erlang, and really just the BEAM has a feature called “Observer”. It’s fun showing it to people new to Elixir because it’s just so cool! It’s a WxWidgets graphical interface that connects in realtime to a running Erlang node and lets you “observe” what’s going on. It has some limited ability to modify things as well, most notably you can kill running processes. This can help when something is misbehaving or you just want to play “chaos monkey” and kill parts of the system to see how it recovers.

This picture shows a process tree for the application. Using this I can inspect individual processes or even kill them!

Running GameServer highlighted

One very cool way to run Observer is to run it on your local machine (which has the ability to display the UI) and connect to a production server (with no windowing UI available) and “observe” it from a distance. So yeah… have a problem in production? Not sure what’s going on? You can tunnel in, crack the lid and poke, prod, and peek around to see what’s going on.

The Fly platform makes it easy to do this for your own applications!

What We Will Do

Fly.io natively supports WireGuard, Jason Donenfeld’s amazing VPN protocol. If you’ve ever lost hours of your life trying to set up an IPSec VPN, you’ll be blown away by how easy WireGuard is. It’s so flexible and performant that Fly uses it as our network fabric. And it’s supported on every major platform, including macOS, iOS, Windows, and Linux. What that means for you is that if your app runs on Fly, you can open a secure, private, direct connection from your dev machine to your production network, in less time than it took me to write this paragraph. Cool, right?

This is what we’re going to do.

WireGuard observer connection

We will bring up a secure WireGuard tunnel that links to your servers on Fly. In this graphic, there are two my_app Elixir nodes clustered together running on Fly.

From the local machine, we can open an IEx terminal configured to join that cluster of remote Elixir nodes. Our local machine supports running Observer and drawing the UI. We use our local observer to talk to the remote nodes in the cluster!

Making It Happen

To test this out, I follow this guide and apply the changes to the multi-region Tic-Tac-Toe game created here. The github repo for the project is here.

Here’s what we do:

  1. Configure an Elixir release to use a cookie value we provide.
  2. Setup WireGuard for Fly. This is a VPN technology that let’s us directly connect to the production private network.
  3. Create a simple script to launch Observer for us.
  4. Launch Observer and explore!

Again, follow the guide here for a step-by-step breakdown of how to do it for your project.

Multi-Region Support?

When Elixir nodes are clustered together and running in different regions, Observer can connect to any node in the cluster.

After making the changes to the TicTac project and deploying it to multiple regions, let’s see what it looks like.

fly status

79510f86 17      fra    run     running 1 total, 1 passing 0        23m39s ago
df93ea35 17      lax(B) run     running 1 total, 1 passing 0        24m3s ago

I have the game scaled out to two regions. One is running in fra (Frankfurt, Germany) and the other is running in lax (Los Angeles, California (US)).

When I open Observer locally, I see two remote instances of tictactoe!

Instances in different regions

Exploring My App

I start playing a game on the web and use Observer to browse around and find the game. It’s highlighted in blue and linked from GameRegistry.

Running GameServer highlighted

Using Observer, I can double-click the selected game process and even view the GenServer’s state. This gives a snapshot of the state at the time I double-clicked it. I highlighted in yellow some interesting parts of the game state.

Game state displayed

If you’re wondering why the state data looks strange (at least different from Elixir), it’s because that’s the Erlang representation of those data types.

Playing Chaos Monkey

Something fun you can do with Observer is identify processes, inspect them, and even kill them. This can help when something is misbehaving and you want to see more about what’s going on. This can also help you test how your system recovers from unexpected failures.

I’ve already identified the running game process. By right-clicking it I see I have the option to “kill” it. What will happen when a game server dies?

Kill process menu option

After killing the game process, I see that a new process was immediately started. How can I tell? The PID (Process ID) value is different.

Process restarted

So, it looks like my system isolates the damage in that no other running games are impacted when one crashes. Yay! My system stays up and running!

From the user’s perspective playing the game, it’s not so graceful. The player is able to recover but it requires them to reload their page or restart the game joining processing. I see that my UX can be improved to make crash recovery better for the user.

That was a productive experiment!

Now You Try

Deploy a Phoenix application to Fly, setup your WireGuard VPN, and start observing your app in production!

Fly makes using Observer easy

With Elixir you can build resilient systems! Fly makes observing them easy. What will you find digging around in the state pile?

Deploy your Elixir App!