Elixir, Erlang, and really just the BEAM has a feature called "Observer". It's fun showing it to people new to Elixir because it's just so cool! It's a WxWidgets graphical interface that connects in realtime to a running Erlang node and lets you "observe" what's going on. It has some limited ability to modify things as well, most notably you can kill running processes. This can help when something is misbehaving or you just want to play "chaos monkey" and kill parts of the system to see how it recovers.
This picture shows a process tree for the application. Using this I can inspect individual processes or even kill them!
One very cool way to run Observer is to run it on your local machine (which has the ability to display the UI) and connect to a production server (with no windowing UI available) and "observe" it from a distance. So yeah... have a problem in production? Not sure what's going on? You can tunnel in, crack the lid and poke, prod, and peek around to see what's going on.
The Fly platform makes it easy to do this for your own applications!
What We Will Do
Fly.io natively supports WireGuard, Jason Donenfeld’s amazing VPN protocol. If you’ve ever lost hours of your life trying to set up an IPSec VPN, you’ll be blown away by how easy WireGuard is. It’s so flexible and performant that Fly uses it as our network fabric. And it’s supported on every major platform, including macOS, iOS, Windows, and Linux. What that means for you is that if your app runs on Fly, you can open a secure, private, direct connection from your dev machine to your production network, in less time than it took me to write this paragraph. Cool, right?
This is what we're going to do.
We will bring up a secure WireGuard tunnel that links to your servers on Fly. In this graphic, there are two
my_app Elixir nodes clustered together running on Fly.
From the local machine, we can open an
IEx terminal configured to join that cluster of remote Elixir nodes. Our local machine supports running Observer and drawing the UI. We use our local observer to talk to the remote nodes in the cluster!
Making It Happen
To test this out, I follow this guide and apply the changes to the multi-region Tic-Tac-Toe game created here. The github repo for the project is here.
Here's what we do:
- Configure an Elixir release to use a cookie value we provide.
- Setup WireGuard for Fly. This is a VPN technology that let's us directly connect to the production private network.
- Create a simple script to launch Observer for us.
- Launch Observer and explore!
Again, follow the guide here for a step-by-step breakdown of how to do it for your project.
When Elixir nodes are clustered together and running in different regions, Observer can connect to any node in the cluster.
After making the changes to the TicTac project and deploying it to multiple regions, let's see what it looks like.
... Instances ID VERSION REGION DESIRED STATUS HEALTH CHECKS RESTARTS CREATED 79510f86 17 fra run running 1 total, 1 passing 0 23m39s ago df93ea35 17 lax(B) run running 1 total, 1 passing 0 24m3s ago
I have the game scaled out to two regions. One is running in
fra (Frankfurt, Germany) and the other is running in
lax (Los Angeles, California (US)).
When I open Observer locally, I see two remote instances of
Exploring My App
I start playing a game on the web and use Observer to browse around and find the game. It's highlighted in blue and linked from
Using Observer, I can double-click the selected game process and even view the GenServer's state. This gives a snapshot of the state at the time I double-clicked it. I highlighted in yellow some interesting parts of the game state.
If you're wondering why the state data looks strange (at least different from Elixir), it's because that's the Erlang representation of those data types.
Playing Chaos Monkey
Something fun you can do with Observer is identify processes, inspect them, and even kill them. This can help when something is misbehaving and you want to see more about what's going on. This can also help you test how your system recovers from unexpected failures.
I've already identified the running game process. By right-clicking it I see I have the option to "kill" it. What will happen when a game server dies?
After killing the game process, I see that a new process was immediately started. How can I tell? The PID (Process ID) value is different.
So, it looks like my system isolates the damage in that no other running games are impacted when one crashes. Yay! My system stays up and running!
From the user's perspective playing the game, it's not so graceful. The player is able to recover but it requires them to reload their page or restart the game joining processing. I see that my UX can be improved to make crash recovery better for the user.
That was a productive experiment!
Now You Try
Deploy a Phoenix application to Fly, setup your WireGuard VPN, and start observing your app in production!