Not every Dependency is worth it.

Cute Elixir droplet getting a checkup at the bird doctor's office.
Image by Annie Ruygt

We’re Fly.io. We run apps for our users on hardware we host around the world. Fly.io happens to be a great place to run Phoenix applications. Check out how to get started!

Elixir has been quietly building out a truly impressive list of packages, and while a deep resource of packages that solve common or complex problems is good, for some people it’s not enough and want more. Which is to be expected, the Elixir community is not the same size as Python or JavaScript, we don’t have client libraries for every single service or startup or library. While a deep set of client libraries is handy for quick iterations, it can gradually turn into a liability. With a little thinking and time at the keyboard we can often come to a solution that fits our problem neatly with something we can fully understand.

As Gary Bernhardt recently posted on twitter

it is always a good reminder, that our dependencies are our problem.

In this post today, we’ll walk through how I evaluate dependencies and then discuss how I go about replacing them in my own projects.“

S3

At some point every application needs a place to put files either for long term backup or public consumption, and for the time being S3 is the place to do it. AWS API’s are fairly notorious for being a little obtuse to work with so the first instinct is to grab a package, time to search Hex.pm.

29 total packages and while this is specific to s3, if we search aws instead we’ll find 150 total packages and the currently best supported option of aws. Which is an Erlang project to autogenerate the AWS Library using aws-codegen into modules and functions we can call.

This is a red flag for me as in my experience I find the "code-gen” style libraries a little obtuse to work with, and usually poorly documented. Let’s give the benefit of the doubt though and give it a swing!

A good practice before adding a package is to dig into the dependency tree to make sure we know what we’re getting into.

  • aws_signature
  • finch (http library I chose)
    • castore
    • mime
    • mint
      • castore
      • hpax
    • nimble_options
    • nimble_pool
    • telemetry
  • hackney (optional not used)
  • jason
    • decimal

Most of these packages are fairly familiar and eventually are included is nearly every application as they grow. One exception is aws_signature, let’s take a peek into that quick. It has no dependencies which is great and the description is:

Request signature implementation for authorizing AWS API calls.

Which is great because that step is often one of the more confusing and complex parts of using the AWS API. Taking a quick pass over the code it looks like something I would write to sign some AWS code, and is well tested/documented. Great!

This has taken all of a couple minutes and we’ve learned a few interesting things:

  • aws uses code-gen for code and docs. Which is a great way for a consistent and up to date API, it’s not great from a UX Perspective.
  • The aws package appears to allow for configuration of HTTP Client’s which is a good sign, we’ll get in to why later.
  • The existence of the aws_signature library, made by the authors, is noteworthy. This is a handy library for when working with AWS, and can be an error prone process if done manually.

Luckily for us the authors of the aws library have S3 examples in the README! I’ll reproduce the code from their example below we can talk about it:

iex> client = AWS.Client.create("your-access-key-id", "your-secret-access-key", "us-east-1") |> AWS.Client.put_http_client({AWS.HTTPClient.Finch, []})
iex> file = File.read!("./tmp/your-file.txt")
iex> md5 = :crypto.hash(:md5, file) |> Base.encode64()
iex> AWS.S3.put_object(client, "your-bucket-name", "foo/your-file-on-s3.txt",
  %{"Body" => file, "ContentMD5" => md5})

Here we see them creating the AWS.Client, setting Finch to the http client, reading the file into memory, hashing it as an optional step and putting it up to their bucket with filename.

This is actually extremely good, in three lines we’re able to upload file, with minimal fanfare. Frankly, I was expecting way more configuration steps and navigating poorly written docs to get this far. If you decide to stop here and use aws for your S3/AWS needs, I believe you’d be making a reasonable and sound choice.

Let’s dig into the pros and cons of this library as I see it.

What has aws done well?

Avoids HTTP Client Bloat

One issue that arises as applications grow is what I like to call ‘http client bloat’. Every dependency that wraps a 3rd party service uses their own HTTP Client, with its own dependencies, connection pools, processes, telemetry and error handling. This makes configuration and observability very difficult.

aws is explicit about requiring you to choose your own client, giving you two sensible defaults and an escape hatch to include you own.

Bare bones API with few Custom Structs/API’s

Library authors of client dependencies love to add “idiomatic” data structures to their client’s that make their project “feel” more native. Often what happens is it obscures the underlying API and slowly drifts as the API changes and the author loses interest in maintaining every single little change. I won’t call anyone out specifically but there are several highly used Elixir client packages that commit this crime.

The authors to aws have done a good job here, the struct’s they created are generic to all aws services and the generated code makes use of simple map‘s for responses. This let’s you navigate the AWS docs and see the same key names you’d expect in the request and response. Without converting them to someone else’s hand made Structs.

What is so bad about code-gen?

From a package maintainer perspective it is fantastic. We can put out a fully functional client library, with functions and arguments and docs all validated at compile time, and with minimal effort.

From a user perspective, we can identify two problems: the generated code and documentation are often subpar in fitting the language, and the library suddenly includes the entire AWS ecosystem

To illustrate the issue of the generated code let’s look at a screenshot of the AWS hexdocs page:

While nothing is empirically bad here, it’s worth noting that, although I needed an S3 package, the entire AWS ecosystem is now part of my dependency tree! Every single API is here and even if I don’t use them I now download and compile them.

Finally the module docs in Elixir projects typically include examples, explanation, gotcha’s and advice for using the module, we don’t get that here. Outside of the README examples it’s mostly up to you to figure it out. One nice bit here is when you click into the function you will get the whole REST API Documentation from AWS for that function which is nice, not always helpful, but better than clicking around the AWS website!

Let’s take ownership

Thing’s happen and in the real world all project dependencies slowly become liabilities. Security issues creep up, maintainers burn out, bugs get introduced and general lack of budget towards maintaining all code catch up to us.

Let’s imagine we’ve had enough, and while this package is great, we want to take ownership of our very small usage of S3 into our own hands. The surface area is small and we know how to use an HTTP Library, Like Gary said

“Its not our fault, but it is our problem”

and if its our problem, lets own it.

Prior Art

Whenever I go down this path I like to take a minute and look for some prior art. And these days the best place to look can end up being the LiveBook project. I know they have S3 Integration and sure enough they have their own hand rolled client! Coming in at a paltry 360 lines of Elixir plus a bunch of code we don’t care about handling XML response parsing.

Zooming in the LiveBook team also use the aws_signature library and have a configurable http client. Further if we look at just the code we care about its under 70 total lines that do what we want! Let’s take a look at the core if the request function:

url = build_url(file_system, path, query)
headers = sign_headers(file_system, method, url, headers, body)
body = body && {"application/octet-stream", body}

result = request(method, url, headers: headers, body: body, timeout: timeout)

When building our own application this will the basis for our own client!

  1. Step one build the url which included bucket name + key and query string.
  2. Create the signed headers, which the aws_signature library does most of the heavy lifting for us there.
  3. And finally make the request!

Implementation

You won’t be surprised to know that I’m a big fan of the Elixir HTTP client Library Req. Req is a high level abstraction over top of Finch the slightly more barebones HTTP Client. It describes it’s self as the “batteries included” HTTP Client, and its what I’ll be using here.

def put_object(credentials, bucket, file_name, mime, file) do
  now = NaiveDateTime.utc_now() |> NaiveDateTime.to_erl()
  host = "https://#{bucket}.s3.amazonaws.com"
  url = "#{host}/#{file_name}"
  headers = [{"Host", host}, {"Content-Type", mime}]
  headers =
    :aws_signature.sign_v4(
      credentials.access_key_id,
      credentials.secret_access_key,
      credentials.region,
      "s3",
      now,
      "put",
      url,
      headers,
      file,
      uri_encode_path: false,
      session_token: Map.get(credentials, :token, nil)
    )

  Req.put(url, headers: headers, body: file)
end

And boom! We have a fully functional and “minimal” AWS.S3.put_object function! There are numerous ways we could improve this and make it more generic, support more options or functions but this is all we needed! If we need more, we have prior art to with LiveBook to build off of, if we don’t need more we’re done!

Wrap up

While this may feel like a contrived example I assure you it is not. Every HTTP-based API client library has its own idiosyncrasies, but once you get past the basics, you can implement any service API and gain full control with minimal fuss.

What dependencies have you included to only end up using one function? Could you re-implement that function yourself? Assuming the license is open could you maybe vendor JUST the functions you need?

Once we start down this path we will be asking ourselves these questions constantly. We are programmers and we can solve problems with code! We don’t simply need to “plumb” together several libraries, and we might benefit from it!

Look into your own codebases, logs, and error handling and find troubled dependencies that you can learn more about and possibly take ownership for!

Fly.io ❤️ Elixir

Fly.io is a great way to run your Phoenix LiveView apps. It’s really easy to get started. You can be running in minutes.

Deploy a Phoenix app today!