Connecting Elixir Containers, With DNS-Based Discovery

More and more of us are writing distributed software these days, and BeamVM languages like Elixir and Erlang are seeing growth as a result. OTP, the Erlang framework for building fault-tolerant distributed applications, provides many of the abstractions and means of combination that we need to write maintainable apps that can work together across a cluster of nodes.

However, when deploying containerized, getting these apps to find and connect with each other is harder than it needs to be.

If you've felt this pain, read on.

Setting up infrastructure can be easy

Setting up your infrastructure is easier every day. If you would like to spin up a production-ready cluster of 3 VMs that support the deployment of containerized applications, and you have a credit card, the following all fit the bill:

  • Kubernetes (works everywhere, with especially good dev ux on Google's cloud)
  • Convox (top-notch dev experience, Amazon's cloud only, with overlay container networking available using the Weave ECS AMI at setup)
  • Flynn (for many clouds), and others.

It's easy to get your application deployed and running on these solutions. But once the application is deployed -- let's say 3 instances running, one on each node -- how can those instances discover and connect to each other?

Setting up discovery can add complexity

"Use an external service discovery solution: etcd, discoverd...."

"Use the orchestrator's API, you know, 'Cloud Native' app!"

Both answers have their place, but both are disheartening for newcomers.

Both communicate that deploying distributed software involves either extending the platform or integrating with that platform. And both can involve significant yak shaves.

Widely-available overlay networks to the rescue!

Fortunately, there is a great approach that's widely supported: DNS-based service discovery.

Kubernetes, Weave and Flynn

Ie, from a container deployed on any of the aforementioned cluster solutions, something like this works:

container-sh$ nslookup my-app-service

  Address: 216.16.11.82    # IP addresses of the
  Address: 216.16.11.81    # containers backing this service
  Address: 216.16.11.79    #
  ...

This feature will inevitably be everywhere, because overlay networks seem to be a good fit for app orchestration, and once you're using one, adding DNS-based discovery is too big a win to ignore.

Kubernetes calls this out on their homepage:

No need to modify your application to use an unfamiliar service discovery mechanism. Kubernetes gives containers their own IP addresses and a single DNS name for a set of containers.

Flynn does something similar, again because it's a core part of how their stuff works:

Services are exposed via DNS to allow clients not aware of discoverd's HTTP API to find and communicate with each other.

Weave loves DNS-based service discovery:

Weave Net implements service discovery by providing a fast "micro DNS" server at each node. You simply name containers and everything ‘just works.’

This is a future I want to be a part of -- a world where IP-per-container and DNS-based discovery are simply table stakes.

Huh. So Can Elixir Do That?

Yes; in fact, since Erlang can resolve addresses, the technique is as simple as this:

$ iex --name app@127.0.0.1
iex> :inet_res.lookup('localhost',:in,:a) |>       #<- dns lookup  
  Enum.each( fn {a,b,c,d} ->                       #<- for each ipv4 addr
     IO.inspect Node.connect( :"app@#{a}.#{b}.#{c}.#{d}" ) #<- connect
  end)
true  
:ok

We do a dns lookup on a name, in this case 'localhost', and connect to each IP associated with it. In this case, there's only one (127.0.0.1). However, when deploying an app as a service, the name we look up would be whatever we've configured, as in this Kubernetes service definition:

kind: Service  
metadata: { name: router }  # <-- :inet_res.lookup('router', :in, :a)  
spec: { clusterIP: None }   # <-- 'headless' service  

Or this Convox/Weave docker-compose.yml example:

router:                     # <-- :inet_res.lookup('router', :in, :a)  
  build: apps/router
  ports:
    - 9100

And the returned IP addresses would be for all of the running containers that make up the Service.

Are There Libraries That Support That?

The Peerage cluster formation library comes with support for DNS-based discovery, using the Peerage.Via.Dns provider. (It also supports other providers, like UDP multicast or list-of-nodes, and it's easy to add your own).

With the :peerage package added to your app's mix.exs file, like so:

def application do
  [applications: [:peerage]] 
end
def deps do
  [{:peerage, "~> 0.3.1"}]
end

configuring dns-based discovery is just:

# prod.exs
config :peerage, via: Peerage.Via.Dns,  
  app_name: "my-app",             #  connect to 'my-app@CONTAINER_IP'
  dns_name: "my-app.cluster..."   #  for each IP of dns_name, periodically.

    # Aside: if you're using releases, `dns_name: "${DNS_NAME}"`
    # to make the dns name configurable, and you can test
    # the production release locally with DNS_NAME=localhost

Running local is easy

Of course, unless you've played with your system's DNS considerably, localhost only returns one IP address! Never fear. To test multi-node in the dev environment, the simplest is the 'List' provider:

# dev.exs
config :peerage, via: Peerage.Via.List, node_list: [  
  :"app_node1@127.0.0.1", :"app_node2@127.0.0.1"
]
$ iex --name app_node1@127.0.0.1 -S mix

[debug] Peerage.Server has attempted discovery. Results:

    NAME                  RESULT OF ATTEMPT
    app_node1@127.0.0.1   true
    app_node2@127.0.0.1   false

...

Rather test with multiple containers locally, in Docker? (Not all stacks need this. Kubernetes has minikube, where dns discovery will 'just work' with multiple nodes. But Convox doesn't have local scaling).

In that case, have a look at the UDP provider:

# dev.exs
config :peerage, via: Peerage.Via.Udp, serves: true  
$ iex --name app_node1@127.0.0.1 -S mix

[debug]   -> Peerage.Via.Udp sees: app_node1@127.0.0.1
[debug]   -> Peerage.Via.Udp sees: app_node1@127.0.0.1
[debug] Peerage.Server has attempted discovery. Results:

     NAME                  RESULT OF ATTEMPT
     app_node1@127.0.0.1   true
...    

UDP is allowed between docker containers on your machine, even without configuration (no need to link the containers), so docker run twice should give you connected nodes! And, of course, it works in other situations where multicast udp could be helpful.

Distribute the wealth

I hope this is helpful to somebody out there. It's a great time to write software, but some of the frontiers are 'obscured by clouds.'

The quality and availability of major abstractions continues to rise, and the means of combining them in a sane way is doing a fair job of keeping up. Let's not kid ourselves: our programming forebears would have loved to have all of this available.

At the same time, we can admit that there's a lot to take in.

Here's to the things that make it all a bit easier. Port 53, you're one of the good ones.

Happy clustering. :)