elixir

Understanding Elixir’s GenStages - Querying the Blockchain

Miguel Palhas

Miguel Palhas on

Understanding Elixir’s GenStages - Querying the Blockchain

In this edition of Elixir Alchemy, we'll dive into Elixir's GenStage module. Along the way, we'll explain backpressure and we'll write a Genstage to query the blockchain. Let's start by discussing how using a GenStage can solve buffering problems.

What Is a GenStage?

Imagine you’re consuming data from an external source. That source could be anything “streamable” - such as reading a file line-by-line, a table in a database, or even a sequence of requests to a 3rd party API.

In such scenarios, where you need to stream data into your system, and probably do some processing on each data point, it’s common to use a buffer to read in a few items, process the whole batch, and then fetch a new set into the buffer. I remember, from the time I was learning C/C++, that this would be a common, although arguably naive way to do things.

With that approach, you may run into one of two problems: the buffer can get too small, or the buffer van get too large.

  1. Buffer Too Small This happens if you read too few items at a time. Since you’re switching back and forth between reading and processing items, there will be a performance cost from the task switching. In the example of reading a file, your hardware or Operating System may be reading more data than what you’re actually requesting, resulting in sub-optimal performance, in addition to having to fetch the same part of the file later on.

  2. Buffer Too Large In this case, you request too much from your data source. You may end up either creating a bottleneck (e.g. having to wait for your hard drive to read everything you requested), or not being able to process all the data in an efficient manner. If you’ve ever heard of a buffer overflow (a common performance and security concern), this is it. You’re reading more than what your system can keep up with, resulting in all kinds of problems, from performance to actual failures.

The Solution: Backpressure

The term backpressure refers to the behavior of a system that builds up input, then halts the receiving of new data once the buffer is full, resuming it once again when the system is ready to handle it.

This is the core idea behind Elixir’s GenStage.

GenStage

GenStage is an abstraction built on top of GenServer to provide a simple way to create a Producer/Consumer architecture, while automatically managing the concept of backpressure.

In a GenStage, you create a pipeline of multiple Producers & Consumers. Producers generate data points, or read them from a source, and then pass them down to the pipeline. They can then be sent through one or more Consumers that will do whatever processing you need done.

The idea of backpressure is applied in the way items are created in a Producer. When the pipeline is ready to receive new items, the handle_demand/2 function of the Producer is called, requesting a specific amount of items.

The amount requested is decided internally (although you can specify a maximum value), and the function is called whenever there is room for them in the pipeline. If items take too long to process, Producers end up being idle for a while, thus relieving some pressure from the system.

Use Case

As an example of what a GenStage can be useful for, let’s consider reading chunks of data from an external data source. In this case, we’ll use the Ethereum blockchain, since it fits this concept nicely.

A blockchain is composed of a series of blocks, each one containing multiple transactions. If we want to process the entire blockchain (for example, to look up all transactions involving a given address, or to listen to it continuously when integrating with your application), a GenStage is a perfect fit.

In this context, each block can be considered as a single data item. Let’s see how this can be achieved.

Querying the Blockchain

We’re going to use Infura’s public HTTP API to interact with the Ethereum blockchain. Let’s start by building a wrapper to its interface. I’ll be using the Tesla library for this (this is just a personal preference, feel free to choose your own).

1defmodule EthSync.Infura do
2  use Tesla
3
4  plug(Tesla.Middleware.BaseUrl, "https://ropsten.infura.io/")
5
6  # encode/decode body as json
7  # Infura doesn't set the "content-type" header to "application/json"
8  # so we need to tell Tesla that we want text/plain requests to be decoded as well
9  plug(Tesla.Middleware.JSON, decode_content_types: ["text/plain; charset=utf-8"])
10
11  @doc "Get an entire block"
12  def get_block(number) do
13    case call(:eth_getBlockByNumber, [to_hex(number), true]) do
14      {:ok, nil} ->
15        {:error, :block_not_found}
16
17      error ->
18        error
19    end
20  end
21
22  @doc "Sends a JSON-RPC call to the server"
23  defp call(method, params \\ []) do
24    case post("", %{jsonrpc: "2.0", id: "call_id", method: method, params: params}) do
25      {:ok, %Tesla.Env{status: 200, body: %{"result" => result}}} ->
26        {:ok, result}
27
28      {:error, _} = error ->
29        error
30    end
31  end
32
33  @doc "Converts integer values to hex strings"
34  def to_hex(decimal), do: "0x" <> Integer.to_string(decimal, 16)
35end

We’ll only need a single endpoint for this: getting a block’s data, given its index on the chain. The block number must be given in hexadecimal format, so we also need a helper method to handle the conversion.

We can verify that this is working via iex:

1iex(1)> EthSync.Infura.get_block(1)
2{:ok,
3 %{
4   "number" => "0x1",
5   "transactions" => [],
6   # ...
7 }
8}
9
10iex(2)> EthSync.Infura.get_block(1_000_000_000_000)
11{:error, :block_not_found}

Building the Producer

Our Producer will be a process with the responsibility of fetching Ethereum blocks.

1defmodule EthSync.Producer2 do
2  alias EthSync.Infura
3  use GenStage
4
5  def init(_) do
6    {:producer, 1}
7  end
8
9  def handle_demand(demand, next_block) when demand > 0 do
10    IO.puts("Demanding #{demand}")
11
12    blocks =
13      next_block..(next_block - 1 + demand)
14      |> Enum.map(fn n ->
15        IO.puts("Fetching block #{n}")
16        Infura.get_block(n)
17      end)
18
19    {:noreply, blocks, next_block + length(blocks)}
20  end
21end

Building the Consumer

The Consumer will receive lists of blocks and then process them. In the example, we’ll use :timer.sleep/1 to simulate processing time since we’re not doing any actual work. Keep in mind that the list of blocks received is not necessarily the same as what was sent in the Producer. Items can be buffered according to the GenStage’s internal rules. It may also happen that you have multiple Consumers and items get split between them.

1defmodule EthSync.Consumer do
2  alias EthSync.Infura
3  use GenStage
4
5  def init(_) do
6    {:consumer, nil}
7  end
8
9  def handle_events(blocks, _from, state) do
10    blocks
11    |> Enum.each(fn
12      {:ok, %{"number" => n}} ->
13        IO.puts("Received block #{n}")
14        :timer.sleep(1_000)
15    end)
16
17    {:noreply, [], state}
18  end
19end

Wiring It All Up

To start the pipeline, we need to start the processes for our Producer & Consumer, and then link them together, so that items produced by the former get sent out to the latter:

1iex> {:ok, producer} = GenStage.start_link(Producer2, [])
2{:ok, #PID<0.160.0>}
3
4iex> {:ok, consumer} = GenStage.start_link(Consumer2, [])
5{:ok, #PID<0.162.0>}
6
7iex> GenStage.sync_subscribe(consumer, to: producer, max_demand: 3)
8{:ok, #Reference<0.2486793675.579338241.116277>}
9Demanding 3
10Received block 0x1
11Received block 0x2
12Received block 0x3
13Demanding 1
14Received block 0x4
15Received block 0x5
16Demanding 1

Notice that even though we start the Producer at the beginning, it only started fetching blocks once we wired the Consumer to it. That’s because there was no demand until that point. Additionally, even though we specify max_demand: 3, that’s not necessarily the amount requested at all times. Since we only have a single Consumer, and it takes 1 second to process each block, the GenStage is smart enough not to overflow it with too many blocks. It adjusts the number of events as needed.

Consumed the coolness?

With the Producer, Consumer and having wired them together we've created a basic GenServer example. We love how GenStages provides an elegant way to create a producer/consumer architecture that automatically manages Backpressure.

This post is written by guest author Miguel Palhas. Miguel is a professional over-engineer @subvisual and organizes @rubyconfpt and @MirrorConf.

If you'd love to read more Elixir Alchemy, be sure to subscribe to the mailing list!

PS We love to support development communities, so we sponsor Code Beam Lite this year (November 30th, Amsterdam). If you want to go there, use the coupon code 'AppSignalNEWS' to get 10% of your ticket

Share this article

RSS
Miguel Palhas

Miguel Palhas

Guest author Miguel is a professional over-engineer at Portuguese-based Subvisual. He works mostly with Ruby, Elixir, DevOps, and Rust. He likes building fancy keyboards and playing excessive amounts of online chess.

All articles by Miguel Palhas

AppSignal monitors your apps

AppSignal provides insights for Ruby, Rails, Elixir, Phoenix, Node.js, Express and many other frameworks and libraries. We are located in beautiful Amsterdam. We love stroopwafels. If you do too, let us know. We might send you some!

Discover AppSignal
AppSignal monitors your apps