elixir

An Introduction to Metaprogramming in Elixir

Jia Hao Woo

Jia Hao Woo on

An Introduction to Metaprogramming in Elixir

In this world, there are many mysteries — but few are as elusive as metaprogramming in Elixir.

In this four-part series, we'll start by looking at core concepts and then explore how metaprogramming operates in Elixir specifically.

Let's develop an understanding of metaprogramming and uncover some Elixir metaprogramming secrets!

Introducing Metaprogramming in Elixir

According to Harald Sondergaard, metaprogramming is:

a programming technique in which computer programs have the ability to treat other programs as their data; meaning that a program can be designed to read, generate, analyze, or transform other programs, and even modify itself while running.

In essence, metaprogramming — much like metadata — revolves around "a set of programs that describe and give information about other programs" (adapted from the Oxford dictionary definition of metadata).

Types of Metaprogramming

There are two types of metaprogramming that prescribe varying degrees of control over a given program:

Introspection

Introspection refers to a program revealing metadata about other programs or itself.

This definition broadly covers the first part of the definition of metaprogramming: "a program can be designed to read...[and] analyze...other programs". The program has access to information about itself or other programs.

Reflection

Reflection refers to a program modifying other programs or itself.

If a program can modify other programs or itself, it — by definition — has access to the metadata of the program, revealing information like names of functions.

By looking at the two types of metaprogramming, we can conclude that reflection encompasses introspection, or introspection is a subset of reflection:

Types of metaprogramming

Why Do Metaprogramming?

We will look at how to use metaprogramming specifically in Elixir. But first, let's cover some general concepts and benefits of metaprogramming.

According to Wikipedia, metaprogramming can be used to (but not limited to) achieve the following:

  1. Move computations from run-time to compile-time
  2. Generate code using compile-time computations
  3. Enable self-modifying code

Observe that among these use cases (as well as across metaprogramming), the terms "run-time" and "compile-time" are used frequently. Let's see what they mean.

Defining Run-time and Compile-time

We can broadly classify metaprogramming into two categories: compile-time and run-time metaprogramming.

But what exactly are run-time and compile-time? They both refer to stages of a program's life cycle.

Compile-time is the stage at which source code converts to binary code or intermediate binary code for a machine or virtual machine to execute. Run-time refers to when code executes.

The program life cycle includes the following steps:

Typical program lifecycle
Source: https://en.wikipedia.org/wiki/Program_lifecycle_phase

Note that this is not a complete representation of the entire program life cycle, just a simplified one.

Compilation "sets the program in stone" by converting it into binary code. Metaprogramming exposes this process to allow developers to "move computation from run-time to compile-time" or "generate code using compile-time computations". This essentially allows the modification of the source code before/during compile-time, meaning that the generated binary code is slightly different.

Self-modifying code is rather unique. In essence, it performs reflection during run-time. The life cycle of a self-modifying program looks a little different:

Program lifecycle of a self-modifying program

Note: You can substitute the "binary" for any intermediate language generated by the compiler, such as JVM bytecode or, in Elixir's case, BEAM VM bytecode.

Interesting Uses of Metaprogramming

The general definition of metaprogramming also encompasses the tools used in a program's life cycle.

For instance, a language compiler is a metaprogramming application designed to receive another program as input and generate binary as output.

We can narrow down broad metaprogramming use cases to the following, more specific, applications in a programming language context:

Code generation

By generating code dynamically during compile-time, it's available during run-time. When the nature of the code that's generated is not fixed, this can prove especially useful. For instance, you can use code generation to design domain-specific languages (DSLs) or generate functions based on input such as files or APIs.

Code instrumentation

Code instrumentation refers to the measure of a program's performance, error diagnosis, and logging of trace information.

Metaprogramming enables this through dynamic program analysis — software analysis performed by running software through a real or virtual processor.

Code instrumentation enables features like code coverage, memory error detection, fault localization, and concurrency errors.

Behavioral changes

This refers to changing the behavior of a program through metaprogramming. Behavioral changes can include feature toggling, where a given feature is toggled on/off through a flag that is read during compile-time/run-time.

This article series is about metaprogramming within Elixir, so our key focus will be on code generation.

Metaprogramming in Elixir: The Basics

Elixir applies a style of metaprogramming known as macro system metaprogramming (also used in other languages like Rust and Lisp).

In Elixir, metaprogramming allows developers to leverage existing features to build new features that suit their individual business requirements.

The foundation of metaprogramming in Elixir is macros.

Defining Macros

According to the official documentation:

Macros are compile-time constructs that are invoked with Elixir's AST as input and a superset of Elixir's AST as output.

There are two critical components to this definition. Let's break them down:

  • Compile-time constructs - evaluated and available during compile-time
  • Elixir's AST - Abstract Syntax Trees (ASTs) are tree representations of the abstract syntax structure of the source code

We use the representations of the source code as building blocks for compile-time constructs. Since the compiler reasons with the source code through ASTs, we effectively "speak" the compiler's language to build constructs that it can directly reason with.

In Elixir, ASTs are tuples, so we reason with the compiler in a manner that is familiar to us. We do not need to deviate from Elixir's syntax to begin writing macros — lowering our barrier to entry of learning macros. On top of that, we do not even need to write ASTs ourselves. There are constructs in Elixir to handle all of that heavy lifting for us.

The above definition also mentions how a macro receives an AST as input and returns a superset of AST as output. So, you can think of a macro as a regular function with inputs, behavior, and an output. The overall goal is to use a given AST to generate a new AST for the compiler to use.

There is more to come on the compilation process of Elixir programs in part two of this series.

Starting Small with Macros

Now that we understand macros, let's dip our toes into the water and implement a basic macro.

We'll start with a very basic comparison of a macro to a regular function.

The Elixir documentation inspires this code example:

1defmodule Foo do
2  defmacro macro_inspect(value) do
3    IO.inspect(value)
4    value
5  end
6
7  def func_inspect(value) do
8    IO.inspect(value)
9    value
10  end
11end

To define a macro, we use defmacro and declare the parameters just as we would a regular function.

Running the macro in IEX yields the following results:

1iex(1)> import Foo
2iex(2)> macro_inspect(1 + 2)
3{:+, [context: Elixir, import: Kernel], [1, 2]}
43
5iex(3)> func_inspect(1 + 2)
63
73

Observe that rather than printing the result of 1 + 2, the macro prints a tuple instead (the AST as input that we defined earlier).

When a macro is first declared, the arguments of that macro are automatically converted into AST so that you don't need to parse the arguments manually. The arguments will not be evaluated beforehand.

However, when the value of the macro is returned, it yields the result of 1 + 2. The macro should return an AST as output (and it is). However, this AST as output is compiled and executed once the macro is called. The expression 1 + 2 is evaluated first, then returned.

Once we understand the basic syntax and declaration of a macro, we can explore the structure of the AST.

AST Structure

As mentioned earlier, the AST is the representation of the source code as a syntax tree. In the example above, we inspect the AST of the expression 1 + 2.

We can break down the AST structure into three components:

  1. Atom — representing the name of the operation
  2. Metadata of the expression
  3. Arguments of the operation
1{
2  :+,                                 # operation name,
3  [context: Elixir, import: Kernel],  # metadata,
4  [1, 2]                              # operation arguments
5}

While you must understand what comprises an AST, we rarely need to read/write raw ASTs.

Elixir makes it ridiculously easy to interface with macros, so we hardly even need to think about the structure of the AST that we are working on — everything is handled for us.

Interacting with ASTs

As mentioned earlier, ASTs represent the source code and are the input and output of macros. They are the cornerstone of macros. We need to interact with the AST representations of expressions freely, without getting bogged down by reading and writing the ASTs ourselves.

This is where quote and unquote come into the picture.

To generate the AST representation of an expression or body, we use quote:

1quote do
2  1 + 2 * 3
3end
4
5{:+, [context: Elixir, import: Kernel],
6 [1, {:*, [context: Elixir, import: Kernel], [2, 3]}]}

When we use quote, we build an AST. While the example above is relatively simple, we will soon discover that quote can be used to build much more complex ASTs.

What if we have a value we want to use in our quote, such as the arguments? We attempt to introduce an external (outside of quote) variable into quote, by using unquote.

unquote evaluates its argument, which is an expression, and injects the result (as an AST) into the AST being built. As everything in Elixir is an expression, we evaluate expressions to inject the results.

For instance, if unquote receives a variable, we will evaluate that expression as the underlying expression referenced by the variable and inject that.

If unquote receives a full expression like 1 + 2 * 3, we will evaluate that to 7 and inject that. unquote expects that the result of the expression is a valid AST.

In part two of this series, we'll discuss the consequences of having an invalid AST and delve into macros more deeply.

Do you recall that macros automatically convert arguments into their AST forms? We will leverage that behavior:

1defmodule Foo do
2  defmacro foo(exp) do
3    quote do
4      doubled = unquote(exp) * 2
5      doubled
6    end
7  end
8end
9
10Foo.foo(1 + 2 * 3)
1114

As you can see, we have built a macro called foo which receives an expression as an argument. Then, we begin to build an AST for the macro in quote. We use unquote(exp) to inject the value of the exp argument into the AST.

You might ask yourself: How do I know that the expression is injected and not evaluated right away?

Well, we can use a handy tool to inspect the AST of the macro and understand how it works under the hood:

1iex(1)> require Foo
2iex(2)> ast = quote do: Foo.foo(1 + 2 * 3)
3iex(3)> ast |> Macro.expand(__ENV__)
4{:__block__, [],
5 [
6   {:=, [],
7    [
8      {:doubled, [counter: -576460752303423358], Foo},
9      {:*, [context: Foo, import: Kernel],
10       [
11         {:+, [context: Elixir, import: Kernel],
12          [1, {:*, [context: Elixir, import: Kernel], [2, 3]}]},
13         2
14       ]}
15    ]},
16   {:doubled, [counter: -576460752303423358], Foo}
17 ]}

First, we generate the AST of the macro call and assign it to a variable.

Then, with our ast variable, we will use Macro.expand to expand the AST to its fullest form.

We'll look at macro expansion next time. For now, think of it as peeling back the layers of an AST to its most fundamental components.

As you can see, the expanded form of the Foo.foo call contains the AST of 1 + 2 * 3. This proves that unquote only injected the AST of the expression into the quote AST, but didn't evaluate it. The evaluation is performed later on (we will get into this in part two as well).

Note: Macro.expand will only attempt to perform expansion on the root node of the AST.

You can find more information about Macro.expand in the docs.

quote Options in Elixir

Now that we understand the fundamentals of macros, we can start to look at our quote options.

While there are several options with quote, we will focus on the three most frequently used and introduce the concepts behind each option.

  • unquote

Toggles the unquoting behavior in quote. By disabling it, any unquote call is converted to an AST of the macro call (as with any other macro/function call).

This defers the evaluation of unquote to a later point. I'll explain why you'd want to do so in the next part of this series.

For now, let's look at the following example:

1iex(1)> a = [foo: 1, bar: 1]
2iex(2)> ast = quote do: unquote(a)
3  [foo: 1, bar: 1]
4iex(3)> ast = quote unquote: false, do: unquote(a)
5  {:unquote, [], [{:a, [], Elixir}]}

When we leave the unquoting behavior enabled (iex(2)), unquote(a) will evaluate a as an expression. This returns the keyword list, which is then injected into the quote AST — and the result is as expected.

However, when we disable the unquoting behavior (iex(3)), unquote(a) is converted into another AST expression, which is injected into the quote AST as-is.

  • bind_quoted

Disables unquoting behavior in the quote and binds given variables in the body of quote.

Binding moves the variable initialization into the body of quote.

We can observe this behavior using Macro.to_string:

1iex(1)> a = [foo: 1, bar: 2]
2iex(2)> ast = quote bind_quoted: [a: a], do: IO.inspect(a)
3iex(3)> ast |> Macro.to_string |> IO.puts
4(
5  a = [foo: 1, bar: 2]
6  IO.inspect(a)
7  :ok
8)
9:ok

As you can see, bind_quoted adds a "copy" of a into the body of quote by assigning it in the body of quote.

In a macro, this is equivalent to binding the variable to the caller context, as the variable is initialized during the evaluation of the callsite.

Note: Contexts will be discussed in greater detail next time.

1  defmodule Foo do
2    defmacro foo(x) do
3      quote bind_quoted: [x: x] do
4        IO.inspect(x)
5        end
6    end
7end
  • location

This option controls whether run-time errors from a macro are reported from the caller or inside the quote.

By setting this option to :keep, error messages report specific lines in the macro that cause the error, rather than the line of the callsite.

You can see a code example in the docs.

Build a Simple Macro in Elixir

We should now be able to build a simple macro that mimics the behavior of an if statement.

Recall that an if statement is comprised of the following components:

1if (condition) do
2  # body
3else
4  # body
5end

We can replicate this structure using our own macro:

1defmodule NewIf do
2  defmacro if?(condition, do: block, else: other) do
3    quote do
4      cond do
5        unquote(condition) == true -> unquote(block)
6        unquote(condition) == false -> unquote(other)
7      end
8    end
9  end
10end
11
12iex(1)> require NewIf
13iex(2)> NewIf.if? 4 == 5, do: :yes, else: :no
14:no

This macro can receive three arguments:

  • condition - predicate to evaluate if? statement against
  • do - block to execute when condition is true
  • else - block to execute when condition is false

In Elixir, such blocks can be declared as arguments if they follow the following syntax: <formal name>: <variable name>. The formal name is the name used when you call the macro. The variable name is the name used in the macro when you're attempting to reference the block.

After receiving these three arguments, we start by building an AST using quote.

Using a cond statement, we determine which body if? should execute. We use unquote to inject the values of condition, block, and other into the AST we are building.

In doing so, when the macro is evaluated, the condition is evaluated to be true/false, and, based on that result, we will either execute block or other.

We wrap up this behavior into an AST returned by quote (which is the return value of the macro).

Next Up: Macros in Detail

Now we have a good grasp on the foundations of metaprogramming in general and specifically in Elixir.

Join me for the next part of this series, where we'll look into the intricacies behind macros and how everything works.

Until next time!

P.S. If you'd like to read Elixir Alchemy posts as soon as they get off the press, subscribe to our Elixir Alchemy newsletter and never miss a single post!

Share this article

RSS
Jia Hao Woo

Jia Hao Woo

Jia Hao Woo is a developer from the little red dot — Singapore! He loves to tinker with various technologies and has been using Elixir and Go for about a year.

All articles by Jia Hao Woo

AppSignal monitors your apps

AppSignal provides insights for Ruby, Rails, Elixir, Phoenix, Node.js, Express and many other frameworks and libraries. We are located in beautiful Amsterdam. We love stroopwafels. If you do too, let us know. We might send you some!

Discover AppSignal
AppSignal monitors your apps