When building software, Elixir (or Erlang) offers great benefits, including concurrency, scalability and reliability.
In this series, we will examine how to make the most of these benefits in your production code upgrades. This article will focus on hot code reloading and upgrades. But before we dive in, let's quickly define OTP.
What Is OTP in Elixir?
Formally, Erlang/OTP is a specific implementation of Erlang Runtime System, i.e. a set of libraries, compilers and a VM implementation.
Informally, OTP often denotes a set of principles to build robust apps in Erlang and the corresponding set of built-in libraries.
Hot Code Reload: Tackling the Uncertainties
There is a bit of uncertainty about this concept.
When we speak of hot code reload or hot code upgrade, we usually mean an ability to change a running process behavior without any negative impact on that process. For example, we may change the behavior of a process that holds a TCP connection without terminating this connection.
Uncertainty comes in with scaling — the question is if we can upgrade:
- a module
- a package (an application in terms of OTP)
- a whole running instance (a release in terms of OTP)
OTP offers tools for upgrading at any scale. In this article, we will consider application level upgrades.
How Are OTP and Hot Code Upgrades Related?
As we will see, hot code upgrades on a larger scale (application and release levels) work only for systems built according to OTP principles.
Hot Code Upgrades: The Basics
A good starting point to understand hot code reload is Hot Code Reloading in Elixir.
It explains the following key points:
- How to reload code for a single module
- How two versions of code exist after loading a new version of the module: new code and old code
- The importance of external calls, which make it possible to transition from old code to new code
- How
GenServer
helps us make such a transition seamlessly
At this point, I'd like to highlight one important concept in-depth: code purge.
Should You Code Purge in Elixir?
What happens if we want to upgrade code two or more times?
Let's create a small mix project:
1mix new code_purge
2cd code_purge
Then update lib/code_purge.ex
to the following:
1# lib/code_purge.ex
2defmodule CodePurge do
3 def pi do
4 3.14
5 end
6end
Now we launch iex shell with mix:
1iex -S mix
2iex(1)> CodePurge.pi
33.14
Then update lib/code_purge.ex
to:
1# lib/code_purge.ex
2defmodule CodePurge do
3 def pi do
4 3.142
5 end
6end
And recompile the project in a separate shell:
1mix compile
In our iex shell, we reload the module code:
1iex(2)> :code.load_file(CodePurge)
2{:module, CodePurge}
3iex(3)> CodePurge.pi
43.142
All has worked as expected. :code.load_file/1
found the updated
Elixir.CodePurge.beam
in _build/dev/lib/code_purge/ebin
folder
(as mix sets up code paths for us) and reloaded it.
But what happens if we try to reload this module once more, without actually changing it?:
1iex(4)> :code.load_file(CodePurge)
2{:error, :not_purged}
What Went Wrong Here?
Wow, that doesn't work. This is because Erlang can't have two versions of old code.
To overcome this, there are two other methods of :code
: :code.purge/1
and :code.soft_purge/1
.
A purge evicts the old code:
1iex(5)> :code.purge(CodePurge)
2false
3iex(6)> :code.load_file(CodePurge)
4{:module, CodePurge}
We can upgrade the code of the module again after the purge. But why do we even need to control that? Why not purge code automatically?
Well, there may still be processes running old code, and we should decide what to do with them during the upgrade. This is also why there are two functions:
:code.purge/1
— kills processes running old code:code.soft_purge/1
— fails if there are any processes running old code
This leads to important consequences: if we want to upgrade our code more than once, our processes will be killed by default during upgrades.
Let's illustrate this.
How Not to Do a Code Upgrade
First, add file lib/code_purge/pi.ex
to your toy project
with the following content:
1# lib/code_purge/pi.ex
2defmodule CodePurge.Pi do
3 def start_link do
4 spawn_link(&server/0)
5 end
6
7 def server do
8 receive do
9 {:get, from} ->
10 send(from, {:ok, 3.14})
11 CodePurge.Pi.server()
12 end
13 end
14
15 def get(pid) do
16 send(pid, {:get, self()})
17
18 receive do
19 {:ok, value} ->
20 {:ok, value}
21 after
22 1000 ->
23 :error
24 end
25 end
26end
Then, run iex shell, spawn a server and check everything is fine:
1iex(1)> pid = CodePurge.Pi.start_link()
2#PID<0.140.0>
3iex(2)> CodePurge.Pi.get(pid)
4{:ok, 3.14}
Now, reload the module once (without any actual changes to functions) and try to purge it so that you can do the next 'upgrade':
1iex(3)> :code.load_file(CodePurge.Pi)
2{:module, CodePurge.Pi}
3iex(4)> :code.purge(CodePurge.Pi)
4** (EXIT from #PID<0.152.0>) shell process exited with reason: killed
What Happened Here?
As expected, your server just died, and even an external call to CodePurge.Pi.server/0
couldn't save you. The server didn't receive messages
and so didn't transition to the new code after the first upgrade.
This isn't robust. One of the obvious reasons
for the failure is that we didn't
use OTP libraries (GenServer
and related libraries) dedicated to creating this kind of server.
Avoid Spawn in Real-World Software Development
In many books and articles, we see code examples demonstrating the power of Elixir or Erlang:
tons of processes easily spawned directly with spawn
or spawn_link
.
However, in real-world software development, we generally should avoid creating home-brewed servers or other long-running processes, and should instead use OTP libraries.
Even for 'one-off' asynchronous tasks, we shouldn't directly use spawn
or spawn_link
.
Elixir has a great alternative for spawn
, though: Task
module (covered in depth
in the AppSignal article Demystifying processes in Elixir).
How To Do a Code Upgrade Using GenServer
Let's create a better version of our server in lib/code_purge/pi_gs.ex
:
1# lib/code_purge/pi_gs.ex
2defmodule CodePurge.PiGs do
3 use GenServer
4
5 def start_link(value \\ 3.14) do
6 GenServer.start_link(__MODULE__, value)
7 end
8
9 def init(value) do
10 {:ok, value}
11 end
12
13 def handle_call(:get, _from, value) do
14 {:reply, value, value}
15 end
16
17 def get(pid) do
18 GenServer.call(pid, :get)
19 end
20end
And now, try to upgrade/purge the code of a running process several times:
1iex(1)> {:ok, pid} = CodePurge.PiGs.start_link()
2{:ok, #PID<0.161.0>}
3iex(2)> CodePurge.PiGs.get(pid)
43.14
5iex(3)> :code.load_file(CodePurge.PiGs)
6{:module, CodePurge.PiGs}
7iex(4)> :code.purge(CodePurge.PiGs)
8false
9iex(5)> :code.load_file(CodePurge.PiGs)
10{:module, CodePurge.PiGs}
11iex(6)> :code.purge(CodePurge.PiGs)
12false
13iex(7)> CodePurge.PiGs.get(pid)
143.14
Nothing bad happens! The reason why is easy to understand.
Our pid
process doesn't spin in CodePurge.PiGs
code.
It runs a GenServer
loop, and we don't update the GenServer
module code at all.
CodePurge.PiGs
is a callback module, and the name is kept in a GenServer
internal state.
GenServer
makes external calls to CodePurge.PiGs
functions when serving GenServer
requests.
The main challenge is to keep updating the states of GenServer
processes, so that any new code can work.
For a single GenServer
, this can be done through :sys
module and code_change
callback
of GenServer
. This is covered in depth in the previously mentioned
hot code reloading article,
here, we'll only briefly demonstrate it.
Without closing the previous iex session,
let's update lib/code_purge/pi_gs.ex
to the following and compile:
1# lib/code_purge/pi_gs.ex
2defmodule CodePurge.PiGs do
3 use GenServer
4
5 def start_link(value \\ 3.14) do
6 GenServer.start_link(__MODULE__, value)
7 end
8
9 def init(value) do
10 {:ok, [value]}
11 end
12
13 def handle_call(:get, _from, st) do
14 [value] = st
15 {:reply, value, st}
16 end
17
18 def get(pid) do
19 GenServer.call(pid, :get)
20 end
21
22 def code_change(_old_vsn, value, _extra) do
23 {:ok, [value]}
24 end
25end
In code_change
we updated the state, just wrapping it with a list.
We also updated handle_call
and init
callbacks.
Now, in the existing iex session, run:
1iex(8)> :code.purge(CodePurge.PiGs)
2false
3iex(9)> :sys.suspend(pid)
4:ok
5iex(10)> :code.load_file(CodePurge.PiGs)
6{:module, CodePurge.PiGs}
7iex(11)> :sys.change_code(pid, CodePurge.PiGs, nil, [])
8:ok
9iex(12)> :sys.resume(pid)
10:ok
11iex(13)> CodePurge.PiGs.get(pid)
123.14
13iex(14)> :sys.get_state(pid)
14[3.14]
Everything works fine, and the last call to :sys.get_state
demonstrates that the
state has actually changed.
Wrap-up
In the first part of this series, we've seen that a GenServer
implementation is needed
for effective hot code upgrades. We've also demonstrated how to upgrade a single
GenServer
instance consistently.
Upgrading an individual process, together with its callback module, can be used as a 'tactical weapon' to fix localized bugs or add some logging.
But updating a system at a greater scale, on a regular basis, requires more powerful tools. In the next part of the series, I'll delve into the world of supervisors in Elixir.
I hope you found this run-through of hot code reloading useful. See you next time for supervisors!
P.S. If you'd like to read Elixir Alchemy posts as soon as they get off the press, subscribe to our Elixir Alchemy newsletter and never miss a single post!