ruby

Introduction to Garbage Collection (Part I)

Thijs Cadier

Thijs Cadier on

Introduction to Garbage Collection (Part I)

Whenever you run your code, you use memory. When you write in a language like Ruby, it seems like the memory available to you is infinite. You can just keep going without thinking about the fixed amount of memory the system running your code has. In this Ruby Magic episode we'll explain how this works!

A bit of history

Back in the day, scripting languages such as Ruby did not exist yet. People only wrote code in languages such as C, a low level programming language. One of the things that makes these languages low level is that you have to clean up after yourself. For example, whenever you allocate memory to store a String, you also have to decide when to clean it up.

Manual cleanup

This looks a little something like the following mock Ruby code. It declares a variable and uses the method free –this method does not actually exist in Ruby– to clean up the memory we've used after we're done with the variable.

11_000_000.times do |i|
2  variable = "Variable #{i}"
3  puts variable
4  free(variable)
5end

A tedious way of programming

You might have already realized there's a risk here: what if you forget to free the variable? In that case the content of that variable will just stick around in memory until the process exits. If you do this often enough, you will be out of memory and your process crashes.

The next example demonstrates another common issue:

11_000_000.times do |i|
2  variable = "Variable #{i}"
3  free(variable)
4  puts variable
5end

We declare the variable and free it. But then we try to use it again, which is impossible because it doesn't exist anymore. If this were C, your program would now crash with a segfault. Oops!

Humans are mistake machines

Humans are notoriously bad at not making these kinds of mistakes all of the time. Hence the need for a way to automatically clean up memory. The most popular way to do this –also used in Ruby– is Garbage Collection (GC).

How Garbage Collection (GC) works

In a language that uses GC, you can create objects without manually cleaning them up. Whenever you create an object, it's registered with the Garbage Collector. GC tries to keep track of all references you make to this object. When it determines you're not using the object any more, it is marked for cleanup. Every once in a while the Garbage Collector pauses your program and cleans up all the marked objects.

Looking at some examples

In the simple loop we used earlier the GC's job is fairly easy. With every iteration of the loop, the variable isn't used anywhere anymore. The variable can immediately be marked for cleanup.

11_000_000.times do |i|
2  variable = "Variable #{i}"
3  puts variable
4end

In the next example we pass the variable into the puts_later method which waits for 30 seconds and then puts the variable.

1def puts_later(variable)
2  Thread.new do
3    sleep 30
4    puts variable
5  end
6end
7
81_000_000.times do |i|
9  variable = "Variable #{i}"
10  puts_later variable
11end

The Garbage Collector's job is already pretty complicated in this relatively simple example. It has to understand that we reference the variable in the puts_later method. Because the method starts a thread, the Garbage Collector has to keep track of the thread and wait for it to finish. Only then can the variable can be marked for cleanup.

When it gets complicated

Without getting into complex examples, trust me when I say the Garbage Collector's job is really hard. This also explains why GC can cause overhead and problems in your production environment. It needs to have a very detailed understanding of what's happening in your program to properly clear memory, which takes quite a few CPU cycles to get right. But hey, it beats cleaning up after yourself!

There's more to Garbage Collection

This was only our introduction to Garbage Collection. In a future article we'll look at how exactly this works in Ruby, and how you can measure and tune GC to improve the performance of your application.

Update:The next episode is available here.

Share this article

RSS
Thijs Cadier

Thijs Cadier

Thijs is a co-founder of AppSignal who sometimes goes missing for months on end to work on our infrastructure. Makes sure our billions of requests are handled correctly. Holds the award for best drummer in the company.

All articles by Thijs Cadier

AppSignal monitors your apps

AppSignal provides insights for Ruby, Rails, Elixir, Phoenix, Node.js, Express and many other frameworks and libraries. We are located in beautiful Amsterdam. We love stroopwafels. If you do too, let us know. We might send you some!

Discover AppSignal
AppSignal monitors your apps