Welcome to a new Ruby Magic article! This time we'll be looking at how Ruby interprets our code, and how we can use this knowledge to our advantage. This post will help you understand how code is interpreted, and how this can help lead to faster code.
A subtle difference between symbols
In a previous Ruby Magic article about Escaping characters in Ruby there was an example about escaping line breaks.
In the example below you see how two strings are combined as one String across multiple lines, with either the plus +
symbol or with the backslash \
.
1"foo" +
2 "bar"
3=> "foobar"
4
5# versus
6
7"foo" \
8 "bar"
9=> "foobar"
These two examples may look similar, but they behave quite differently. To know the difference between how these are read and interpreted, you'd normally need to know the nitty-gritty about the Ruby interpreter. Or, we can just ask Ruby what the difference is.
InstructionSequence
Using the RubyVM::InstructionSequence
class we can ask Ruby how it interprets some code we give it. This class gives us a tool set which we can use to get a glimpse of Ruby's internals.
What is returned in the example below is Ruby code as it's understood by the YARV interpreter.
YARV interpreter
YARV (Yet Another Ruby VM) is the Ruby interpreter introduced in Ruby version 1.9, replacing the original interpreter: MRI (Matz's Ruby Interpreter).
Languages that use interpreters directly execute code without an intermediate compilation step. This means that Ruby does not first compile a program to an optimized machine language program, which compiled languages such as C, Rust and Go do.
In Ruby, a program is first translated to an instruction set for the Ruby VM, and is then executed immediately after. These instructions are an intermediate step between your Ruby code and the code being executed in the Ruby VM.
These instructions make it easier for the Ruby VM to understand Ruby code without having to deal with syntax specific interpretation. That's handled while creating these instructions. Instruction sequences are optimized operations which represent the interpreted code.
During the normal execution of a Ruby program we don't see these instructions, but by viewing them we can review if Ruby has interpreted our code correctly. With InstructionSequence
it's possible to see what kind of instructions YARV creates before it executes them.
It's not necessary to understand all of the YARV instructions that make up the Ruby interpreter. Most commands will speak for themselves.
1"foo" +
2 "bar"
3RubyVM::InstructionSequence.compile('"foo" + "bar"').to_a
4# ... [:putstring, "foo"], [:putstring, "bar"] ...
5
6# versus
7
8"foo" \
9 "bar"
10RubyVM::InstructionSequence.compile('"foo" "bar"').to_a
11# ... [:putstring, "foobar"] ...
The real output contains a bit more setup commands that we will look at later, but here we can see the real difference between "foo" + "bar"
and "foo" "bar"
.
The former creates two strings and combines them. The latter creates one string. This means that with "foo" "bar"
we only create one string, rather than three with "foo" + "bar"
.
1 1 2 3
2 ↓ ↓ ↓
3"foo" + "bar" # => "foobar"
Of course, this is just about the most basic example we can use, but it shows a good use case of how a small detail in the Ruby language could potentially have a lot of impact:
- More allocations: every String object is allocated separately.
- More memory usage: every allocated String object takes up memory.
- Longer garbage collection: every object, even when short-lived, takes up time to be cleaned by the garbage collector. More allocations means longer garbage collection times.
Disassembling
Another use case is debugging a logic issue. The following is an easy mistake to make, which can have big consequences. Can you spot the difference?
11 + 2 * 3
2# versus
3(1 + 2) * 3
We can use Ruby to help us find out the difference in this slightly more complex example.
By disassembling this code example we can get Ruby to print a more readable table of the commands it's performing.
11 + 2 * 3
2# => 7
3puts RubyVM::InstructionSequence.compile("1 + 2 * 3").disasm
4# == disasm: <RubyVM::InstructionSequence:<compiled>@<compiled>>==========
5# 0000 trace 1 ( 1)
6# 0002 putobject_OP_INT2FIX_O_1_C_
7# 0003 putobject 2
8# 0005 putobject 3
9# 0007 opt_mult <callinfo!mid:*, argc:1, ARGS_SIMPLE>
10# 0009 opt_plus <callinfo!mid:+, argc:1, ARGS_SIMPLE>
11# 0011 leave
12
13# versus
14
15(1 + 2) * 3
16# => 9
17puts RubyVM::InstructionSequence.compile("(1 + 2) * 3").disasm
18# == disasm: <RubyVM::InstructionSequence:<compiled>@<compiled>>==========
19# 0000 trace 1 ( 1)
20# 0002 putobject_OP_INT2FIX_O_1_C_
21# 0003 putobject 2
22# 0005 opt_plus <callinfo!mid:+, argc:1, ARGS_SIMPLE>
23# 0007 putobject 3
24# 0009 opt_mult <callinfo!mid:*, argc:1, ARGS_SIMPLE>
25# 0011 leave
The example above is a bit more involved with the number of YARV instructions, but just from the order in which things are printed and executed we see the difference a pair of parentheses can make.
With the parentheses around 1 + 2
we make sure the addition is performed first, by moving it up the order of operations in mathematics.
Note that you don't actually see the parentheses in the disassembly output itself, only their effect on the rest of the code.
Disassembly
The Disassembly output prints a lot of things that might not immediately be understandable.
In the table format that is printed, every line starts with an operation number. After which it mentions the operation and finally the argument to the operation.
A small sample of operations we've seen so far:
trace
- start a trace. See the docs on TracePoint for more information.putobject
- push an object on the stack.putobject_OP_INT2FIX_O_1_C_
- push the Integer1
on the stack. Optimized operation. (0
and1
are optimized.)putstring
- push a string on the stack.opt_plus
- addition operation (internally optimized).opt_mult
- multiply operation (internally optimized).leave
- leave the current code context.
Now that we know how the Ruby interpreter converts our developer friendly and readable Ruby code to YARV instructions, we can use this to optimize our applications.
It's possible to pass along entire methods and even entire files to RubyVM::InstructionSequence
.
1puts RubyVM::InstructionSequence.disasm(method(:foo))
2puts RubyVM::InstructionSequence.compile_file("/tmp/hello.rb").disasm
Find out why some piece of code works and why another doesn't. Learn why certain symbols make code behave differently than others. The devil is in the details, and it's good to know how your Ruby code is behaving in your app and if you can optimize it in any way.
Optimizations
Other than being able to view your code on interpreter level and optimize for it, you can use InstructionSequence
to optimize your code even further.
With InstructionSequence
, it's possible to optimize certain instructions with Ruby's built-in performance optimizations. The full list of available optimizations is available in the RubyVM::InstructionSequence.compile_option =
method documentation.
One of these optimizations is Tail Call Optimization.
The RubyVM::InstructionSequence.compile
method accepts options to enable this optimization as such:
1some_code = <<-EOS
2def fact(n, acc=1)
3 return acc if n <= 1
4 fact(n-1, n*acc)
5end
6EOS
7puts RubyVM::InstructionSequence.compile(some_code, nil, nil, nil, tailcall_optimization: true, trace_instruction: false).disasm
8RubyVM::InstructionSequence.compile(some_code, nil, nil, nil, tailcall_optimization: true, trace_instruction: false).eval
You can even turn this optimization on for all your code with RubyVM::InstructionSequence.compile_option =
. Just make sure to load this before any of your other code.
1RubyVM::InstructionSequence.compile_option = {
2 tailcall_optimization: true,
3 trace_instruction: false
4}
For more information about how Tail Call Optimization works in Ruby check out these articles: Tail Call Optimization in Ruby and Tail Call Optimization in Ruby: Background.
Conclusion
Learn more about how Ruby interprets your code with RubyVM::InstructionSequence
and see what your code is really doing so you can make it more performant.
This introduction to InstructionSequence might also be a fun way to learn more about how Ruby works under the hood. Who knows? You might even be interested in working on some of Ruby's code itself.
That concludes our short introduction to code compilation in Ruby. We'd love to know how you liked this article, if you have any questions about it, and what you'd like to read about next, so be sure to let us know at @AppSignal.