How do I benchmark Ruby code?

by Jesse Storimer on June 19, 2013

How do I benchmark Ruby code?

This guest post is by Jesse Storimer. He’s the author of Working With Unix Processes, a gentle introduction to Unix system programming for Ruby programmers. Jesse has been programming Ruby since joining Shopify in 2008 and is still going strong, always looking for a chance to dig lower down into the stack. He lives way in the backwoods of southern Ontario, Canada with his wife and two daughters. Jesse blogs at jstorimer.com and has authored a few other books for Ruby developers.

Jesse Storimer So you’ve got some Ruby code and you want to make it faster. Maybe you’ve already got a new implementation in mind, or maybe you’re still cooking that up. But how do you make certain that your new implementation is faster?

Science, of course! Ruby’s standard library comes with a benchmarking library fit for measuring the execution time of your Ruby code. The Benchmark module offers several different ways for you to benchmark your code. I’ll take you through the different options and their use cases.

Getting started

The Benchmark module is in the standard library, so you don’t need to install any gems to get it. Here’s the documentation from the standard library.

The simplest way to measure your Ruby code is with Benchmark.measure.

require 'benchmark'
require 'bigdecimal/math'

# calculate pi to 10k digits
puts Benchmark.measure { BigMath.PI(10_000) }

This will return something that looks like this:

  0.310000   0.040000   0.350000 (  0.339958)

With no context, these might look like magic numbers. Here’s what they mean:

Benchmark numbers breakdown

Generally, the number farthest to the right is the most important one. It tells how long it actually took to perform the operation. If you’re curious about why the clock time is so high, the other numbers can help you drill down to see if you’re spending time in system functions or your own code.

Now that you know what those magic numbers mean, we can move on to the core Benchmark API. The truth is that I rarely use the measure method on its own. It only prints the benchmark for a single block of code. The most common way to use Benchmark is to compare the execution time of different approaches to the same problem.

Benchmark has some built-in methods for this exact purpose.

Benchmark#bm

This method lets you define several blocks of code to benchmark, then prints the results side-by-side in the same format you saw earlier.

require 'benchmark'

iterations = 100_000

Benchmark.bm do |bm|
  # joining an array of strings
  bm.report do
    iterations.times do
      ["The", "current", "time", "is", Time.now].join(" ")
    end
  end

  # using string interpolation
  bm.report do
    iterations.times do
      "The current time is #{Time.now}"
    end
  end
end

This will print the following result:

       user     system      total        real
   0.540000   0.010000   0.550000 (  0.556572)
   0.410000   0.010000   0.420000 (  0.413467)

Notice that this is the same format I outlined earlier, but now you have little hints about each of the numbers.

The core API here is this:

Benchmark.bm do |bm|
  bm.report { first_approach }
  bm.report { alternative_approach }
end

You call the Benchmark#bm method passing a block. The block variable is a special object provided by Benchmark. It gives you a report method that you call with the block of code that you want to measure. Benchmark then runs both blocks of code and prints their execution times side-by-side.

A note about iterations: Often, when doing benchmarks that test code that executes very quickly, you need to do many iterations to see a meaningful number. In this case, I did 100,000 iterations of each variant just to get the execution time up to half a second so I could grasp the difference.

Labels

In that last benchmark, I buried some comments in the source that said what each block of code was doing. That’s not so helpful when looking at the results! Benchmark allows you to pass in a label to the report method that will be printed along with the results.

require 'benchmark'

iterations = 100_000

Benchmark.bm(27) do |bm|
  bm.report('joining an array of strings') do
    iterations.times do
      ["The", "current", "time", "is", Time.now].join(" ")
    end
  end

  bm.report('string interpolation') do
    iterations.times do
      "The current time is #{Time.now}"
    end
  end
end

I’ve now removed the comments describing the blocks and pass them in to the report method as an argument. Now the output describes itself:

                                  user     system      total        real
joining an array of strings   0.550000   0.010000   0.560000 (  0.565089)
string interpolation          0.410000   0.010000   0.420000 (  0.416324)

There’s one more important change I made in that last example that may have gone unnoticed. I passed 27 as an argument to the Benchmark.bm method. This signifies how much padding the header labels should have in the result output. If you pass labels to report, but don’t set this value high enough, your output won’t line up properly.

Let’s see an example with no argument passed to Benchmark.bm.

       user     system      total        real
joining an array of strings  0.520000   0.010000   0.530000 (  0.541942)
string interpolation  0.390000   0.010000   0.400000 (  0.394111)

That’s certainly not right. Make sure you pass a value that’s greater than the length of your longest label. That’s the happy path.

Benchmark#bmbm

The Benchmark#bm you just saw is really the core of Benchmark, but there’s one more method I should mention: Benchmark#bmbm. That’s right it’s the same method name, repeated twice.

Sometimes, with a benchmark that creates a lot of objects, the results start to get skewed because of interactions with Ruby’s memory allocation or garbage collection routines. When creating a lot of objects, one block may need to run garbage collector, while the other doesn’t; or just one block may get stuck with the cost of allocating more memory for Ruby to use.

In this case, the benchmark can produce unbalanced results. This is when you want to use Benchmark#bmbm.

The method name is suitable because it actually benchmarks your blocks of code twice. First, it runs the code as a ‘rehearsal’ to force any initialization that needs to happen, then it forces the GC to run, then it runs the benchmark again ‘for real’. This ensures that the system is fully initialized and the benchmark is fair.

This last example benchmark allocates a lot of objects. When this runs at the rehearsal stage, Ruby has to allocate more memory to make room for all the objects. Then when the ‘real’ benchmark happens, the memory is already available and just the actual implementation is tested.

require 'benchmark'

array = Array(1..10_000_000)

Benchmark.bmbm(7) do |bm|
  bm.report('reverse') do
    array.dup.reverse
  end

  bm.report('reverse!') do
    array.dup.reverse!
  end
end

And here’s the result:

Rehearsal --------------------------------------------
reverse    0.020000   0.020000   0.040000 (  0.050908)
reverse!   0.030000   0.020000   0.050000 (  0.048042)
----------------------------------- total: 0.090000sec

               user     system      total        real
reverse    0.010000   0.000000   0.010000 (  0.015385)
reverse!   0.030000   0.000000   0.030000 (  0.023973)

Notice the discrepancy between the rehearsal and the benchmark! Thanks bmbm!

Conclusion

When you want to try your hand at speeding up some of your Ruby code, make sure that you measure, measure, measure to prove that your new implementation is faster than the old one. This great little benchmarking library ships with Ruby right in the standard library, so there’s no excuses!

I hope you found this article valuable. Feel free to ask questions and give feedback in the comments section of this post. Thanks!

Technorati Tags: , , , ,

Posted by Jesse Storimer

{ 9 comments… read them below or add one }

John June 19, 2013 at 5:51 pm

This was a great intro to benchmark. Super helpful, thanks!

Why isn’t the total elapsed time the same as user + CPU time? Looks like elapsed time is shorter…

Reply

Jesse Storimer June 19, 2013 at 7:59 pm

The total elapsed time is more precise. Both are correct, but user + CPU time are rounded up.

I believe this is because the elapsed real time is measured as intervals on the system clock, so it’s as precise as your system clock can be Whereas the user and CPU time are provided right from the guts of the CPU, with less guarantees as to their granularity.

Reply

Len Jaffe June 21, 2013 at 1:24 am

Real time keeps ticking while other processes have the CPU. The system and user time only accumulate while your process is the currently executing process.

Reply

TM June 21, 2013 at 12:47 am

In time jargon, total = user + sys. Elapsed time is called “real.”

Why the distinction? It can happen that `total < real`, from blocking I/O, or that `total > real`, from parallel execution.

Reply

Jesse Storimer June 21, 2013 at 1:05 am

Right! Forgot about parallel execution. 2 threads could use 2s of CPU time in 1s of real time.

Reply

Douglas A. June 20, 2013 at 3:38 am

Thanks for the blog. It just so happens that today I was doing this code.
http://www.codecademy.com/courses/ruby-beginner-en-Qn7Qw/2/3?curriculum_id=5059f8619189a5000201fbcb
Note sure if you can see this one but it is The Marriage of Hashes and Symbols, by Eric Weinstein , Dare to Compare

Reply

Andrew Vit June 21, 2013 at 4:02 am

Any thoughts on tracing memory bloat in ruby?

Reply

Benoit Daloze June 21, 2013 at 2:03 pm

Just for info, #bmbm does not need the max label width, it can compute it (#bm cannot unless evaluating the block to #bm twice which might be counter-intuitive).

I would strongly recommend to not trust a single run, but always to measure many and compare them to have an idea of the error (even more important on implementations with JIT which need (possibly long) warmup).

user/system time is usually more reproducible (does depend much less of external factors such as other processes) but has indeed less granularity, would be 0 for #sleep for instance and would sum user/system time of each thread.

Reply

Akshay September 13, 2013 at 6:08 pm

Good explanation of Benchmarking in Ruby. Best way to measure performance of code blocks.

Reply

Leave a Comment

{ 3 trackbacks }

Previous post:

Next post: