HomeRumblingsSoftwareTravelingArchivesAbout
May 2011 Archives

rake performance: sh...I'm working

In Ruby Performance: What a difference the compiler makes we saw how ruby performance on Windows benefits from the newer mingw compiler.

The measurements are done using our current build system with our production code. That’s ~36 MB (37.331.657 bytes at the time of writing) of sources and configuration files to parse and compile. After changing RubyInstaller versions the full build was still taking four to five minutes.

I’ll make a small parenthesis here to remark that the previous Dell Latitude series (the E64 and E65) have huge performance issues. Identical laptops with identical software deliver wildly different numbers:

The Y axis is seconds. Each color is a different laptop and the ten clusters represent our ten most used rake tasks.

Astonishing in the above graph is that both the fastest as well as the second slowest time are from identical laptops (E6510 Latitudes). The slowest laptop is a E6410 Latitude which exhibits serious overheating.

To return from the brink of hardware apoplexy, the full build was still slow for our tastes. At this point the code is clean and very, very DRY. A few experiments with minimizing Rake::Task look-ups, cached Rake::FileList instances etc. showed exactly 0 improvement.

I like to think that that means there is nothing more to do to streamline the logic but it probably means that I have reached the end of my optimizing skillz. In line with the “reduce disk I/O” mantra, we started looking at the places were we interact heavily with the operating system.

File reads and writes from within Ruby we already had under control, but what about all those commands we delegated to the shell?

Rake#sh allows you to shell out for a command and conveniently raises an exception when the exit code of that command is <>0. When providing a single string it will spawn a full shell (on Windows that means cmd.exe) while when providing an array of strings it will use Kernel#system semantics.

Turns out (doh!) that using Kernel#system is a tick faster. It gets buried in the noise if you only do a couple but when compiling we tend to do two to three calls to Rake#sh per library (compiler, assembler and archiver). At about 130 libraries switching from sh(String) to sh(Array) gave us a 7% performance boost which translates to builds faster by >20 seconds.

Short of mangling my clean dependency structures and hacking the code to bits (meaning I will be damned to an eternity of maintaining it by myself) I can’t see how I can improve performance further.

Fortunately all of this currently happens on normal hard disks and the order for an SSD has already been placed, so there is still hope. After that, parallel is the only way to go.

I have now actually spent almost a full four weeks in a measure-analyze-refactor loop. I've gained three insights as far as performance optimization is concerned:

  1. Keeping the code DRY with clean abstractions and proper design is infinitely more useful and productive than hacking tricks for performance’s sake. None of the tricks I used made any difference while they made the code a lot harder to read and maintain. Refactoring for DRYness and testability on the other hand proved a boon: It isolated the suspects, allowed granular benchmarking and made experiments a hell of a lot easier.

  2. Measure, measure, measure. It doesn’t matter what you've read or what you think looks faster in your source files. Between the source and it’s output lie several layers (interpreter, shell, operating system) over which you have little or no control. Unless you see those numbers going down no change is worth committing.

  3. Optimizing performance is HARD and it takes TIME. Look for the usual suspects (disk I/O I'm looking at you) and avoid trying to be clever.

There is a fourth insight, pertaining to build systems: Consistent, reliable incremental builds are unbeatable. The less often you do a rake clean the less time you lose.

Continue Reading…

Posted by Vassilis Rizopoulos on May 20, 2011

Ruby Performance: What a difference the compiler makes

For a long time Ruby performance was not an issue. That changed as soon as I moved from Windows to Mac and started supporting development teams on different systems.

Ruby performance under Windows was(is) an embarrassment (when it isn’t a reason to change development environment).

When writing software like build systems there is the factor of perceived performance which can be best defined as the amount of time between starting a program and the moment the first output is given to the user (the startup time) or the amount of time between screen outputs.

Providing clues to the user might actually slow down your scripts (all those stdout prints cost time) but it gives your user the sense that something (hopefully useful) is happening.

Then there is the real performance and lately I have reduced everything down to a single rule: minimize disk I/O.

For a build system that compiles stuff from source there is only so much you can do, but the rule holds.

When my team labeled our startup time the “Ruby Tribute Minute” I knew I couldn’t ignore the problem anymore. In order to keep up with a rapidly growing code base I had to improve performance: run fast in order to stay put.

The platform is Windows 7 32bit and to make matters worse I/O wise, it’s a virtual machine.

RubyInstaller 1.8.7p249 with DevKit 3.4.5 was the original installation. The update is RubyInstaller 1.8.7p334 with DevKit 4.5.1 and recompiled native gems.

Using the same codebase and running over the same amount of code the following graph shows the performance boost that Ruby gets from using a newer compiler on Windows.

That first bar is very important. It’s a rake -T, the closest thing to measuring the RTM. The p249 version needs 11 seconds while the p334 version under 5 (the same operation on my MacBook Pro needs 2 seconds but the comparison is useless since there are less cores and no VM)

The test is very realistic, in that it’s a sequence of rake calls from a living, growing build system: -T, clean, generate, build-1, build-1, build-2, build-2.

The last four measure time for a full rebuild and the minimum incremental build time for our two main targets. Where the compiler actually does the bulk of the job (bars 5 and 7) there is no gain but in every other aspect the new RubyInstaller version is faster and feels faster – it actually feels faster than it is, which goes a long way to placate my team members.

I'm actually quite interested to see what 1.9.2 performance looks like, but need to work out some incompatibilities first.

All I can do is send a great big thanks to Luis Lavena and the RubyInstaller team for doing such a great job and improving the Ruby experience on Windows immensely.

Continue Reading…

Posted by Vassilis Rizopoulos on May 10, 2011