mdub@DogBiscuit.org
... mmm, crunchy!
about - weblog - software - resume - email - pgp

Introducing ShamRack

The system I'm currently working on integrates with several external systems, over HTTP, using simple (RESTish) web-services. I really don't want to involve those external systems while testing my own, though; I want to stub 'em out.

My first attempt involved stubbing out HTTP calls using my mocking framework of choice. I'm using RestClient, which I like a lot, and stubbing out RestClient API calls worked quite well. It kept on working quite well for several hours, until I decided to refactor a little, using RestClient in a slightly differently way, at which point it broke completely. Bother. I really don't like having tests coupled to implementation details, so went searching for another way.

FakeWeb looked pretty good, in that it stubs things out at the Net::HTTP layer, which I'm unlikely to refactor out of the picture. In the end, though, it's not really what I wanted. I wanted to be able to do things like:

  • verify the body (and mime-type) of a POST/PUT request
  • dynamically generate responses, based on some aspect of the request (e.g. query parameters)

In short, I wanted a Fake Object, rather than a simple stub.

It occurred to me around about then that we already have plenty of tools for describing the behaviour of web-applications: they're called web-application frameworks! Many of them are too heavy-weight for my purposes, but Sinatra is nicely minimal. So, 60 lines of Ruby code later, I had a little web-app that mimicked one of those external web-services sufficiently for my testing. Win!

But waitaminut. I really don't want to have to start a separate process running my fake web-service, and talk to it using HTTP. That's going to be slow: network I/O isn't cheap. Isn't there some way I can use something like Sinatra but still keep everything in-process?

There is now. ShamRack plumbs Net::HTTP directly into applications built to run on Rack. Which includes all Sinatra apps, as well as Rails, Merb, etc.

Using ShamRack, I avoid the network traffic, making the tests a whole lot faster (about 25 times faster, in my case). Plus, I avoid the complication of having to start and stop an external web-server. Finally, because my fake web-service app is in-process, I get a handy back-channel I can use to setup or inspect it's state during tests.

If you find ShamRack handy, or have ideas about how it could improve, let me know!

Spying on your code with RR

A while back, Melbourne's own Pete Yandell created Not A Mock, an extension to RSpec that supports test-spies. And a damn fine idea it was, too.

I've recently discovered that my current favourite stub/mock framework, Brian Takita's RR, can do test-spies too!

Huh?

What's this "spy" business about? Well, when mocking, before triggering the behaviour you're testing, you set up expectations that a certain methods of collaborating objects will be invoked, with the specified parameters. Like so:

describe TransferEverything do

  before do
    @account1 = Account.new
    @account2 = Account.new
    @transfer = TransferEverything.new(:from => @account1, :to => @account2)
  end

  describe "#execute" do

    it "moves all funds from one account to the other" do

      all_the_money = 1.42
      stub(@account1).balance { all_the_money }

      mock(@account1).withdraw(all_the_money)   # <= set expectations
      mock(@account2).deposit(all_the_money)

      @transfer.execute                         # <= execute
      
    end                                         # <= verify expectations

  end

end

The expectations are typically verified auto-magically, by the mocking framework, at the end of your test.

The spy alternative

Setting up expectations before a call always feels clumsy. Using a test spy makes tests flow more naturally:

  1. Stub out collaborators, setting up canned responses where required.
  2. Execute the code you're testing.
  3. Verify the results, including both:
    • the outputs (return values, or resulting state)
    • the interactions (ie. the method-invocations you expected your fake collaborators to receive).

Fur egg-sample:

describe TransferEverything do

  # ...

  describe "#execute" do

    it "moves all funds from one account to the other" do

      all_the_money = 1.42
      stub(@account1).balance { all_the_money }
      stub(@account1).withdraw
      stub(@account2).deposit

      @transfer.execute

      @account1.should have_received.withdraw(all_the_money)
      @account2.should have_received.deposit(all_the_money)

    end

  end

end

One thing I find particularly useful about this technique is the ability to execute code in a setup block, then verify the various aspects of it's behaviour in separate test-cases.

describe TransferEverything do

  # ...

  describe "#execute" do

    before do
      @all_the_money = 1.42
      stub(@account1).balance { all_the_money }
      stub(@account1).withdraw
      stub(@account2).deposit
      @transfer.execute
    end

    it "withdraws all funds from source account" do
      @account1.should have_received.withdraw(all_the_money)
    end

    it "deposits funds in receiving account" do
      @account2.should have_received.deposit(all_the_money)
    end

  end

end

This results in smaller, more coherent test-cases.

Using RR test-spies in RSpec

If you're using RSpec, you'll need to use the adapter class that comes with RR, rather than the one that comes with RSpec. That is, in your spec_helper.rb, do this, which provides access to the have_received matcher.

require 'rr'
Spec::Runners.configure do |config|
  config.mock_with RR::Adapters::Rspec
end

Spying on Java

Honourable mention: if you're lucky (*cough*) enough to be coding Java, I HIGHLY recommended Mockito, which also implements test-spies, and is easily the best Java mocking/stubbing library around.

Faster project-wide searching in Textmate

Update: I'm now using Ack in Project, which is even better!

Textmate is a nice editor, but it's "Find in Project" (⇧⌘F) function is annoying slow in large projects.

So, I'm happy to have found an alternative: GrepInProject++, which leverages the raw power of find(1) and grep(1) for super fast searching. The original GrepInProject was created by Henrik Nyh; Robert Thurnher added a better UI and some other features.

Enjoy it thusly:

$ sudo mv GrepInProjectSearch.nib /Applications/TextMate.app/Contents/SharedSupport/Support/nibs/
  • Open GrepInProject.tmCommand with Textmate

By default the "Grep in Project" command is bound to ⇧⌘F, replacing the built-in "Find in Project" command.

Note: my version is ever-so-slightly different to Robert's; I changed the find to use name rather than path matching (saving a few precious milliseconds), and removed a redundant "recursive" option from the grep.

Rsync: 1, Time Machine: 0

I recently bought a Mac Mini to serve various purposes about the house - not least of which, as a remote backup server for my MacBook Pro.

At which point I spent several evenings wrestling with Time Machine, with limited success. I moved my existing (500G, external) drive to the Mac Mini, shared it, and nominated it as my backup volume. But:

  • Time Machine wouldn't recognise the existing backups on that drive, and insisted on starting again from scratch (because it creates sparsebundle disk images for remote backup clients, but not for the local system). Annoying.
  • The initial backup took forever, because TM backs up everything not specifically excluded. (Granted, I'm backing up over a 801.11g wireless network).
  • Incremental backups kicked in every hour, and even when I hadn't been altering files, seemed to take an excessive amount of time to complete, ie. around 15 minutes. Much of this time was spent "preparing", and affected the performance of both my laptop, and the network. I don't need or want hourly backup, but TM provides no way to set a less demanding schedule.
  • Several times things got borked when I interrupted a backup midway, and I had to reboot, remount or otherwise intervene to get things working again.

Eventually, I gave up, and went looking for alternatives. After flirting with rdiff-backup and rsnapshot, I eventually did a little research and rolled my own rsync backup script:

#! /bin/sh

set -e 

snapshot_host=theLoungeRoomMac.local
snapshot_dir=/Volumes/WD_500/Snapshots/woollyams
snapshot_user=root
ssh_user=$snapshot_user@$snapshot_host

ping -o $snapshot_host > /dev/null || {
  echo "WARNING: can't see $snapshot_host -- skipping backup"
  exit 1
}

ssh $ssh_user "test -d $snapshot_dir" || {
  echo "ERROR: can't see $ssh_user:$snapshot_dir" >&2
  exit 2
}
  
snapshot_id=`date +%Y%m%d%H%M`

/usr/bin/rsync --archive --verbose \
  --delete --delete-excluded \
  --numeric-ids --extended-attributes \
  --one-file-system \
  --partial \
  --link-dest ../current/ \
  --relative \
  --max-size=50M \
  --exclude ".git" \
  --exclude ".svn" \
  /private/etc /Users/mdub \
  $ssh_user:$snapshot_dir/in-progress/

ssh $ssh_user "cd $snapshot_dir; rm -fr $snapshot_id; mv in-progress $snapshot_id; rm -f current; ln -s $snapshot_id $snapshot_dir/current"

Advantages over Time Machine are:

  • I can run this as often or as infrequently as I like.
    • I'm currently running it out of /etc/daily.local, which is run by periodic, which is run by launchd.
    • It doesn't get in my way by running while I'm actively using my machine.
  • I can use the full power of rsync filter rules to exclude uninteresting files (e.g. "--exclude .git --exclude .svn").
  • I can even filter by file size ("--max-size=50M") to skip things like big downloads and VMware images, without having to explicitly nominate them.
  • It takes less than 3 minutes to perform an incremental backup (providing I haven't changed too much).
  • I can safely interrupt the backup process, or pull the plug, or whatever, and it's robust enough to carry on where it left off next time.
  • I can keep as many time-stamped snapshots as I wish.
  • It's relatively efficient space-wise, due to the use of hard-links to share unchanged files between snapshots (not as efficient as Time Machine, though, which hard-links entire directories).
  • Each snapshot is a simple, easy-to-browse, easy-to-search directory, containing plain old files and directories. It gives me comfort that I wouldn't need a spiffy GUI to locate a file I was looking to restore.

"Continuous Integration" might not mean what you think it means

Continuous Integration is a common practice in Agile development circles, but I think people (especially those new to agile thinking) sometimes miss the point.

Problem is, the term has become synonymous with build-servers such as CruiseControl (etc, etc), which frequently grab the latest code, build it, and execute automated tests. These are often referred to as "continuous-integration servers", which IMHO is a really bad name, 'cos if there's one thing these servers typically don't do, it's integrate.

And the point of continuous-integration is just that: Integrating. Continuously! Which means:

  • developers frequently updating their working-areas (or personal branches) with the latest code on the mainline branch (typically many times a day), and
  • frequently merging their own changes back into the mainline (typically several times a day).

Unless you're doing this, you ain't "doing continuous integration", however frequently you're running automated builds!

Integrating continuously can be difficult. In particular, it forces you to chunk larger changes and features into small, bite-sized pieces that can be drip-fed into the codebase. And, you have to deal with other developers changing stuff all the time. Build-servers and automated tests are essential tools here, because they help keep the team honest, ensuring that everyone has a stable (if evolving) base to work on.

There are are plenty of upsides to frequent integration:

  • each individual integration is smaller, and therefore easier
  • design issues (including differences of opinion) are identified earlier
  • developers can leverage each other's work earlier
  • changes can be tested (and bugs detected) earlier
  • software can be deployed more frequently

In summary: check it in already!

Attacking slow-running builds (notes from CITCON)

Last weekend I went along to CITCON here in Melbourne. Which was great fun, by the way.

There I ran a session on "Attacking slow-running CI builds". It was a small group, but an interesting discussion, I think. Here are my (rough, unedited) notes:

WHAT is the impact of a slow build?

  • fewer checkins
  • more waiting
  • context switching
  • discourages integration
  • discourages writing of additional tests
  • more chance of overlapping checkins
  • more build breakages
  • more time required to get the build fixed
  • reduced productivity
  • WASTE!

WHY is the build slow?

  • slow tests (particularly acceptance tests)
    • over-testing (testing the same code-paths repeatedly)
    • expensive set-up and tear-down
    • too much testing via the user-interface
    • tests that pause, sleep, or poll (e.g. to deal with AJAX)
  • too much I/O!
  • use of slow infrastructure components (database servers, application servers, etc.)
  • slow hardware

HOW can we make it faster?

  • faster hardware
  • run tests in parallel
  • distribute tests
  • fail fast
    • selective testing: run tests most likely to fail first
      • could use dependency-analysis to identify which tests were affected by recent commits
  • refactor story-based acceptance tests into scenario-based tests
    • bigger tests, with more assertions, offsets set-up/tear-down costs
      • but makes tests more complex
  • share test fixtures between a group of tests
    • but breaks test isolation
  • avoid I/O
    • in-memory database
    • in-memory file-store (RAM disk?)
    • stub out infrastructure components
      • avoid testing these components by side-effect
  • populate the database directly, rather than using the user-interface to set-up for a test
  • separate your system into components that can be tested independently

Thinking about this later ...

There are two types ...

The suggestions for improving build times seemed to fall into two categories:

  1. optimise the build/tests
  2. throw additional hardware at the problem

My problem with the "throw hardware at it" approach is that it typically only helps for the build-server machine; the poor old developers are still left with a slow-running build, and therefore many of the productivity issues still exist.

Another idea

It occurs to me now that we missed a fairly fundamental trick to improve test times: improve the performance of the system-under-test itself. It's a great excuse to start thinking about performance earlier in the project.

"Customer Acceptance Test" does not need to mean end-to-end

On all the projects I've been on in recent years, we've ended up with the majority of the tests being either "developer unit tests", which run super-fast, or "customer acceptance tests" which test end-to-end (browser-to-database) and run super-slow.

Methinks it should be less black-and-white. If we can demonstrate functionality that the customer cares about by calling the underlying logic directly (i.e. at unit-test level), rather than by exercising the user-interface, then what's wrong with that? (We just need one test to prove that the underlying logic has been properly integrated into the UI.)

Railsconf 2008 Highlights

I was lucky enough to be at Railsconf 2008 in Portland last weekend (along with Marty, Rob, Trav and Abhi).

Highlights

  • Meeting other Ruby/Rails enthusiasts from all over. (Well, all over the US, at least).
  • Joel Spolsky's opening keynote was hilarious (in a good way). Some other commentators found it low on content, but I thought it had a strong message: usability matters!
  • Seeing Kent Beck present was fantastic. He had the audience hanging on his every word, as he described how "anything he'd done had taken 20 years to have an impact".
  • Ezra's talk on Vertebra, his XMPP-based "cloud control" project, was fascinating. What a great abuse of technology!
  • The JRuby and Rubinius teams are co-operating closely, in a spirit of friendly, respectful rivalry. Particularly notable is their effort to collaborate (with each other, and Matz) on a rigourous set of executable specs for Ruby language.
  • The upcoming version of Phusion Passenger will support not only Rails applications, but also Rack (and therefore Merb, Sinatra, Camping), and (get this) WSGI (and therefore a bunch of Python frameworks, including Django)!
  • There are increasingly varied options for deploying Rails apps, including the traditional {Apache,nginx}+{mongrel,thin}, JRuby WARs in a servlet container, Passenger, and the Amazon-EC2-based services like RightScale and Heroku. Heroku's deployment model is pretty damn clever: just "git push".

Regrets

With 4 streams going on, the talks I got to were naturally out-numbered by those I missed. Some of the ones I really wish I'd seen include:

  • MagLev: Gemstone's Ruby implementation-in-progress, based on their Smalltalk VM
  • Scott Chacon on "Using Git" (apparently he went into mind-bending detail of the Git internals)
  • Justin Gehtland's "Small Things, Loosely Joined, and Written Fast"

Git (on the Mac)

Git is the hype. I'm just starting to use it for a couple of projects, both directly, and as a local facade to Subversion.

Here are some suggestions on using git under Mac OS X.

Installation

Installation using MacPorts is pretty painless. Ensure you choose the "svn" variant if you want Git/Subversion integration.

sudo port install git +svn +doc

Another option is the native installer, available at http://code.google.com/p/git-osx-installer/

Textmate

If you use Textmate, the Git Textmate bundle is rather nice.

cd ~/Library/Application\ Support/TextMate/Bundles
git clone git://gitorious.org/git-tmbundle/mainline.git Git.tmbundle

Remember to set the TM_GIT variable (to "/opt/local/bin/git" or "/usr/local/bin/git", as the case may be), otherwise stuff won't work.

Shell completion

For command-line (bash) users, there's TAB-completion available, which is pretty handy. I'm using it directly from my local clone of the git source tree, like this:

# in .bashrc ...

git_completion_script=$HOME/OpenSource/kernel.org/git/contrib/completion/git-completion.bash
if test -f $git_completion_script; then
  source $git_completion_script
fi

GitNub for history browsing

GitNub is a sweet little UI for browsing history of git commits.

Using Git

So far, I haven't talked at all about how you actually USE the thing, and don't intend to, since there are already so many great resources out there on the subject. Some I've found useful are:

ReadOnlyFormBuilder

For RubyOnRails developers, form_for and fields_for are the accepted way of DRYing up form templates. You know the deal; you code

<% form_for :customer, :url => customers_path() do |customer_form| %>
  <p>
    <label>Name:</label> 
    <%= customer_form.text_field :first_name, :size => 15 %>
    <%= customer_form.text_field :last_name, :size => 20 %>
  </p>
  ... etc ...
<% end %>

and you get

<form action="/customers" method="post">
  <p>
    <label>Name:</label> 
    <input id="customer_first_name" name="customer[first_name]" size="15" type="text" />
    <input id="customer_last_name" name="customer[last_name]" size="20" type="text" value="" />
  </p>
  ... etc ...
</form>

Rails generates sensible field names and ids for you, and slurps existing values out of the model object. So far, so good.

Lately, I've taken to using the same trick when presenting data, not just when editing it. So, whereas before I might have written:

  <p>
    <label>Name:</label> 
    <span id="customer_first_name"><%= h @customer.first_name %></span>
    <span id="customer_last_name"><%= h @customer.last_name %></span>
  </p>
  ... etc ...

I'll now code it up as:

<% fields_for :customer, :builder => ReadOnlyFormBuilder do |customer_form| %>
  <p>
    <label>Name:</label> 
    <%= customer_form.text_field :first_name, :size => 15 %>
    <%= customer_form.text_field :last_name, :size => 20 %>
  </p>
  ... etc ...
<% end %>

and get the same output. (In case you're wondering, the ids are there to help with automated testing).

Note the similarity between the last code snippet and the first one on this page; apart from the first line they're indentical. Usually, I'll put the field-declarations themselves in a partial that's shared between "new", "edit" and "show" actions. That way, your "show" page automatically gets identical layout to the others, just with raw values in place of editable fields.

The ReadOnlyFormBuilder class itself it fairly straightforward - I'm planning to wrap it up into a plugin sometime soon. In the meantime, the implementation of text_field looks something like this:

def text_field(attribute, options={})
  content_tag("span", html_escape(value_of(attribute)), :id => "#{@object_name}_#{attribute}")
end

def value_of(attribute)
  value = model.send(attribute)
end

def model
  @object || @template.instance_variable_get("@#{@object_name}")
end

Rake profiling

Where's the bottleneck in your Rake build? Let's find out. Drop (or include) this in your Rakefile:

module Rake
  class Task
    def execute_with_timestamps(*args)
      start = Time.now
      execute_without_timestamps(*args)
      execution_time_in_seconds = Time.now - start
      printf("** %s took %.1f seconds\n", name, execution_time_in_seconds)
    end
    
    alias :execute_without_timestamps :execute
    alias :execute :execute_with_timestamps 
  end
end