mdub@DogBiscuit.org
... mmm, crunchy!
about - weblog - software - resume - email - pgp

Rake profiling

Where's the bottleneck in your Rake build? Let's find out. Drop (or include) this in your Rakefile:

module Rake
  class Task
    def execute_with_timestamps(*args)
      start = Time.now
      execute_without_timestamps(*args)
      execution_time_in_seconds = Time.now - start
      printf("** %s took %.1f seconds\n", name, execution_time_in_seconds)
    end
    
    alias :execute_without_timestamps :execute
    alias :execute :execute_with_timestamps 
  end
end

How I Learning to Stop Worrying and Love the Mac

In my new job, a Mac is the preferred tool of the trade. So now I'm learning to use a nice shiny MacBook Pro, and after years developing on Windoze, it's a very pleasant experience. Here are some of the things that are making my life just that little bit more delightful:

  • It's Unix. On Windoze, Cygwin helped a little, but this is soooo much better.
  • Launchbar - an application launcher and more. I bought this almost immediately after getting my Mac, based on some random recommendation somewhere, and haven't regretted it - I probably use it once every 5 minutes, on average. I've subsequently tried Quicksilver, but it didn't feel immediately "right" in the same way Launchbar does.
  • Textmate - "the missing editor". I've been a dedicated Emacs user for about 20 years now, but had to give Textmate a try, given all the hype. It's noice! Many of the features I know and love from Emacs are there (albeit bound to different cryptic key-combinations), and the UI is clean and Mac-savvy.
  • 1Password - a password manager. I got this handy little utility as part of a bundle from macheist.com. It stores all your passwords in a (secure) keychain, indexed by website, making it really easy to log back in next time you visit. Best of all, it integrates with most browsers, meaning you only need to store passwords once. It can store multiple sets of login details per site, too, which is very useful when testing web-apps.
  • Safari is a nice little web browser, and once you turn on the debug menu, it's even better. Normally, I'd reach for Firebug for this sort of functionality, but the built-in Safari equivalent is almost as good.

Diving (thoughtlessly) back into the workforce

As of last week, I'm developing software for money again, after a nine month break.

At the same time, I'm saying goodbye to ThoughtWorks, which is not easy, since I've really enjoyed my time there. I joined TW to hang out with good people, and wasn't disappointed: the faces have changed a little over the years (as people come and go), but TW still employs some of the most talented and passionate people I've ever had the opportunity to work with.

In the end, though, I figured it was time to try something new. I'm now working for Cogent Consulting. I've known Steve and Marty, who run the company, since the early days of the Melbourne eXtreme Programming group, and have a lot of respect for them both. They've started to assemble a very interesting, talented bunch of individuals, and I'm looking forward to the ride.

Among other things, I'm going to be doing a whole lot more Ruby and Rails work than I have to-date. Which feels good, since I've been blathering about Ruby for a few years now.

Ant build tips

During my past few Java projects, I've developed some guidelines which I find make builds faster, more reliable and easier to maintain. The details are specific to Ant, but hopefully the principles are transferrable to other software build systems.

These ideas may seem blindingly obvious to some readers, but I suspect they'll appear new-and-strange, and perhaps even bad-and-wrong, to others. In any event, I hope to trigger some thought/discussion.

Principles

My build approach is based on two simple principles:

  • Efficiency - don't rebuild up-to-date outputs
  • Safety - do rebuild out-of-date outputs

(By "output", I mean some artifact produced by the build. I'm avoiding the word "target" here, since it has specific meaning in Ant.)

Efficiency - DON'T rebuild up-to-date outputs

Quick builds, and rapid feedback, are important for developer productivity. Using a build system that recreates everything from scratch after even a minor change is a great way to kill productivity.

Re-executing a single build step is typically not the end of the world, but many outputs are also inputs to other build steps, so unnecessarily rebuilding an output early on during the build can trigger rework all the way through.

Safety - DO rebuild out-of-date outputs

On the flip side, when a key input DOES change, you need to ensure that all the derived outputs are rebuilt, or at least revalidated. Otherwise, your build becomes "flaky" and unpredictable.

A flaky build forces developers to compensate somehow, e.g. by explicitly running "clean" builds every time, whch impacts productivity.

Tips

Explicitly declare dependencies between your targets

Some people are reluctant to declare dependencies, because declaring them introduces overhead. But not doing so is unsafe, because it opens the door to build steps being executed with stale inputs, resulting in confusing, frustrating, non-deterministic build behaviour.

If you've followed the "Don't rebuild up-to-date outputs" rule, then dependencies should be safe/cheap, ie. there's minimal overhead, and no reason not to declare them.

Targets should be Nouns, not Verbs

Typically, programmers name Ant targets by what they do, e.g. "compile", "test". However, this tends to produce very procedural builds.

So instead, I recommend choosing names describing what the target produces, e.g. "classes", "test/report". Perhaps it's just because I spent so many years automating builds using make, but I find that such noun-ish targets help in various ways:

  • it's easier to understand what outputs each target produces (for obvious reasons)
  • intermediate targets tend to become useful in their own right
  • dependencies become clearer, as it makes more sense to depend on a concrete input, rather than a process

If you've read this far, go read Martin Fowler's "OutputBuildTarget" article; he explores the subject more eloquently than I'm capable of.

Some targets might not produce a concrete artifact (or the artifact might not be the main point of the target). In such cases, I'll sometimes name them based on the condition they produce, or ensure. For example, a target using Simian to check for duplication might be called "minimal-duplication" (as opposed to "simian").

Use <uptodate> to avoid unnecessary rework

Most Ant tasks include dependency-checking based on file timestamps, and will avoid rework. But some tasks aren't so clever. For instance, the <junit> task will happily re-run all your tests, even if they all passed last time, and neither code not tests have changed.

The <uptodate> task can help fill the gap. It compares the timestamps of specified input and output files, and sets a property indicating that work can be avoided.

Here's an example where <uptodate> is used to avoid unnecessary re-generation of XML-mapping code:

<target name="xml-module/check"
        depends="properties">
    <uptodate property="xml-module.uptodate"
              targetfile="${xml-module.jar}">
        <srcfiles dir="spec" includes="**/*.xsd"/>
    </uptodate>
</target>

<target name="xml-module"
        depends="xml-module/check, xmlbean/taskdef"
        unless="xml-module.uptodate">
    <xmlbean destfile="${xml-module.jar}"
             classpathref="xmlbeans.classpath">
        <fileset dir="spec" includes="**/*.xsd"/>
    </xmlbean>
</target>

Use <touch> to record a completed task

Although it's unusual, some build steps have no output: they are simply processes that must be executed, e.g. validating the format of a file, or verifying adherence to coding standards (Checkstyle, Simian). Other build steps can produce many outputs, e.g. code-generation tools.

In these cases, where there's no identifiable primary output, it can be useful to invent a placeholder output-file using Ant's <touch> task. The resulting file is empty, but it's timestamp can be used for dependency-checking, to determine if/when the build step needs to be re-run.

<touch> is most useful in conjunction with <uptodate>, as in the following example:

<target name="libs/check">
    <uptodate property="libs.uptodate">
        <srcfiles dir="." includes="ivy.xml"/>
        <mapper type="merge" to="lib/.done"/>
    </uptodate>
</target>

<target name="libs" description="retrieve dependencies with ivy"
        depends="libs/check" unless="libs.uptodate">
    <ivy:retrieve pattern="lib/[conf]/[artifact].[ext]" />
    <touch file="lib/.done" />
</target>    

Here we're using Ivy to download third-party libraries. After download, we create a touch-file to mark the job as done. On subsequent runs, the library resolution and download process will be skipped, unless the "ivy.xml" control-file has been changed.

As I alluded to earlier, I have also used the combination of <touch> and <uptodate> to:

  • skip code-style checks when code hasn't changed
  • skip tests when neither code nor tests have changed

Use <dependset> to remove out-of-date outputs

When Ant is not clever enough to determine when something needs re-doing, the <dependset> task is useful for mopping up stale outputs.

Pitfalls

Avoid "private" targets

Many builds include "private" or "hidden" targets, that are unsafe to call directly. A common convention in the Ant world is name these targets starting with '-', since that makes them inaccessible from the command-line.

I think private targets are a smell: they indicate that implicit dependencies are present in the build. Hiding the unsafe targets makes sense, in a way ... but I much prefer to make the dependencies explicit, as described above, at which point it's safe to let every target be called directly (which often comes in handy when testing some aspect of the build process).

Avoid targets depending on "clean"

Having popular targets depend on "clean" is a bad smell. You DO need to avoid using artifacts from previous builds which have passed their use-by date, but starting the whole build from scratch is overkill, when proper dependencies and careful timestamp-checking can ensure that just the stale stuff is rebuilt.

Avoid <copy overwrite="true">

An anti-pattern I often encounter (and a pet peeve) is:

<copy overwrite="true" ...>
    ...
    <filterset>
        <filter token="PASSWORD" value="${db.password}"/>
        ...
    </filterset>
</copy>

The "overwrite" attribute causes Ant to copy files every time, ignoring the usual timestamp-checking that prevents re-generation of up-to-date files. Using "overwrite" can easily cause most of your jars/wars/ears/etc to be updated with every build.

Instead, use <dependset> to invalidate the outputs in the case that ${db.password} has changed.

See Also

method_missing magic - emulating Groovy's "it" in Ruby

Inspired variously by:

I've cooked up a shortcut for generating simple blocks, meaning that rather than

people.select { |x| x.name.length > 10 }

I can write such things as:

people.select(&its.name.length > 10)

Disclaimer: I think this is more "cool hack" than useful tool; it's probably too much of an alien artifact to be useful in real life. And it's not generally applicable, like "it" in Groovy. And really, it's not that much more verbose to use a block. Aaaaaanyway ...

The trick is that the above is parsed as

people.select(&(its.name.length.>(10)))

The "its" method creates a MessageBuffer object, which records the messages (method invocations) sent it's way:

irb(main):001:0> require 'message_buffer'
=> true
irb(main):002:0> its
=> #<MessageBuffer:0x6b40b44 @messages=[]>
irb(main):003:0> its.name.length < 10
=> #<MessageBuffer:0x6b3e678 @messages=[[:name], [:length], [:<, 10]]>

Now, the "&" operator coerces it's argument to a Proc, and MessageBuffer#to_proc generates a Proc that replays all the recorded messages. Q.E.D.

The full source-code is fairly short, so I'll include it inline:

class MessageBuffer 

  instance_methods.each do |m|
    undef_method m unless m =~ /^(__|respond_to|inspect)/ 
  end
  
  def initialize
    @messages = []
  end

  def method_missing(*message)
    @messages << message        # record the message
    self                        # return self so we can keep recording
  end
  
  def __replay_all_messages__(obj)
    @messages.inject(obj) do |obj, message|
      obj.__send__(*message)
    end 
  end
  
  def to_proc
    proc { |x| __replay_all_messages__(x) }
  end

end

def its
  MessageBuffer.new
end


Update: Florian Gross suggested a better way to replay recorded messages, using inject, and I've updated the code accordingly.

Selenium Core 0.8.0

The Selenium Core team (of which I'm a sometime member) released version 0.8.0 last week.

Highlights include:

  • a "multiWindow" option which places the application-under-test in a separate window, allowing testing of "frame-busting" apps;
  • more reliable page-load detection for popup windows;
  • new cookie-management actions;
  • a run-speed slider and "Pause" button which replace the old Run/Walk/Step radio-buttons;
  • many bug-fixes and stability improvements;
  • tested against latest versions of Firefox, IE6, Opera, Konqueror, Safari and WebKit.

The multi-window layout option is a great step forward, since it was a limitation that prevented many people from using Selenium.

You can download the new version at:

http://release.openqa.org/selenium-core/0.8.0/

(Yes, the documentation and website still suck. Sorry.)

Presentation on Ruby/Rails at EJA

A couple of months ago I gave a presentation on Ruby and Rails to a local Java user-group. My slides are now online:

It contains a few examples showing how expressive Ruby can be, when compared to Java.

I hate "frameworks"

Give me a "toolkit" or "library" over a "framework" any hour of the day.

A software framework offers to solve 80% of my problem, but usually without understanding what my problem actually is.

A toolkit is collection of tools. I can pick them up and use them as I see fit. I can use individual tools/components, without needing to adopt them all. I can use them in conjunction with other tools I have, without voiding any warranties.

Grumble.

BasketCase

The project I'm currently on uses Rational ClearCase to manage it's source-code.

Now, I'm sure there are many great things about ClearCase. Not that any spring to mind. (Wait, here's one: it keeps ClearCase administrators in a job! Whew, I knew there had to be a silver lining somewhere.)

What I can tell you, though, is that it sucks for agile, team development. So, I've wrapped a script around the ClearCase command-line, to make the wannabe Subversion user in me feel more at home.

(Ruby: the glue that doesn't set).

The result is BasketCase (at RubyForge). It's still very much a work-in-progress (and when I no longer have need of it, will likely become a work-in-abandonment), but I hope it may provide a glimmer of hope to anyone else who finds themselves in a similar predicament.

By the way, the less observant of you may not have noticed that the name is a clever play on the word "rational". Geddit? Oh, nevermind ...

Crimes committed in the name of "Consistency"

In developing software, consistency often helps:

  • Refactoring your code to reduce duplication makes your system easier to extend, and provides bugs fewer places to hide.
  • Solving similar problems in similar ways (e.g. using design patterns) promotes conceptual consistency, allowing teams to communicate and collaborate more easily.
  • Adhering to user-interface guidelines can make your application more predictable, and therefore, more comfortable to use.

That's great. But don't lose sight of the real goals, like: usability and maintainability. Consistency is just a strategy; if it's allowed to become a goal in it's own right, things can start to go awry:

  • Comments are sometimes helpful, but making them mandatory for every method/procedure often reduces maintainability, by making the code "noisy".
  • Parts of your system may benefit from declarative security and transaction management, clustering, and all those other tempting features provided by EJBs, but that's no reason to use them everywhere.
  • An ORM tool like Hibernate is great for building object-oriented, domain-driven, RDBMS-backed enterprise applications, but if all you're doing is dumping data as CSV files, perhaps it's overkill.
  • Your defect-tracking system might be good for tracking, er, defects ... but that doesn't mean you should use it to manage all your work.
  • Re-use is nice, where appropriate, but some things that look conceptually similar from the ivory penthouse can turn out to be quite different once you get into the details. If it requires more code to use an existing library, than to implement your feature directly, then the library isn't adding value.

In summary: keep your eye on the ball. (Whoops, there it goes ...)