mdub@DogBiscuit.org
... mmm, crunchy!
about - weblog - software - resume - email - pgp

Ant build tips

During my past few Java projects, I've developed some guidelines which I find make builds faster, more reliable and easier to maintain. The details are specific to Ant, but hopefully the principles are transferrable to other software build systems.

These ideas may seem blindingly obvious to some readers, but I suspect they'll appear new-and-strange, and perhaps even bad-and-wrong, to others. In any event, I hope to trigger some thought/discussion.

Principles

My build approach is based on two simple principles:

  • Efficiency - don't rebuild up-to-date outputs
  • Safety - do rebuild out-of-date outputs

(By "output", I mean some artifact produced by the build. I'm avoiding the word "target" here, since it has specific meaning in Ant.)

Efficiency - DON'T rebuild up-to-date outputs

Quick builds, and rapid feedback, are important for developer productivity. Using a build system that recreates everything from scratch after even a minor change is a great way to kill productivity.

Re-executing a single build step is typically not the end of the world, but many outputs are also inputs to other build steps, so unnecessarily rebuilding an output early on during the build can trigger rework all the way through.

Safety - DO rebuild out-of-date outputs

On the flip side, when a key input DOES change, you need to ensure that all the derived outputs are rebuilt, or at least revalidated. Otherwise, your build becomes "flaky" and unpredictable.

A flaky build forces developers to compensate somehow, e.g. by explicitly running "clean" builds every time, whch impacts productivity.

Tips

Explicitly declare dependencies between your targets

Some people are reluctant to declare dependencies, because declaring them introduces overhead. But not doing so is unsafe, because it opens the door to build steps being executed with stale inputs, resulting in confusing, frustrating, non-deterministic build behaviour.

If you've followed the "Don't rebuild up-to-date outputs" rule, then dependencies should be safe/cheap, ie. there's minimal overhead, and no reason not to declare them.

Targets should be Nouns, not Verbs

Typically, programmers name Ant targets by what they do, e.g. "compile", "test". However, this tends to produce very procedural builds.

So instead, I recommend choosing names describing what the target produces, e.g. "classes", "test/report". Perhaps it's just because I spent so many years automating builds using make, but I find that such noun-ish targets help in various ways:

  • it's easier to understand what outputs each target produces (for obvious reasons)
  • intermediate targets tend to become useful in their own right
  • dependencies become clearer, as it makes more sense to depend on a concrete input, rather than a process

If you've read this far, go read Martin Fowler's "OutputBuildTarget" article; he explores the subject more eloquently than I'm capable of.

Some targets might not produce a concrete artifact (or the artifact might not be the main point of the target). In such cases, I'll sometimes name them based on the condition they produce, or ensure. For example, a target using Simian to check for duplication might be called "minimal-duplication" (as opposed to "simian").

Use <uptodate> to avoid unnecessary rework

Most Ant tasks include dependency-checking based on file timestamps, and will avoid rework. But some tasks aren't so clever. For instance, the <junit> task will happily re-run all your tests, even if they all passed last time, and neither code not tests have changed.

The <uptodate> task can help fill the gap. It compares the timestamps of specified input and output files, and sets a property indicating that work can be avoided.

Here's an example where <uptodate> is used to avoid unnecessary re-generation of XML-mapping code:

<target name="xml-module/check"
        depends="properties">
    <uptodate property="xml-module.uptodate"
              targetfile="${xml-module.jar}">
        <srcfiles dir="spec" includes="**/*.xsd"/>
    </uptodate>
</target>

<target name="xml-module"
        depends="xml-module/check, xmlbean/taskdef"
        unless="xml-module.uptodate">
    <xmlbean destfile="${xml-module.jar}"
             classpathref="xmlbeans.classpath">
        <fileset dir="spec" includes="**/*.xsd"/>
    </xmlbean>
</target>

Use <touch> to record a completed task

Although it's unusual, some build steps have no output: they are simply processes that must be executed, e.g. validating the format of a file, or verifying adherence to coding standards (Checkstyle, Simian). Other build steps can produce many outputs, e.g. code-generation tools.

In these cases, where there's no identifiable primary output, it can be useful to invent a placeholder output-file using Ant's <touch> task. The resulting file is empty, but it's timestamp can be used for dependency-checking, to determine if/when the build step needs to be re-run.

<touch> is most useful in conjunction with <uptodate>, as in the following example:

<target name="libs/check">
    <uptodate property="libs.uptodate">
        <srcfiles dir="." includes="ivy.xml"/>
        <mapper type="merge" to="lib/.done"/>
    </uptodate>
</target>

<target name="libs" description="retrieve dependencies with ivy"
        depends="libs/check" unless="libs.uptodate">
    <ivy:retrieve pattern="lib/[conf]/[artifact].[ext]" />
    <touch file="lib/.done" />
</target>    

Here we're using Ivy to download third-party libraries. After download, we create a touch-file to mark the job as done. On subsequent runs, the library resolution and download process will be skipped, unless the "ivy.xml" control-file has been changed.

As I alluded to earlier, I have also used the combination of <touch> and <uptodate> to:

  • skip code-style checks when code hasn't changed
  • skip tests when neither code nor tests have changed

Use <dependset> to remove out-of-date outputs

When Ant is not clever enough to determine when something needs re-doing, the <dependset> task is useful for mopping up stale outputs.

Pitfalls

Avoid "private" targets

Many builds include "private" or "hidden" targets, that are unsafe to call directly. A common convention in the Ant world is name these targets starting with '-', since that makes them inaccessible from the command-line.

I think private targets are a smell: they indicate that implicit dependencies are present in the build. Hiding the unsafe targets makes sense, in a way ... but I much prefer to make the dependencies explicit, as described above, at which point it's safe to let every target be called directly (which often comes in handy when testing some aspect of the build process).

Avoid targets depending on "clean"

Having popular targets depend on "clean" is a bad smell. You DO need to avoid using artifacts from previous builds which have passed their use-by date, but starting the whole build from scratch is overkill, when proper dependencies and careful timestamp-checking can ensure that just the stale stuff is rebuilt.

Avoid <copy overwrite="true">

An anti-pattern I often encounter (and a pet peeve) is:

<copy overwrite="true" ...>
    ...
    <filterset>
        <filter token="PASSWORD" value="${db.password}"/>
        ...
    </filterset>
</copy>

The "overwrite" attribute causes Ant to copy files every time, ignoring the usual timestamp-checking that prevents re-generation of up-to-date files. Using "overwrite" can easily cause most of your jars/wars/ears/etc to be updated with every build.

Instead, use <dependset> to invalidate the outputs in the case that ${db.password} has changed.

See Also

TestGroups for JUnit

New users of JUnit often assume that there will only be one instance of their TestCase class (I did, at first).

In fact, each test-method is represented by a separate instance of the test-class. This isolation of test-methods is actually pretty sensible, since it means that (from the horse's mouth)

... each test will run with a fresh fixture and the results of one test can't influence the result of another.

If your tests are truly unit-tests, then re-creating a fresh fixture for every method should be fairly cheap, so it's not a big deal. BUT, it's a slightly different story if you're using JUnit as a framework for acceptance tests, or integration tests, or any scenario in which creating the required fixture/resource objects is costly.

My problem

On my current project, we have a large suite of web-app acceptance-tests written using HtmlUnit. We starting off writing tests something like this:

public PolicySelectionScreenTest extends TestCase {
    public void setUp() throws Exception {
        expensiveSetUpCode();
    }
    public void testPolicyTypeDefaultsToStandard() {
        assertEquals("STD", screenFixture.getPolicyType());
    }
    public void testWindscreenOptionDefaultsToNo() {
        assertEquals("N", screenFixture.getWindscreenOption());
    }
}

It soon became obvious that re-running the expensiveSetUpCode() for each test was - well - expensive, so we starting looking for ways to reduce that overhead. An obvious way to do it is to bundle several asserts into the one test, e.g.

public PolicySelectionScreenTest extends TestCase {
    public void testInitialScreenStateIsCorrect() {
        expensiveSetUpCode();
        assertEquals("STD", screenFixture.getPolicyType());
        assertEquals("N", screenFixture.getWindscreenOption());
    }
}

There are a couple of problems with this, though:

  • Test-methods get bloaty, and their names become less informative. This isn't ideal, as I prefer short test-methods, with names that describe the intended behaviour.
  • Testing of a scenario may halt prematurely, when it could usefully run further and provide more feedback about what is or isn't working.

A solution

So, I developed a way of aggregating a number of related test-methods into a "TestGroup". Now our tests look more like this:

public PolicySelectionScreenTests extends TestGroup {
    public void groupSetUp() throws Exception {
        expensiveSetUpCode();
    }
    public void testPolicyTypeDefaultsToStandard() {
        assertEquals("STD", screenFixture.getPolicyType());
    }
    public void testWindscreenOptionDefaultsToNo() {
        assertEquals("N", screenFixture.getWindscreenOption());
    }
}

A TestGroup instance can be converted into JUnit-ese easily, by calling its asTest() method:

public static Test suite() {
    TestSuite suite = new TestSuite();
    // ... etc ...
    suite.addTest(new PolicySelectionScreenTests().asTest());
    return suite;
}

Alternatively, we have an extended TestSuite implementation that makes this a little easier:

public static Test suite() {
    TestSuite suite = new GroupAwareTestSuite();
    // ... etc ...
    suite.addTestSuite(PolicySelectionScreenTests.class);
    return suite;
}

Now, our original tests run faster (since the expensiveSetUpCode() is only run once), but the test-methods remain short and well-named. Woo-hoo! [cue weird little dance of joy].

But wait, there's more

As you might have guessed, there's a groupTearDown() to match groupSetUp(). The normal setUp() and tearDown() hooks are also supported, and run before/after each test, as you'd expect.

A warning

Once we start sharing test-fixtures like this, we're effectively removing JUnit's built-in safety harness, and thus running the risk of tests infecting the results of other tests by "polluting" the fixture. There's no easy solution: you just have to be really careful. Guidelines:

  • If possible, avoid putting any code that alters the state into the test-methods of a TestGroup.
  • If that's not possible, ensure you reset the fixture to a known state in the setUp() hook.

A peek inside

In my first attempt at TestGroups, I simply implemented the Test interface. Unfortunately, it's a fairly thin interface, and doesn't provide an API for navigating the hierarchical structure of a test-suite. If you want to explore the hierarchy, you'll have to assume that your test-suite will be constructed from TestCase and TestSuite objects - perhaps with the odd TestDecorator thrown in - and perform the required instanceof checks. If some new, unknown implementation of Test comes along, your assumptions are shot. Most IDEs are in this position, as they typically display the test-hierarchy. Thus, my original implementation didn't play nicely in an IDE environment.

So, instead, TestGroup.asTest() creates a structure that adapts the TestGroup to look like a TestSuite. The suite is wrapped by a TestSetup decorator that fires the groupSetUp() and groupTearDown() hooks. The TestCases in the suite are simple proxies that invoke methods on the shared TestGroup instance. Or, in pictures:

TestGroup

Because the result is just a aggregate of core JUnit objects, it doesn't confuse IDEs in the way I described earlier.

The code

If you're interested in using TestGroups, or just want to take a look at the code, you can get it here.

In praise of the Java Generics FAQ

I've been attempting to make QDox a little more tolerant of Java code containing generic types. In the process I found Angelika Langer's Java Generics FAQ. There's a wealth of information here about the subject, but it's extremely well-organised and a joy to navigate.

Great job, Angelika ... thank you very much for your help.

Jetty as a test-suite decorator

Marty Andrews and I have been working on a small project together. It's primarily intended as a demo of continuous integration, but has also given us the opportunity to play with some new technologies/ideas.

One of the coolest tricks we picked up (from Cactus) was to start/stop a web-server as part of running the tests, rather than depending on having one running already.

(In the past I've typically written Ant scripts that dump a WAR-file in a magic directory, and wait "a bit" for the server to auto-deploy it, before running my HTTP-based acceptance-tests. This is way nicer.)

The key is a test decorator that starts Jetty to serve our web-app:

package com.thoughtworks.todolist;

import junit.extensions.TestSetup;
import junit.framework.Test;
import org.mortbay.jetty.Server;
import org.mortbay.util.InetAddrPort;

public class JettyTestSetup extends TestSetup {

    private Server _server;

    public JettyTestSetup(Test test) {
        super(test);
    }

    protected void setUp() throws Exception {
        _server = new Server();
        _server.addListener(new InetAddrPort(9999));
        _server.addWebApplication(
            "/todolist", "build/todolist.war"
        );
        _server.start();
    }

    protected void tearDown() throws Exception {
        _server.stop();
        _server = null;
    }

}

As you can see, it's not hard to get a Jetty server going. Jetty is nice and lightweight, too: it's small (less than 600k), and starts up fast (less than a second here).

Now, it's a simple matter to decorate our test-suite with JettyTestSetup:

public class AllAcceptanceTests {

    public static Test suite() throws Exception {
        TestSuite suite = new TestSuite();
        suite.addTestSuite(ViewListTest.class);
        suite.addTestSuite(AddItemTest.class);
        // ... etc ...
        return new JettyTestSetup(suite);
    }

}

That's it. The server gets started at the beginning of the suite, and stopped afterward.

Exploring the Java Heap (with Ruby and Graphviz)

Recently, I needed to track down a rather nasty memory-leak in a Java app, and ended up rolling together a simple heap-dump explorer.

I could probably have achieved the same result with a commercial Java profiler; in fact, we had one around. But unfortunately, one of the third-party libraries we use started failing strangely when run under the profiler. I didn't particularly want to go thru the hoops of investigating upgrades or alternatives. I was also worried about the overhead of running our app (which gets fairly large) under a full-blown profiler. To cut a long story short, I decided to leverage hprof, the mini-profiler bundled with Sun's JDK.

hprof can be made to dump the state of the heap, using the following incantation:

$ java -Xrunhprof:heap=dump,doe=n my.MainClass

Hit CTRL-Break (or CTRL- under *nix), and you'll get a file called "java.hprof.txt", containing, among other things, a heap dump:

HEAP DUMP BEGIN (226541 objects, 12621472 bytes) ...
ROOT 22538ac8 (kind=<thread>, id=29, trace=39041)
ROOT 224de7e0 (kind=<thread>, id=19, trace=39041)
...
ARR 213470f8 (sz=24, trace=11796, nelems=2, 
              elem type=java.lang.Object@a19378)
        [0]             2063d760
        [1]             21349ce0
OBJ 213471c0 (sz=48, trace=11797, 
              class=org.openide.util.WeakSet$Entry@2020e660)
        this$0          21313990
        iterChainPrev   21f1a068
        queue           21313a30
        referent        205520d8
OBJ 2134c880 (sz=24, trace=11799, 
              class=java.lang.ref.WeakReference@ab22c0)
        next            2134c880
        queue           a19900
...

As you can see, it contains info about each object (or array, or class), including size, and references to other objects. If you have time on your hands, you can search around this with a text editor, and get some idea of what's going on. With that many objects to deal with, I wrote a Ruby script (hprofexplore.rb) to help me out.

Usage is fairly simple; first, you point the script at the hprof output, from which it extracts the heap data.

$ hprofexplore.rb java.hprof.txt
loading HPROF data from java.hprof.txt ...
219191 objects loaded

Then you get a command prompt:

>> ?
S <pattern> ... list objects with type matching (glob-style) <pattern>
<id>        ... goto object with specified <id>
O           ... display output references FROM current object
I           ... display input references TO current object
D [<file>]  ... dump a DOT graph of visited objects to <file> 
                (default: last DOT output file)
U           ... un-visit the current object, for graphing purposes
C           ... clear the visited-set; ie. un-visit all objects
Q           ... quit

The "s" (search) command is a good place to start if you have a clue as to what type of object is not getting garbage-collected:

>> s model.Belief
21f5e2a8    OBJECT foo.model.Belief size=40
21f5d7c8    OBJECT foo.model.BeliefMode size=16
21f5d7e8    OBJECT foo.model.BeliefMode size=16
21f5cfa8    CLASS foo.model.Belief
21f5d808    OBJECT foo.model.BeliefMode size=16
21f5d828    OBJECT foo.model.Belief size=40
21f5dd88    OBJECT foo.model.Belief size=40
21f5d788    CLASS foo.model.BeliefMode
8 objects matched

With this starting point, you can focus on a particular object, and get info about the references to/from it:

>> 21f5e2a8 i
21f5e2a8    OBJECT foo.model.Belief size=40
  <- [0]               21fa8408  (ARRAY java.lang.Object)
  <- userObject        220288b0  (OBJECT foo.ui.model.AgentData)
  <- referent          22029c70  (OBJECT java.util.WeakHashMap$Entry)

Explore the reference-graph by typing in object-IDs (cut and paste comes in handy here). Appending "i" or "o" to an object-ID causes input or output references to be printed as well:

>> 220288b0 i
220288b0    OBJECT foo.ui.model.AgentData size=48
  <- [0]               220270c8  (ARRAY java.lang.Object)
  <- source            22028990  (OBJECT java.beans.PropertyChangeSupport)
  <- value             22029c70  (OBJECT java.util.WeakHashMap$Entry)
  <- repositoryElement 22688778  (OBJECT foo.ui.nodes.BeliefNode)

>> 22029c70 i
22029c70    OBJECT java.util.WeakHashMap$Entry size=40
  <- [4]               21fdda28  (ARRAY java.util.WeakHashMap$Entry)

>> 21fdda28 i
21fdda28    ARRAY java.util.WeakHashMap$Entry size=80 n_elements=16
  <- table             21fdd9c8  (OBJECT java.util.WeakHashMap)

>> 21fdd9c8 i
21fdd9c8    OBJECT java.util.WeakHashMap size=48
  <- findViewElementCache 21fdd888  (OBJECT foo.ui.model.Repository)

>> 21fdd888
21fdd888    OBJECT foo.ui.model.Repository size=48

Now the cool part: the "d" command allows you to write a reference-graph of the visited objects, in "DOT" format:

d graph.dot
wrote graph.dot

(see graph.dot)

You can then use the "dot" tool from AT&T's GraphViz project, to render it as PNG (or GIF, or JPG):

$ dot -Tpng -o graph.png graph.dot

A useful toy, in any case.