Unresolved tension and TDD

A standard creative writing technique for avoiding writer’s block the next day is to stop in mid-sentence, ideally with your protagonist in peril. When you stop writing at

“The demo ended, and the lights were turned up. The project leader turned to the customer. The customer frowned for a moment, and said …”

you’re leaving yourself a no-friction starting point for the next morning. And your subconcious, which hates unresolved tension, is going to help by churning away all night thinking about what the customer is going to say, and how the project leader is going to recover. You might go to sleep with a good definite idea of what happens next, and wake up with better ideas.

The same technique works when doing Test-Driven Development. Instead of stopping for the day with tests passing, leave yourself a failing test. Ideally a test for next bit of code you plan to write. Your subconcious, taunted by the image of the JUnit red bar, is going to grind away on the problem until you sit down at the keyboard the next morning. Then, instead of spending time puzzling over what to do next (the coding equivalent of starting a new chapter), you have a very definite task before you: “Get to green”. And you have the benefit of whatever ideas your subconcious came up.

Once you’ve resolved the conflict by making the failing test pass, you have momentum to move on to the next test.

This technique works even when you realize on the way home that you screwed up the test. It’s just a different form of unresolved conflict.

Dinner, with Rhino

John Brewer led a hands-on JavaScript session at last night’s Silicon Valley Patterns meeting, using the Rhino JavaScript interepreter. I pair-programmed with Bill Humphries, which involved a “one types, the other eats” maneuver with his laptop to avoid getting dinner crumbs on the keyboard. We made it through the prototype-based inheritance exercises (minus the tricky extra-credit problem) just as they were shutting down the restaurant.

Rhino has been on my “check it out” list for a while now. I don’t have projects in queue that are liable to be using server-side JavaScript, but who knows.

I’m a Dapper Drake Man

On a whim, and before the morning dose of caffiene raised the alert level to “wary”, I upgraded my primary Linux box to Ubuntu 6.0.6 (“Dapper Drake”). It took an hour; half of that was downloading, half was configuring. The upgrade process stopped once to ask if I wanted to overwrite an Apache configuration file that I’d modified. It even showed me a diff. Very nice touch. Poking around, I found only one thing that the upgrade had broken (XChat2 went away), but took two minutes to fix.

Ubuntu is getting very close to being a Linux desktop for non-technical people, though I’m probably the wrong person to judge that.

Date bloat

If you want to find performance hot-spots in a legacy codebase, a good place to look is where time is being manipulated. Converting a string representation of a date into a number (say, of epoch days or seconds) is surprisingly expensive. I once profiled a system and found that 30% of runtime went to pulling dates out of a database, converting them to strings, then to numbers, and then back to strings for display. Some judicious caching and a smarter conversion shaved that down to less than 5%. The time hit can be worse if the system uses some general purpose date conversion package that supports a number of formats. You may be using a single format, but you’re paying a performance hit for generality. After all, customer might want to type “next friday at 5pm” instead of having to reach for a calendar.

If performance isn’t an issue, there may still be a refactoring opportunity (i.e., code bloat). There’s something about time that seems to deaden the senses, causing otherwise thoughtful developers to write a “how many minutes ago was midnight” calculation without noticing that it’s already done in four other places. In the Java world, the problem was compounded when the Calendar class arrived, giving developers an even bigger bag of tools for determining how many hours until 8 AM next Monday.

Accidentally defeating MySQL’s query cache

I just learned some neat things about MySQL’s query optimizer, including a bit about how MySQL 4.0.1 (and later) caches queries and result sets. When the query cache is enabled (which it is by default on Fedora Core 3), and you present MySQL with a SELECT query, it consults the cache before preparing or analyzing the query. On finding a match, the cached result from the prior execution of the query is returned.

The cache is actually keyed by the MD5 hash of the full query, including whitespace and comments. In most applications, queries are effectively static. Even when queries are generated dynamically, an identical query string is regenerated each time one is requested. In these cases, hashing the full query isn’t a problem. Same query, same hash.

But consider the application that’s grown over time to where no one developer has a good grasp of data access patterns. When slow performance demands attention, you will have to reconstruct a model of data access patterns, because the story very likely changed while you were paying attention elsewhere. Histograms of query counts and executions times are useful. To account for the same query being issued from different parts of the application, it’s common to instrument the query by adding a comment. For example,

SELECT /* summary view */ COUNT(*) FROM widgets;

Comments have the additional benefit of being visible when you notice that the server is crawling and ask MySQL to show you which queries are currently executing.

But if you start to inject dynamic information (e.g., timestamps and process id) into the comments, the “same” query may hash differently, and the benefit of having a query cache is lost. For no readily apparent reason, things just run a bit slower.

For more on query caching (including how to defeat it intentionally, and why), see the MySQL documentation, or chapter 5 in High Performance MySQL.

Test First, Really.

From the “mistakes I make that you can learn from” file:

When changing legacy code, find and run the tests before you start.

Otherwise you might find yourself burning up an hour or two trying to figure out how your perfectly innocuous changes managed to break some seemingly unrelated test. Only to find, after carefully backing out your changes, that the tests were failing before you started.

So get a baseline, even if you’re writing new tests.

For whatever reason, I have to relearn this lesson every few months. But at least I remember to run the tests before declaring ‘done’.