Most bug tracking systems accumulate feature requests along with defect reports. The next time I set up a tracking system, I’m going to add a “wouldn’t it be nice if” (WIBNI) category to distinquish between the nifty ideas that developers dream up and the features that customers actually ask for.

There are good reasons for implementing both types of features, but unless you keep the distinction visibile (and the decisions about how much of each to do deliberate), you might find yourself accidentally putting out a release with nothing but internally generated features. There might be some customer value in that, but navel gazing can be a risky strategy.

A Day of data munging

Today I tackled a data munging problem that had been festering on my To Do list for several weeks. The problem was blocking other, more pleasant work, and was contributing to my having a foul mood.

The problem involved reworking 121 HTML documents to hoist content out of a table and then discard the table structure along with some other content. Editing the documents by hand would have been both exhausting and error prone (25 documents? Maybe. 121? No thanks). There were enough slight variations in the documents to make writing editor macros tricky. It could have been done in multiple passes, but that’s error prone with that many documents.

This was essentially a tree rewriting problem, so XSLT would seem to be a good tool to reach for. Unfortuntely, the documents were non-well-formed pre-XHTML. (Many of the files has passed through Adobe GoLive, which produces some… uh, interesting HTML.) Was it worth taking the time to convert to XHTML? Tempting, but not in this case.

I ended up using Perl and HTML::TokeParser to tokenize the documents. A simple “keep tokens until we see a <table> tag, discard it, and keep discarding until we see… and then keep tokens until…” state machine was quick to code up and test, and ran so fast that at first I thought there had to be a bug. Converting the tokens back to HTML let me clean up some of the damage that GoLive had done.

Perl has long been my tool of choice for problem like this, though lately I’ve been spending most of my evening programming time with Ruby. It’ll be interesting to see if or when Ruby supplants Perl for problems like this one. My knowledge of Ruby’s libraries outside of those needed for Rails work is still meager, and my bag of Perl tricks is pretty big. I’m guessing it’ll be another year.

Rails and CVS aren’t the best of dance partners

Rails creates empty directories, CVS throws them away. Unhappiness ensues.

CVS likes to cull empty directories. Try it. Create a new Rails project, import it into CVS, and check it out again. Empty directories have gone away.

Unfortunately, rake stats gets really unhappy when directories like app/apis have gone missing.

Another nudge towards Subversion.