Finding Boundaries

November 14th, 2009

Re-reading Martin Fowler’s writeup on rake, Using the Rake Build Language, I found this buried gem:

Often when you come across something new it can be a good idea to overuse it in order to find out it’s boundaries. This is a quite reasonable learning strategy. It’s also why people always tend to overuse new technologies or techniques in the early days. People often criticize this but it’s a natural part of learning. If you don’t push something beyond its boundary of usefulness how do you find where that boundary is? The important thing is to do so in a relatively controlled environment so you can fix things when you find the boundary.

Two thoughts:

First, I’m not doing enough pushing of boundaries when playing with new things, tending instead to test-drive new technologies “on road” rather than off. Solving familiar problems in familiar ways but with a new tool is a good way to get a feeling for the tool, but it shortchanges learning. Perhaps I’ve let the set of problems I’m tackling become too narrow.

The second thought is that “a relatively controlled environment” is both key and often ignored. I’ve seen plenty of examples of people overusing new technologies in production code bases. If that’s you, please get yourself a side project to experiment in. Finding a science experiment buried in a code base can cause a major headache. There’s a time and a place for boundary-pushing experiments.

Study material for understanding distributed data stores

October 11th, 2009

Here’s a quick starting point to help people who’ve grown up with SQL databases sort out what all of the “SQL doesn’t scale! No More SQL!” noise is all about, and what the world of distributed data stores looks like. Part of this is from material I used for a discussion track at the Silicon Valley Patterns Group (a study group for serious techies).

A motivation behind the “No SQL” movement is the observation that relational databases don’t scale, that distributed data stores do, and that distributed data tables are a better fit for a large class of applications that have been straining to force their data into SQL schemas. You may noticed a progression from selectively denormalizing relational schemas for performance, to actively avoiding JOINS for more performance, to using an in-memory key/value cache for even more performance, to sharding data for even more performance. A logical step along this path is ditching SQL in favor of distributed in-memory key/value caches (i.e., distributed hash tables, or DHTs). There’s an implied “where this makes sense”, but that’s a discussion for a later post.

The jumping off point in understanding distributed data stores is the ‘CAP Theorem’, which states that, when building distributed systems, one can choose no more than two of Consistency, Availability, and Partition Tolerance (the ability for the system to keep going when pieces can’t talk to one another). For large distributed systems, partitions are a given (due to temporary routing problems or longer fiber cuts), leaving a choice between Consistency and Availability. Systems with SQL back-ends (and designed with ACID compliance in mind) have typically chosen Consistency, and will suffer unavailability until consistency can be guaranteed. Large e-commerce systems often choose Availability, taking on the responsibility for coping with data that may become inconsistent.

Wernor Vogels’ paper Eventual Consistency – Revisited is a good introduction to the CAP Theorem and Amazon’s approach to it. Vogels covers some of the same ground in a video presentation. Read the first or watch the second. A basic understanding of CAP is essential.

With CAP under your belt, I recommend watching Todd Lipcon’s Intro Session at the NoSQL conference (the fist two video links). He packs a lot of useful information into an hour.

From there, it’s on to specifics. Two influential distributed data stores are Amazon’s Dynamo (a distributed hash table) and Google’s BigTable (a distributed data store with more structure than a simple hash table).

Werner Vogels has a good article on Dynamo that delves far enough into Dynamo’s implementation to give a general idea of how to build a distributed hash table.

Dynamo inspired Project Voldemort, an open source implementation started at LinkedIn.

BigTable inspired Project Cassandra, an open source implementation started at FaceBook, designed by one of Dynamo’s authors. Follow up on this if you have data structuring needs beyond what can fit into simple hash tables.

The videos from the NoSQL conference cover a few other players in the distributed data store space.

There’s a lot more good material out there. This is just one way to get started.

Proof that TDD slows projects down

October 9th, 2009

A frequent argument between people who practice Test-Driven Development (TDD) and those who don’t is whether the overhead of all those extra tests slows a project down or speeds it up. Now there’s a study to cite.

Exploding Software Myths, an article from Microsoft Research, summarizes some recent research in software development, including a study, Realizing quality improvement through test driven development: results and experiences of four industrial teams, which compares development done using TDD to “normal” projects, using data gathered from projects at Microsoft and IBM. The upshot?

“What the research team found was that the TDD teams produced code that was 60 to 90 percent better in terms of defect density than non-TDD teams. They also discovered that TDD teams took longer to complete their projects—15 to 35 percent longer.”

So there it is. TDD is slower.

Now consider an alternate story, told using the same numbers:

By doing normal development instead of TDD, teams can complete projects 14 to 26 percent faster at the expense of defect densities that are 150 to 900 percent higher.

H1N1

September 20th, 2009

Flu vaccinations that cover the H1N1 virus (AKA “Swine Flu”) are supposed to be available in a few weeks. This is a vaccination to seriously consider getting.

I won’t be needing it. H1N1 just made a pass through my family, taking us out like dominoes. I was out of action for six days. And I mean out. This is not a virus that’s going to let you catch up on work at home.

On a flu scale of 1 to 10, I give H1N1 a 7. Fever, chills, fatigue, massive sinus drainage, a lot of coughing, and a seriously trashed sleep schedule. On the upside, no vomiting.

So do yourself a favor on this one and get the shot. If you decide not to, consider stocking up on chicken soup, tea, throat lozenges, and a stack of light reading material.

A Brief Interlude, With Bees

July 5th, 2009

There’s a bee hive in the Oak near our front door. The bees are good neighbors, coming and going in small numbers and seldom straying into the house. One day a limb is going to come crashing down and we’ll get drenched in honey.

On Thursday, the bees swarmed. I was at work and missed it. Yesterday, they swarmed again.

Here’s what the swarm looked and sounded like at its peak.

Ten minutes later, the swarm was half this size; an hour later it was gone. Today, the hive has returned to its normal routine.

XSS Exploits in Wordpress Themes

June 21st, 2009

I found a family of Cross-site Scripting (XSS) Vulnerabilities while checking out Wordpress themes. It’s the same vulnerability, copy/pasted from one theme to its “inspired by” children.

An XSS vulnerability is where a site allows a hacker to spit back JavaScript of the hacker’s choosing to some innocent third-party, typically by way of some type of sneaky JavaScript injection that the site isn’t coded to protect against. The third-party’s browser executes the JavaScript as if it came from the site, which lets the injected script have access to any cookies that the third-party has with the site. The hacker stealthily retrieves the cookies, and it’s all downhill from there. Lost cookies can lead to lost passwords.

One of my longer-overdue tasks has been to get this site off of the default Wordpress theme. So I’ve slowly been checking out themes, taking notes of what I like, and kicking a few of them around on a private WP install on my laptop. Quite a few themes are fodder for XSS attacks, by way of incorrectly sanitized search pages.

Here’s how to tell if the theme you’re looking at (or using!) is vurlnerable to the XSS exploit. Type this into the search box:

<script>alert('xss');</script>

If you get a pop-up dialog that says ‘xss’, you’ve found a vulnerable theme. Repair the vulnerability before using the theme, or find a safer one.

To pick on one theme in particular (because it’s a nice theme, and the author hasn’t fixed the exploit or responded to my email), try this on the DePo Clean theme. Scroll to the bottom to get to the search box.

Fortunately, repairing the vulnerability is straightforward. Find and open search.php in the editor of your choice, then replace

<?php echo $s; ?>

with

<?php echo strip_tags($s); ?>

This prevents a script entered via search from being echoed back to the user.

And consider sending a note to the theme’s author. Fixing a problem at the source is usually best.

And no, no new theme here yet. Working on it.