Archive for September, 2007

Throwing Replication into the mix

Sunday, September 16th, 2007

Bill de hÓra pointed me at a discussion about ad hoc querying that I’d forgotten about being a part of. The page has a 2007 copyright, but the discussion actually happened several years before. A bit of context might help: Luke Hohmann was VP of Engineering at Aurigin; Chuck Rabb and I worked under him. We had a fully normalized (3NF) database of U.S. Patents from 1970 on, fully text indexed, with images, and with some funky analytics built on top of it, all on 1997 technology (think RAID arrays build from 18Gb drives, driven by smoking hot 200Mhz processors). Much of what I know about physical database design comes from working with Chuck, scary as that might be to him.

In the discussion I advocated using a separate “data warehouse” for ad hoc queries. Time has passed, and I’ve picked up a few more tricks. If we were to have the discussion again today, I’d add another option: running ad hoc queries against a database replica. Replication is a lot more approachable now than it was a few years back, especially if you’re using MySQL or PostreSQL. Replicas are a way of separating reads from writes, allowing you to do both reporting and ad hoc analysis against nearly live data without degrading performance on your primary database. If your primary schema is sufficient for your ad hoc query needs, replication can be quite cost effective.

If you’re architecting a web application that’s going to have to scale, a basic understanding of replication and its limits is now a must have.