Scalability, and what limits it

Paul Hoffman of Joyent gave a very pragmatic talk on Scaling Ruby applications the other night at Google. The talk will be up on Google Video soon. I recommend watching for it.

Scalability is a fascinating topic, and is almost always approached from the technical side. When I starting musing about scalability, it’s usually over some technical details such as database partitioning or caching (memcached FTW!). Cal Henderson’s book, and his on presentations on scaling Flickr are almost entirely technical.

Paul came at it from a more pragmatic angle. Here’s his “Fundamental Limits of Scalability” list, with my interpretation and commentary.

At the top of Paul’s list…

1. Money

Now there’s a reality slap. And it’s so obvious. No money, no game.

2. Time

Second reality slap. If you don’t have the time to sort out scalability, you’re screwed. Given enough time and enough money, many things are possible. But there’s never enough time.

Time and money also join together as cash flow management. Many startups fail for lack of good cash flow management. (I’ve seen that up close, but that’s a story for another day.)

O.K., time and money. Got it. Time for something technical?

3. People

Damn, forgot people. If you’re going to scale an application, you’re going to need to scale Operations to manage all those servers, and you’ll need to scale support. Duh.

Got it. Money, time, and people. Something technical next?

4. Experience

Oh, right. If you can’t get the right mix of experience, you’re going to have to reinvent the wheel yourself, which is going to cost time and money, and risks burning out your people.

Got it. Money, time, and people with the right experience. Can we talk about event busses yet?

5. Power

Oh, that. 2,000 kW runs ~400 servers (Paul’s numbers). Getting more than that pulled into a standard commercial building may take some doing. Power is why Google builds server farms near hydroelectric plants.

6. Bandwidth

Now we finally get technical, but in a back-of-the-envelope sort of way. Bandwidth requires pipes (the internet being basically a series of tubes), and pipes cost money. Fat pipes cost lots of money. You can calculate how much bandwidth to the outside world you’ll need using some assumptions about what a “standard page” looks like and a simple formula. Catch the video for Paul’s numbers. I jotted down “100 Mbps is good for 5.5M page views/day”, but he had more to say there.

Bandwidth also limits scalability at internal interconnect points. A GigE interconnect eventually limits how much database replication you can do. Finally, we’re at a limit that has direct architectural consequences, and we’ve only just started to talk about technology stack.

I think it’s fair to collapse power and bandwidth into the money bucket. That’s what outfits like Joyent and Engine Yard are for. But the next time I start to think seriously about building a big web application, I’ll try to remember time and people.