Ajax Security

A security podcast I listened to during today’s commute caused part of my anatomy to retract in between a few chuckles. In Technometria: Ajax Security, Phil Windley, past CTO of the state of Utah, interviews Billy Hoffman, co-author of Ajax Security, on the subject of Web 2.0 security. In a presentation that avoided the usual doom, gloom, and “buy my services” approach, Billy detailed several exploits, both white- and black-hat; some funny, and a few that weren’t. Among the “how could developers be so stupid” exploits (hint: don’t write your own encryption algorithms) were a few that now have me double-checking code that I was fairly convinced was secure.

If you’re deploying Web-2.0ish stuff, even if it’s only using Flash on the client side, this one may be worth a listen.

Bit in the ass by a typo

One of my common typos bit me in the ass tonight.

As part of a fiendish plan to learn new technologies by actually building stuff with them instead of just reading about building stuff with them, I’d set myself the weekend goal of putting up an internet-facing Django application using lighthttpd and FastCGI instead of the usual (and recommended) Apache/mod_python combination. I’d tried out FastCGI once with Rails, and was using lighthttpd elsewhere to serve up static content, but had never used them together. (And to mix things up some more, SQLite3 instead of MySQL on the back end, but that’s not part of tonight’s problem.)

Two hours in, after an install hiccup and a few minor problems, “hello world” was working. A short time later, simple dynamic content (from a single table) was being served up. Except the Django admin interface didn’t quite look right. It wasn’t getting styled. Firebug confirmed that the HTML looked O.K., but firebug wasn’t seeing any CSS. Typing in the relative URL of the admin style sheet got me to the style sheet, so the path through FastCGI wasn’t the problem. But the page wasn’t getting styled. WTF!?! Clearing the browser cache didn’t help, nor did restarting firefox. Uh… Uh…

In the middle of asking for help on the #django IRC channel, I spotted the problem.

While fleshing out lighthttpd.conf, I’d speculatively added some missing mime types. If I’d been pair programming, my pair might have asked “are you sure you need to do that yet?” and I’d have had to agree ‘no’, and would now be writing about something else. Instead, with nobody there to stop me, I’d added

  ".js" => "application/x-javascript",
  ".css" => "test/css",

and the door down the rabbit hole was opened.

Do you see the problem? I didn’t, even after staring at it a half-dozen times, because I know what I’d meant to type. But what I’d actually typed was

  ".css" => "test/css",

which caused style sheets to be served up with an almost correct mime type. The sad thing is that I’ve made this typo before in similar contexts. It may be that too much testing has warped my muscle memory.

Reading, current and recent

When Programming Collective Intelligence first popped up on radar, I had the impression that it was about massively multi-player games, something about which I’m only vaguely interested. On learning that it was about recommendation systems and related machine learning techniques, it went to the head of the reading queue. Programming Collective Intelligence is turning out to be very, very good. Good enough to demonstrate several simple, working recommendation systems (by page 28!), with complete code (in Python). I’m in the middle of the chapter on clustering techniques, and am making notes on how to apply them to a pile of data I’ve been sitting on.

Programming Collective Intelligence may not be a “game changing” book, but I think it’s going show people that some very interesting techniques are actually within their reach.

For the current “scratch an itch” project, I’ve been using Django, the web application framework for Python that seems to have the most mindshare at present. (Yes, yes, I know there are others, but I only have time for one, and Django is it.) Until The Definitive Guide to Django: Web Development Done Right appeared in print, working with Django meant reading documentation on the project website. Which leads to the first problem I ran into: The book is written against Django version 0.96 (the most recent “stable” release), while the documentation on the website reflects the latest code in the project’s subversion repository. And post 0.96, there’s some really good, compelling stuff, such as unicode support, functional testing support, and auto-escaping in templates, some of which isn’t backwards-compatible. (Fortunately, the Django team keeps a running list of incompatible changes on the project website.)

The other issue I have with the book is that (developer-level) testing gets mentioned only in passing, which seems to me to be a bizarre omission given the attention that’s given to testing by other frameworks and the general level of awareness of TDD and unit testing in the developer community. Fortunately, there’s a good testing chapter on the Django website.

That said, The Definitive Guide to Django is a well written, easy read. Any other issues I have with the book are really issues with Django itself, and how it compares (or doesn’t) to Rails.

On learning that Kent Beck was writing a book titled Implementation Patterns, I guessed that it would be about high-level construction patterns and pre-ordered it from Amazon. The book is actually a quite thoughtful discourse on how low-level implementation choices, such as choice of variable names and levels of abstraction in data types, communicate your intentions to later readers of your code, thus helping keep code bases viable. It’s also an excellent model for how to think and reflect about how you approach your work. I took away some better ways of explaining concerns I already had about coding, but lacked the words to articulate quite as clearly as Beck manages to.

Though primarily Java-centric, much of what Beck has to say about day-to-day coding maps to other languages (and the dynamic language side of me loved seeing Java referred to as “pessimistically typed”). There’s some material about Java collection performance at the end of the book that seems like filler, but if you’re working it Java it might be worthwhile.

This is a good book for mentors to hand out to “coders” who are making the transition to developer.

Throwing Replication into the mix

Bill de hOra pointed me at a discussion about ad hoc querying that I’d forgotten about being a part of. The page has a 2007 copyright, but the discussion actually happened several years before. A bit of context might help: Luke Hohmann was VP of Engineering at Aurigin; Chuck Rabb and I worked under him. We had a fully normalized (3NF) database of U.S. Patents from 1970 on, fully text indexed, with images, and with some funky analytics built on top of it, all on 1997 technology (think RAID arrays build from 18Gb drives, driven by smoking hot 200Mhz processors). Much of what I know about physical database design comes from working with Chuck, scary as that might be to him.

In the discussion I advocated using a separate “data warehouse” for ad hoc queries. Time has passed, and I’ve picked up a few more tricks. If we were to have the discussion again today, I’d add another option: running ad hoc queries against a database replica. Replication is a lot more approachable now than it was a few years back, especially if you’re using MySQL or PostreSQL. Replicas are a way of separating reads from writes, allowing you to do both reporting and ad hoc analysis against nearly live data without degrading performance on your primary database. If your primary schema is sufficient for your ad hoc query needs, replication can be quite cost effective.

If you’re architecting a web application that’s going to have to scale, a basic understanding of replication and its limits is now a must have.

Stuff I’m noticing

Ripping out old code is satisfying, but it’s a lot harder to get it out than it was to put it there in the first place, especially in a well-factored system with lots of small methods. Ripping out the code close to the “surface” (i.e., API or UI) is easy. Making sure that you’ve tracked down all of the newly-orphaned methods in deeper layers can involve a lot of tedious work.

I have a feeling that there’s a way to leverage TDD in reverse, but I’m not seeing it just yet.

Eyeing Guice

I’m doing a cram course in Java dependency injection (DI) frameworks, and am coming away tentatively impressed with Guice, a new framework by two guys at Google who I know to be highly competent, and, more importantly, tasteful Java developers. They’ve spent time grappling with bloat, and it shows in their design. Guice is a lot lighter weight than DI in Spring, and makes good use of Java5 annotations. The caveat with Guice is that it’s still early days, and the (limited) documentation has some gaps that require educated guesses (and dives into the Guice Javadoc) to fill in.