Accidentally defeating MySQL’s query cache

I just learned some neat things about MySQL’s query optimizer, including a bit about how MySQL 4.0.1 (and later) caches queries and result sets. When the query cache is enabled (which it is by default on Fedora Core 3), and you present MySQL with a SELECT query, it consults the cache before preparing or analyzing the query. On finding a match, the cached result from the prior execution of the query is returned.

The cache is actually keyed by the MD5 hash of the full query, including whitespace and comments. In most applications, queries are effectively static. Even when queries are generated dynamically, an identical query string is regenerated each time one is requested. In these cases, hashing the full query isn’t a problem. Same query, same hash.

But consider the application that’s grown over time to where no one developer has a good grasp of data access patterns. When slow performance demands attention, you will have to reconstruct a model of data access patterns, because the story very likely changed while you were paying attention elsewhere. Histograms of query counts and executions times are useful. To account for the same query being issued from different parts of the application, it’s common to instrument the query by adding a comment. For example,

SELECT /* summary view */ COUNT(*) FROM widgets;

Comments have the additional benefit of being visible when you notice that the server is crawling and ask MySQL to show you which queries are currently executing.

But if you start to inject dynamic information (e.g., timestamps and process id) into the comments, the “same” query may hash differently, and the benefit of having a query cache is lost. For no readily apparent reason, things just run a bit slower.

For more on query caching (including how to defeat it intentionally, and why), see the MySQL documentation, or chapter 5 in High Performance MySQL.