Posted on

Don’t Fight the Query Cache

In the spirit of the Good Practice/Bad Practice series, this post is – apart from my first as contributing engineer at Open Query – a basic level application tip. More advanced application developers will already be aware of the issue.

It is easy to overlook certain “non-deterministic” functions in your queries which will prevent the query cache from caching its results. Common examples include NOW(), CURDATE(), and RAND() (here is the complete list).

Obviously when RAND() is used, you usually don’t expect the same result if the query is repeated, so this isn’t a concern. But other cases, particularly date/time related queries, can actually benefit from the cache, and there is a simple workaround. When the query cache is able to satisfy a query, the overhead (in latency and CPU) of parsing and planning the query is eliminated, and more importantly, no disk I/O on data and indexes is required – reducing latency even further and improving scalability on a busy server.
(Remember that the MySQL query cache is not always helpful. If your queries don’t repeat themselves often, the cost of maintaining the query cache (hashing query text, managing cached result sets, etc) is just unwanted overhead. As always, test with your own workload.)
I’ll use an example of a query that might be run many times, by different web site users. Imagine a table recording user login events:

CREATE TABLE logins (
id INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,
user_id INT UNSIGNED,
tstamp TIMESTAMP NOT NULL
);

To find how many different users logged in today,

SELECT COUNT(DISTINCT user_id) FROM logins WHERE tstamp >= CURDATE();

The first thing to know about this query is that it requires a full table scan, because there is no index on tstamp. Let’s add the index.

mysql> CREATE INDEX tstamp_idx ON logins (tstamp);
Query OK, 0 rows affected (0.07 sec)
Records: 0  Duplicates: 0  Warnings: 0
mysql> explain SELECT COUNT(DISTINCT user_id) FROM logins WHERE tstamp >= CURDATE();
+----+-------------+--------+-------+---------------+------------+---------+------+------+-------------+
| id | select_type | table  | type  | possible_keys | key        | key_len | ref  | rows | Extra       |
+----+-------------+--------+-------+---------------+------------+---------+------+------+-------------+
|  1 | SIMPLE      | logins | range | tstamp_idx    | tstamp_idx | 4       | NULL |    1 | Using where |
+----+-------------+--------+-------+---------------+------------+---------+------+------+-------------+

Explain shows that the query will use the new index to select a range of rows. (In general, BTree indexes are ideal for selecting records where one or more column(s) are equal to constant(s), or a range where you want rows where column is <, <=, >, or >= a constant. This query is the latter kind.)

Because this query uses CURDATE(), it is not eligible for query caching. We can confirm this as follows:

mysql> insert into logins (user_id) values(100);
Query OK, 1 row affected (0.00 sec)

mysql> show global status like 'Qcache_inserts';
+----------------+-------+
| Variable_name  | Value |
+----------------+-------+
| Qcache_inserts | 6     |
+----------------+-------+
1 row in set (0.00 sec)

mysql> SELECT COUNT(DISTINCT user_id) FROM logins WHERE tstamp >= CURDATE();
+-------------------------+
| COUNT(DISTINCT user_id) |
+-------------------------+
|                       1 |
+-------------------------+
1 row in set (0.01 sec)

mysql> show global status like 'Qcache_inserts';
+----------------+-------+
| Variable_name  | Value |
+----------------+-------+
| Qcache_inserts | 6     |
+----------------+-------+

Notice that the Qcache_inserts counter did not increase; the result of running the query was not inserted into the query cache.

The workaround here is to eliminate the function call to CURDATE() by inserting today’s server date as a literal, in your application (for example, PHP):

$sql = "SELECT COUNT(DISTINCT user_id) FROM logins WHERE tstamp >= '".date('Y-m-d')."'";

The SQL then becomes something of the form:

SELECT COUNT(DISTINCT user_id) FROM logins WHERE tstamp >= '2009-07-12';

We can now see that the result is being cached, observing Qcache_inserts:

mysql> show global status like 'Qcache_%';
+-------------------------+----------+
| Variable_name           | Value    |
+-------------------------+----------+
...
| Qcache_hits             | 5        |
| Qcache_inserts          | 6        |
...

mysql> SELECT COUNT(DISTINCT user_id) FROM logins WHERE tstamp >= '2009-07-12';
+-------------------------+
| COUNT(DISTINCT user_id) |
+-------------------------+
|                       1 |
+-------------------------+
1 row in set (0.01 sec)

mysql> show global status like 'Qcache_%';
+-------------------------+----------+
| Variable_name           | Value    |
+-------------------------+----------+
...
| Qcache_hits             | 5        |
| Qcache_inserts          | 7        |
...

We can also see from Qcache_hits that repeating the same query will use the cached result:

mysql> SELECT COUNT(DISTINCT user_id) FROM logins WHERE tstamp >= '2009-07-12';
+-------------------------+
| COUNT(DISTINCT user_id) |
+-------------------------+
|                       1 |
+-------------------------+
1 row in set (0.00 sec)

mysql> show global status like 'Qcache_%';
+-------------------------+----------+
| Variable_name           | Value    |
+-------------------------+----------+
...
| Qcache_hits             | 6        |
| Qcache_inserts          | 7        |
...

This optimisation may be worthwhile when you have queries which are frequently repeated – such as statistics that all users see.

Another query pattern is to aggregate rows according to some function of the current moment, NOW(). For example, “how many users logged in during the past hour?” To exploit the query cache you can make the result slightly approximate, e.g. in your application take the current time and round down to the last 5 minute interval, and compare with this calculated time. This slightly out of date result is cacheable until the next interval begins (and the query text changes).

(Remember that any modifications to the tables involved in the query will invalidate the cached result, so queries on tables which are very frequently changed are not going to benefit greatly from these techniques.)

For more information on the query cache, see the fine manual.

— Toby Thain, Open Query Engineer


In our good practice / bad practice series, we will provide you with byte/bite sized pieces of advice on what we consider good (or bad) practice in MySQL-land. The topics can be just about anything, so expect random things to come up. Also, the level of advancedness greatly varies. A topic might be a no-brainer for some, a reminder for others and a revelation for a third person. We strive to cater to all levels of expertise!

Posted on