toby | Open Query

Posted on December 23, 2009January 10, 2010 by toby

Know your data – and your numeric types.

Numeric types in MySQL have two varieties:

– “precise” types such as INTEGER and DECIMAL;

– the IEEE-standard floating point types FLOAT and DOUBLE.

As a rule of thumb, the first group are for exact, “counted” quantities. The INTEGER types represent whole numbers, and DECIMAL represents “fixed point” decimal, with a preset number of places after the decimal point.

Various widths of INTEGER are available in MySQL, from 8-bit TINYINT to 64-bit BIGINT. Calculations with integer types are fast, as they usually correspond to hardware register sizes.

DECIMAL is commonly used for quantities like decimal currency where the number of digits of precision is known and fixed. For example, exactly counting pennies in two decimal digits. Computation with DECIMAL is slower than other types, but this is unlikely to impact most applications.

In the other category are FLOAT and DOUBLE, which are the 32 and 64-bit IEEE standard types, which are usually supported in hardware and are therefore fast and convenient for arithmetic. These are generally good choices for “measurements” – values with limited precision.

It is important to understand what is meant by “precision” (Wikipedia has <a href=”http://en.wikipedia.org/wiki/Arithmetic_precision”>a full discussion</a>). For example, I can measure my height at 185.8 centimetres. Because of the way I make the measurement, which we know to be approximate, this figure is understood have only one meaningful digit after the decimal point.

Precision is a property of all inexact real world “measurements” – such as position, length, weight, brightness; it is usually expressed as the number of “significant figures” or significant digits. (My height measurement has four significant digits.) This should be considered when values are displayed. It is somewhat misleading to represent my height as 185.8000 – common sense tells us that the value is not this accurate.

Serious problems can occur when we do not know the actual precision of measurements, if we wrongly assume a greater precision than exists. A typical example might be a GPS map display which uses measured position to locate the user relative to features such as rivers, roads, and railway tracks. The map display is high resolution and implies a great deal of precision to the viewer. Let us say that based on incoming data, it places our vehicle three metres East of a river. If the measurement has a true precision of, say, 20 metres, we cannot even know even which side of the river we are on! So to allow users to draw safe conclusions, the presentation of data needs to take precision into account.

(While very frequently cited as examples of “precise” figures, not all monetary values are. For example, the value of <a href=”http://twitter.com/nationaldeficit”>USA’s national deficit</a> was estimated today at $11,983,250,643,192.95. I am no economist, but rather few of these digits can be actually significant!)

Throwing away precision with inappropriate modelling is also a potential problem. Ideally measurements arrive with an implied or explicit precision. But even without that, we need to assure ourselves that we are safely storing them.

Keep in mind that we are always dealing with two kinds of precision:

– Machine precision, which is defined by the chosen type; and

– Data precision, which is a property of the values we are storing.

If I am storing my height as cm in a FLOAT column, the data precision is only 4 significant digits as discussed, but the machine precision of this column is always about 7 significant digits, no matter what we try to store. Clearly to avoid throwing away significant parts of your input, the machine precision should exceed your data precision.

Consider a <a href=”http://geocoder.ibegin.com/downloads/canada_cities.zip”>CSV file of latitude and longitude</a>, and notice that every value has 11 digits after the point:

Afton Station,NS,45.6051050000,-61.6974950000

Agassiz,BC,49.2421627750,-121.7496169988

This does not mean that every value has 14 digits of precision – if that were so, then these coordinates would be accurate to within 0.01mm at the equator! This is clearly not true.

Let’s say we merely want to stick markers on a national map. One hundred metre resolution would be more than adequate. Given an Earth circumference of 40,075,020 metres, 100 metres is approximately 1/1113 of a degree. While three digits (0.001) can represent 1/1000ths of a degree, this is not quite precise enough, so let’s keep four fractional digits after the point. Therefore we are looking to represent 7 significant figures, for example:

Afton Station,NS,45.60511,-61.69750

Which of FLOAT or DOUBLE is the right type to use for such values? Let’s investigate.

A key difference between floating point types and fixed point representations (such as DECIMAL) is that while the overall binary precision is fixed, the precision of the fractional part of the value can vary! The available precision depends on magnitude of the value. The highest precision is available for values closest to zero, and precision gets worse as numbers increase in magnitude (by every power of two).

To understand the effect of this, it is necessary to examine floating point representation in more detail. (I am going to handwave more esoteric features of <a href=”http://en.wikipedia.org/wiki/IEEE_754-1985″>the IEEE standard</a> and try to give a general picture.)

Floating point values have three parts:

– a sign (+/-)

– an “exponent” (binary scale factor)

– the value itself (known as the fraction or “mantissa”).

These are analogous to decimal “scientific” or “exponential” notation that you may already be familiar with (e.g. 76.4935 = 7.64935 x 10^1 or 7.64935E+1, where 1 is the decimal exponent).

The combination of these fields precisely defines a rational number, whose value is desirably “close enough” to the value you need to store (which was imprecise to begin with, so the approximation involved in converting to floating point representation is normally not a problem).

The overall precision available is determined by the bits allowed to store the fraction. For reference,

– FLOAT allows 23 bits for the fraction

– DOUBLE allows 52 bits for the fraction

Respectively, this means about 7 and 16 decimal digits in total. So it appears that FLOAT is probably adequate for our seven digit needs.

To make sure, let’s work backwards and confirm just how precisely a FLOAT value can represent a latitude. To do this I will show how an example value is converted into the FLOAT representation. None of the values in our table will have latitudes greater than 77 degrees, so I will pick 76.4935. (Higher values have larger exponents and hence the least available precision for the fractional part, so are the safest test.)

First we need to determine the correct exponent for the value. Then we can work out the real-world “resolution” of the number, i.e. how much the actual value changes if we change the fraction by the smallest possible amount (i.e., in its least significant bit).

The exponent is the largest power of 2 that divides the value, to “normalise” it into the range 0..1. A glance shows us that for 76, a divisor of 128 (2^7) is the right one. That is, the floating point exponent is 7. And the resulting fraction is 76.4935 / 128, or in decimal, 0.59760546875.

Remembering that the binary fraction portion has 23 bits, let’s examine its “binary expansion”. This is effectively just the result of 0.59760546875 x 2^23, written out in base 2:

1001100 . 0111111001010110 (total 23 bits)

^^^^^^^ ^^^^^^^^^^^^^^^^

whole # fraction part

= 76 . 4935 (approx)

Note that, because the exponent is 7, the first 7 bits make up the whole number part (= binary 1001100 = 76). I’ve put a gap where the “binary point” belongs. Written out like this, we can see that we have 16 bits to the right of this point. 2^16 = 65536; so, around this magnitude of 76 degrees (and up to 128, as at 128 the binary exponent increases to 8), we can resolve to 1/65536th of a degree. This is enough bits for four decimal digits (which only requires 1/10000th).

So exactly how precise on the ground will this be?

1/65536th of a degree of latitude is about 1.7 metres: much better than our hoped for resolution of 100 metres. So we have shown that FLOAT is more than adequate for the job.

The same analysis can be done for DOUBLE, of course. For interest’s sake, the equivalent binary expansion is:

1001100 . 011111100101011000000100000110001001001101110 (total 52 bits)

We have 29 more bits to play with, or a total of 45 fraction bits after the whole number part, at this magnitude. This is ridiculously precise, and can resolve 1/35184372088832th of a degree; or 0.00114mm on the surface of the globe. (This is enough to represent 13 decimal digits after the point.)

This example has shown how knowing a little of how floating point works can help you be confident about issues of precision, when choosing types to represent approximate values, or measurements. The key is to know your data, and understand how much precision you have, and how much your application needs.

In this “Good Practice/Bad Practice” post I hope to give some guidelines to choosing between MySQL’s numeric types, using longitude and latitude as a modelling example. (Disclaimer: I am not a mathematician, and the generalisations here are meant to help with practical modelling questions rather than be rigorously theoretical.)

Numeric types in MySQL fall into two main varieties:

“precise” types such as INTEGER and DECIMAL;
the IEEE-standard floating point types FLOAT and DOUBLE.

As a rule of thumb, the first group are for exact, or “counted” quantities. The INTEGER types represent whole numbers, and DECIMAL represents “fixed point” decimal, with a preset number of places after the decimal point.

Various widths of INTEGER are available in MySQL, from 8-bit TINYINT to 64-bit BIGINT. Calculations with integer types are fast, as they usually correspond to hardware register sizes.

In the other category are FLOAT and DOUBLE, which are the 32 and 64-bit IEEE standard types, which are usually supported in hardware and are therefore fast and convenient for arithmetic. These are generally good choices for “measurements” – values with limited precision.

It is important to understand what is meant by “precision” (Wikipedia has a full discussion). For example, I can measure my height at 185.8 centimetres. Because of the way I make the measurement, which we know to be approximate, this figure is understood have only one meaningful digit after the decimal point.

(While very frequently cited as examples of “precise” figures, not all monetary values are. For example, the value of USA’s national deficit was estimated today at $11,983,250,643,192.95. I am no economist, but rather few of these digits can be actually significant!)

Keep in mind that we are always dealing with two kinds of precision:

Machine precision, which is defined by the chosen type; and
Data precision, which is a property of the values we are storing.

Consider a CSV file of latitude and longitude, and notice that every value has 10 digits after the point:

 Afton Station,NS,45.6051050000,-61.6974950000

 Agassiz,BC,49.2421627750,-121.7496169988

This does not mean that every value has 13 digits of precision – if that were so, then these coordinates would be accurate to within 4mm on the ground! This is clearly not possible.

Let’s say we merely want to stick markers on a web page showing a national map. One hundred metre resolution would be more than adequate. Given an Earth circumference of 40,075,020 metres, 100 metres is approximately 1/1113 of a degree. While three digits (0.001) can represent 1/1000ths of a degree, this is not quite precise enough, so let’s keep four fractional digits after the point. Therefore we are looking to represent 7 significant figures, for example:

 Afton Station,NS,45.60511,-61.69750

Which of FLOAT or DOUBLE is the right type to use for such values? Let’s investigate.

A key difference between floating point types and fixed point representations (such as DECIMAL) is that while the overall binary precision is fixed, the precision of the fractional part of the value can vary! The available precision depends on magnitude of the value. The highest precision is available for values closest to zero, and precision gets worse as numbers increase in magnitude (by every power of two).

To understand the effect of this, it is necessary to examine floating point representation in more detail. (I am going to handwave more esoteric features of the IEEE standard and try to give a general picture. In particular I am not going to talk about rounding, biased exponents, or denormalised numbers.)

Floating point values have three parts:

a sign (+/-)
an “exponent” (binary scale factor)
the value itself (known as the fraction or “mantissa”).

These are analogous to decimal “scientific” or “exponential” notation that you may already be familiar with (e.g. 76.4935 = 7.64935 x 10^1 or 7.64935E+1, where 1 is the decimal exponent).

The overall precision available is determined by the bits allowed to store the fraction. For reference,

FLOAT allows 23 physical bits for the fraction (with hidden bit, effectively 24 bits of fraction)
DOUBLE allows 52 physical bits for the fraction (with hidden bit, effectively 53)

Respectively, this is 7 and 15 precise decimal digits in the whole figure. So FLOAT probably meets our 7 digit requirement.

The exponent is the largest power of 2 that divides the value, to “normalise” it into a value between 0 and 1. Since our starting value is more than 1, we need to look at divisors among the powers of 2, which are greater than one: 1, 2, 4, 8, 16, 32, 64, 128, 256… A glance at this list shows us that for 76, a divisor of 128 (2^7) is the right one; i.e., the floating point exponent is 7. And the resulting fraction part is 76.4935 / 128, or in decimal, 0.59760546875.

(Normalisation works similarly for starting values less than one, but in the other direction. We multiply by the largest power of 2 that leaves the value less than one, and store the negative exponent. Negative values are simply dealt with by converting to positive before normalising, and noting a negative sign.)

Since all non-zero positive numbers begin with binary ‘1’, IEEE representation cleverly implies this “hidden” 1 bit, and doesn’t physically store it. This frees up one more bit for the fraction, i.e. giving a total of 24 bits for FLOAT precision. (Because of this, and “exponent biasing”, the binary sequence shown isn’t the actual IEEE bit pattern representing this number.)

Assuming a fraction of 24 bits, let’s examine its “binary expansion”. This is effectively just the result of 0.59760546875 x 2^24, written out in base 2:

1001100 . 01111110010101100  (total 24 bits including "hidden" bit)

^^^^^^^   ^^^^^^^^^^^^^^^^^

whole #   fraction part

=    76 . 4935 (approx)

Because the exponent is 7, the first 7 bits in my binary sequence above are the whole number part (= binary 1001100 = 76). I’ve put a gap where the “binary point” belongs. Written out like this, we can see that we have 17 bits to the right of this point. 2^17 = 131072; so, around this magnitude of 76 degrees (and up to 128, as at 128 the binary exponent increases to 8), we can resolve no worse than 1/131072th of a degree. This is enough bits for five decimal digits (which only requires 1/100000th).

So how precise on the ground will this be?

1/131072th of a degree of latitude is about 0.85 metres: much better than our hoped for resolution of 100 metres. So we have shown that FLOAT is more than adequate for the job. This is not to say that FLOAT is the correct choice for all geographic uses; only that it is adequate for this use, where we decided 100m resolution was enough. On the other hand, source data for geocoding may need more precision than FLOAT can deliver. (Naturally the extra precision must be present in the source data. Simply using a more precise type cannot add any precision to the original measurement, of course 🙂

The analysis above can be done for DOUBLE, of course. For interest’s sake, the equivalent binary expansion is:

1001100 . 0111111001010110000001000001100010010011011101  (total 53 bits including "hidden" bit)

We have 29 more bits to play with, or a total of 46 fraction bits after the whole number part, at this magnitude. This is ridiculously precise, and can resolve no worse than 1/70368744177664th of a degree; or 0.0000015mm on the surface of the globe. (This is enough to represent 13 decimal digits after the point.)

This example has shown how knowing a little of how floating point works can help you be confident about issues of precision, when choosing types to represent approximate values, or measurements – rather than automatically falling back on DOUBLE or even DECIMAL as a “paranoid default”. The key is to know your data, and understand how much precision you have, and how much your application needs.

Posted on December 23, 2009January 10, 2010 by toby

Posted on August 29, 2009 by toby — 5 Comments

Will your production MySQL server survive a restart?

Do you know if your production MySQL servers will come back up when restarted? A recent support episode illustrates a number of best practices. The task looked trivial: Update a production MySQL server (replication master) with a configuration tuned and tested on a development server. Clean shutdown, change configuration, restart. Unfortunately, the MySQL daemon did not just ‘come back’, leaving 2 sites offline. Thus begins an illuminating debugging story.

First place to look is the daemon error log, which revealed that the server was segfaulting, seemingly at the end of or just after InnoDB recovery. Reverting to the previous configuration did not help, nor did changing the InnoDB recovery mode. Working with the client, we performed a failover to a replication slave, while I got a second opinion from a fellow engineer to work out what had gone wrong on the server.

Since debug symbols weren’t shown in the stack trace, we needed to generate a symbol file (binary was unstripped) to use with the resolve_stack_dump utility. The procedure for obtaining this is detailed in the MySQL manual. With a good stack trace in hand, we were able (with assistance from an old friend, thanks Dean!) to narrow the crash down to bug 38856 (also see 37027). A little further investigation showed that the right conditions did exist to trigger this bug:

expire_logs_days = 14 # had been set in the my.cnf
the binlog.index file did not match the actual state of log files (i.e. some had been manually deleted, or deleted by a script)

So with this knowledge, it was possible to bring the MySQL server back up. It turned out that the expire_logs_days had perhaps been added to the configuration but not tested at the time (the server had not been restarted for 3 months). This had placed the system in a state, unbeknownst to the administrators, where it would not come back up after a restart. It was an interesting (if a tad stressful) incident as it shows the reasons for many best practices – which most of us know and follow – but worth re-capping here.

even seemingly trivial maintenance can potentially trigger downtime
plan any production maintenance in the quiet zone, and be sure to allow enough time to deal with the unforeseen
don’t assume your live server will ‘just restart’
put my.cnf under revision control (check out “etckeeper”, a standard Ubuntu package; it can keep track of everything in /etc using bzr, svn or git)
do not make un-tested changes to config, test immediately, preferably on dev or staging system
be ready to failover (test regularly like a fire drill); this is another reason why master-master setups are more convenient than mere master-slave
replication alone is NOT a backup
don’t remove binlogs or otherwise touch anything in data dir behind mysql’s back
have only 1 admin per server so you don’t step on each other’s toes (but share credentials with 2IC for emergencies only)
use a trusted origin for your binary packages, just building and passing the basis test-suite is not always sufficient
know how to get a good stack trace with symbols, to help find bug reports
be familiar with bugs.mysql.com, but it still helps to ask others as they might have just seen something similar and can help you quickly find what you’re looking for!
and last but very important: it really pays to find the root cause to a problem (and prevention requires it!), so a “postmortum” on a dead server is very important… if we had just wiped that server, the problem might have reoccurred with another server later.

Posted on August 29, 2009 by toby — 5 Comments

Posted on August 5, 2009August 7, 2009 by toby — 4 Comments

Does your host flush?

While investigating a friend’s report of ext3/InnoDB corruption in a Ubuntu VirtualBox guest, when his OS X host machine crashed, I discovered that by default, VirtualBox does not do anything about IDE Flush Cache commands. The natural implication of this, which I have been able to reproduce, is that your journaled and transactional guest systems cannot maintain integrity if the host crashes. (For longer reasoning see that forum post.) It’s as if you were running them on broken hardware.

This can corrupt InnoDB – which under normal circumstances recovers fine from crashes. You can imagine what it could do to MyISAM 🙂

So this post is just a headsup – even if you’re only doing development in VB – unless you want to risk nasty surprises, I strongly recommend you disable the IgnoreFlush option in VirtualBox. See “Responding to guest IDE flush requests,” in the VirtualBox manual.

This issue may also affect other virtualisation hosts.

While investigating a friend’s report of ext3/InnoDB corruption in a Ubuntu VirtualBox guest, when his OS X host machine crashed, I discovered that by default, VirtualBox does not do anything about IDE Flush Cache commands. A risk of this is that your journaled and transactional guest systems risk corruption if the host crashes. (For longer reasoning see that forum post.) It’s as if you were running them on broken hardware.

Even if you’re only doing development in VB – to reduce the chance of a nasty surprise, I recommend you disable the IgnoreFlush option in VirtualBox. See “Responding to guest IDE flush requests,” in the VirtualBox manual.

This issue may also affect other virtualisation hosts.

Posted on August 5, 2009August 7, 2009 by toby — 4 Comments

Posted on July 22, 2009August 31, 2018 by toby

Don’t Fight the Query Cache

In the spirit of the Good Practice/Bad Practice series, this post is – apart from my first as contributing engineer at Open Query – a basic level application tip. More advanced application developers will already be aware of the issue.

It is easy to overlook certain “non-deterministic” functions in your queries which will prevent the query cache from caching its results. Common examples include NOW(), CURDATE(), and RAND() (here is the complete list).

Obviously when RAND() is used, you usually don’t expect the same result if the query is repeated, so this isn’t a concern. But other cases, particularly date/time related queries, can actually benefit from the cache, and there is a simple workaround. When the query cache is able to satisfy a query, the overhead (in latency and CPU) of parsing and planning the query is eliminated, and more importantly, no disk I/O on data and indexes is required – reducing latency even further and improving scalability on a busy server.
(Remember that the MySQL query cache is not always helpful. If your queries don’t repeat themselves often, the cost of maintaining the query cache (hashing query text, managing cached result sets, etc) is just unwanted overhead. As always, test with your own workload.)
I’ll use an example of a query that might be run many times, by different web site users. Imagine a table recording user login events:

CREATE TABLE logins (
id INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,
user_id INT UNSIGNED,
tstamp TIMESTAMP NOT NULL
);

To find how many different users logged in today,

SELECT COUNT(DISTINCT user_id) FROM logins WHERE tstamp >= CURDATE();

The first thing to know about this query is that it requires a full table scan, because there is no index on tstamp. Let’s add the index.

mysql> CREATE INDEX tstamp_idx ON logins (tstamp);
Query OK, 0 rows affected (0.07 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql> explain SELECT COUNT(DISTINCT user_id) FROM logins WHERE tstamp >= CURDATE();
+----+-------------+--------+-------+---------------+------------+---------+------+------+-------------+
| id | select_type | table  | type  | possible_keys | key        | key_len | ref  | rows | Extra       |
+----+-------------+--------+-------+---------------+------------+---------+------+------+-------------+
|  1 | SIMPLE      | logins | range | tstamp_idx    | tstamp_idx | 4       | NULL |    1 | Using where |
+----+-------------+--------+-------+---------------+------------+---------+------+------+-------------+

Explain shows that the query will use the new index to select a range of rows. (In general, BTree indexes are ideal for selecting records where one or more column(s) are equal to constant(s), or a range where you want rows where column is <, <=, >, or >= a constant. This query is the latter kind.)

Because this query uses CURDATE(), it is not eligible for query caching. We can confirm this as follows:

mysql> insert into logins (user_id) values(100);
Query OK, 1 row affected (0.00 sec)

mysql> show global status like 'Qcache_inserts';
+----------------+-------+
| Variable_name  | Value |
+----------------+-------+
| Qcache_inserts | 6     |
+----------------+-------+
1 row in set (0.00 sec)

mysql> SELECT COUNT(DISTINCT user_id) FROM logins WHERE tstamp >= CURDATE();
+-------------------------+
| COUNT(DISTINCT user_id) |
+-------------------------+
|                       1 |
+-------------------------+
1 row in set (0.01 sec)

mysql> show global status like 'Qcache_inserts';
+----------------+-------+
| Variable_name  | Value |
+----------------+-------+
| Qcache_inserts | 6     |
+----------------+-------+

Notice that the Qcache_inserts counter did not increase; the result of running the query was not inserted into the query cache.

The workaround here is to eliminate the function call to CURDATE() by inserting today’s server date as a literal, in your application (for example, PHP):

$sql = "SELECT COUNT(DISTINCT user_id) FROM logins WHERE tstamp >= '".date('Y-m-d')."'";

The SQL then becomes something of the form:

SELECT COUNT(DISTINCT user_id) FROM logins WHERE tstamp >= '2009-07-12';

We can now see that the result is being cached, observing Qcache_inserts:

mysql> show global status like 'Qcache_%';
+-------------------------+----------+
| Variable_name           | Value    |
+-------------------------+----------+
...
| Qcache_hits             | 5        |
| Qcache_inserts          | 6        |
...

mysql> SELECT COUNT(DISTINCT user_id) FROM logins WHERE tstamp >= '2009-07-12';
+-------------------------+
| COUNT(DISTINCT user_id) |
+-------------------------+
|                       1 |
+-------------------------+
1 row in set (0.01 sec)

mysql> show global status like 'Qcache_%';
+-------------------------+----------+
| Variable_name           | Value    |
+-------------------------+----------+
...
| Qcache_hits             | 5        |
| Qcache_inserts          | 7        |
...

We can also see from Qcache_hits that repeating the same query will use the cached result:

mysql> SELECT COUNT(DISTINCT user_id) FROM logins WHERE tstamp >= '2009-07-12';
+-------------------------+
| COUNT(DISTINCT user_id) |
+-------------------------+
|                       1 |
+-------------------------+
1 row in set (0.00 sec)

mysql> show global status like 'Qcache_%';
+-------------------------+----------+
| Variable_name           | Value    |
+-------------------------+----------+
...
| Qcache_hits             | 6        |
| Qcache_inserts          | 7        |
...

This optimisation may be worthwhile when you have queries which are frequently repeated – such as statistics that all users see.

Another query pattern is to aggregate rows according to some function of the current moment, NOW(). For example, “how many users logged in during the past hour?” To exploit the query cache you can make the result slightly approximate, e.g. in your application take the current time and round down to the last 5 minute interval, and compare with this calculated time. This slightly out of date result is cacheable until the next interval begins (and the query text changes).

(Remember that any modifications to the tables involved in the query will invalidate the cached result, so queries on tables which are very frequently changed are not going to benefit greatly from these techniques.)

For more information on the query cache, see the fine manual.

— Toby Thain, Open Query Engineer

In our good practice / bad practice series, we will provide you with byte/bite sized pieces of advice on what we consider good (or bad) practice in MySQL-land. The topics can be just about anything, so expect random things to come up. Also, the level of advancedness greatly varies. A topic might be a no-brainer for some, a reminder for others and a revelation for a third person. We strive to cater to all levels of expertise!

Posted on July 22, 2009August 31, 2018 by toby

Share this:

Share this:

Share this:

Share this: