Posted on

The actual range and storage size of an INT

What’s the difference between INT(2) and INT(20) ? Not a lot. It’s about output formatting, which you’ll never encounter when talking with the server through an API (like you do from most app languages).

The confusion stems from the fact that with CHAR(n) and VARCHAR(n), the (n) signifies the length or maximum length of that field. But for INT, the range and storage size is specified using different data types: TINYINT, SMALLINT, MEDIUMINT, INT (aka INTEGER), BIGINT.

At Open Query we tend to pick on things like INT(2) when reviewing a client’s schema, because chances are that the developers/DBAs are working under a mistaken assumption and this could cause trouble somewhere – even if not in the exact spot where we pick on it. So it’s a case of pattern recognition.

A very practical example of this comes from a client I worked with last week. I first spotted some harmless ones, we talked about it, and then we hit the jackpot: INT(22) or something, which in fact was storing a unix timestamp converted to int by the application, for the purpose of, wait for this, user’s birth date. There’s a number of things wrong with this, and the result is something that doesn’t work properly.

Currently, the unix epoc/timestamp when stored in binary is a 32 bit unsigned integer, with a range from 1970-01-01 to somewhere in 2037. Note the unsigned qualifier, otherwise it already wraps around 2004.

  • if using signed, you’d currently only find out with users younger than 7 or so. You may be “lucky” to not have any, but kids are tech savvy so websites and systems in general may well have entries with kids younger than that.
  • using a timestamp for date-of-birth tells me that the developers are young 😉 well that’s relative, but in this: younger than 40. I was born in 1969, so I am very aware that it’s impossible to represent my birthdate in a unix timestamp! What dates do you test with? Your own, and people around you. ‘nuf said.
  • finally, INT(22) is still an INT, which for MySQL means 32 bits (4 bytes) and it happened to be signed also.

So, all in all, this wasn’t going to work. Exactly what would fail where would be highly app code (and date) dependent, but you can tell it needs a quick redesign anyway.

I actually suggested checking the requirements whether having just a year would suffice for the intended use (can be stored in a YEAR(4) field), this reduces the amount of personal data stored and thus removes privacy concerns. Otherwise, a DATE field which can optionally be allowed to not have a day-of-month (i.e. only ask for year/month) as that again can be sufficient for the intended purpose.

Posted on