Posted on

RAM flakier than expected

Ref: Google: Computer memory flakier than expected (CNET DeepTech, Stephen Shankland)

Summary: According to tests at Google, it appears that today’s RAM modules have several thousand errors a year, which would be correctable if it weren’t for the fact that most of us aren’t using ECC RAM.

Previous research, such as some data from a 300-computer cluster, showed that memory modules had correctable error rates of 200 to 5,000 failures per billion hours of operation. Google, though, found the rate much higher: 25,000 to 75,000 failures per billion hours.

This is quite relevant for database servers because they write a lot rather than mainly read (desktop use). In the MySQL context, if a bit gets flipped in RAM, your data could get corrupted, or it’s ok on disk and you’re just reading corrupted data somehow. While using more RAM is good for performance, it also means a bigger RAM footprint for your data and thus more exposure to the issue.

In MySQL 5.0 and the general 5.1, the binary and relay logs do not have checksums on log events. If something gets corrupted anywhere on disk or on its way to disk, garbage will come out and we have seen instances where this happens. There are patches to add a checksum to the binlog structure (Google worked on this) and we’ll be pushing for this to be ported into MariaDB 5.1 urgently. It’s no use having it just in later versions. It does change the on-disk format, but so be it. This is very very important stuff.

FYI, InnoDB does use page checksums which are also stored on disk. There is an option to turn them off, but our general recommendation would be to not do that 😉 What about the iblog files though? Normally they just refer to pages which at some stage get flushed, but a) if through a glitch they refer to a different page that could lose some committed data and b) on recovery, it could directly affect data. Mind you I’m conjecturing here, more research necessary!

Naturally this does not just affect database systems, file systems too can easily suffer from RAM glitches – probably with the exception of ZFS, since it has checksums everywhere and keeps them separate from the data.

Anything that keeps data around in RAM, and/or is write intensive. Memcached! How do other database systems work in this respect?

Note: this post is not intended to be alarmist; I just think it’s good to be aware of things so they can be taken into account when designing systems. If you look closely at any system, there are things that can potentially be cause for concern. That doesn’t mean we shouldn’t use them, per-say.

Posted on

Tool of the Day: rsnapshot

rsnapshot is a filesystem snapshot utility for making backups of local and remote systems, based on rsync. Rather than just doing a complete copy every time, it uses hardlinks to create incrementals (which are from a local perspective a full backup also). You can specify how long to keep old backups, and all the other usual jazz. You’d generally have it connect over ssh. You’ll want/need to run it on a filesystem that supports hardlinks, so that precludes NTFS.

In the context of MySQL, you can’t just do a filesystem copy of your MySQL data/logs, that would be inconsistent and broken. (amazingly, I still see people insisting/arguing on this – but heck it’s your business/data to gamble with, right?)

Anyway, if you do a local mysqldump also, or for instance use XtraBackup to take a binary backup of your InnoDB tablespace/logs, then rsnapshot can be used to automate the transfer of those files to a different geographical location.

Two extra things you need to do:

  • Regularly test your backups. They can fail, and that can be fatal. For XtraBackup, run the prepare command and essentially start up a MySQL instance on it to make sure it’s all happy. Havint this already done also saves time if you need to restore.
  • For restore time, you need to include the time needed to transfer files back to the target server.
Posted on

Good Practice / Bad Practice: CREATE TABLE and the Storage Engine

When you write your create table statements, always make sure that you make them non-ambiguous. That way even though other servers might have different configurations, you make sure your table will be created in the same way.
Imagine for instance you are developing an application on a development server, nicely storing all the scripts you need to create the same database on your production server. If the same script creates a table differently on both servers, that might cause you a lot of headache later on. At Open Query, we strive to minimise (or preferrably eliminate) headaches.

One of the parts of the create table statement that has the largest impact is the storage engine specification. When you omit the storage engine from the create table statement, your table is automatically created with the default storage engine type configured for the server. Since the storage engine is a very important choice when designing your tables, you want to make sure that it is always the correct type.

Here’s an example: instead of writing CREATE TABLE city (city_id int, city_name varchar(100)) you should write: CREATE TABLE city (city_id int, city_name varchar(100)) ENGINE=InnoDB

It is a simple adjustment, but it will save you from possible problems if you just make a habit out of specifying the storage engine.

But wait, there one more thing! It’s also very important you have sql_mode=NO_ENGINE_SUBSTITUTION in your my.cnf, otherwise your table may still silently become a MyISAM table if your desired engine (for any reason) is disabled in the binary, configuration, or at runtime. With this setting error, such a situation will cause an error – so you know for sure.

In our good practice / bad practice series, we will provide you with byte/bite sized pieces of advice on what we consider good (or bad) practice in MySQL-land. The topics can be just about anything, so expect random things to come up. Also, the level of advancedness greatly varies. A topic might be a no-brainer for some, a reminder for others and a revelation for a third person. We strive to tender to all of you!

Posted on

How many files does InnoDB have open?

If you use innodb_file_per_table = 1 and innodb_open_files = X (whatever amount is suitable for your server) there’s no way internal to MySQL for finding out how many IBD files InnoDB actually has open. Neither SHOW GLOBAL STATUS LIKE ‘innodb%’ nor SHOW ENGINE INNODB STATUS provide this information.

Many sites do have a growing number of tables, so you’ll want to know when it’s time to up the number (and potentially also open-files-limit). Solution: sudo lsof | grep -c “\.ibd$”

Posted on

What to do with the Falcon engine?

Keep it. Make sure it gets correctly positioned in the coming months.

It appears that with the Oracle acquisition, the reason-to-exist for Falcon is regarded as gone (a non-Oracle-owned InnoDB replacement), previously seen as a strategic imperative – much delayed though.

But look, each engine has unique architectural aspects and thus a niche where it does particularly well. Given that Falcon exists, I’d suggest to not just “ditch it” but have it live as one of the pluggables. What Oracle will do to it is unknown, but Sun/MySQL can make sure of this positioning by making sure in the coming months that Falcon works in 5.1 as a pluggable engine, perhaps also creating a separate bzr project/tree for it on Launchpad.

Then the good work can find its way into the real world, now.

Posted on

InnoDB lock timeout before query execution

I found this yesterday while tracking down a locking issue for a client. They had a connection time out on a lock, but before it times out, SHOW PROCESSLIST had the status ‘statistics’ so it wasn’t actually executing the query yet. So, what was it doing and why did it time out there?

The answer actually was remarkably simple, but I did have to take a peek in the MySQL server source code (horay for Open Source!) The server sets the thd_proc_info to ‘statistics’ when calling the join optimiser, that’s the part of the optimiser that works out the best join order.

A lesser known feature of the join optimiser is that if it works out that only one row can match (lookup on primary or unique key field), it’ll try to retrieve the row. If there’s no match, there’s an “impossible WHERE clause” and essentially the entire query is optimised away and 0 rows returned.

If there is a match, all references to columns in that table can be replaced with the values that were just retrieved, turning them into constants. After all, there’s only one row matching! This can optimise away tables, removing the need to execute some or all joins. You don’t actually see this directly if you do EXPLAIN EXTENDED and then SHOW WARNINGS, because that output is from the parse tree which does not know about the join structure. But what you will see there is that the columns were replaced with constants, so from that you can deduce what’s going on.

For a query
EXPLAIN EXTENDED SELECT name from Country where code='AUS'
you would see access type const in the explain (which is the indicator for this optimisation), and SHOW WARNINGS brings up
select 'Australia' AS `name` from `world`.`country` where 1
so you see the WHERE clause is gone completely and the name column in the SELECT has been fed the value Australia so it’s all constants.

Back to the original issue… if another connection holds an exclusive (write) lock on that particular row, the join optimiser will be waiting for the lock to be released and that’s how you can actually get a lock timeout at this stage. It’s a perfectly normal thing to happen.