Posted on

MySQL data backup: going beyond mysqldump

A user on a linux user group mailing list asked about this, and I was one of the people replying. Re-posting here as I reckon it’s of wider interest.

> […] tens of gigs of data in MySQL databases.
> Some in memory tables, some MyISAM, a fair bit InnoDB. According to my
> understanding, when one doesn’t have several hours to take a DB
> offline and do dbbackup, there was/is ibbackup from InnoBase.. but now
> that MySQL and InnoBase have both been ‘Oracle Enterprised’, said
> product is now restricted to MySQL Enterprise customers..
>
> Some quick searching has suggested Percona XtraBackup as a potential
> FOSS alternative.
> What backup techniques do people employ around these parts for backups
> of large mixed MySQL data sets where downtime *must* be minimised?
>
> Has your backup plan ever been put to the test?

You should put it to the test regularly, not just when it’s needed.
An untested backup is not really a backup, I think.

At Open Query we tend to use dual master setups with MMM, other replication slaves, mysqldump, and XtracBackup or LVM snapshots. It’s not just about having backups, but also about general resilience, maintenance options, and scalability. I’ll clarify:

  • XtraBackup and LVM give you physical backups. that’s nice if you want to recover or clone a complete instance as-is. But if anything is wrong, it’ll be all stuffed (that is, you can sometimes recover InnoDB tablespaces and there are tools for it, but time may not be on your side). Note that LVM cannot snapshot between multiple volumes consistently, so if you have your InnoDB ibdata/IBD files and iblog files on separate spindles, using LVM is not suitable.
  • mysqldump for logical (SQL) backups. Most if not all setups should have this. Even if the file(s) were to be corrupted, they’re still readable since it’s plain SQL. You can do partial restores, which is handy in some cases. It’ll be slower to load so having *only* an SQL dump of a larger dataset is not a good idea.
  • some of the above backups can and should *also* be copied off-site. that’s for extra safety, but in terms of recovery speed it may not be optimal and should not be relied upon.
  • having dual masters is for easier maintenance without scheduled outages, as well as resilience when for instance hardware breaks (and it does).
  • slaves. You can even delay a slave (Maatkit has a tool for this), so that would give you a live correct image even in case of a user error, provided you get to it in time. Also, you want enough slack in your infra to be able to initialise a new slave off an existing one. Scaling up at a time when high load is already occurring can become painful if your infra is not prepared for it.

A key issue to consider is this… if the dataset is sufficiently large, and the online requirements high enough, you can’t afford to just have backups. Why? Because, how quickly can you deploy new suitable hardware, install OS, do restore, validate, put back online?

In many cases one or more aspects of the above list simply take too long, so my summary would be “then you don’t really have a backup”. Clients tend to argue with me on that, but only fairly briefly, until they see the point: if a restore takes longer than you can afford, that backup mechanism is unsuitable.

So, we use a combination of tools and approaches depending on needs, but in general terms we aim for keeping the overall environment online (individual machines can and will fail! relying on a magic box or SAN to not fail *will* get you bitten) to vastly reduce the instances where an actual restore is required.
Into that picture also comes using separate test/staging servers to not have developers stuff around on live servers (human error is an important cause of hassles).

In our training modules, we’ve combined the backups, recovery and replication topics as it’s clearly all intertwined and overlapping. Discussing backup techniques separate from replication and dual master setups makes no sense to us. It needs to be put in place with an overall vision.

Note that a SAN is not a backup strategy. And neither is replication on its own.

Posted on
Posted on 2 Comments

Cache pre-loading on mysqld startup

The following quirky dynamic SQL will scan each index of each table so that they’re loaded into the key_buffer (MyISAM) or innodb_buffer_pool (InnoDB). If you also use the PBXT engine which does have a row cache but no clustered primary key, you could also incorporate some full table scans.

To make mysqld execute this on startup, create /var/lib/mysql/initfile.sql and make it be owned by mysql:mysql

SET SESSION group_concat_max_len=100*1024*1024;
SELECT GROUP_CONCAT(CONCAT('SELECT COUNT(`',column_name,'`) FROM `',table_schema,'`.`',table_name,'` FORCE INDEX (`',index_name,'`)') SEPARATOR ' UNION ALL ') INTO @sql FROM information_schema.statistics WHERE table_schema NOT IN ('information_schema','mysql') AND seq_in_index = 1;
PREPARE stmt FROM @sql;
EXECUTE stmt;
DEALLOCATE PREPARE stmt;
SET SESSION group_concat_max_len=@@group_concat_max_len;

and in my.cnf add a line in the [mysqld] block

init-file = /var/lib/mysql/initfile.sql

That’s all. mysql reads that file on startup and executes each line. Since we can do the whole select in a single (admittedly quirky) query and then use dynamic SQL to execute the result, we don’t need to create a stored procedure.

Of course this kind of simplistic “get everything” only really makes sense if the entire dataset+indexes fit in memory, otherwise you’ll want to be more selective. Still, you could use the above as a basis, perhaps using another table to provide a list of tables/indexes to be excluded – or if the schema is really stable, simply have a list of tables/indexes to be included instead of dynamically using information_schema.

Practical (albeit niche) application:

In a system with multiple slaves, adding in a new slave makes it start with cold caches, but since with loadbalancing it will pick up only some of the load it often works out ok. However, some environments have dual masters but the application is not able to do read/write splits to utilise slaves. In that case all the reads also go to the active master. Consequentially, the passive master will have relatively cold caches (only rows/indexes that have been updated will be in memory) so in case of a failover the amount of disk reads for the many concurrent SELECT queries will go through the roof – temporarily slowing the effective performance to a dismal crawl: each query takes longer with the required additional disk access so depending on the setup the server may even run out of connections which in turn upsets the application servers. It’d sort itself out but a) it looks very bad on the frontend and b) it may take a number of minutes.

The above construct prevents that scenario, and as mentioned it can be used as a basis to deal with other situations. Not many people know about the init-file option, so this is a nice example.

If you want to know how the SQL works, read on. The original line is very long so I’ll reprint it below with some reformatting:

SELECT GROUP_CONCAT(CONCAT(
  'SELECT COUNT(`',column_name,'`)
          FROM `',table_schema,'`.`',table_name,
          '` FORCE INDEX (`',index_name,'`)'
       ) SEPARATOR ' UNION ALL ')
  INTO @sql
  FROM information_schema.statistics
  WHERE table_schema NOT IN ('information_schema','mysql')
  AND seq_in_index = 1;

The outer query grabs each regular db/table/index/firstcol name that exists in the server, writing out a SELECT query that counts all not-NULL values of the indexed column (so it must scan the index), forcing that specific index. We then abuse the versatile and flexible GROUP_CONCAT() function to glue all those SELECTs together, with “UNION ALL” inbetween. The result is a single very long string, so we need to tweak the maximum allowed group_concat output beforehand to prevent truncation.

Posted on 2 Comments
Posted on

A day in the life of Datacenter Disasters

Open Query currently hosts a large part of our infrastructure at Linode. We are extremely happy with their performance, stability and support. Unfortunately any chain is only as strong as it’s weakest link. This week, there was a major thunderstorm near the Hurricane Electric datacenter (anyone else think that name is funny in combination with the event in case?) in fremont and through a massive powersurge, most of HE’s datacenter lost power. Among the Linodes affected in our infrastructure were all of the machines involved in our MMM setup.

The masters came back up before the monitor, which is around the time I was alerted. Logging in, I noticed replication was broken on one of the masters, but the other master seemed healthy. Since the monitor was not up and it seemed like it could potentially be hours before it would, I decided it was time for manual action. Since our MMM setup doesn’t have slaves currently, I decided a good option would be to mimic MMM and move the virtual IP to the healthy server.

I executed the following manual commands to make the desired changes:

$ ip addr add <virtip> dev eth0
$ /usr/sbin/arping -I eth0 -c 5 <virtip>

That brought all our applications back online, which was the desired effect. I manually fixed replication by repositioning the masters. A while later, the monitor came up and automatically took over, bringing everything back to normal.

Everything went well, but it wasn’t until the next morning I realised there was a possible flaw in my logic (that din’t effect us, but I wanted to blog about it to make others realise): When replication stopped, master A was active. My commands above made master B the active master. Now, in theory it is possible that writes were sent to master A after replication broke, and commands that were sent to master B would presume those writes were executed there which they were not as replication didn’t execute them. This is one of those niche occasions where data-drift can occur without noticing it.

My recommendation is to not do what I did unless you are very certain your setup doesn’t suffer from this potential problem. If you do decide to use this trick however, make sure to use the maatkit mk-tablecheck and mk-tablesynch when all is well again to check for (and correct!) data drift.

Posted on
Posted on 4 Comments

Quest for Resilience: Multi-DC Masters

This is a Request for Input. Dual MySQL masters with MMM in a single datacentre are in common use, and other setups like DRBD and of course VM/SAN based failover solutions are conceptually straightforward also. Thus, achieving various forms of resilience within a single data-centre is doable and not costly.

Doing the same across multiple (let’s for simplicity sake limit it to two) datacentres is another matter. MySQL replication works well across longer links, and it can use MySQL’s in-built SSL or tools like stunnel. Of course it needs to be kept an eye on, as usual, but since it’s asynchronous the latency between the datacentres is not a big issue (apart from the fact that the second server gets up-to-date a little bit later).

But as those who have tried will know, having a client (application server) connection to a MySQL instance in a remote data-centre is a whole other matter, latency becomes a big issue and is generally very noticeable on the front-end. One solution for that is to have application servers only connect to their “local” MySQL server.

So the question to you is, do you now have (or have you had in the past) a setup with MySQL masters in different datacentres, what did that setup look like (which additional tools and infra did you use for it), and what were your experiences (good and bad, solutions to issues, etc). I’m trying to gather additional expertise that might already be about, which can help us all. Please add your input! thanks

Posted on 4 Comments
Posted on

relay-log-space-limit

We don’t often see this option configured (default: unlimited) but it might be a good idea to set it. What it does is limit the amount of disk space the combined relay logs are allowed to take up.

A slave’s IO_Thread reads from the master and puts the events into the relay log; the slave’s SQL_Thread reads from the relay log and executes the query. If/when replication “breaks”, unless it’s connection related it tends to be during execution of a query. In that case the IO_Thread will keep running (receiving master events and storing in the relay log). Beyond some point, that doesn’t make sense.

The reason for having two separate replication threads (introduced in MySQL 4.0) is that long-running queries don’t delay receiving more data. That’s good. But receiving data is generally pretty fast, so as long as that basic issue is handled, it’s not necessary (for performance) to have the IO_Thread run ahead that far.

So you can set something like relay-log-space-limit=256M. This prevents slave disk space from getting gobbled up in some replication failure scenarios. The data will still be available in the logs on the master (provided of course the log expiration there isn’t too short – replication monitoring is still important!).

Conclusion: the relay log as a cache. Don’t leave it at “Unlimited”, that’s inefficient (and potentially problematic) use of resources. If you do run out of diskspace, the relay log can get corrupted – then you have to reposition, which will re-read the data from the master anyway.

Posted on
Posted on 5 Comments

Trivia: identify this replication failure

We got good responses to the “identify this query profile” question. Indeed it indicates an SQL injection attack. Obviously a code problem, but you must also think about “what can we do right now to stop this”. See the responses and my last note on it below the original post.

Got a new one for you!

You find a system with broken replication, could be a slave or one in a dual master setup. the IO thread is still running. but the SQL thread is not and the last error is (yes the error string is exactly this, very long – sorry I did not paste this string into the original post – updated later):

“Could not parse relay log event entry. The possible reasons are: the master’s binary log is corrupted (you can check this by running ‘mysqlbinlog’ on the binary log), the slave’s relay log is corrupted (you can check this by running ‘mysqlbinlog’ on the relay log), a network problem, or a bug in the master’s or slave’s MySQL code. If you want to check the master’s binary log or slave’s relay log, you will be able to know their names by issuing ‘SHOW SLAVE STATUS’ on this slave.”

In other similar cases the error message is about something else but the query it shows with it makes no sense. To me, that essentially says the same as the above.

The server appears to have been restarted recently.

What’s wrong, and what’s your quickest way to get replication going again given this state?

Posted on 5 Comments
Posted on

MySQL University session Oct 22: Dual Master Setups With MMM

This Thursday (October 22nd, 13:00 UTC), Walter Heck (of Open Query) will present Dual Master Setups With MMM. MMM (Multi-Master Replication Manager for MySQL) is a set of flexible scripts to perform monitoring/failover and management of MySQL master-master replication configurations (with only one node writable at any time). Session slides (PDF).

The toolset also has the ability to read balance standard master/slave configurations with any number of slaves, so you can use it to move virtual IP addresses around a group of servers depending on whether they are behind in replication. For more
information, see mysql-mmm.org.

For MySQL University sessions you point your browser here. You need a browser with a working Flash plugin. You may register for a Dimdim account, but you don’t have to.

Posted on
Posted on

New Open Query training days in Australia

The favourite Open Query course modules as well as reworked and brand new ones, with November/December 2009 dates for Brisbane, Sydney, Canberra and Melbourne listed below. You can register for days/modules individually, to suit your time, budget and current needs. Your trainers are Sean, Ray and Arjen (see OQ people).

For the Canberra and Melbourne days which are DBA/HA, registrations for all of the modules in a series before 15 October will receive a copy of the “High Performance MySQL” book (normal bookstore price is AUD 105).

Canberra

Sydney

Brisbane

  • Thu 19 Nov: MySQL Query Performance Optimisation and Tuning
  • Fri 20 Nov: MySQL Server Performance Optimisation and Tuning

Melbourne

Posted on
Posted on

Getting ready for FrOScon 2009

I arrived yesterday in St. Augustin, near Bonn in Germany. After a good day of hitchhiking (weather is beautiful here) I stayed with my Pakistani Couchsurfing host and we had an extremely interesting evening talking about the gigantic cultural differences between western civilization and Pakistani civilization. It beats staying in a hotel by about a million points 🙂

This morning I headed to the FrOScon HQ at the fachhochschule to help out with whatever was needed. Turns out that was a bit premature (misunderstanding on my part), so I have had some time to catch up on mail and give some more attention to my talk on Saturday. I’ll be helping out throughout the and the whole day tomorrow with things though.

I’ll be talking about MySQL MMM, a project that I have invested quite a bit of time in getting to know. My talk will outline what MMM is, what it’s not and an example of our setup at Open Query. It’s a full hour long, so it should be very interesting to be able to go into that much detail.

If you are near St. Augustin, make sure to come by for Froscon, as it’s schedule has some very interesting talks and you’ll also have a good chance to meet fellow MySQL-geeks in the OpenSQLCamp dev-room.

Posted on
Posted on 1 Comment

Replication fail with “There is no ‘username’@’host’ registered”

This post doesn’t contain a tip, I’m asking the q and I don’t know the answer yet. Could just be a logic error on my part, in which case it was just a personal mystery and I thank you for helping!

MySQL error 1449 “There is no ‘username’@'host’ registered”

You can get a replication fail with this error if you have triggers, views or stored routines that run as their creator (which is usual) but the slave does not have that user. So far so good.

We just encountered a replication fail on multiple slaves (i.e. not a single isolated mishap) with this error on an UPDATE statement accessing a single base table (i.e. no view), and no triggers defined or routines called in the query. Indeed the slave did not have the reported user (intentionally), but that shouldn’t have mattered. Creating the user allowed replication to continue successfully – that’s no surprise either.

The question is, what condition actually caused this error?

For ref, the original error message; obfucated but structurally unchanged:
[ERROR] Slave: Error ‘There is no ’someuser’@'%’ registered’ on query. Default database: ’somedb’. Query: ‘UPDATE `t1` SET `name` = ’some text string’, `description` = ‘another text string’, `date_taken` = ”, `place_taken` = ”, `modified` = ‘2009-06-24 12:06:16′, `updated_by` = 1, `modified_on` = 1245773176 WHERE `t1`.`id` = 361′, Error_code: 1449

UPDATE (2009-06-26) ok it was an Arjen fail, on a recheck of all the possible situations where permissions are checked (views, stored procs, triggers), I found that there’s 1 trigger in the entire system and it happens to be on this table. So, no bug and my mistake. Issue resolved! Thanks to all those helping!

Posted on 1 Comment