Posted on

Green HDs and RAID Arrays

Some so-called “Green” harddisks don’t like being in a RAID array. These are primarily SATA drives, and they gain their green credentials by being able reduce their RPM when not in use, as well as other aggressive power management trickery. That’s all cool and in a way desirable – we want our hardware to use less power whenever possible! – but the time it takes some drives to “wake up” again is longer than a RAID setup is willing to tolerate.

First of all, you may wonder why I bother with SATA disks at all for RAID. I’ve written about this before, but they simply deliver plenty for much less money. Higher RPM doesn’t necessarily help you for a db-related (random access) workload, and for tasks like backups which do have a lot of speed may not be a primary concern. SATA disks have a shorter command queue than SAS, so that means they might need to seek more – however a smart RAID controller would already arrange its I/O in such a way as to optimise that.

The particular application where I tripped over Green disks was a backup array using software RAID10. Yep, a cheap setup – the objective is to have lots of diskspace with resilience, and access speed is not a requirement.

Not all Green HDs are the same. Western Digital ones allow their settings to be changed, although that does need a DOS tool (just a bit of a pest using a USB stick with FreeDOS and the WD tool, but it’s doable), whereas Seagate has decided to restrict their Green models such that they don’t accept any APM commands and can’t change their configuration.

I’ve now replaced Seagates with (non-Green) Hitachi drives, and I’m told that Samsung disks are also ok.

So this is something to keep in mind when looking at SATA RAID arrays. I also think it might be a topic that the Linux software RAID code could address – if it were “Green HD aware” it could a) make sure that they don’t go to a state that is unacceptable, and b) be tolerant with their response time – this could be configurable. Obviously, some applications of RAID have higher demands than others, not all are the same.

Posted on

MySQL data backup: going beyond mysqldump

A user on a linux user group mailing list asked about this, and I was one of the people replying. Re-posting here as I reckon it’s of wider interest.

> […] tens of gigs of data in MySQL databases.
> Some in memory tables, some MyISAM, a fair bit InnoDB. According to my
> understanding, when one doesn’t have several hours to take a DB
> offline and do dbbackup, there was/is ibbackup from InnoBase.. but now
> that MySQL and InnoBase have both been ‘Oracle Enterprised’, said
> product is now restricted to MySQL Enterprise customers..
> Some quick searching has suggested Percona XtraBackup as a potential
> FOSS alternative.
> What backup techniques do people employ around these parts for backups
> of large mixed MySQL data sets where downtime *must* be minimised?
> Has your backup plan ever been put to the test?

You should put it to the test regularly, not just when it’s needed.
An untested backup is not really a backup, I think.

At Open Query we tend to use dual master setups with MMM, other replication slaves, mysqldump, and XtracBackup or LVM snapshots. It’s not just about having backups, but also about general resilience, maintenance options, and scalability. I’ll clarify:

  • XtraBackup and LVM give you physical backups. that’s nice if you want to recover or clone a complete instance as-is. But if anything is wrong, it’ll be all stuffed (that is, you can sometimes recover InnoDB tablespaces and there are tools for it, but time may not be on your side). Note that LVM cannot snapshot between multiple volumes consistently, so if you have your InnoDB ibdata/IBD files and iblog files on separate spindles, using LVM is not suitable.
  • mysqldump for logical (SQL) backups. Most if not all setups should have this. Even if the file(s) were to be corrupted, they’re still readable since it’s plain SQL. You can do partial restores, which is handy in some cases. It’ll be slower to load so having *only* an SQL dump of a larger dataset is not a good idea.
  • some of the above backups can and should *also* be copied off-site. that’s for extra safety, but in terms of recovery speed it may not be optimal and should not be relied upon.
  • having dual masters is for easier maintenance without scheduled outages, as well as resilience when for instance hardware breaks (and it does).
  • slaves. You can even delay a slave (Maatkit has a tool for this), so that would give you a live correct image even in case of a user error, provided you get to it in time. Also, you want enough slack in your infra to be able to initialise a new slave off an existing one. Scaling up at a time when high load is already occurring can become painful if your infra is not prepared for it.

A key issue to consider is this… if the dataset is sufficiently large, and the online requirements high enough, you can’t afford to just have backups. Why? Because, how quickly can you deploy new suitable hardware, install OS, do restore, validate, put back online?

In many cases one or more aspects of the above list simply take too long, so my summary would be “then you don’t really have a backup”. Clients tend to argue with me on that, but only fairly briefly, until they see the point: if a restore takes longer than you can afford, that backup mechanism is unsuitable.

So, we use a combination of tools and approaches depending on needs, but in general terms we aim for keeping the overall environment online (individual machines can and will fail! relying on a magic box or SAN to not fail *will* get you bitten) to vastly reduce the instances where an actual restore is required.
Into that picture also comes using separate test/staging servers to not have developers stuff around on live servers (human error is an important cause of hassles).

In our training modules, we’ve combined the backups, recovery and replication topics as it’s clearly all intertwined and overlapping. Discussing backup techniques separate from replication and dual master setups makes no sense to us. It needs to be put in place with an overall vision.

Note that a SAN is not a backup strategy. And neither is replication on its own.

Posted on

Tool of the Day: rsnapshot

rsnapshot is a filesystem snapshot utility for making backups of local and remote systems, based on rsync. Rather than just doing a complete copy every time, it uses hardlinks to create incrementals (which are from a local perspective a full backup also). You can specify how long to keep old backups, and all the other usual jazz. You’d generally have it connect over ssh. You’ll want/need to run it on a filesystem that supports hardlinks, so that precludes NTFS.

In the context of MySQL, you can’t just do a filesystem copy of your MySQL data/logs, that would be inconsistent and broken. (amazingly, I still see people insisting/arguing on this – but heck it’s your business/data to gamble with, right?)

Anyway, if you do a local mysqldump also, or for instance use XtraBackup to take a binary backup of your InnoDB tablespace/logs, then rsnapshot can be used to automate the transfer of those files to a different geographical location.

Two extra things you need to do:

  • Regularly test your backups. They can fail, and that can be fatal. For XtraBackup, run the prepare command and essentially start up a MySQL instance on it to make sure it’s all happy. Havint this already done also saves time if you need to restore.
  • For restore time, you need to include the time needed to transfer files back to the target server.
Posted on

Will your production MySQL server survive a restart?

Do you know if your production MySQL servers will come back up when restarted? A recent support episode illustrates a number of best practices. The task looked trivial: Update a production MySQL server (replication master) with a configuration tuned and tested on a development server. Clean shutdown, change configuration, restart. Unfortunately, the MySQL daemon did not just ‘come back’, leaving 2 sites offline. Thus begins an illuminating debugging story.
First place to look is the daemon error log, which revealed that the server was segfaulting, seemingly at the end of or just after InnoDB recovery. Reverting to the previous configuration did not help, nor did changing the InnoDB recovery mode. Working with the client, we performed a failover to a replication slave, while I got a second opinion from a fellow engineer to work out what had gone wrong on the server.
Since debug symbols weren’t shown in the stack trace, we needed to generate a symbol file (binary was unstripped) to use with the resolve_stack_dump utility. The procedure for obtaining this is detailed in the MySQL manual. With a good stack trace in hand, we were able (with assistance from an old friend, thanks Dean!) to narrow the crash down to bug 38856 (also see 37027). A little further investigation showed that the right conditions did exist to trigger this bug:
  • expire_logs_days = 14 # had been set in the my.cnf
  • the binlog.index file did not match the actual state of log files (i.e. some had been manually deleted, or deleted by a script)
So with this knowledge, it was possible to bring the MySQL server back up. It turned out that the expire_logs_days had perhaps been added to the configuration but not tested at the time (the server had not been restarted for 3 months). This had placed the system in a state, unbeknownst to the administrators, where it would not come back up after a restart. It was an interesting (if a tad stressful) incident as it shows the reasons for many best practices – which most of us know and follow – but worth re-capping here.
  • even seemingly trivial maintenance can potentially trigger downtime
  • plan any production maintenance in the quiet zone, and be sure to allow enough time to deal with the unforeseen
  • don’t assume your live server will ‘just restart’
  • put my.cnf under revision control (check out “etckeeper”, a standard Ubuntu package; it can keep track of everything in /etc using bzr, svn or git)
  • do not make un-tested changes to config, test immediately, preferably on dev or staging system
  • be ready to failover (test regularly like a fire drill); this is another reason why master-master setups are more convenient than mere master-slave
  • replication alone is NOT a backup
  • don’t remove binlogs or otherwise touch anything in data dir behind mysql’s back
  • have only 1 admin per server so you don’t step on each other’s toes (but share credentials with 2IC for emergencies only)
  • use a trusted origin for your binary packages, just building and passing the basis test-suite is not always sufficient
  • know how to get a good stack trace with symbols, to help find bug reports
  • be familiar with, but it still helps to ask others as they might have just seen something similar and can help you quickly find what you’re looking for!
  • and last but very important: it really pays to find the root cause to a problem (and prevention requires it!), so a “postmortum” on a dead server is very important… if we had just wiped that server, the problem might have reoccurred with another server later.
Posted on

Good Practice / Bad Practice: Off-site Backups

In today’s gp/bp an open door will be kicked in: take your backups offsite!
I was actually tempted to create a poll to see how many of you do not have proper backups, and how many of you do not take those backups offsite. It is a simple piece of advice and relatively simple to set up. Offsite in this case would ideally be physically offsite: to a different server in a different building in a different location. A start however is to take them to a different server. And don’t make the mistake of thinking a different VPS on the same physical server is good enough. True, that will protect you from operating system failure of the guest, but it will likely not protect you from hardware failure, or operating system failure on the host OS.

Also, take good care of how you are getting your backups offsite. A normal FTP connection might do the job, but it is hardly secure. Ideally, use SFTP or rsync over ssh to stream your backups offsite.
Some people still take their backups offsite by physically moving a cd, dvd or tape from one location to another. It’s a good start, but in this age of cheap broadband, you might want to think about doing this online. A cron-job is much less likely to not run than it is for you to forget to move that tape.

In our good practice / bad practice series, we will provide you with byte/bite sized pieces of advice on what we consider good (or bad) practice in MySQL-land. The topics can be just about anything, so expect random things to come up. Also, the level of advancedness greatly varies. A topic might be a no-brainer for some, a reminder for others and a revelation for a third person. We strive to tender to all of you!

Posted on

When a backup is not, and when a restore is a failure

Since writing and speaking a bit more about the “relax! a failure is not an emergency” concept, more and more people approach me with interesting horror stories. I’m scribbling a few backup-related ones here for your enjoyment – and naturally there are important lessons.

Story 1: A place makes backups that get shipped off-site, interstate even. One day a couple of files are lost, and so someone files a request to retrieve said files from the archive. Well, apparently that’s something that should be done as it creates some very stressed responses and a quoted timeline of a few weeks. In the end the issue is resolved through other means and the request stopped – unfortunate, since it would have been very interesting if the requested files would actually ever arrive… clearly a retrieval was not part of the expected process. One also wonders how long a full dataset retrieval would take, or if  it’s even possible!

Story 2: A place has data recovery/failover infrastructure. However, it’s in use as a test environment and consequently an actual event that requires use of this hardware would first need re-imaging of the boxes to get them set up to even receive the data they need to contain. Estimated timeline: days to weeks.

The aboves are extreme, but I think they make the point pretty well. I hear and see lots of cases where there are processes in place for making backups, and off-site transfer. Lovely. However, they do often appear to “forget” the objective of which these processes are merely a part, which is not just shippping out a bucket of bits but potentially using it for failover or recovery!

So you want to test the recovery process, that is the entire trail from the storage all the way to having an actual new functioning copy. And, you want to see how much time that takes.

In Open Query parleance, if you haven’t tested this fully, or the time required is more than the downtime your business can afford, then we say you technically have no backup. Simple. No sense being fuzzy about this, right?

It’s one of the things we help our clients with as part of our service. This is not really a topic that should be regarded as optional.