Posted on 4 Comments

snafu with MySQL relay log path – the why and the fix

Referred to by Launchpad Bug #119271 and MySQL Bug#28850, MySQL installations get bitten after an upgrade, if they were acting as a replication slave. However, the actually root cause is not an upgrade.

If you simply set up say Ubuntu Feisty, you’ll encounter the same problem. If you set up as a slave, the server uses a relay log. In the affected versions, its put under /var/run. That’s a serious snafu, because /var/run is generally on tmpfs and a) very small, and b) gets wiped on a restart. Only runtime foo like .pid files should be under /var/run (as per LSB, Linux Standards Base).
Anyway, the “gets wiped on restart” is where new installations get bitten, although the error is of course the same as on an upgrade where the path changes from /var/lib/mysql: the server simply can’t find the relay logs it thought it had.
If you have a perfectly running replication slave, shut it down and restart the machine, you’ll have broken replication anyway. One of my students encountered this in the MySQL Replication Workshop last week, just after another student had brought up an error they’d spotted in the logs of their production system.

The issue has only recently been resolved in Ubuntu (5.0.51a), upstream (MySQL has it pushed for 5.0.54), and the problem appears to be fairly prolific among the various distros. CentOS also has something on this. From the comments in all of these, it appears that the basic problem is, although simple, not actually that well understood. The MySQL relay logs should NOT be deleted (or vanish through other means) on a restart. That’s all. Simple.

So how did it happen? I first thought the prob was restricted to Debian (from which Ubuntu is derived), but the Debian/Ubuntu my.cnf files have not changed and actually don’t contain a relay-log entry. The problem originated upstream at MySQL where the default path for the PID file was changed from the datadir to /var/run/mysqld. That in itself was a correct move (again, for LSB compliance), but elsewhere in the code the base path for the relay log got derived from the path of the PID file, and that’s where the trouble originates.
The change of the PID default path was not really noticed anywhere, as most distros already put it in /var/run/mysqld through their default my.cnf file. But since none of them explicitly specifies a relay-log path, it gets put where the compiled-in defaults tell it to go. Kaboom.

So, coding error. The path should not have been derived from the PID base, but programmers are human 😉
I do wonder why MySQL’s QA didn’t spot this, they’re a fairly thorough bunch.

Another thing to note is that if you have an affected version (any distro, or direct from MySQL), the quick fix is to just put some extra lines in your my.cnf:

# We're fixing up the paths for the relay log infrastructure (repl.slaves)
# 2008-05-21 by a r j e n (at) o p e n q u e r y (dot) c o m (dot) a u
# NOTE 1: adapt the filenames to whatever they currently are on disk!
# The filenames may depend on your hostname or distro specifics.
# NOTE 2: If you built your MySQL server from source,
# or if you installed from the binary tarball,
# your data path will be different from /var/lib/mysql,
# such as /usr/local/mysql/data. Check and adapt.
relay-log = /var/lib/mysql/relay-bin
# let's do these too, just in case
relay-log-index = /var/lib/mysql/relay-bin.index
relay-log-info-file = /var/lib/mysql/relay-bin.info

Then your replication slave universe should return to operate within normal parameters. Essentially, it un-breaks replication 😉

Actually I think that relay-log-info-file probably used the datadir or the binary log base as its default path already, and that would be why the server would think that the relay logs are somewhere when they’re not: the relay-log.info file would contain the current active log filename and position. With the log files disappearing…. you understand. But just in case, make it all explicit and thus prevent problems of this nature.

Posted on 4 Comments

4 thoughts on “snafu with MySQL relay log path – the why and the fix

  1. Lots of hair pulling here as we set up a load-balanced proof of concept setup in our offices.. all worked a treat, but as you described, a reboot would result in needing to do a sync and reset of the replication.

    Thanks for publishing this info!

  2. Adding the two lines to my.cnf that you suggested

    relay-log-index = /var/lib/mysql/relay-bin.index
    relay-log-info-file = /var/lib/mysql/relay-bin.info

    solved my problem. Thank you!

  3. After 2 days of figuring out what went wrong with my brand new deployment of MySQL replication… I was going insane.

    Thank you for your post! It saved my sanity.

  4. Thank you very much for posting this! I ran into this after an upgrade, adding those 3 lines and restarting mysql solved the problem and picked up all the changes from the master correctly.

Comments are closed.