Posted on 5 Comments

Ladies and gentlemen, check your assumptions

I spent some time earlier this week trying to debug a permissions problem in Drupal.

After a lot of head-scratching, it turned out that Drupal assumes that when you run INSERT queries sequentially on a table with an auto_increment integer column, the values that are assigned to this column will also be sequential, ie: 1, 2, 3, …

This might be a valid assumption when you are the only user doing inserts on a single MySQL server, but unfortunately that is not always the situation in which an application runs.

I run MySQL in a dual-master setup, which means that two sequential INSERT statements will never return sequential integers.  The value will always be determined by the  auto_increment_increment and auto_increment_offset settings in the configuration file.

In my case, one master will only assign even numbers, the other only uneven ones.

My patch was accepted, so this problem is now fixed in the Drupal 7 (and hopefully soon in 6 as well) codebase.

The moral of the story is that your application should never make such assumptions about auto_increment columns.  A user may run the application on a completely different architecture, and it may break in interesting and subtle ways.

If you want to use defined integers like Drupal does, make sure you explicitly insert them. Otherwise, you can retrieve the assigned number via the mysql_insert_id() function in PHP or via SELECT LAST_INSERT_ID() in MySQL itself.

Have you checked your code today?

Posted on 5 Comments

5 thoughts on “Ladies and gentlemen, check your assumptions

  1. I’d argue that the master-master scenario you describe is an edge-case and no normal application should be expected to work with it.

    In such a scenario, simple queries could easily break replication; it’s naturally fragile and you will have little chance of keeping the same data in the two servers.

    What you’re talking about is having two servers which execute approximately the same queries in not-necessarily-the-same order, and hoping that their data doesn’t differ.

    The trick with the auto_increment in twos only gets over the most immediately obvious problem with this scenario, it doesn’t solve any of the others.

    Unless you are absolutely sure that only one master is written to at once (with a window in between to allow the other to catch up before switching) then this scenario is inherently weak.

    The only other possibility is where you have a carefully written custom application (i.e. not Drupal) where every update query has been carefully constructed to avoid problems – but that is not easy either.

    1. @MarkR From your comment, I’m deducing (please correct me if I’m wrong) that you may not be aware of how a master-master replication setup in MySQL works. The two masters replicate from eachother. The auto-increment settings merely guarantee that no conflicting auto-inc IDs will ever be generated.
      Generally, only a single master is written to; after all, every master still needs to do all the writes even if they originated on the other one, so it’s not a load balancing system for writes. It allows for easier failover when there’s a problem with the active master, and it also allows for easy maintenance operations including adding/removing indexes and other schema changes. These can be done on the non-active master without replicating, then you flip the active master, and do the same again. The users won’t even notice.

  2. @Arjen –
    first, thanks for the info and for patching it up!
    I wonder: what about row DELETion? AUTO_INCREMENT gap can occur in this case, as well. Could it have been overlooked for so long?

    @Mark,
    I think the issue here is not if M-M setup is good for you or not. It’s about naive application assumptions.
    I’ve seen similar assumptions: like an application which inserts a row, then tests from the MAX(id) (id beign AUTO_INCREMENT) to “learn” what the newly created id was. Obviously, two concurrent or immediate INSERTs will cause a problem.
    It is therefore that applications need to be very careful about DB assumptions. How well would Drupal do on Oracle? Does it have any MySQL-specific assumptions?
    I think Arjen’s point was very good.

  3. @Mark,
    The setup works fine for me; one master is sitting in a datacenter and the other one never does simultaneous writes. I basically have it set up this way so I can tweak or import small amount of data locally if I need to and have it magically replicate to the production server. The main issue is as Shlomi says the assumption on the part of the application. It shouldn’t make this assumption, no matter the MySQL architecture.

    @Shlomi,
    Thankyou 🙂

    Drupal 6 supports MySQL and Postgres out of the box, so the bootstrap process supports both of these and should (in theory) not make any MySQL-specific assumptions. Drupal 7 will in fact support multiple SQL servers, but only one master, so it shouldn’t make this problem worse.

  4. @cafuego

    Whooops! Assumed post was by Arjen… Sorry 😀
    Edit my post at your convenience…

Comments are closed.