Posted on 8 Comments

Quiz: Enabling an application for MySQL Replication

A little challenge for you… given an existing app that does not know about separate master/slave connections, and you want to enable working in a replicated infrastructure. Simply redirecting all SELECTs to the slave connection will not work. Why?

Hint: there are at least two reasons, depending on other factors. There may be more.

Comments are set to be moderated so providing answers will not spoil it for others. I’ll leave it run for a bit and then approve all comments.

Posted on 8 Comments

8 thoughts on “Quiz: Enabling an application for MySQL Replication

  1. Well, for starters, a SELECT statement may follow other DML which set session information such as LAST_INSERT_ID which is needed by subsequent DML statements. In such an application, if the subsequent DML statements are SELECT statements, LAST_INSERT_ID() will not work as expected and the application breaks.

    Next are transactions. If you are using a transactional storage engine, then rows changed by statements must be visible to the session making the changes. This problem is similar to the previous one, except perhaps more insidious, because the application may appear to work when it really doesn’t.

    Then you have the situation where a query absolutely must use the most recent version of the information from the master. Replication delay will break this application when this is the case.

    TEMPORARY tables are another problem. If you send all commands but SELECT to the master, but only SELECT to the slave, then any temporary tables which are created are not accessible and this breaks the applicaition.

    There are probably a few other things, but that covers the biggies, I think.

  2. The first issue that comes to mind is the use of temporary tables. Since the inserts and creates of the temporary table are sessions specific the slave won’t be able to see temporary tables manipulated from the master…
    Brain churning for more problems…..

  3. 1) Any query that uses server specific data i.e. date functions
    2) Requesting the latest insert id (primary key generated) will be invalid since it will hit the the slave which doesn’t have a record of that session
    3) Temporary tables will be created on the master but not visible on the slave since they wont’ be in the same session

  4. 1. If the select is with in a transaction block .
    2. If it is a select for update

  5. First there are transactions, like SELECT … FOR UPDATE/LOCK IN SHARE MODE, if you issue that on the slave, but the UPDATE on the master, good times.
    Second woudl be SELECT INTO? But that might work, depends on the further actions…
    Third are the stored procedures, SELECT PROCEDURE xxx, in case that PROCEDURE writes stuff…

  6. Transactions would break if selects were done within the transaction and were confined to reading committed data.

  7. Correction from previous post: any select done in the context of a transaction, in a system where dirty reads are prohibited, wouldn’t see results from writes done within the supposed transaction, because they would be two different sessions.

  8. You’ll at least have to handle the problem of stale data on the slave (if there’s enough server load, if this is important for you, you should measure the slave lag). This implies more modifications to the app than just redirecting SELECTs to the slave.

    Depending on the type of queries, some SELECTs will bring data that will be useless to the user if it’s read from the slave (i.e., let’s assume you have an online résumé managing app, every time the user makes a change, you should read the data from the master, and then presumably cache this to save on disk access 🙂 )

Comments are closed.