Tag Archives: Galera

Tracing down a problem, finding sloppy code

Daniel was tracking down what appeared to be a networking problem….

  • server reported 113 (No route to host)
  • However, an strace did not reveal the networking stack ever returning that.
  • On the other side, IP packets were actually received.
  • When confronted with mysteries like this, I get suspicious – mainly of (fellow) programmers.
  • I suggested a grep through the source code, which revealed  return -EHOSTUNREACH;
  • Mystery solved, which allowed us to find what was actually going on.


  1. Don’t just believe or presume the supposed origin of an error.
  2. Programmers often take shortcuts that cause grief later. I fully appreciate how the above code came about, but I still think it was wrong. Mapping a “similar” situation onto an existing error code is convenient. But when an error occurs, the most important thing is for people to be able to track down what the root cause is. Reporting this error outside of its original context (error code reported by network stack) is clearly unhelpful, it actually misdirects and requires people to essentially waste time to track it down (as above).
  3. Horay once again for Open Source, which makes it so much easier to figure these things out. While possibly briefly embarrassing for the programmer, more eyes allows code to improve better and faster – and, perhaps, also entices towards better coding practices from the outset (I can hope!).

What do you think?

Munin graphing of MySQL

While there are many graphing tools out there and we’ve used Munin for a while now.

The MySQL plugin for Munin had fallen out of date and the show engine innodb status output changed in 5.5 making some bits of the plugin simply not work any more. Also the show global status has some extra variables so there was a need to create new graphs.

All of these are now in the 2.1.8+ development releases of Munin.

Here are samples of the new/updated graphs.

Munin table definations
Table Definitions

Innodb Buffer Pool Activity
Innodb Buffer Pool Internal Breakdown
Innodb Insert Buffer
Innodb Insert Buffer
Innodb Buffer Pool
Innodb Semaphores
Innodb Semaphores
Innodb Master Thread
Innodb Master Thread
Innodb Adaptive Hash Index
Innodb Queries and Transactions
Innodb Queries and Transactions
Innodb Read Views
Innodb Read Views
Innodb Descriptors
Innodb Descriptors
Performance Schema Losses
Performance Schema Losses
Query Cache
Query Cache
Maximum Memory of MySQL
Maximum Memory of MySQL
Handler Read
Handler Read
Handler Transaction
Handler Transaction
Handler Write
Handler Write
Handler Temporary Write/Updates
Handler Temporary Write/Updates
Execution (triggers and events)
Index Condition Pushdown
Multi Range Read Optimizations
Multi Range Read Optimizations

Some of these above graphs may miss a variable or two with MariaDB-10 because of variable name changes. These will be corrected when we get to those. In MariaDB-10 there is useful transition to information schema tables for status information which will make it significantly easier to parse.

Individual buffer pool information also has been parsed out however we haven’t worked out how to graphing this correctly. Also not yet merged is a bunch of Galera graphs which are currently waiting on some Galera provider changes.

We’ll continue to work with the Munin developers to keep this MySQL plugin up to date and useful.

There’s other graphs in the MySQL Munin plugins that we haven’t changed so aren’t included here.

innodb_flush_logs_on_trx_commit and Galera Cluster

We deploy Galera Cluster (in MariaDB) for some clients, and innodb_flush_logs_on_trx_commit is one of the settings we’ve been playing with. The options according to the manual:

  • =0 don’t write or flush at commit, write and flush once per second
  • =1 write and flush at trx commit
  • =2 write log, but only flush once per second

The flush (fsync) refers to the mechanism the filesystem uses to try and guarantee that written data is actually on the physical medium/device and not just in a buffer (of course cached RAID controllers, SANs and other devices use some different logic there, but it’s definitely written beyond the OS space).

In a non-cluster setup, you’d always want it to be =1 in order to be ACID compliant and that’s also InnoDB’s default. So far so good. For cluster setups, you could be more lenient with this as you require ACID on the cluster as a whole, not each individual machine – after all, if one machine drops out at any point, you don’t lose any data.

Codership docu recommended =2, so that’s what Open Query engineer Peter Lock initially used for some tests that he was conducting. However, performance wasn’t particularly shiny – actually not much higher than =1. That in itself is interesting, because typically we regard the # of fsyncs/second a storage system can deal with as a key indicator of performance capacity. That is, as our HD Latency tool shows when you run it on a storage device (even your local laptop harddisk), the most prominent aspect of what limits the # of writes you can do per second appears to be the fsyncs.

I then happened to chat with Oli Sennhauser (former colleague from MySQL AB) who now runs the FromDual MySQL/MariaDB consulting firm in Switzerland, and he’s been working with Galera for quite a long time. He recognised the pattern and said that he too had that experience, and he thought =0 might be the better option.

I delved into the InnoDB source code to see what was actually happening, and the code indeed concurs with what’s described in the manual (that hasn’t always been the case ;-). I also verified this with Jeremy Cole whom we may happily regard as guru on “how InnoDB actually works”. The once-per-second flush (and optional preceding write) is performed by the InnoDB master thread. Take a peek in log/log0log.c and trx/trx0trx.c, specifically trx_commit_off_kernel() and srv_sync_log_buffer_in_background().

In conclusion:

  1. Even with =0, the log does get written and flushed once per second. This is done in the background so connection threads don’t have to wait for it.
  2. There is no setting where there is never a flush/fsync.
  3. With =2, the writing of the log takes place in the connection thread and this appears to incur a significant overhead, at least relative to =0. Aside from the writing of the log at transaction commit, there doesn’t appear to be a difference.
  4. Based on the preceding points, I would say that if you don’t want =1, you might as well set =0 in order to get the performance you’re after. There is of course a slight practical difference between =0 and =2. With =2 the log is immediately written. If the mysqld process were to crash within a second after that, the OS would close the file and have that log write stored. With =0 that log data wouldn’t have been written. If the OS or machine fails, that log write is lost either way.

In production environments, we tend to mainly want to mitigate trouble from system failures, so =0 appears to be a suitable/appropriate option – for a Galera cluster environment.

What remains is the question of why the log write operation appears to reduce transaction commit performance so much, in a way more so than the flush/fsync. Something to investigate further!
Your thoughts?

Galera pre-deployment check

One of the first things we do when preparing a client’s infrastructure for Galera deployment is see whether their schema is suitable.

  • Avoiding quirks and edge cases, we can say that Galera simply requires all tables to be InnoDB and also have a PRIMARY KEY (obviously having a PK in InnoDB is important anyway, for InnoDB-internal reasons).
  • We want to know about FULLTEXT indexes. With recent InnoDB versions also supporting FULLTEXT we need to check not just whether a table has such an index, but actually which engine it is.
  • Spatial indexes. While both InnoDB and MyISAM can deal with spatial datatypes (POINT, GEOMETRY, etc), only MyISAM has the spatial indexes.

Naturally, checking a schema in the server is more effective than going through other sources and possibly missing bits. On the downside, the only viable way to get this info out of MariaDB is INFORMATION_SCHEMA, but because of the way it’s implemented queries tend to be slow and resource intensive. So essentially we do need to ask I_S, but do it as efficiently as possible (we’re dealing with production systems). We have multiple separate questions to ask, which normally we’d ask in separate queries, but in case of I_S that’s really something to avoid. So that’s why it’s all integrated into the single query below, catching every permutation of “not InnoDB”, “lacks primary key”, “has fulltext or spatial index”. We skip the system databases and any VIEWs.

We use the lesser known mysql client command ‘tee’ to output the data into a file, and close it after the query.

We publish the query not as a work of art – I don’t think it’s that pretty! We’d like you to see because we don’t care for secrets and also because if there is any way you can reach the same objective using a less resource intensive approach, we’d love to hear about it! This is one of the very few cases where we care only about efficiency, not how pretty the query looks. That said, of course I’d prefer it to be easily readable.

If you regard it purely as a query to be used for Galera, then you can presume it’ll be run on MariaDB 5.5 or later – since 5.3 and above has optimised subqueries, perhaps you can do something with that.

If you spot any other flaw or missing bit, please comment on that too. thanks!

-- snip
tee galeracheck.txt

       CONCAT(t.table_schema,'.',t.table_name) as tbl,
       IF(ISNULL(c.constraint_name),'NOPK','') AS nopk,
       IF(s.index_type = 'FULLTEXT','FULLTEXT','') as ftidx,
       IF(s.index_type = 'SPATIAL','SPATIAL','') as gisidx
  FROM information_schema.tables AS t
  LEFT JOIN information_schema.key_column_usage AS c
    ON (t.table_schema = c.constraint_schema AND t.table_name = c.table_name
        AND c.constraint_name = 'PRIMARY')
  LEFT JOIN information_schema.statistics AS s
    ON (t.table_schema = s.table_schema AND t.table_name = s.table_name
        AND s.index_type IN ('FULLTEXT','SPATIAL'))
  WHERE t.table_schema NOT IN ('information_schema','performance_schema','mysql')
    AND t.table_type = 'BASE TABLE'
    AND (t.engine <> 'InnoDB' OR c.constraint_name IS NULL OR s.index_type IN ('FULLTEXT','SPATIAL'))
  ORDER BY t.table_schema,t.table_name;

-- snap

Credit: the basis of the “find tables without a PK” is based on SQL by Sheeri Cabral and Giuseppe Maxia.

MariaDB Foundation

You may have already seen the announcement MariaDB Foundation to Safeguard Leading Open Source Database. We at Open Query wholeheartedly support this (r)evolution of the MySQL ecosystem, which appears to be increasingly necessary as Oracle Corp is seriously dropping the ball with security updates and actually just general development and innovation. Oracle has actually done some very good work, I happily acknowledge that – but security issues are critical, having crashing bugs and incorrect query results in a .28 of a GA release is uncool, and not incorporating awesome development efforts by the community is just astonishing.

MariaDB is where the Sphinx fulltext search and OQGraph engines are integrated, the community developed virtual columns and Galera synchronous replication are added, and the fabulous developers at Monty Program have taken the time to fix up the previously ghastly performance of subqueries so that they now rock. And that’s apart from the many little bugs Monty Program has fixed along the way. Awesome work, guys!

MariaDB still merges from stock MySQL, but it’s important to not be dependent on that – the MariaDB Foundation can help support that already existing effort and raise the profile of MariaDB to – as far as I’m concerned – move on from MySQL.

From our end, apart from our continued active involvement in the community, Open Query will be adjusting and augmenting its fee structure so that our clients co-fund MariaDB Foundation and contributing companies such as Codership (Galera). MySQL and MariaDB are mission critical for businesses. It’s not (and never has been) a case of volunteers and mere charity. We all p(l)ay our part.

It’s time to upgrade.