One of the first things we do when preparing a client’s infrastructure for Galera deployment is see whether their schema is suitable.
- Avoiding quirks and edge cases, we can say that Galera simply requires all tables to be InnoDB and also have a PRIMARY KEY (obviously having a PK in InnoDB is important anyway, for InnoDB-internal reasons).
- We want to know about FULLTEXT indexes. With recent InnoDB versions also supporting FULLTEXT we need to check not just whether a table has such an index, but actually which engine it is.
- Spatial indexes. While both InnoDB and MyISAM can deal with spatial datatypes (POINT, GEOMETRY, etc), only MyISAM has the spatial indexes.
Naturally, checking a schema in the server is more effective than going through other sources and possibly missing bits. On the downside, the only viable way to get this info out of MariaDB is INFORMATION_SCHEMA, but because of the way it’s implemented queries tend to be slow and resource intensive. So essentially we do need to ask I_S, but do it as efficiently as possible (we’re dealing with production systems). We have multiple separate questions to ask, which normally we’d ask in separate queries, but in case of I_S that’s really something to avoid. So that’s why it’s all integrated into the single query below, catching every permutation of “not InnoDB”, “lacks primary key”, “has fulltext or spatial index”. We skip the system databases and any VIEWs.
We use the lesser known mysql client command ‘tee’ to output the data into a file, and close it after the query.
We publish the query not as a work of art – I don’t think it’s that pretty! We’d like you to see because we don’t care for secrets and also because if there is any way you can reach the same objective using a less resource intensive approach, we’d love to hear about it! This is one of the very few cases where we care only about efficiency, not how pretty the query looks. That said, of course I’d prefer it to be easily readable.
If you regard it purely as a query to be used for Galera, then you can presume it’ll be run on MariaDB 5.5 or later – since 5.3 and above has optimised subqueries, perhaps you can do something with that.
If you spot any other flaw or missing bit, please comment on that too. thanks!
-- snip tee galeracheck.txt SELECT DISTINCT CONCAT(t.table_schema,'.',t.table_name) as tbl, t.engine, IF(ISNULL(c.constraint_name),'NOPK','') AS nopk, IF(s.index_type = 'FULLTEXT','FULLTEXT','') as ftidx, IF(s.index_type = 'SPATIAL','SPATIAL','') as gisidx FROM information_schema.tables AS t LEFT JOIN information_schema.key_column_usage AS c ON (t.table_schema = c.constraint_schema AND t.table_name = c.table_name AND c.constraint_name = 'PRIMARY') LEFT JOIN information_schema.statistics AS s ON (t.table_schema = s.table_schema AND t.table_name = s.table_name AND s.index_type IN ('FULLTEXT','SPATIAL')) WHERE t.table_schema NOT IN ('information_schema','performance_schema','mysql') AND t.table_type = 'BASE TABLE' AND (t.engine <> 'InnoDB' OR c.constraint_name IS NULL OR s.index_type IN ('FULLTEXT','SPATIAL')) ORDER BY t.table_schema,t.table_name; notee -- snap
Credit: the basis of the “find tables without a PK” is based on SQL by Sheeri Cabral and Giuseppe Maxia.