Posted on

My oldest still-open MySQL bug

a bugThis morning I received an update on a MySQL bug, someone added a comment on an issue I filed in November 2003, originally for MySQL 4.0.12 and 4.1.0. It’s MySQL bug#1956 (note the very low number!), “Parser allows naming of PRIMARY KEY”:

[25 Nov 2003 16:23] Arjen Lentz
Description:
When specifying a PRIMARY KEY, the parser still accepts a name for the index even though the MySQL server will always name it PRIMARY.
So, the parser should NOT accept this, otherwise a user can get into a confusing situation as his input is ignored/changed.

How to repeat:
CREATE TABLE pk (i INT NOT NULL);
ALTER TABLE pk ADD PRIMARY KEY bla (i);

'bla' after PRIMARY KEY should not be accepted.

Suggested fix:
Fix grammar in parser.

Most likely we found it during a MySQL training session, training days have always been a great bug catching source as students would be encouraged to try lots of quirky things and explore.

It’s not a critical issue, but one from the “era of sloppiness” as I think we may now call it, with the benefit of hindsight.  At the time, it was just regarded as lenient: the parser would silently ignore things it couldn’t process in a particular context.  Like creating a foreign key on a table using an engine that didn’t support foreign keys would see the foreign keys silently disappear, rather than reporting an error.  Many  if not most of those quirks have been cleaned up over the years.  Some are very difficult to get rid of, as people use them in code and thus essentially we’re stuck with old bad behaviour as otherwise we’d break to many applications.  I think that one neat example of that is auto-casting:

SELECT "123 apples" + 1;
+------------------+
| "123 apples" + 1 |
+------------------+
|              124 |
+------------------+
1 row in set, 1 warning (0.000 sec)

SHOW WARNINGS;
+---------+------+-----------------------------------------------+
| Level   | Code | Message                                       |
+---------+------+-----------------------------------------------+
| Warning | 1292 | Truncated incorrect DOUBLE value: '123 hello' |
+---------+------+-----------------------------------------------+

At least that one chucks a warning now, which is good as it allows developers to catch mistakes and thus prevent trouble.  A lenient parser (or grammar) can be convenient, but it tends to also enable application developers to be sloppy, which is not really a good thing.  Something that creates a warning may be harmless, or an indication of a nasty mistake: the parser can’t tell.  So it’s best to ensure that code is clean, free even of warnings.

Going back to the named PRIMARY KEY issue… effectively, a PRIMARY KEY has the name ‘PRIMARY’ internally.  So it can’t really have another name, and the parser should not accept an attempt to name it.  The name silently disappears so when you check back in SHOW CREATE TABLE or SHOW INDEXES, you won’t ever see it.
CREATE TABLE pkx (i INT PRIMARY KEY x) reports a syntax error. So far so good.
But, CREATE TABLE pkx (i INT, PRIMARY KEY x (i)) is accepted, as is ALTER TABLE pkx ADD PRIMARY KEY x (i).

MySQL 8.0 still allows these constructs, as does MariaDB 10.3.9 !

The bug is somewhere in the parser, so that’s sql/sql_yacc.yy.  I’ve tweaked and added things in there before, but it’s been a while and particularly in parser grammar you have to be very careful that changes don’t have other side-effects.  I do think it’s worthwhile fixing even these minor issues.

Posted on
Posted on

MariaDB 10.3 use case: Hidden PRIMARY KEY column for closed/legacy applications

Database symbolI really like MariaDB 10.3, it has a lot of interesting new features.  Invisible (hidden) columns, for instance.  Here is a practical use case:

You’re dealing with a legacy application, possibly you don’t even have the source code (or modifying is costly), and this app doesn’t have a PRIMARY KEY on each table. Consequences:

  • Even though InnoDB would have an internal hidden ID in this case (you can’t access that column at all), it affects InnoDB’s locking strategy – I don’t think this aspect is explicitly documented, but we’ve seen table-level locks in this scenario where we’d otherwise see more granular locking;
  • Galera cluster really wants PKs;
  • Asynchronous row-based replication can work without PKs, but it is more efficient with;

So, in a nutshell, your choices for performance and scaling are restricted.

On the other hand, you can’t just add a new ID column, because the application may:

  • use SELECT * and not be able to handle seeing the extra column, and/or
  • use INSERT without an explicit list of columns and then the server would chuck an error for the missing column.

The old way

Typically, what we used to do for cases like this is hunt for UNIQUE indexes that can be converted to PK. That’d be a clean operation with no effect on the application. We use a query like the following to help us find out if this is feasible:


SELECT
CONCAT(s.TABLE_SCHEMA,'.',s.TABLE_NAME) AS tblname,
INDEX_NAME AS idxname,
GROUP_CONCAT(s.COLUMN_NAME) AS cols,
GROUP_CONCAT(IF(s.NULLABLE='YES','X','-')) AS nullable
FROM INFORMATION_SCHEMA.STATISTICS s
WHERE s.table_schema NOT IN ('information_schema','performance_schema','mysql')
AND s.NON_UNIQUE=0
GROUP BY tblname,idxname
HAVING nullable LIKE '%X%'
ORDER BY tblname;

The output is like this:


+----------+---------+------+----------+
| tblname  | idxname | cols | nullable |
+----------+---------+------+----------+
| test.foo | a       | a,b  | -,X      |
+----------+---------+------+----------+

We see that table test.foo has a UNIQUE index over columns (a,b), but column b is NULLable.  A PK may not contain any NULLable columns, so it would not be a viable candidate.

All is not yet lost though, we can further check whether the column data actually contains NULLs. If it doesn’t, we could change the column to NOT NULL and thus solve that problem.  But strictly speaking, that’s more risky as we may not know for certain that the application will never use a NULL in that column. So that’s not ideal.

IF all tables without a PK have existing (or possible) UNIQUE indexes without any NULLable columns, we can resolve this issue. But as you can appreciate, that won’t always be the case.

With MariaDB 10.3 and INVISIBLE columns

Now for Plan B (or actually, our new plan A as it’s much nicer).  Let’s take a closer look at the foo table from the above example:


CREATE TABLE foo (
a int(11) NOT NULL,
b int(11) DEFAULT NULL,
UNIQUE KEY uidx (a,b)
) ENGINE=InnoDB

Our new solution:

ALTER TABLE foo ADD COLUMN id INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY INVISIBLE FIRST

So we’re adding a AUTO_INCREMENT new column named ‘id’, technically first in the table, but we also flag it as invisible.

Normally, you won’t notice, see how we can use INSERT without an explicit (a,b) column list:


MariaDB [test]> INSERT INTO  foo VALUES (2,24);
Query OK, 1 row affected (0.007 sec)

MariaDB [test]> SELECT * FROM foo;
+---+------+
| a | b    |
+---+------+
| 1 |   20 |
| 2 |   24 |
+---+------+

So it won’t even show up with SELECT *.  Fabulous.  We can see the column only if we explicitly ask for it:


MariaDB [test]> SELECT id,a,b FROM foo;
+----+---+------+
| id | a |    b |
+----+---+------+
|  1 | 1 |   20 |
|  2 | 2 |   24 |
+----+---+------+

We solved our table structural issue by introducing a clean AUTO_INCREMENT id column covered by a PRIMARY KEY, while the application remains perfectly happy with what it can SELECT and INSERT. Total win!

Posted on