The MySQL/MariaDB optimiser likes to know things like the cardinality of an index – that is, the number of distinct values the index holds. For a PRIMARY KEY, which only has unique values, the number is the same as the number of rows. For an indexed column that is boolean (such as yes/no) the cardinality would be 2.
There’s more to it than that, but the point is that the optimiser needs some statistics from indexes in order to try and make somewhat sane decisions about which index to use for a particular query. The statistics also need to be updated when a significant number of rows have been added, deleted, or modified.
In MyISAM, ANALYZE TABLE does a tablescan where everything is tallied, and the index stats are updated. InnoDB, on the other hand, has always done “index dives”, looking at a small sample and deriving from that. That can be ok as a methodology, but unfortunately the history is awkward. The number used to be a constant in the code (4), and that is inadequate for larger tables. Later the number was made a server variable innodb_stats_sample_pages and its default is now 8 – but that’s still really not enough for big(ger) tables.
We recently encountered this issue again with a client, and this time it really needed addressing as no workarounds were effective across the number of servers and of course over time. Open Query engineer Daniel filed https://mariadb.atlassian.net/browse/MDEV-7084 which was picked up by MariaDB developed Jan Lindström.
Why not just set the innodb_stats_sample_pages much higher? Well, every operation takes time, so setting the number appropriate for your biggest table means that the sampling would take unnecessarily long for all the other (smaller, or even much smaller) tables. And that’s just unfortunate.
So why doesn’t InnoDB just scale the sample size along with the table size? Because, historically, it didn’t know the table size: InnoDB does not maintain a row count (this has to do with its multi-versioned architecture and other practicalities – as with everything, it’s a trade-off). However, these days we have persistent stats tables – rather than redoing the stats the first time a table is opened after server restart, they’re stored in a table. Good improvement. As part of that information, InnoDB now also knows how many index pages (and leaf nodes in its B+Tree) it has for each table. And while that’s not the same as a row count (rows have a variable length so there’s no fixed number of rows per index page), at least it grows along with the table. So now we have something to work with! The historical situation is no longer a hindrance.
In order to scale the sample size sanely, that is not have either too large a number for small tables, or a number for big tables that’s over the top, we’ll want some kind of logarithmic scale. For instance, log2(16 thousand) = 14, and log2(1 billion) = 30. That’s small enough to be workable. The new code as I suggested:
n_sample_pages = max(min(srv_stats_sample_pages, index->stat_index_size), log2(index->stat_index_size) * srv_stats_sample_pages);
This is a shorter construct (using min/max instead of ifs) of what was already there, combined with the logarithmic sample basis. For very small tables, either the innodb_stats_sample_pages number if used or the actual number of pages, whichever is smaller – for bigger tables, the log2 of the #indexpages is used, multiplied by the dynamic system variable innodb_stats_sample_pages. So we can still scale and thus influence the system in case we want more samples. Simple, but it seems effective – and it any case we get decent results in many more cases than before, so it’s a worthwhile improvement. Obviously, since it’s a statistical sample, it could still be wrong for an individual case.
Jan reckons that just like MyISAM, InnoDB should do a table scan and work things out properly – I agree, this makes sense now that we have persistent stats. So the above is a good fix for 5.5 and 10.0, and the more significant change to comprehensive stats can be in a near future major release. So then we have done away with the sampling altogether, instead basing the info on the full dataset. Excellent.
Another issue that needed to be dealt with is when InnoDB recalculates the statistics. You don’t want to do it on every change, but regularly if there has been some change is good as it might affect which indexes should be chosen for optimal query execution. The hardcoded rule was 1/16th of the table or 2 billion rows, whichever comes first. Again that’s unfortunate, because for a bigger table 1/16th still amounts to a very significant number. I’d tend towards setting the upper bound to say 100,000. Jan put in a new dynamic system variable for this, stat_modified_counter. It essentially replaces the old static value of 2 billion, providing us with a configurable upper bound. That should do nicely!
Once again horay for open source, but in particular responsive and open development. If someone reports a bug and there is no interaction between developer and the outside world until something is released, the discussion and iterative improvement process to come to a good solution cannot occur. The original code definitely worked as designed, but it was no longer suitable in the context of today’s usage/needs. The user-dev interaction allowed for a much better conclusion.