A useful overview of options, syntax and tools that have been deprecated or removed for the upcoming MySQL 5.7 release.
This is a little quiz (could be a discussion). I know what we tend to prefer (and why), but we’re interested in hearing additional and other opinions!
Given the way MySQL/MariaDB is architected, what would you prefer to see in a new server, more cores or higher clock speed? (presuming other factors such as CPU caches and memory access speed are identical).
For example, you might have a choice between
- 2x 2.4GHz 6 core, or
- 2x 3.0GHz 4 core
which option would you pick for a (dedicated) MySQL/MariaDB server, and why?
And, do you regard the “total speed” (N cores * GHz) as relevant in the decision process? If so, when and to what degree?
Open Query is supporting the mysql-cli Kickstarter project (for MySQL and MariaDB) by Amjith Ramanujam who already successfully completed a similar tool for PostgreSQL.
It is a new MySQL client with Auto-Completion and Syntax Highlighting. From the info provided, it’s Python based, thus portable, and can be installed without root access. Could be a very useful tool. The good old mysql command line client does lack some things, yet a relatively low-level command line client is often useful for remote tasks (as opposed to graphical tools) so we reckon it’s good that this realm gets a bit of attention!
Because interactivity with the trainer (our classes are not dry lectures) and discussions are an important and intrinsic part of our teaching approach, we’ve long tracked development of technologies for online training, but previously were not satisfied.
High costs of various corporate offerings would negatively impact our pricing, given the relatively small scale use and our purposely small classes. The student system requirements would often be problematic – obviously students use different operating systems (Windows, Mac, Linux) and we cannot prescribe that people use a particular OS.
Big Blue Button has long looked like it had the right potential, and it’s now developed to a point where were happy with it. For more tech and practical details, see our Interactive Online Training page.
After our successful trial runs, we have the following course modules scheduled in the next few weeks, others to follow soon:
- 25-26 Mar 2015: MariaDB 10.0 & MySQL 5.6 Upgrade and Features (AUD 195 + GST)
- 30 Mar – 2 Apr 2015: MySQL/MariaDB Backup, Recovery and Replication (incl Galera) Workshop (AUD 595 + GST)
The date ranges may appear a tad odd at first, but what we do is run each original day-module across two sessions over two days, in this case noon-3.30pm Sydney time. An online session has three 10 minute breaks.
As you can also see the pricing is pretty neat – we can do that since we control the infrastructure and obviously don’t have trainer travel, venue and catering to worry about. No travel hassles for you, either! You should find a quiet spot and try and not get interrupted.
All the interactivity, discussion and hands-on work is there as normal, Open Query provides the VMs – and students (and trainer) can access each other’s session, where needed. We’re pretty pleased with the set up.
Naturally we can also do custom training in this format – we do still offer on-site training as well.
For bookings, or if you’d like more information or have other questions, contact us today!
Maish Saidel-Keesing writes:
VMware announced last week the launch of VIO and there are a number of things that I think people are missing and should be pointed out.
Building on the original kSplice idea and combining the efforts of the work done at Red Hat and SuSE, common infrastructure is now ready to be put into the Linux 3.20 mainline kernel – Red Hat and SuSE have already committed to using this.
I still reckon it’s freaky trickery, but heck – it works, and it’s great for server environments that have no redundancy (I prefer to fix that issue!) and can’t afford any downtime.
The MySQL/MariaDB optimiser likes to know things like the cardinality of an index – that is, the number of distinct values the index holds. For a PRIMARY KEY, which only has unique values, the number is the same as the number of rows. For an indexed column that is boolean (such as yes/no) the cardinality would be 2.
There’s more to it than that, but the point is that the optimiser needs some statistics from indexes in order to try and make somewhat sane decisions about which index to use for a particular query. The statistics also need to be updated when a significant number of rows have been added, deleted, or modified.
In MyISAM, ANALYZE TABLE does a tablescan where everything is tallied, and the index stats are updated. InnoDB, on the other hand, has always done “index dives”, looking at a small sample and deriving from that. That can be ok as a methodology, but unfortunately the history is awkward. The number used to be a constant in the code (4), and that is inadequate for larger tables. Later the number was made a server variable innodb_stats_sample_pages and its default is now 8 – but that’s still really not enough for big(ger) tables.
We recently encountered this issue again with a client, and this time it really needed addressing as no workarounds were effective across the number of servers and of course over time. Open Query engineer Daniel filed https://mariadb.atlassian.net/browse/MDEV-7084 which was picked up by MariaDB developed Jan Lindström.
Why not just set the innodb_stats_sample_pages much higher? Well, every operation takes time, so setting the number appropriate for your biggest table means that the sampling would take unnecessarily long for all the other (smaller, or even much smaller) tables. And that’s just unfortunate.
So why doesn’t InnoDB just scale the sample size along with the table size? Because, historically, it didn’t know the table size: InnoDB does not maintain a row count (this has to do with its multi-versioned architecture and other practicalities – as with everything, it’s a trade-off). However, these days we have persistent stats tables – rather than redoing the stats the first time a table is opened after server restart, they’re stored in a table. Good improvement. As part of that information, InnoDB now also knows how many index pages (and leaf nodes in its B+Tree) it has for each table. And while that’s not the same as a row count (rows have a variable length so there’s no fixed number of rows per index page), at least it grows along with the table. So now we have something to work with! The historical situation is no longer a hindrance.
In order to scale the sample size sanely, that is not have either too large a number for small tables, or a number for big tables that’s over the top, we’ll want some kind of logarithmic scale. For instance, log2(16 thousand) = 14, and log2(1 billion) = 30. That’s small enough to be workable. The new code as I suggested:
n_sample_pages = max(min(srv_stats_sample_pages, index->stat_index_size), log2(index->stat_index_size) * srv_stats_sample_pages);
This is a shorter construct (using min/max instead of ifs) of what was already there, combined with the logarithmic sample basis. For very small tables, either the innodb_stats_sample_pages number if used or the actual number of pages, whichever is smaller – for bigger tables, the log2 of the #indexpages is used, multiplied by the dynamic system variable innodb_stats_sample_pages. So we can still scale and thus influence the system in case we want more samples. Simple, but it seems effective – and it any case we get decent results in many more cases than before, so it’s a worthwhile improvement. Obviously, since it’s a statistical sample, it could still be wrong for an individual case.
Jan reckons that just like MyISAM, InnoDB should do a table scan and work things out properly – I agree, this makes sense now that we have persistent stats. So the above is a good fix for 5.5 and 10.0, and the more significant change to comprehensive stats can be in a near future major release. So then we have done away with the sampling altogether, instead basing the info on the full dataset. Excellent.
Another issue that needed to be dealt with is when InnoDB recalculates the statistics. You don’t want to do it on every change, but regularly if there has been some change is good as it might affect which indexes should be chosen for optimal query execution. The hardcoded rule was 1/16th of the table or 2 billion rows, whichever comes first. Again that’s unfortunate, because for a bigger table 1/16th still amounts to a very significant number. I’d tend towards setting the upper bound to say 100,000. Jan put in a new dynamic system variable for this, stat_modified_counter. It essentially replaces the old static value of 2 billion, providing us with a configurable upper bound. That should do nicely!
Once again horay for open source, but in particular responsive and open development. If someone reports a bug and there is no interaction between developer and the outside world until something is released, the discussion and iterative improvement process to come to a good solution cannot occur. The original code definitely worked as designed, but it was no longer suitable in the context of today’s usage/needs. The user-dev interaction allowed for a much better conclusion.