Posted on

Web Security: SHA1 SSL Deprecated

You may not be aware that the mechanism used to fingerprint the SSL certificates that  keep your access to websites encrypted and secure is changing. The old method, known as SHA1 is being deprecated – meaning it will no longer be supported. As per January 2016 various vendors will no longer support creating certificates with SHA1, and browsers show warnings when they encounter an old SHA1 certificate. Per January 2017 browsers will reject old certificates.

The new signing method, known as SHA2, has been available for some time. Users have had a choice of signing methods up until now, but there are still many sites using old certificates out there. You may want to check the security on any SSL websites you own or run!

To ensure your users’ security and privacy, force https across your entire website, not just e-commerce or other sections. You may have noticed this move on major websites over the last few years.

For more information on the change from SHA1 to SHA2 you can read:

To test if your website is using a SHA1 or SHA2 certificate you can use one of the following tools:

Open Query also offers a Security Review package, in which we check on a broad range of issues in your system’s front-end and back-end and provide you with an assessment and recommendations. This is most useful if you are looking at a form of security certification.

Posted on
Posted on

Motivation to Migrate RDBMS

http://www.itnews.com/article/3004953/use-oracles-database-watch-out-for-this-dec-1-deadline.html

Companies that use a standard edition of Oracle’s database software should be aware that a rapidly approaching deadline could mean increased licensing costs.

Speaking from experience (at both MySQL AB and Open Query), typically, licensing/pricing changes such as these act as a motivator for migrations.

Migrations are a nuisance (doesn’t matter from/to what platform) and are best avoided as they’re intrinsically painful, costly and time-consuming. Smart companies know this.

When asked in generic terms, we generally recommend against migrations (even to MySQL/MariaDB) for the above-mentioned practical and business reasons. There are also technical reasons. I’ll list a few:

  • application, query and schema design tends to be most tuned to a particular RDBMS, usually the one the main developer(s) are familiar with. Features are used in a certain way, and the original target platform (even if non deliberate) is likely to execute most efficiently;
  • RDBMS choice drives hardware/network architecture. A migration should also include a re-think of this, to make optimal use of the database platform;
  • it’s quite rare (but not unheard of!) for an application to perform better on another platform, without putting a lot of extra work in. If extra work is on the table, then the original DB platform should also be considered as a valid option;
  • related to other points: a desire to migrate might be based on employees’ expertise with a particular platform rather than this particular application’s intrinsic suitability to that platform. While that can be a valid reason, it should be recognised as the actual reason as there are obviously cost/effort implications in terms of migration cost and other options such as training can be considered.
Nevertheless, a company that’s really annoyed by a vendor’s attitude can opt for the migration route, as they may decide it’s the path of less pain (and lower cost) in the long(er) term.

We do occasionally guide and assist with migrations, if after review it looks like a viable and sensible direction to take.

Posted on
Posted on 5 Comments

Slow Query Log Rotation

Some time ago, Peter Boros at Percona wrote this post: Rotating MySQL slow logs safely. It contains good info, such as that one should use the rename method for rotation (rather than copytruncate), and then connect to mysqld and issue a FLUSH LOGS (rather than send a SIGHUP signal).

So far so good. What I do not agree with is the additional construct to prevent slow queries from being written during log rotation. The author’s rationale is that if too many items get written while the rotation is in process, this can block threads. I understand this, but let’s review what actually happens.

Indeed, if one were to do lots of writes to the slow query log in a short space of time, a write could block while waiting.

Is the risk of this occurring greater during a logrotate operation? I doubt it. A FLUSH LOGS has to close and open the file. While there is no file open, no writes can occur anyhow and they may be stored in the internal buffer of the lowlevel MySQL code for this.

In any case, if there is such a high write rate, that is an issue in itself: it is not useful to have the slow query log write that fast. Instead, you’d up the long_query_time and min_examined_rows variables to reduce the effectively “flow rate”. It’s always best to resolve an underlying issue rather than its symptom(s).

Posted on 5 Comments
Posted on

Using Persistent Memory in RDBMS

People at Intel started the pmem library project some time ago, it’s open to the broader community at GitHub and  other developers, including Linux kernel devs, are actively involved.

While the library does allow interaction with an SSD using a good-old-filesystem, we know that addressing SSD through SATA or SAS is very inefficient. That said, the type of storage architecture that SSD uses does require significant management for write levelling and verifying so that the device as a whole actually lasts, and your data is kept safe: in theory you could write to an NVRAM chip, and not know when it didn’t actually store your data properly.

But there are other technologies, such as Memristor (RRAM) and Phase Change Memory (PCM, PRAM). Numonyx (founded by Intel and others, since acquired by Micron) was one of the companies developing PCM some years ago, to the point of some commercial applications. Somewhat oddly (in my opinion), Micron ditched their PCM line in 2014 focusing more on 3D NAND technology. In 2015, Intel and Micron announced that they were working on something called 3D XPoint but Micron denies that it’s based on PCM.

I like the concept of PCM because it has a lot of advantages over NAND technology. It’s very stable, doesn’t “bleed” to adjacent memory cells, if it writes correctly it’s stored correctly, and it’s fast. Not as fast as ordinary RAM, but it’s persistent! What I’ve been holding out for is just a small amount of PCM or similar storage in computers, phones, tablets and e-book readers.

In small mobile devices the advantage would be vastly reduced power consumption. ARM processors are able to put entire sections of the processor in standby to save power, but RAM needs to be powered and refreshed regularly. So with persistent memory, a device could maintain state while using hardly any power.

For RDBMS such as MySQL and MariaDB, persistent memory could be used for the InnoDB log files and other relatively small state information that needs to be persistently kept. So this storage would behave likely memory and be addressed as such (pmem uses mmap), but be persistent. So you could commit a transaction, your fsync is very quick, and the transactional information has been stored in a durable fashion. Very shiny, right?

It doesn’t need to be large, something like 512MB would be ample for RDBMS, and possibly much less for mobile devices.

I still reckon persistent memory space has huge potential – and I mention the mobile devices because that’s obviously a larger market. Previously Micron did work with Nokia on using NVM in phones, but as we all know Nokia was acquired and the Micron focus changed. I find the current state of it all quite disappointing, but I do hope the various players in this field will soon focus on this again properly and get the tech out there to be used!

If you happen to know of any current developments and activities, I’d like to hear about it!

Posted on
Posted on

on ORDER BY optimization | Domas Mituzas

http://dom.as/2015/07/30/on-order-by-optimization/

An insightful exploration by Domas (Facebook) on how some of the MySQL optimiser’s decision logic is sometimes naive, in this case regarding ORDER BY optimisation.

Quite often, “simple” logic can work better than complex logic as chasing all the corner cases can just make things worse – but sometimes, logic can be too simple.

Everything must be made as simple as possible, but no simpler.
— Albert Einstein / Roger Sessions

Posted on
Posted on 4 Comments

More Cores or Higher Clock Speed?

This is a little quiz (could be a discussion). I know what we tend to prefer (and why), but we’re interested in hearing additional and other opinions!

Given the way MySQL/MariaDB is architected, what would you prefer to see in a new server, more cores or higher clock speed? (presuming other factors such as CPU caches and memory access speed are identical).

For example, you might have a choice between

  • 2x 2.4GHz 6 core, or
  • 2x 3.0GHz 4 core

which option would you pick for a (dedicated) MySQL/MariaDB server, and why?

And, do you regard the “total speed” (N cores * GHz) as relevant in the decision process? If so, when and to what degree?

Posted on 4 Comments
Posted on

mysql-cli on Kickstarter

Open Query is supporting the mysql-cli Kickstarter project (for MySQL and MariaDB) by Amjith Ramanujam who already successfully completed a similar tool for PostgreSQL.

It is a new MySQL client with Auto-Completion and Syntax Highlighting. From the info provided, it’s Python based, thus portable, and can be installed without root access. Could be a very useful tool. The good old mysql command line client does lack some things, yet a relatively low-level command line client is often useful for remote tasks (as opposed to graphical tools) so we reckon it’s good that this realm gets a bit of attention!

Posted on
Posted on

Interactive Online Training for MySQL and MariaDB

Because interactivity with the trainer (our classes are not dry lectures) and discussions are an important and intrinsic part of our teaching approach, we’ve long tracked development of technologies for online training, but previously were not satisfied.

High costs of various corporate offerings would negatively impact our pricing, given the relatively small scale use and our purposely small classes. The student system requirements would often be problematic – obviously students use different operating systems (Windows, Mac, Linux) and we cannot prescribe that people use a particular OS.

Big Blue Button has long looked like it had the right potential, and it’s now developed to a point where were happy with it. For more tech and practical details, see our Interactive Online Training page.

After our successful trial runs, we have the following course modules scheduled in the next few weeks, others to follow soon:

The date ranges may appear a tad odd at first, but what we do is run each original day-module across two sessions over two days, in this case noon-3.30pm Sydney time. An online session has three 10 minute breaks.

As you can also see the pricing is pretty neat – we can do that since we control the infrastructure and obviously don’t have trainer travel, venue and catering to worry about. No travel hassles for you, either! You should find a quiet spot and try and not get interrupted.

All the interactivity, discussion and hands-on work is there as normal, Open Query provides the VMs – and students (and trainer) can access each other’s session, where needed. We’re pretty pleased with the set up.

Naturally we can also do custom training in this format – we do still offer on-site training as well.

For bookings, or if you’d like more information or have other questions, contact us today!

Posted on
Posted on

LKML: Live patching for 3.20

https://lkml.org/lkml/2015/2/9/534

Building on the original kSplice idea and combining the efforts of the work done at Red Hat and SuSE, common infrastructure is now ready to be put into the Linux 3.20 mainline kernel – Red Hat and SuSE have already committed to using this.

I still reckon it’s freaky trickery, but heck – it works, and it’s great for server environments that have no redundancy (I prefer to fix that issue!) and can’t afford any downtime.

Posted on
Posted on 3 Comments

Improving InnoDB index statistics

The MySQL/MariaDB optimiser likes to know things like the cardinality of an index – that is, the number of distinct values the index holds. For a PRIMARY KEY, which only has unique values, the number is the same as the number of rows.  For an indexed column that is boolean (such as yes/no) the cardinality would be 2.

There’s more to it than that, but the point is that the optimiser needs some statistics from indexes in order to try and make somewhat sane decisions about which index to use for a particular query. The statistics also need to be updated when a significant number of rows have been added, deleted, or modified.

In MyISAM, ANALYZE TABLE does a tablescan where everything is tallied, and the index stats are updated. InnoDB, on the other hand, has always done “index dives”, looking at a small sample and deriving from that. That can be ok as a methodology, but unfortunately the history is awkward. The number used to be a constant in the code (4), and that is inadequate for larger tables. Later the number was made a server variable innodb_stats_sample_pages and its default is now 8 – but that’s still really not enough for big(ger) tables.

We recently encountered this issue again with a client, and this time it really needed addressing as no workarounds were effective across the number of servers and of course over time. Open Query engineer Daniel filed https://mariadb.atlassian.net/browse/MDEV-7084 which was picked up by MariaDB developed Jan Lindström.

Why not just set the innodb_stats_sample_pages much higher? Well, every operation takes time, so setting the number appropriate for your biggest table means that the sampling would take unnecessarily long for all the other (smaller, or even much smaller) tables. And that’s just unfortunate.

So why doesn’t InnoDB just scale the sample size along with the table size? Because, historically, it didn’t know the table size: InnoDB does not maintain a row count (this has to do with its multi-versioned architecture and other practicalities – as with everything, it’s a trade-off). However, these days we have persistent stats tables – rather than redoing the stats the first time a table is opened after server restart, they’re stored in a table. Good improvement. As part of that information, InnoDB now also knows how many index pages (and leaf nodes in its B+Tree) it has for each table. And while that’s not the same as a row count (rows have a variable length so there’s no fixed number of rows per index page), at least it grows along with the table. So now we have something to work with! The historical situation is no longer a hindrance.

In order to scale the sample size sanely, that is not have either too large a number for small tables, or a number for big tables that’s over the top, we’ll want some kind of logarithmic scale. For instance, log2(16 thousand) = 14, and log2(1 billion) = 30. That’s small enough to be workable. The new code as I suggested:

n_sample_pages = max(min(srv_stats_sample_pages, index->stat_index_size), log2(index->stat_index_size) * srv_stats_sample_pages);

This is a shorter construct (using min/max instead of ifs) of what was already there, combined with the logarithmic sample basis. For very small tables, either the innodb_stats_sample_pages number if used or the actual number of pages, whichever is smaller – for bigger tables, the log2 of the #indexpages is used, multiplied by the dynamic system variable innodb_stats_sample_pages. So we can still scale and thus influence the system in case we want more samples. Simple, but it seems effective – and it any case we get decent results in many more cases than before, so it’s a worthwhile improvement. Obviously, since it’s a statistical sample, it could still be wrong for an individual case.

Jan reckons that just like MyISAM, InnoDB should do a table scan and work things out properly – I agree, this makes sense now that we have persistent stats. So the above is a good fix for 5.5 and 10.0, and the more significant change to comprehensive stats can be in a near future major release. So then we have done away with the sampling altogether, instead basing the info on the full dataset. Excellent.

Another issue that needed to be dealt with is when InnoDB recalculates the statistics. You don’t want to do it on every change, but regularly if there has been some change is good as it might affect which indexes should be chosen for optimal query execution. The hardcoded rule was 1/16th of the table or 2 billion rows, whichever comes first. Again that’s unfortunate, because for a bigger table 1/16th still amounts to a very significant number. I’d tend towards setting the upper bound to say 100,000. Jan put in a new dynamic system variable for this, stat_modified_counter. It essentially replaces the old static value of 2 billion, providing us with a configurable upper bound. That should do nicely!

Once again horay for open source, but in particular responsive and open development. If someone reports a bug and there is no interaction between developer and the outside world until something is released, the discussion and iterative improvement process to come to a good solution cannot occur. The original code definitely worked as designed, but it was no longer suitable in the context of today’s usage/needs. The user-dev interaction allowed for a much better conclusion.

Posted on 3 Comments