Posted on

Press Release 2018-09-11: Open Query acquired by Catalyst IT Australia Pty Limited

We are pleased to announce that Open Query, Queensland-based provider of MySQL, MariaDB and related services which just celebrated its 11-th anniversary, has been acquired by Catalyst IT Australia.

Founded in New Zealand in 1997, Catalyst is an experienced and respected Open Source integrator.  Catalyst is looking forward to the opportunity to work with the current Open Query clients as well as with new prospects. Catalyst offers a broad suite of Enterprise services, including support and custom development for Drupal CMS, SilverStripe CMS, Moodle LMS, Samba and other software, as well as fully managed hosting on AWS and other platforms.

“Catalyst’s core values are very much aligned with those of Open Query, which is why we are particularly pleased with this outcome”, notes Arjen Lentz, Founder and Exec.Director of Open Query.

Catalyst IT Australia has offices in Sydney, Melbourne and Brisbane.

Contacts

For Open Query Pty Ltd

Arjen Lentz, Exec.Director
https://openquery.com.au

For Catalyst IT Australia Pty Limited

Andrew Boag, Managing Director
https://www.catalyst-au.net/
Phone (02) 8203 9777

Posted on
Posted on

My oldest still-open MySQL bug

a bugThis morning I received an update on a MySQL bug, someone added a comment on an issue I filed in November 2003, originally for MySQL 4.0.12 and 4.1.0. It’s MySQL bug#1956 (note the very low number!), “Parser allows naming of PRIMARY KEY”:

[25 Nov 2003 16:23] Arjen Lentz
Description:
When specifying a PRIMARY KEY, the parser still accepts a name for the index even though the MySQL server will always name it PRIMARY.
So, the parser should NOT accept this, otherwise a user can get into a confusing situation as his input is ignored/changed.

How to repeat:
CREATE TABLE pk (i INT NOT NULL);
ALTER TABLE pk ADD PRIMARY KEY bla (i);

'bla' after PRIMARY KEY should not be accepted.

Suggested fix:
Fix grammar in parser.

Most likely we found it during a MySQL training session, training days have always been a great bug catching source as students would be encouraged to try lots of quirky things and explore.

It’s not a critical issue, but one from the “era of sloppiness” as I think we may now call it, with the benefit of hindsight.  At the time, it was just regarded as lenient: the parser would silently ignore things it couldn’t process in a particular context.  Like creating a foreign key on a table using an engine that didn’t support foreign keys would see the foreign keys silently disappear, rather than reporting an error.  Many  if not most of those quirks have been cleaned up over the years.  Some are very difficult to get rid of, as people use them in code and thus essentially we’re stuck with old bad behaviour as otherwise we’d break to many applications.  I think that one neat example of that is auto-casting:

SELECT "123 apples" + 1;
+------------------+
| "123 apples" + 1 |
+------------------+
|              124 |
+------------------+
1 row in set, 1 warning (0.000 sec)

SHOW WARNINGS;
+---------+------+-----------------------------------------------+
| Level   | Code | Message                                       |
+---------+------+-----------------------------------------------+
| Warning | 1292 | Truncated incorrect DOUBLE value: '123 hello' |
+---------+------+-----------------------------------------------+

At least that one chucks a warning now, which is good as it allows developers to catch mistakes and thus prevent trouble.  A lenient parser (or grammar) can be convenient, but it tends to also enable application developers to be sloppy, which is not really a good thing.  Something that creates a warning may be harmless, or an indication of a nasty mistake: the parser can’t tell.  So it’s best to ensure that code is clean, free even of warnings.

Going back to the named PRIMARY KEY issue… effectively, a PRIMARY KEY has the name ‘PRIMARY’ internally.  So it can’t really have another name, and the parser should not accept an attempt to name it.  The name silently disappears so when you check back in SHOW CREATE TABLE or SHOW INDEXES, you won’t ever see it.
CREATE TABLE pkx (i INT PRIMARY KEY x) reports a syntax error. So far so good.
But, CREATE TABLE pkx (i INT, PRIMARY KEY x (i)) is accepted, as is ALTER TABLE pkx ADD PRIMARY KEY x (i).

MySQL 8.0 still allows these constructs, as does MariaDB 10.3.9 !

The bug is somewhere in the parser, so that’s sql/sql_yacc.yy.  I’ve tweaked and added things in there before, but it’s been a while and particularly in parser grammar you have to be very careful that changes don’t have other side-effects.  I do think it’s worthwhile fixing even these minor issues.

Posted on
Posted on

INSERT IGNORE … ON DUPLICATE KEY UPDATE … allowed

Not serious, but just interesting: MDEV-16925: INSERT IGNORE … ON DUPLICATE KEY UPDATE … allowed

  • INSERT IGNORE allows an insert to come back with “ok” even if the row already exists.
  • INSERT … ON DUPLICATE KEY UPDATE … intends the UPDATE to be executed in case the row already exists.

So combining the two makes no sense, and while harmless, perhaps it would be better if the parser were to throw a syntax error for it.
Currently, it appears the server executes the ON DUPLICATE KEY, thus disregarding the IGNORE:

MariaDB [test]> create table dup (a int, b int, unique (a,b));
Query OK, 0 rows affected (0.019 sec)

MariaDB [test]> insert into dup values (1,20),(2,24);
Query OK, 2 rows affected (0.013 sec)
Records: 2  Duplicates: 0  Warnings: 0

MariaDB [test]> select * from dup;
+------+------+
| a    | b    |
+------+------+
|    1 |   20 |
|    2 |   24 |
+------+------+
2 rows in set (0.000 sec)

MariaDB [test]> insert ignore foo values (2,24) on duplicate key update b=23;
Query OK, 1 row affected (0.006 sec)

MariaDB [test]> select * from dup;
+------+------+
| a    | b    |
+------+------+
|    1 |   20 |
|    2 |   24 |
+------+------+
2 rows in set (0.000 sec)

MariaDB [test]> insert ignore dup values (2,24) on duplicate key update b=23;
Query OK, 2 rows affected (0.007 sec)

MariaDB [test]> select * from dup;
+------+------+
| a    | b    |
+------+------+
|    1 |   20 |
|    2 |   23 |
+------+------+
2 rows in set (0.001 sec)

It has been noted that INSERT can chuck errors for reasons other than a duplicate key.
As the manual states:

“IGNORE has a similar effect on inserts into partitioned tables where no partition matching a given value is found. Without IGNORE, such INSERT statements are aborted with an error. When INSERT IGNORE is used, the insert operation fails silently for rows containing the unmatched value, but inserts rows that are matched.”

One example would be a foreign key check failing, in this case IGNORE would see the INSERT return without error (but without inserting a row, of course).

However, when combined with ON DUPLICATE KEY may create an ambiguity. If an fk check fails, but there is also a dup key clause, which one takes precedence? That is, will the on dup key be executed in that case if IGNORE is also specified, or will the server just return without error because of the fk check fail?
If we intend to allow this syntax, then the exact behaviour for each of the possible combinations of errors should be documented.

Posted on
Posted on

Australian government encryption folly

data encryption lockIn IDG’s CIO magazine (17 July 2018): Wickr, Linux Australia, Twilio sign open letter against govt’s encryption crackdown ‘mistake’.  Not just those few, though, 76 companies and organisations signed that letter.

Learning Lessons

Encryption is critical to the whole “online thing” working, both for individuals as well as companies.  Let’s look back at history:

  • In most countries’ postal legislation, there’s a law against the Post Office opening letters and packages.
  • Similarly, a bit later, telephone lines couldn’t get tapped unless there’s a very specific court order.

This was not just a nice gesture or convenience.  It was critical for

  1. trusting these new communication systems and their facilitating companies, and
  2. enabling people to communicate at greater distances and do business that way, or arrange other matters.

Those things don’t work if there are third parties looking or listening in, and it really doesn’t matter who or what the third party is. Today’s online environment is really not that different.

Various governments have tried to nobble encryption at various points over the past few decades: from trying to ban encryption outright, to requiring super keys in escrow, to exploiting (and possibly creating) weaknesses in encryption mechanisms.  The latter in particular is very damaging, because it’s so arrogant: the whole premise becomes that “your people” are the only bright ones that can figure out and exploit the weakness. Such presumptions are always wrong, even if there is no public proof. A business or criminal organisation who figures it out can make good use of it for their own purposes, provided they keep quiet about it.  And companies are, ironically, must better than governments at keeping secrets.

Best Practice

Apps and web sites, and facilitating infrastructures, should live by the following basic rules:

  • First and foremost, get people on staff or externally who really understand encryption and privacy. Yes, geeks.
  • Use proper and current encryption and signature mechanisms, without shortcuts.  A good algorithm incorrectly utilised is not secure.
  • Only ask for and store data you really need, nothing else.
  • Be selective with collecting metadata (including logging, third party site plugins, advertising, etc). It can easily be a privacy intrusion and also have security consequences.
  • Only retain data as per your actual needs (and legal requirements), no longer.
  • When acting as an intermediary, design for end-to-end encryption with servers merely passing on the encrypted data.

Most of these aspects interact.  For example: if you just pass on encrypted data, but meanwhile collect and store an excess of metadata, you’re not actually delivering a secure environment. On the other hand, by not having data, or keys, or metadata, you’ll be neither a target for criminals, nor have anything to hand over to a government.

See also our earlier post on Keeping Data Secure.

But what about criminals?

Would you, when following these guidelines, enable criminals? Hardly. The proper technology is available anyway, and criminal elements who are not technically capable can hire that knowledge and skill to assist them. Fact is, some smart people “go to the dark side” (for reasons of money, ego, or whatever else).  You don’t have to presume that your service or app is the one thing that enables these people. It’s just not.  Which is another reason why these government “initiatives” are folly: they’ll fail at their intended objective, while at the same time impairing and limiting general use of essential security mechanisms.  Governments themselves could do much better by hiring and listening to people who understand these matters.

Posted on
Posted on

MariaDB Galera cluster and GTID

In MariaDB 10.2.12, these two don’t yet work together. GTID = Global Transaction ID.  In the master-slave asynchronous replication realm, this means that you can reconnect a slave to another server (change its master) and it’ll happily continue replicating from the correct point.  No more fussing with filenames and offsets (which of course will both differ on different machines).

So in concept the GTIID is “globally” unique – that means it’s consistent across an entire infra: a binlogged write transaction will have the same GTID no matter on which machine you look at it.

  • OK: if you are transitioning from async replication to Galera cluster, and have a cluster as slave of the old infra, then GTID will work fine.
  • PROBLEM: if you want to run an async slave in a Galera cluster, GTID will currently not work. At least not reliably.

The overview issue is MDEV-10715, the specific problem is documented in MDEV-14153 with some comments from me from late last week. MDEV-14153 documents cases where the GTID is not in fact consistent – and the way in which it isn’t is most disturbing.

The issue appears as “drift”. A GTID is made up of R-S-# where R is replication domain (0 unless set by an app), S for server-id where the write was originally done, and # which is just a number. The only required rule for the # is that that each next event has to have a higher number than the previous.  In principle there could be #s missing, that’s ok.

In certain scenarios, the # part of the GTID falls behind on the “other nodes” in the Galera cluster. There was the node where the statement was first issued, and then there are the other nodes which pick up the change through the Galera (wsrep) cluster mechanism. Those other nodes.  So at that point, different nodes in the cluster have different GTIDs for the same query. Not so great.

To me, this looked like a giant red flag though: if a GTID is assigned on a commit, and then replicated through the cluster as part of that commit, it can’t change. Not drift, or any other change. So the only possible conclusion must be that it is in fact not passed through the cluster, but “reinvented” by a receiving cluster node, which simply assumes that the current event from a particular server-id is previous-event id + 1.  That assumption is false, because as I mentioned above it’s ok for gaps to exist.  As long as the number keeps going up, it’s fine.

Here is one of the simplest examples of breakage (extract from a binlog, with obfuscated table names):

# at 12533795
#180704 5:00:22 server id 1717 end_log_pos 12533837 CRC32 0x878fe96e GTID 0-1717-1672559880 ddl
/*!100001 SET @@session.gtid_seq_no=1672559880*//*!*/;
# at 12533837
#180704 5:00:22 server id 1717 end_log_pos 12534024 CRC32 0xc6f21314 Query thread_id=4468 exec_time=0 error_code=0
SET TIMESTAMP=1530644422/*!*/;
SET @@session.time_zone='SYSTEM'/*!*/;
DROP TEMPORARY TABLE IF EXISTS `qqq`.`tmp_foobar` /* generated by server */
/*!*/;

Fact: temporary tables are not replicated (imagine restarting a slave, it wouldn’t have whatever temporary tables were supposed to exist). So, while this event is stored in the binary log (which it is to ensure that if you replay the binlog on a machine, it correctly drops the temporary table after creating and using it), it won’t go through a cluster.  Remember that Galera cluster is essentially a ROW-based replication scheme; if there are changes in non-temporary tables, of course they get replicated just fine.  So if an app creates a temporary table, does some calculations, and then inserts the result of that into a regular table, the data of that last bit will get replicated. As it should. In a nutshell, as far as data consistency goes, we’re all fine.

But the fact that we have an event that doesn’t really get replicated creates the “fun” in the “let’s assume the next event is just the previous + 1” logic. This is where the drift comes in. Sigh.

In any case, this issue needs to be fixed by let’s say “being re-implemented”: the MariaDB GTID needs to be propagated through the Galera cluster, so it’s the same on every server, as it should be. Doing anything else is always going to go wrong somewhere, so trying to catch more cases like the above example is not really the correct way to go.

If you are affected by this or related problems, please do vote on the relevant MDEV issues. That is important!  If you need help tracking down problems, feel free to ask.  If you have more information on the matter, please comment too!  I’m sure this and related bugs will be fixed, there are very capable developers at MariaDB Corp and Codership Oy (the Galera company). But the more information we can provide, the better. It often helps with tracking down problems and creating reproducible test cases.

Posted on
Posted on

A reminder for Symantec certificate users

icon security lowUsing Chrome or Chromium?  Browse to PayPal, right-click while on the page, select Inspect, and click on the Console tab. Bit of an exercise, but it’ll let you see the following notice:

The SSL certificate used to load resources from https://www.paypal.com will be distrusted in M70.
Once distrusted, users will be prevented from loading these resources.
See https://g.co/chrome/symantecpkicerts for more information.

Google hasn’t really been hiding this, they have been publicly talking about it since last year.

“Symantec’s PKI business, which operates a series of Certificate Authorities under various brand names, including Thawte, VeriSign, Equifax, GeoTrust, and RapidSSL, had issued numerous certificates that did not comply with the industry-developed CA/Browser Forum Baseline Requirements.”

The M70 release of Chrome will be in beta from September, and GA in October 2018.  Yes, that’s quite soon!

All of this would be fine, if sites and companies would get a move on and deploy new certificates that don’t originate from this problematic source.  One would expect PayPal to have done this months ago, but apparently not.  It’s important to be aware of this, because come October it’ll be much more than a little inconvenience particularly for e-commerce sites: online payments via PayPal will start failing – unless and until PayPal and others update their certificates.

Posted on
Posted on

EFF STARTTLS Everywhere project: safer hops for email

Safe and secure online infrastructure is a broad topic, covering databases, privacy, web applications, and much more, and over the years we’ve specifically addressed many of these issues with information and recommendations.

The Electronic Frontier Foundation (EFF) announced the launch of STARTTLS Everywhere, their initiative to improve the security of the email ecosystem. Thanks to previous EFF efforts like Let’s Encrypt (that we’ve written about earlier on the Open Query blog), and the Certbot tool, as well as help from the major web browsers, there have been significant wins in encrypting the web. Now EFF wants to do for email what they’ve done for web browsing: make it simple and easy for everyone to help ensure their communications aren’t vulnerable to mass surveillance.

STARTTLS is an addition to SMTP, which allows one email server to say to the other, “I want to deliver this email to you over an encrypted communications channel.” The recipient email server can then say “Sure! Let’s negotiate an encrypted communications channel.” The two servers then set up the channel and the email is delivered securely, so that anybody listening in on their traffic only sees encrypted data. In other words, network observers gobbling up worldwide information from Internet backbone access points (like the NSA or other governments) won’t be able to see the contents of messages while they’re in transit, and will need to use more targeted, low-volume methods.

STARTTLS Everywhere provides software that a sysadmin can run on an email server to automatically get a valid certificate from Let’s Encrypt. This software can also configure their email server software so that it uses STARTTLS, and presents the valid certificate to other email servers. Finally, STARTTLS Everywhere includes a “preload list” of email servers that have promised to support STARTTLS, which can help detect downgrade attacks.

The net result: more secure email, and less mass surveillance.

This article is based on the announcement in EFFector, EFF’s newsletter.

Posted on
Posted on

MariaDB 10.3 use case: Hidden PRIMARY KEY column for closed/legacy applications

Database symbolI really like MariaDB 10.3, it has a lot of interesting new features.  Invisible (hidden) columns, for instance.  Here is a practical use case:

You’re dealing with a legacy application, possibly you don’t even have the source code (or modifying is costly), and this app doesn’t have a PRIMARY KEY on each table. Consequences:

  • Even though InnoDB would have an internal hidden ID in this case (you can’t access that column at all), it affects InnoDB’s locking strategy – I don’t think this aspect is explicitly documented, but we’ve seen table-level locks in this scenario where we’d otherwise see more granular locking;
  • Galera cluster really wants PKs;
  • Asynchronous row-based replication can work without PKs, but it is more efficient with;

So, in a nutshell, your choices for performance and scaling are restricted.

On the other hand, you can’t just add a new ID column, because the application may:

  • use SELECT * and not be able to handle seeing the extra column, and/or
  • use INSERT without an explicit list of columns and then the server would chuck an error for the missing column.

The old way

Typically, what we used to do for cases like this is hunt for UNIQUE indexes that can be converted to PK. That’d be a clean operation with no effect on the application. We use a query like the following to help us find out if this is feasible:


SELECT
CONCAT(s.TABLE_SCHEMA,'.',s.TABLE_NAME) AS tblname,
INDEX_NAME AS idxname,
GROUP_CONCAT(s.COLUMN_NAME) AS cols,
GROUP_CONCAT(IF(s.NULLABLE='YES','X','-')) AS nullable
FROM INFORMATION_SCHEMA.STATISTICS s
WHERE s.table_schema NOT IN ('information_schema','performance_schema','mysql')
AND s.NON_UNIQUE=0
GROUP BY tblname,idxname
HAVING nullable LIKE '%X%'
ORDER BY tblname;

The output is like this:


+----------+---------+------+----------+
| tblname  | idxname | cols | nullable |
+----------+---------+------+----------+
| test.foo | a       | a,b  | -,X      |
+----------+---------+------+----------+

We see that table test.foo has a UNIQUE index over columns (a,b), but column b is NULLable.  A PK may not contain any NULLable columns, so it would not be a viable candidate.

All is not yet lost though, we can further check whether the column data actually contains NULLs. If it doesn’t, we could change the column to NOT NULL and thus solve that problem.  But strictly speaking, that’s more risky as we may not know for certain that the application will never use a NULL in that column. So that’s not ideal.

IF all tables without a PK have existing (or possible) UNIQUE indexes without any NULLable columns, we can resolve this issue. But as you can appreciate, that won’t always be the case.

With MariaDB 10.3 and INVISIBLE columns

Now for Plan B (or actually, our new plan A as it’s much nicer).  Let’s take a closer look at the foo table from the above example:


CREATE TABLE foo (
a int(11) NOT NULL,
b int(11) DEFAULT NULL,
UNIQUE KEY uidx (a,b)
) ENGINE=InnoDB

Our new solution:

ALTER TABLE foo ADD COLUMN id INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY INVISIBLE FIRST

So we’re adding a AUTO_INCREMENT new column named ‘id’, technically first in the table, but we also flag it as invisible.

Normally, you won’t notice, see how we can use INSERT without an explicit (a,b) column list:


MariaDB [test]> INSERT INTO  foo VALUES (2,24);
Query OK, 1 row affected (0.007 sec)

MariaDB [test]> SELECT * FROM foo;
+---+------+
| a | b    |
+---+------+
| 1 |   20 |
| 2 |   24 |
+---+------+

So it won’t even show up with SELECT *.  Fabulous.  We can see the column only if we explicitly ask for it:


MariaDB [test]> SELECT id,a,b FROM foo;
+----+---+------+
| id | a |    b |
+----+---+------+
|  1 | 1 |   20 |
|  2 | 2 |   24 |
+----+---+------+

We solved our table structural issue by introducing a clean AUTO_INCREMENT id column covered by a PRIMARY KEY, while the application remains perfectly happy with what it can SELECT and INSERT. Total win!

Posted on
Posted on

Using FastCGI to separate web frontend from application space

server rackFastCGI has many advantages, and it’s our preferred interface when it’s available for the required script language (such as PHP).  However, we generally see environments where the php-fpm processes (for instance) are run on the same system as the web server, even though that’s not necessary.  In FastCGI space, the web server (say nginx) passes a request through a socket or TCP/IP address:port, and the application delivers back an response (web page, web service JSON result, etc).  Obviously sockets only work locally, but the port can be on another machine.

So the question is, how do you arrange your virtual server rack?  While splitting things out like that tends to add a few ms of latency, the advantages tend to justify this approach:

  • Not running PHP on your web-server, and only having connectivity web-server -> application, is good for security:
    • should the web server get compromised, it can still only make application requests, not reconfigure the application server in any way;
    • should the application get compromised, it will not be able to gain control over the web-server as there is no connectivity path in that direction.
  • nginx is exceedingly good at “smart proxy”-style tasks, and that’s essentially already what it’s doing with FastCGI anyway;
    • you could have multiple application servers, rather than just one.  Optionally scaled dynamically as traffic needs change;
    • this effectively turns nginx into a kind of load-balancer for the application server back-ends; the way you specify this is through
      1. a single name that resolves to multiple address which are then used in a round-robin fashion (not necessarily  optimal as some requests may take longer than others), or
      2. a server group as provided by the ngx_http_upstream_module module.
    • nginx can detect when a back-end is unresponsive, and connect to an alternative;
    • thinking back to the fact that some requests take longer than others, that’s usually URL-specific.  That is, it’s relevant for queries in certain areas (such as reporting, or drill-down type searches) of a web interface.  Now, you can tell nginx to use a different FastCGI destination for those, and thus separate (insulate, really) the handling of that application traffic from the rest of the application.
  • This all fits very neatly with containers, should your infrastructure use those or if you’re considering moving towards containers and micro-services.

The possibilities are quite extensive, but naturally the details and available options will depend on your needs and what you already have in place.  We regularly help our clients with such questions. Solutions Architecture, both for evolving existing environments as well as for greenfield projects.

Posted on
Posted on

GitHub acquired by Microsoft

GitHub-MarkMicrosoft has just acquired GitHub for $7.5bn.  Good or bad?

Injected VC capital was $350m, so ROI for the VCs = 21.4x = very happy VCs.

Microsoft has done excellent work on OSS software in recent years, including on the Linux kernel, PHP, and many others.  Just like Oracle continues to put very good effort into MySQL after the Sun Microsystems acquisition many years ago.

But Microsoft is not an Open Source software company. The open source development model is not something they have built into their business “DNA” – processes (actually many companies that only do OSS haven’t got that either). So why GitHub? Combine it with LinkedIn (acquired by Microsoft earlier), and you have developers’ resumes. That’s valuable. It’s a strategically smart move, for Microsoft.

Will GitHub users benefit, and if so, how?

Well, I expect there’ll be more hoovering of “useful” (meta)data by a corporation, which some LinkedIn users will find handy, but I think it’s mainly beneficial to Microsoft rather than users, and this type of gathering and combining data is fundamentally incompatible with basic privacy.  It will bite, at some point down the line.  It always does.

Fundamentally, GitHub and its use is self-contradictory.  Git explicitly enables distributed source code control and truly distributed development, whereas GitHub is very much centralised.  Don’t just walk away to something else now, that won’t address the actual problem.  Solving it properly will include having bug tracking as part of a repository, and by design not relying on a single central location, or company.  The developer community (and companies) must resolve this external dependency.

Posted on