Tag Archives: mysql

Mixing databases usually not optimal

Dan McKinley (Etsy) wrote an [IMHO] insightful article Why MongoDB Never Worked at Etsy.

First off, it’s important to realise that it’s not a snipe at MongoDB – it’s a fine tool.

The lessons are related to mixing multiple databases in a deployment (administration and monitoring overhead) and the acknowledgement that issues of schema design, scalability and maintenance need attention regardless of which brand or technology you pick for your database. That comes back to the old insight that migrations are rarely worth it (regardless of what you migrate to what).

I think these are indeed important considerations as they have a major impact on the ongoing costs of your entire environment (production as well as development and testing) – these days we encounter the “we’re doing this part of our application using MongoDB” approach quite often, so it’s useful to read about and learn from other people’s experience.

With MongoDB there is a particular extra issue to consider, and Dan McKinley also mentions it in his post. NoSQL databases are often also schema-less. However, to keep your data manageable when it grows to significance, you do need to structure it somehow – that is, you need to make sure that (and I’ll just use generic terminology here) in a specific set of records each record contains the required fields. If you don’t, at some point things become unmanageable (or your data ends up as a pile of unusable bits).

Thus, you’re dealing with some form of schema, whether you call it that or not. And you might deal with it in application logic or through some toolkit, rather than in the database itself, but it can’t just be ignored or disregarded. And that’s critical, as often going to a schema-less database is presented as a “then you don’t need to worry about that” change. You do need to “worry” about it: you can pick where the most suitable place is for your needs. If you look at it in that way, you can make an appropriate choice for the particular application at hand.

Luxbet, MariaDB and Melbourne Cup

Yesterday was Melbourne Cup day in Australia – the biggest annual horse race event in the country, and in the state of Victoria it’s even a public holiday.

Open Query does work for Luxbet (part of Tabcorp), and Melbourne Cup day is by far their biggest day of the year in terms of traffic. It’s not just a big spike, there’s orders of magnitude difference so you can really say that the rest of the year is downright quiet (in relative terms). So, a very interesting load pattern.

Since last year Luxbet has upgraded from stock MySQL to MariaDB, and with our input made some other infrastructure modifications including moving to a pure solid state storage (FusionIO) solution as a SAN just won’t deliver the resilience and performance required. This may seem odd, but remember that a) a SAN is also a single point of failure (so when the SAN fails, multiple db servers will be “out” – not desirable even though a failover to another datacenter is possible), and b) MariaDB/XtraDB (InnoDB) already have all recent data and indexes in RAM, so whatever I/O is required won’t benefit from a SAN cache. Thus, the SAN will have to actually do a physical disk seek and read to get what is needed, and we all know seeks are slow. A write or fsync also incurs some latency, regardless of the storage array speed.

So those are the reasons for the local storage solution. While there are aspects of RAID and other redundancy in that setup, the main resilience in the infrastructure comes from having more machines, rather than necessarily having more redundancy in each machine.

Grant is working on a more comprehensive version of this story.

MySQL Connector/Arduino

Chuck Bell, one of my former colleague from MySQL AB, has created a connector for Arduino to MySQL. So this allows Arduino code to be a direct client of a MySQL or MariaDB server, with Ethernet and WiFi shields supported.

With Arduino boards being used more and more, this can come in really handy – not only for retrieving (for instance) centralised configuration data, but also for logging. Useful stuff. Thanks Chuck!

Links

 Introducing MySQL Connector/Arduino 1.0.0 beta

Hint of the day: noatime and relatime in fstab

It’s been written about everywhere, but since we keep spotting installations in the wild where people don’t know about it, it probably deserves another mention.

By default, Linux uses the atime option on a disk mount, which means it writes a timestamp (e.g. a write to the drive) every time it reads anything. So in this case, reads cause writes – and also disk seeks, because a read from a file will then trigger having to write to the directory that contains the file. This even occurs if a file is read from the file system’s page cache (reading from the machine’s memory rather than the drive).

Unless you require an audit trail of users reading files, you generally you don’t want this. Thus, you want to add the noatime option to the disk mount in /etc/fstab. If you have just the defaults in there, you just make it defaults,noatime. It’ll doesn’t necesarily require a reboot as you can use umount/mount, but that gets tricky when dealing with the root filesystem so a reboot is generally easier. Setting these options is one of the first things we do when configuring a server.

Some user applications, such as Mutt (mail reader) do use the read access time. In that case, you can use the relatime option instead, which only writes a timestamp when a file or directory is written to. This is just for completeness of this story, as it’s still sub-optimal for a database server.

If you require read details for auditing (security) of the operating system, make sure all database-related files (database directories, InnoDB log files, binary logs, etc) are on a separate mount where you can use noatime.

Using noatime also makes a lot of sense on a web server, as it does a lot of reads. Remember, the fact that most files are in the filesystem cache doesn’t make a difference. As a general guide, it makes sense to set on most server installations. Quick win.

LEVENSHTEIN MySQL stored function

At Open Query we steer clear of code development for clients. We sometimes advise on code, but as a company we don’t want to be in the programmer role. Naturally we do write scripts and other necessities to do our job.

Assisting with an Open Source project, I encountered three old UDFs. User Defined Functions are native functions that are compiled and then loaded by the server similar to a plugin. As with plugins, compiling can be a pest as it requires some of the server MySQL header files and matching build switches to the server it’s going to be loaded in. Consequentially, binaries cannot be considered safely portable and that means that you don’t really want to have a project rely on UDFs as it can hinder adoption quite severely.

Since MySQL 5.0 we can also use SQL stored functions and procedures. Slower, of course, but functional and portable. By the way, there’s one thing you can do with UDFs that you (at least currently) can’t do with stored functions, and that’s create a new aggregate function (like SUM or COUNT).

The other two functions were very specific to the app, but the one was a basic levenshtein implementation. A quick google showed that there were existing SQL and even MySQL stored function implementations, most derived from a single origin which was actually broken (and the link is now dead, as well). I grabbed one that appeared functional, and reformatted it for readability then cleaned it up a bit as it was doing some things in a convoluted way. Given that the stored function is going to be much slower than a native function anyway, doing things inefficiently inside loops can really hurt.

The result is below. Feel free to use, and if you spot a bug or can improve the code further, please let me know!
Given the speed issue, I’m actually thinking this should perhaps be added as a native function in MariaDB. What do you think?

-- core levenshtein function adapted from
-- function by Jason Rust (http://sushiduy.plesk3.freepgs.com/levenshtein.sql)
-- originally from http://codejanitor.com/wp/2007/02/10/levenshtein-distance-as-a-mysql-stored-function/
-- rewritten by Arjen Lentz for utf8, code/logic cleanup and removing HEX()/UNHEX() in favour of ORD()/CHAR()
-- Levenshtein reference: http://en.wikipedia.org/wiki/Levenshtein_distance

-- Arjen note: because the levenshtein value is encoded in a byte array, distance cannot exceed 255;
-- thus the maximum string length this implementation can handle is also limited to 255 characters.

DELIMITER $$
DROP FUNCTION IF EXISTS LEVENSHTEIN $$
CREATE FUNCTION LEVENSHTEIN(s1 VARCHAR(255) CHARACTER SET utf8, s2 VARCHAR(255) CHARACTER SET utf8)
  RETURNS INT
  DETERMINISTIC
  BEGIN
    DECLARE s1_len, s2_len, i, j, c, c_temp, cost INT;
    DECLARE s1_char CHAR CHARACTER SET utf8;
    -- max strlen=255 for this function
    DECLARE cv0, cv1 VARBINARY(256);

    SET s1_len = CHAR_LENGTH(s1),
        s2_len = CHAR_LENGTH(s2),
        cv1 = 0x00,
        j = 1,
        i = 1,
        c = 0;

    IF (s1 = s2) THEN
      RETURN (0);
    ELSEIF (s1_len = 0) THEN
      RETURN (s2_len);
    ELSEIF (s2_len = 0) THEN
      RETURN (s1_len);
    END IF;

    WHILE (j <= s2_len) DO
      SET cv1 = CONCAT(cv1, CHAR(j)),
          j = j + 1;
    END WHILE;

    WHILE (i <= s1_len) DO
      SET s1_char = SUBSTRING(s1, i, 1),
          c = i,
          cv0 = CHAR(i),
          j = 1;

      WHILE (j <= s2_len) DO
        SET c = c + 1,
            cost = IF(s1_char = SUBSTRING(s2, j, 1), 0, 1);

        SET c_temp = ORD(SUBSTRING(cv1, j, 1)) + cost;
        IF (c > c_temp) THEN
          SET c = c_temp;
        END IF;

        SET c_temp = ORD(SUBSTRING(cv1, j+1, 1)) + 1;
        IF (c > c_temp) THEN
          SET c = c_temp;
        END IF;

        SET cv0 = CONCAT(cv0, CHAR(c)),
            j = j + 1;
      END WHILE;

      SET cv1 = cv0,
          i = i + 1;
    END WHILE;

    RETURN (c);
  END $$

DELIMITER ;

Storage caching options in Linux 3.9 kernel

dm-cache is (albeit still classified “experimental”) is in the just released Linux 3.9 kernel. It deals with generic block devices and uses the device mapper framework. While there have been a few other similar tools flying around, since this one has been adopted into the kernel it looks like this will be the one that you’ll be seeing the most in to the future. It saves sysadmins the hassle of compiling extra stuff for a system.

A typical use is for an SSD to cache a HDD. Similar to a battery backed RAID controller, the objective is to insulate the application from latency caused by the mechanical device, the most laggy part of which is seek time (measured in milliseconds). Giventhe  relatively high storage capacity of an SSD (in the hundreds of GBs), this allows you to mostly disregard the mechanical latency for writes and that’s very useful for database systems such as MariaDB.

That covers writes (for the moment), but what about reads? Can MariaDB benefit from the read-caching? For the MyISAM storage engine, yes (as it relies on filesystem caching for speeding up row data access). For InnoDB, much less so. But let’s explore this, because it’s not quite a yes/no story – it depends. For typical systems with a correctly dimensioned system and InnoDB buffer pool, most of the active dataset will reside in RAM. For a system using a cached RAID controller that means that an actual disk read is not likely to be in the cache. With an SSD cache you might get lucky as it’s bigger – so stuff that has been read or written in some recent past may still be there. What we have found from testing with hdlatency (on actual client/hosting infra) is that SANs typically don’t have enough cache to pull that off – they too may have SSD caches now, but remember they get accessed by many more users with different data needs as well. The result of SSD filesystem caching for reads is actually similar to InnoDB tweaks that implement a secondary buffer pool on SSD storage, it creates a relatively large and cheap space for “lukewarm” pages (ones that haven’t been recently accessed).

So why does it depend? Because your active dataset might be too large, and/or your combined reads/writes are still more than the physical disks can handle. It’s very important to consider the latter: write caching insulates you from the seeks and allows an intermediate layer to re-order writes to optimise the head movement, but the writes still need to be done and thus ultimately you remain bound by an upper end physical limit. Insulation is not complete separation. If your active dataset is larger than RAM+SSD, then the reads also also need to be taken into account for seek capacity.

So right now you could say that at decent prices, if your active dataset is in the range of a few hundred GB to even a few TB, RAM with the optional addition of SSD caching can all work out nicely – what can still make it go sour is the rate of writes. Conclusion: this type of setup provides you with more headroom than a battery backed RAID controller, should you need that.

Separating reporting to distinct database servers (typically slaves, configured for relatively few connections and large queries) actually still helps quite a bit as it really changes what’s in the buffer pool and other caches. Or, differently put, looking at the access patterns of the different parts of your application is important – there are numerous variation on this basic pattern. It’s a form of functional sharding.

You’ll have noticed I didn’t mention any benchmarks when discussing all this (and most other topics). Many if not most benchmarks have artificial aspects to them, which makes them problematic when dealing with the real world. As shown above, applying background knowledge of the systems and structures, logic, and maths gets you a very long way (either independently or in consultation with us). It can get you through important decision processes quicker. Testing can still play an important part, but then it’s either part of or very close to your real world environment, not a lab activity. It will be specific to you. Don’t get trapped having to deliver on numbers from benchmarks.

MariaDB Foundation

You may have already seen the announcement MariaDB Foundation to Safeguard Leading Open Source Database. We at Open Query wholeheartedly support this (r)evolution of the MySQL ecosystem, which appears to be increasingly necessary as Oracle Corp is seriously dropping the ball with security updates and actually just general development and innovation. Oracle has actually done some very good work, I happily acknowledge that – but security issues are critical, having crashing bugs and incorrect query results in a .28 of a GA release is uncool, and not incorporating awesome development efforts by the community is just astonishing.

MariaDB is where the Sphinx fulltext search and OQGraph engines are integrated, the community developed virtual columns and Galera synchronous replication are added, and the fabulous developers at Monty Program have taken the time to fix up the previously ghastly performance of subqueries so that they now rock. And that’s apart from the many little bugs Monty Program has fixed along the way. Awesome work, guys!

MariaDB still merges from stock MySQL, but it’s important to not be dependent on that – the MariaDB Foundation can help support that already existing effort and raise the profile of MariaDB to – as far as I’m concerned – move on from MySQL.

From our end, apart from our continued active involvement in the community, Open Query will be adjusting and augmenting its fee structure so that our clients co-fund MariaDB Foundation and contributing companies such as Codership (Galera). MySQL and MariaDB are mission critical for businesses. It’s not (and never has been) a case of volunteers and mere charity. We all p(l)ay our part.

It’s time to upgrade.

MariaDB security updates

Important Security Fix for a Buffer Overflow Bug: MariaDB 5.5.28a, 5.3.11, 5.2.13 and 5.1.66 include a fix for CVE-2012-5579, a vulnerability that allowed an authenticated user to crash MariaDB server or to execute arbitrary code with the privileges of the mysqld process. This is a serious security issue. We recommend upgrading from older versions as soon as possible.

MariaDB 5.5.28a, 5.3.11, 5.2.13 and 5.1.66 (GA) binaries, packages, and source tarballs are now available for download from http://downloads.mariadb.org. So you can upgrade within your own major series.

Note that while this fix has just been published, some other vulnerabilities have been noted over the weekend also. Below a summary of these other CVEs as documented by Red Hat Security Response Team, with annotations by Sergei Gulubchik who is the Security Coordinator for MariaDB.

See http://seclists.org/oss-sec/2012/q4/388 for Sergei’s full response.

Note that stock MySQL is also affected – in this post we’re just referring to the specific MariaDB fixes/releases/responses. It appears that Oracle has not yet made any releases for this security issue, which is unfortunate as the issues have been published and can therefore be easily exploited by malicious users. In the same thread referenced above it is stated that Oracle has been made aware of the issues so a fix should be forthcoming for people who use stock MySQL also.

Naturally these security advisories also affect anyone still running a 5.0 OurDelta or early 5.1 OurDelta version. Please upgrade urgently to the latest MariaDB 5.1 (5.1.66) or above. If you require any assistance with this, please contact Open Query. This advisory is also noted on the front page of ourdelta.org.

MariaDB C client libraries and the end of dual-licensing

Finally there is an LGPL C client library for MariaDB, and thus also for MySQL. Monty Program and SkySQL have been working on this for some time. Admittedly there was already the BSD licensed Drizzle client library which was also able to talk to a MySQL/MariaDB server, however its API is different. The C client library for MariaDB has exactly the same API existing applications are used to, so you can just re-link and keep going! There is also a new LGPL Java client library for MariaDB.

In case you don’t quite realise: this is actually a major thing.

At MySQL AB, the client library was made GPL and this flowed through to Sun Microsystems and then Oracle Corp. This licensing choice for the client library was the basis of the infamous “dual licensing” model: if you developed a proprietary application to distribute (not just internal or used in a website), you could obviously not just link against the GPL client library was its license explicitly states that whomever receives that code is entitled to ask for the source code. Some describe that as the “viral nature” of GPL – if you want to use such licensed code it prescribes what kind of license you can use for the rest of the code it’s linked to – but in any case that’s how the license operates, following the objective of GPL to promote free as in freedom (for recipients of the programs).

So the way this worked in practice is that MySQL Sales could sell you a non-GPL license for this code, thus enabling proprietary use. It’s legal leverage and not intrinsically bad, but IMHO the way the sales people played this was. Using FUD (fear-uncertainty-doubt) and sometimes misdirection they coaxed clients in to costly licensing arrangements, including in cases where really the GPL license would have sufficed. While this behaviour was by no means corporate policy, the sales people got a lot of leeway – and being paid on commission, these were the consequences.

When the MySQL Network subscription offering (now called MySQL Enterprise) for support was introduced around 2004, the intent was that this new revenue stream would make the dual licensing irrelevant. But that’s not what happened (hindsight gives insight). Since selling non-GPL licenses was a major revenue stream for the sales people, they kept on pushing it. Clayton Christensen and others at Harvard Business School observed years ago that you can’t disrupt yourself, and this might be an interesting example. While the subscription model did take off nicely, the licensing sales were still making lots of money, and thus it did not diminish in importance. Note that this sales model exists until today, see Commercial License for OEMs, ISVs and VARs (Oracle Corp) .

The MySQL Enterprise subscription (which is still sold by Oracle Corp) has had other consequences also. The tools that clients get via that subscription are nice, they provide information on how the server is running and how queries are handled. But it does so at a cost: for some tasks it needs a proxy (sits inbetween app and db server) which of course adds overhead. One simple example is the case where you want to know whether a particular query uses a tmp table or a sort operation, and whether those operations happened in memory or on disk. But the server already knows, so why is a proxy required? Because if the code were in the server, everybody would have it and not just the subscribers. That’s a valid business decision of course, but as you see it has significant technical consequences and someone’s business model should not concern us. In the MySQL 5.0 OurDelta enhanced builds we already applied a patch (an amalgamation of work by several people) to enhance the slow query log, Monty ported it for 5.1 and it’s now a standard part of MariaDB.

For everybody’s sake it would have been great if the dual licensing model had gone the way of the dodo. It had its time (until around 2004-2005, arguably), after which it became a hindrance, actually hurting the MySQL ecosystem – and that situation pretty much persisted until now. That’s why this replacement library is so important, it makes that whole case obsolete. I say horay, and I hope distributions will soon use this library (as apart from anything else, the licensing for it is clear rather than messy).

To summarise, as talks about licensing often confuse (particularly with the amount of FUD and old info around on the net):

  1. GPL v2 only “kicks in” when software is distributed (beyond the boundary of your organisation).
  2. Internal use (which includes the code on running a website) is not distribution.
  3. Whether or not you sell something is irrelevant for GPL: it’s purely about distribution of code, not whether money is involved.
  4. Using MySQL or MariaDB Server is almost always fine under GPL, since you’re not linking against anything else.
  5. The old MySQL client library is GPL, so if you link your app against it you’re subject to GPL, that is, the rest of the app would need to be licensed under something compatible with GPL (typically, GPL itself). Again, this is only relevant if you distribute that app.
  6. The new MariaDB client library is LGPL v2.1, so you are allowed to link it with proprietary application code and then distribute. The recipients must be able to re-link the proprietary code with another version (modified or not) of the client library (easily satisfied if you link dynamically rather than statically).
  7. If someone tells you that use of a wire protocol constitutes linking and that therefore a proprietary app is in essence linked to MySQL/MariaDB Server (which are GPL licensed), this can easily be disputed using the following example (there are many): Microsoft Internet Explorer often talks to an Apache web server using the HTTP protocol, but noone would suggest that these two codebases are in any way linked or co-dependent in a way that would be relevant to their licensing.
  8. While I was one of the people at MySQL AB who dealt with licensing questions and I strongly encourage you to reject FUD (the above info given is no different from what I used to tell while employed by MySQL), I am not a lawyer. Get legal advice if/when you need it, and get it from someone with a serious clue about GPL. Check their public credentials, not just advertising claims.

Also see:

On a side-note, perhaps someone can now link the LibreOffice Base code and the native MySQL SDBC driver (written by Georg Richter years ago) with the MariaDB client library, as they’re all LGPL. This would resolve a long-standing issue that Sun Microsystems (when they owned both OpenOffice and MySQL) sadly did not fix. See OpenOffice.org, MySQL… aren’t they both owned by Sun? And there may be other software projects that can now be “fixed up” in a similar way. If you know of any, please let me know!

Serving Clients Rather than Falling Over

Dawnstar Australis (yes, nickname – but I know him personally – he speaks with knowledge and authority) updates on The Real Victims Of The Click Frenzy Fail: The Australian Consumer after his earlier post from a few months ago.

Colourful language aside, I believe he rightfully points out the failings of the organising company and the big Australian retailers. From the Open Query perspective we can just review the situation where sites fall over under load. Contrary to what they say, that’s not a cool indication of popularity. Let’s compare with the real world:

  1. Brick & Mortar store does something that turns out popular and we see a huge queue outside, people need to wait for hours. The people in the queue can chat, and overall the situation can be regarded as positive: it shows passers-by that there’s something special going on, and that’s cool. If you don’t want to be in the crowd, you’ll come back later.
  2. Website is unresponsive/inaccessible. There’s nothing cool or positive about this, as the cause is not only unknown, but in fact irrelevant in the context. Each potential client is on their own. Things fail, so they go elsewhere (if there are substitutes) or potentially away completely (concert, it’ll sell out). The bad taste sticks, so if there are alternatives they will not only move there, but be quite vocal about it so others move also.

So you see, you really don’t want your site to go down because of popularity, or for any other reason. Slashdot years ago created a “degrade gracefully” mechanism, where parts of the site would go static. So where normally users would be able to comment and rate posts, they’d just be able to read. In the worst case, only the front page would remain active. On Sept 11 2001, Slashdot was one of the few big sites that actually remained accessible and provided regular news that people could then read even though the topic was not really in its normal scope. The point is, they proved the approach multiple times.

Contrarily, companies like Ticketek have surely got Enterprise Design architecture, however their site has been seen to fall over with events such as The Wiggles. They might be able to get away with this since they’re essentially a monopoly provider: if you want a ticket for this particular event, you need to go to them. But it’s not good. Generally they acted surprised, even though the huge load was entirely predictable. Is that just naive, or a hope to mislead the public, or negligent? You decide.

It’s really a failure in design of sorts. As to where exactly, only an architectural review would show, and it’ll be different for different sites. However, the real lesson is that it’s not about “Enterprise Design” at all, nor about using any particular high-profile hosting provider or involvement of other buzzwords. It’s about proper architecture and deployment and the database is only one aspects of this. It doesn’t have to end up particularly expensive either, it just has to be done right and there’s no single magical approach – each case is unique. Looking at this is best done early on (it tends to also work our better and cheaper), but we’ve helped clients out at much later stages also.  Ideally, we do like to help before there’s a raging fire.