Posted on

Motivation to Migrate RDBMS

http://www.itnews.com/article/3004953/use-oracles-database-watch-out-for-this-dec-1-deadline.html

Companies that use a standard edition of Oracle’s database software should be aware that a rapidly approaching deadline could mean increased licensing costs.

Speaking from experience (at both MySQL AB and Open Query), typically, licensing/pricing changes such as these act as a motivator for migrations.

Migrations are a nuisance (doesn’t matter from/to what platform) and are best avoided as they’re intrinsically painful, costly and time-consuming. Smart companies know this.

When asked in generic terms, we generally recommend against migrations (even to MySQL/MariaDB) for the above-mentioned practical and business reasons. There are also technical reasons. I’ll list a few:

  • application, query and schema design tends to be most tuned to a particular RDBMS, usually the one the main developer(s) are familiar with. Features are used in a certain way, and the original target platform (even if non deliberate) is likely to execute most efficiently;
  • RDBMS choice drives hardware/network architecture. A migration should also include a re-think of this, to make optimal use of the database platform;
  • it’s quite rare (but not unheard of!) for an application to perform better on another platform, without putting a lot of extra work in. If extra work is on the table, then the original DB platform should also be considered as a valid option;
  • related to other points: a desire to migrate might be based on employees’ expertise with a particular platform rather than this particular application’s intrinsic suitability to that platform. While that can be a valid reason, it should be recognised as the actual reason as there are obviously cost/effort implications in terms of migration cost and other options such as training can be considered.
Nevertheless, a company that’s really annoyed by a vendor’s attitude can opt for the migration route, as they may decide it’s the path of less pain (and lower cost) in the long(er) term.

We do occasionally guide and assist with migrations, if after review it looks like a viable and sensible direction to take.

Posted on
Posted on 5 Comments

Slow Query Log Rotation

Some time ago, Peter Boros at Percona wrote this post: Rotating MySQL slow logs safely. It contains good info, such as that one should use the rename method for rotation (rather than copytruncate), and then connect to mysqld and issue a FLUSH LOGS (rather than send a SIGHUP signal).

So far so good. What I do not agree with is the additional construct to prevent slow queries from being written during log rotation. The author’s rationale is that if too many items get written while the rotation is in process, this can block threads. I understand this, but let’s review what actually happens.

Indeed, if one were to do lots of writes to the slow query log in a short space of time, a write could block while waiting.

Is the risk of this occurring greater during a logrotate operation? I doubt it. A FLUSH LOGS has to close and open the file. While there is no file open, no writes can occur anyhow and they may be stored in the internal buffer of the lowlevel MySQL code for this.

In any case, if there is such a high write rate, that is an issue in itself: it is not useful to have the slow query log write that fast. Instead, you’d up the long_query_time and min_examined_rows variables to reduce the effectively “flow rate”. It’s always best to resolve an underlying issue rather than its symptom(s).

Posted on 5 Comments
Posted on

Hint of the day: Warning level in Error Log to see Aborted Connections

log_warnings = 2

Yields useful information in the MariaDB or MySQL error log file (or syslog on Debian/Ubuntu) you don’t want to miss out on.

You will know about aborted connections, which are otherwise only visible through global status as Aborted_connects (lost connection before they completed authentication) and Aborted_clients (cut fully authenticated connection).

It looks like

130523 2:14:05 [Warning] Aborted connection 173629 to db: 'unconnected' user: 'someapp' host: '10.2.0.50' (Unknown error)

You will know when, where from, and if for instance a wrong password was used you’ll see the username. Basically you’ll get as much info as the server has available at that point. Useful.

 

Posted on
Posted on

InnoDB without PRIMARY KEY

Having an InnoDB table without a PRIMARY KEY is not good. Many have known this for years, but exact opinions as to why have differed. From observation, it was clear to me that it impacted performance.

InnoDB stores its row data in the leaf nodes of the primary key B+tree structure, that means that it can’t work without… so if you don’t specify a PK, it makes one up. Seems pretty innocuous and shouldn’t actually perform any worse than an auto-inc field. Except that in reality the performance can be much much worse. Annoying. Naturally we recommend clients to always have a PK (auto-inc, a composite of foreign keys, or if need be a natural key) but production systems cannot always be quickly changed, depending on the app code adding a column is not something you can just do at the DBA level.

Recently my good friend and former colleague Jeremy Cole, who has been delving into the depths of InnoDB, asked me if I had any open questions on the topic. So I mentioned the above, and after a brief look at the relevant code caught he was definitely interested in exploring the issue. The result is this blog post: http://blog.jcole.us/2013/05/02/how-does-innodb-behave-without-a-primary-key/ for your enlightenment and enjoyment.

Now we know what goes on internally. And it’s clear that performance is negatively affected, and why. Useful.

Thanks Jeremy!

Posted on
Posted on

Query pattern: OR across different tables

When a query uses a construct like

SELECT ... FROM a JOIN b ON (...) WHERE a.c1 = X OR b.c2 = Y

execution will inevitably degrade as the dataset grows.

The optimiser can choose to use an index merge when dealing with two relevant indexes over a single table, but that’s obviously of no use in this scenario as the optimiser has to choose which table to access first. And regardless of which table is accessed first, the other one might yield a result. Thus the query will never be efficient.

The real answer is that the query construct is wrong, a JOIN is used inappropriately. The correct approach for this type of query is using a UNION:

SELECT ... FROM a WHERE a.c1 = X
UNION [ALL]
SELECT ... FROM b WHERE b.c2 = Y

This mistake occurs relatively often, because while on the one hand people try to reduce the number of queries necessary to achieve their objective, they are (somewhat) familiar with JOINs and either don’t know about UNION or use it so rarely that they don’t think of it when the question calls for its use. So, this is your reminder 😉

Posted on
Posted on

Serving Clients Rather than Falling Over

Dawnstar Australis (yes, nickname – but I know him personally – he speaks with knowledge and authority) updates on The Real Victims Of The Click Frenzy Fail: The Australian Consumer after his earlier post from a few months ago.

Colourful language aside, I believe he rightfully points out the failings of the organising company and the big Australian retailers. From the Open Query perspective we can just review the situation where sites fall over under load. Contrary to what they say, that’s not a cool indication of popularity. Let’s compare with the real world:

  1. Brick & Mortar store does something that turns out popular and we see a huge queue outside, people need to wait for hours. The people in the queue can chat, and overall the situation can be regarded as positive: it shows passers-by that there’s something special going on, and that’s cool. If you don’t want to be in the crowd, you’ll come back later.
  2. Website is unresponsive/inaccessible. There’s nothing cool or positive about this, as the cause is not only unknown, but in fact irrelevant in the context. Each potential client is on their own. Things fail, so they go elsewhere (if there are substitutes) or potentially away completely (concert, it’ll sell out). The bad taste sticks, so if there are alternatives they will not only move there, but be quite vocal about it so others move also.

So you see, you really don’t want your site to go down because of popularity, or for any other reason. Slashdot years ago created a “degrade gracefully” mechanism, where parts of the site would go static. So where normally users would be able to comment and rate posts, they’d just be able to read. In the worst case, only the front page would remain active. On Sept 11 2001, Slashdot was one of the few big sites that actually remained accessible and provided regular news that people could then read even though the topic was not really in its normal scope. The point is, they proved the approach multiple times.

Contrarily, companies like Ticketek have surely got Enterprise Design architecture, however their site has been seen to fall over with events such as The Wiggles. They might be able to get away with this since they’re essentially a monopoly provider: if you want a ticket for this particular event, you need to go to them. But it’s not good. Generally they acted surprised, even though the huge load was entirely predictable. Is that just naive, or a hope to mislead the public, or negligent? You decide.

It’s really a failure in design of sorts. As to where exactly, only an architectural review would show, and it’ll be different for different sites. However, the real lesson is that it’s not about “Enterprise Design” at all, nor about using any particular high-profile hosting provider or involvement of other buzzwords. It’s about proper architecture and deployment and the database is only one aspects of this. It doesn’t have to end up particularly expensive either, it just has to be done right and there’s no single magical approach – each case is unique. Looking at this is best done early on (it tends to also work our better and cheaper), but we’ve helped clients out at much later stages also.  Ideally, we do like to help before there’s a raging fire.

Posted on
Posted on 8 Comments

One-way Password Crypting Flaws

I was talking with a client and the topic of password crypting came up. From my background as a C coder, I have a few criteria to regard a mechanism to be safe. In this case we’ll just discuss things from the perspective of secure storage, and validation in an application.

  1. use a digital fingerprint algorithm, not a hash or CRC. A hash is by nature lossy (generates evenly distributed duplicates) and a CRC is intended to identify bit errors in transmitted data, not compare potentially different data.
  2. Store/use all of the fingerprint, not just part (otherwise it’s lossy again).
  3. SHA1 and its siblings are not ideal for this purpose, but ok. MD5 and that family of “message digests” has been proven flawed long ago, they can be “freaked” to create a desired outcome. Thus, it is possible to manufacture a source string that generates an MD5 of course.
  4. Add a salt of reasonable length (extra string added to password), otherwise dictionary attacks are way to easy. In addition, not using a salt means that two users who have the same password end up with the same encrypted password which is another case of “too much info” for people. Salt should of course be different for each user. Iterate.
  5. Even if someone were to capture your user/pwd table, they should not be able to decode the passwords within a reasonable amount of time. Flaws in any of the above issues can make such attacks easy.

The below code, used in a variety of ecommerce packages (osCommerce prior to v2.3.0, ZenCartCRE Loaded / Loaded Commerce, and other derivatives and descendants of oscommerce) on the surface appears to do something quite smart. Note that this code does not use an external salt (such as the username or other separate field) but instead generates it and adds it to the encrypted password. This enables it to be used in applications where no username or other login constant other than the password is available, although I’d consider that quite rare.

function validateAdminPassword($plain, $encrypted) {
  if (!$plain && !$encrypted) {
    return false;
  }

  $stack = explode(':', $encrypted);
  if (sizeof($stack) != 2)
    return false;

  if (md5($stack[1] . $plain) == $stack[0]) {
    return true;
  }

  return false;
}

function encryptAdminPassword($plain) {
  $password = '';

  for ($i=0; $i<10; $i++) {
    $password .= rand();
  }

  // arjen comment: so the 2 is what you need to increase,
  // as well as the length of the relevant database column.
  $salt = substr(md5($password), 0, 2);

  $password = md5($salt . $plain) . ':' . $salt;

  return $password;
}

This code is flawed. Apart from being confusing (using the $password variable name when calculating the salt) the main problem is that the salt ends up too short. The code generates 10 pseudo-random characters (PHP tends to initialise the random generator from time, so it can be somewhat predictable which is a potential attack vector – for instance when the creation time of the user record is also stored) but then it’s run through MD5() after which only the two first characters of the resulting message digest are used for the actual salt. Since the MD5 comes out as hex digits, the range of each of the two characters is [0-9a-f] and so the total number of possibilities for the salt string is 256. That’s not a lot!

The effort involved in pre-calculating the MD5s (including all salt permutations) is not that high, it’s merely 256 times the size of the dictionary used. Wouldn’t take that much disk space. Since this code is used by lots of sites, the potential for a successful attack is rather high in that sense also. Combined with the lack of iteration, this just makes an attack all too easy.

Finally, if the user table were captured from a site with a large number of users, the chance of finding colliding encrypted passwords is quite a bit higher than it should be. But the above mentioned approach already has sufficient potential for a damaging security breach.

If this code is active on your site, a quick patch would be to increase the length of the salt by changing the substr() call, and make it do iterations. Obviously you’ll also need to similarly increase the length of the password storage column in your database. You then get old and new crypted passwords in your table and you can work out which version by checking the length of the crypted password string. On login you can replace old for new for each user as you’ll have their plain password at that point (since they just filled it in and sent it to your app). That way you can create a clean transition. I grant you it’s not perfect but it’s at least improving an otherwise very insecure situation.

If, rather than pragmatically fixing up an existing environment that you didn’t write, you want to do all this properly, read Password Storage Cheat Sheet from OWASP (the Open Web Application Security Platform). It lists a range of considerations (and reasoning) beyond my basic pragmatic list. If you’re going to code from scratch, please do it right.

Edit: 20120717: added maximum affected osCommerce version thanks to Harald Ponce de Leon. osCommerce as of v2.3.0 uses phpass like WordPress

Posted on 8 Comments
Posted on 21 Comments

What a Hosting Provider did Today

I found Dennis the Menace, he now has a job as system administrator for a hosting company. Scenario: client has a problem with a server becoming unavailable (cause unknown) and has it restarted. MySQL had some page corruption in the InnoDB tablespace.

The hosting provider, being really helpful, goes in as root and first deletes ib_logfile* then ib* in /var/lib/mysql. He later says “I am sorry if I deleted it. I thought I deleted the log only. Sorry again.”  Now this may appear nice, but people who know what they’re doing with MySQL will realise that deleting the iblogfiles actually destroys data also. MySQL of course screams loudly that while it has FRM files it can’t find the tables. No kidding!

Then, while he’s been told to not touch anything any more, and I’m trying to see if I can recover the deleted files on ext3 filesystem (yes there are tools for that), he goes in again and puts an ibdata1 file back. No, not the logfiles – but he had those somewhere else too. The files get restored and turn out to be two months old (no info on how they were made in the first place but that’s minor detail in this grand scheme). All the extra write activity on the partition would’ve also made potential deleted file recovery more difficult or impossible.

This story will still get a “happy” ending, using a recent mysqldump to load a new server at a different hosting provider. Really – some helpfulness is not what you want. Secondary lesson: pick your hosting provider with care. Feel free to ask us for recommendations as we know some excellent providers and have encountered plenty of poor ones.

Posted on 21 Comments
Posted on

When Clever Goes Wrong & How Etsy Overcame – Arstechnica

In 2007, Etsy made a big bet on homegrown middleware to help with the site’s scalability. A half-year after it was taken live, the company decided to abandon it. As a senior software engineer at Etsy put it, “if you’re doing something ‘clever,” you’re probably doing it wrong.”

Read the full article at Arstechnica.com

I want to focus on the important lessons from this article, about middleware and using stored procedures in this fashion for a public web application, creating unscalable design complexity (smart and “proper” according to the old enterprise design teachings…) – causing infrastructure, development and maintenance hassles.

In the process they did replace PostgreSQL with MySQL but that’s not the critical change that made all the difference. PostgreSQL is a fine database system also.

Posted on
Posted on

On Password Strength

XKCD (as usual) makes a very good point – this time about password strength, and I reckon it’s something app developers need to consider urgently. Geeks can debate the exact amount of entropy, but that’s not really the issue: insisting on mixed upper/lower and/or non-alpha and/or numerical components to a user password does not really improve security, and definitely makes life more difficult for users.

So basically, the functions that do a “is this a strong password” should seriously reconsider their approach, particularly if they’re used to have the app decide whether to accept the password as “good enough” at all.

Update: Jeff Preshing has written an xkcd password generator. Users probably should choose their own four words, but it’s a nice example and a similar method could be used by an app to give “password suggestions” that are still safe.

Posted on