Posted on 8 Comments

One-way Password Crypting Flaws

I was talking with a client and the topic of password crypting came up. From my background as a C coder, I have a few criteria to regard a mechanism to be safe. In this case we’ll just discuss things from the perspective of secure storage, and validation in an application.

  1. use a digital fingerprint algorithm, not a hash or CRC. A hash is by nature lossy (generates evenly distributed duplicates) and a CRC is intended to identify bit errors in transmitted data, not compare potentially different data.
  2. Store/use all of the fingerprint, not just part (otherwise it’s lossy again).
  3. SHA1 and its siblings are not ideal for this purpose, but ok. MD5 and that family of “message digests” has been proven flawed long ago, they can be “freaked” to create a desired outcome. Thus, it is possible to manufacture a source string that generates an MD5 of course.
  4. Add a salt of reasonable length (extra string added to password), otherwise dictionary attacks are way to easy. In addition, not using a salt means that two users who have the same password end up with the same encrypted password which is another case of “too much info” for people. Salt should of course be different for each user. Iterate.
  5. Even if someone were to capture your user/pwd table, they should not be able to decode the passwords within a reasonable amount of time. Flaws in any of the above issues can make such attacks easy.

The below code, used in a variety of ecommerce packages (osCommerce prior to v2.3.0, ZenCartCRE Loaded / Loaded Commerce, and other derivatives and descendants of oscommerce) on the surface appears to do something quite smart. Note that this code does not use an external salt (such as the username or other separate field) but instead generates it and adds it to the encrypted password. This enables it to be used in applications where no username or other login constant other than the password is available, although I’d consider that quite rare.

function validateAdminPassword($plain, $encrypted) {
  if (!$plain && !$encrypted) {
    return false;
  }

  $stack = explode(':', $encrypted);
  if (sizeof($stack) != 2)
    return false;

  if (md5($stack[1] . $plain) == $stack[0]) {
    return true;
  }

  return false;
}

function encryptAdminPassword($plain) {
  $password = '';

  for ($i=0; $i<10; $i++) {
    $password .= rand();
  }

  // arjen comment: so the 2 is what you need to increase,
  // as well as the length of the relevant database column.
  $salt = substr(md5($password), 0, 2);

  $password = md5($salt . $plain) . ':' . $salt;

  return $password;
}

This code is flawed. Apart from being confusing (using the $password variable name when calculating the salt) the main problem is that the salt ends up too short. The code generates 10 pseudo-random characters (PHP tends to initialise the random generator from time, so it can be somewhat predictable which is a potential attack vector – for instance when the creation time of the user record is also stored) but then it’s run through MD5() after which only the two first characters of the resulting message digest are used for the actual salt. Since the MD5 comes out as hex digits, the range of each of the two characters is [0-9a-f] and so the total number of possibilities for the salt string is 256. That’s not a lot!

The effort involved in pre-calculating the MD5s (including all salt permutations) is not that high, it’s merely 256 times the size of the dictionary used. Wouldn’t take that much disk space. Since this code is used by lots of sites, the potential for a successful attack is rather high in that sense also. Combined with the lack of iteration, this just makes an attack all too easy.

Finally, if the user table were captured from a site with a large number of users, the chance of finding colliding encrypted passwords is quite a bit higher than it should be. But the above mentioned approach already has sufficient potential for a damaging security breach.

If this code is active on your site, a quick patch would be to increase the length of the salt by changing the substr() call, and make it do iterations. Obviously you’ll also need to similarly increase the length of the password storage column in your database. You then get old and new crypted passwords in your table and you can work out which version by checking the length of the crypted password string. On login you can replace old for new for each user as you’ll have their plain password at that point (since they just filled it in and sent it to your app). That way you can create a clean transition. I grant you it’s not perfect but it’s at least improving an otherwise very insecure situation.

If, rather than pragmatically fixing up an existing environment that you didn’t write, you want to do all this properly, read Password Storage Cheat Sheet from OWASP (the Open Web Application Security Platform). It lists a range of considerations (and reasoning) beyond my basic pragmatic list. If you’re going to code from scratch, please do it right.

Edit: 20120717: added maximum affected osCommerce version thanks to Harald Ponce de Leon. osCommerce as of v2.3.0 uses phpass like WordPress

Posted on 8 Comments
Posted on 1 Comment

Understanding SHOW VARIABLES: DISABLED and NO values

When you use SHOW VARIABLES LIKE “have_%” to see whether a particular feature is enabled, you will note the value of NO for some, and DISABLED for others. These values are not intrinsically clear for the casual onlooker, and often cause confusion. Typically, this happens with SSL and InnoDB. So, here is a quick clarification!

  • NO means that the feature was not enabled (or was actively disabled) in the build. This means the code and any required libraries are not present in the binary.
  • DISABLED means that the feature is built in and capable of working in the binary, but is disabled due to relevant my.cnf settings.
  • YES means the feature is available, and configured in my.cnf.

SSL tends to show up as DISABLED, until you configure the appropriate settings to use it in my.cnf (SHOW VARIABLES LIKE “ssl_%”). From then on it will show up as YES.

Depending on your MySQL version and distro build, InnoDB can be disabled via the “skip-innodb” option. Obviously that’s not recommended as InnoDB should generally be your primary engine of choice!

However, InnoDB can also show up as DISABLED if the plugin fails to load due to configuration or other errors on startup. When this happens, review the error log (often redirected to syslog/messages) to identify the problem.

If InnoDB is configured as the default storage engine, failed initialisation of the plugin should now result in mysqld not starting, rather than starting with InnoDB disabled, as obviously InnoDB is required in that case.

Posted on 1 Comment
Posted on 2 Comments

The 2012 Leap Second on Linux

Sheeri K. Cabral at the Mozilla Foundation wrote about an issue with the June 30th 2012 leap second affecting at least MySQL, Java and Minecraft servers. It now appears that the underlying cause is a Linux kernel bug, as noted by John Stultz (IBM) on the Linux Kernel mailing list, and the team Sheeri is part of deserves due credit for doing awesome pattern recognition and being the first to bring it to public attention, enabling people to quickly correlate their own experience with that of others and finding a practical solution as well as helping figure out the cause.

Sheeri’s original post MySQL and the Leap Second, High CPU and the Fix describes how MySQL servers would suddenly exhibit high CPU usage during a period of low load. From her analysis this happened from the exact time that in UTC the date would go from June 30th to July 1st, and it so happens that this year a leap second (23:59:60) is inserted.

A quick fix is

$ sudo date -s "`date`"

Obviously a system reboot works as well, but that’s rather crude. Some sysadmins roll out some form of quickfix to their servers via Puppet.

It’s important to note that merely restarting MySQL Server (or another affected service) does not resolve the problem – not surprising, since they’re all victims of the problem rather than the cause. There is a MySQL bug report for it, with the kernel list reference as its last comment.

(post updated with Sheeri’s feedback – see comment below)

Update 2012-07-04

Several Heise Online articles provide additional information on the issue.

The kernel bug means that the [high resolution timer] code fails to set the system time when the leap second is added. The result is that the hrtimer representation of the time taken from the kernel is a second ahead of the system time. If an application then calls a kernel function with a timeout of less than a second, the kernel assumes that the timeout has elapsed immediately after setting the timer, and so returns to the program code immediately. In the event of a timeout, many programs simply repeat the requested operation and immediately set a new timer. This results in an endless loop, leading to 100% CPU utilisation.

Other tidbits:

  • The issue is not related to the 2009 leap second problem, so it’s not a regression.
  • A number of kernel developers had been performing testing in recent months to see whether the 2012 leap second insertion was likely to cause problems, finding and fixing several bugs in the process.
  • The problem appears to affect all kernel versions from 2.6.26 up to and including 3.3.Google’s way of handling leap seconds by inserting fractions of the second during the day prior to the event is interesting, their method completely avoids the leap second insert. Since leap seconds (and days) always require special handling in software, code that is only required on those instances, it makes sense to avoid them altogether if that’s possible. Obviously the Google method cannot be applied to leap days, but the issues with those are of a different nature to leap second insertion. See Time, technology and leaping seconds
  • The report from the Hetzner hosting service about the issue causing a 1MW spike in electricity usage deserves consideration. With the proliferation of servers, desktop computers and embedded devices such as wireless routers, time-based bugs have the potential to cause major disruption, in this case to an electricity grid. If systems controlling the environment (like the grid) are affected also, the consequences can be even more significant.

From Open Query’s own explorations (this includes some conjecture):

  • From our own client realm it appears that many Red Hat and CentOS systems were not affected, whereas those running Debian or Ubuntu kernels were. Since distros roll their own kernels with numerous patches, this is entirely possible. As a software developer knows, even a patch serving a different purpose could somehow affect the timer behaviour, thus avoiding the problem. There’s also the real possibility that it’s a (partial) correlation not a causality.
  • Some people don’t run the NTP service. That’s not something I wouldn’t really like to recommend, as having a proper system time definitely prevents more issues than it causes, but in this particular case it may have “saved” some systems from experiencing the issue.
  • The NTP service has many settings, some of which can also affect the behaviour for this case.

In a nutshell… the real world is complex and an event involves a combination of different factors resulting in a certain behaviour. While it’s sometimes easy to identify a cause for a particular environment (one client, in our case), getting a complete picture across more clients is more than a tad harder. If you simply put the information from different clients together, the evidence can appear to be rather contradictory.

Posted on 2 Comments
Posted on

Server Ownership Legalities

As I reported via Twitter late last week, we encountered an issue that got some of our mail delivery delayed by about a day and a half. I’ll explain more about what happened as I believe in openness on these matters, and also the experience has educational content for others.

Our mail server doesn’t have direct external interaction, it’s shielded by two relays that handle both the inbound MX and the outbound queue. This setup works remarkably well in terms of exposure to spam and other malicious activity. As previously discussed, it appears that it’s more difficult to make mail server infra more resilient without expending lots more time/effort and infrastructure expenditure. Just because of the way the common tools for mail delivery and imap are built, having two or more of each in a semi-active setup gets quite complex. Complexity is in itself a risk so it has to be considered in relation to the costs and risks of the alternatives.

When our mail server becomes unavailable, incoming mail is queued, and we have backups so no mail is actually lost. The cost is the time and effort involved in getting a full replacement server up and running from a backup. That can be optimised/prepared to a point, but mail is still a lot more data than most other web infrastructure so shuffling that data around just takes a while. Some outbound queues from our online services (for instance our client services system Redmine) goes straight to the relays so there is less impact there. Apart from backups elsewhere, have redundancy for the mailserver: an identical instance on a server in the same DC (those servers are our own).

So what happened last week? Our servers resided in a rack which was leased from the DC by another company through which we “sublet” the rack space, connection and bandwidth. This is a common scenario, as small businesses don’t generally need a full rack and datacentres prefer dealing with fewer/bigger clients and set their pricing accordingly. The intermediate company became unavailable which put our servers in a temporary legal limbo. The DC only gives access to the primary lessor of the rack, so us asking for access to move our servers wasn’t straightforward. Of course we had documentation to back up our assertion as to which equipment was ours, but as you can imagine that legal avenue takes longer to resolve – fortunately the owner of the intermediate company communicated well with the operations manager at the DC and that’s how we were able to retrieve our gear relatively quickly.

We’re still in the same DC, but are now a direct client of the DC in a shared rack. That may appear odd in the context of what I wrote before, but since we first moved there several years ago the DC has improved their infrastructure management to the point where servicing smaller clients is not a resource drain and thus they have sensible plans available. That’s brilliant given the market, but it’s actually quite unusual – commonly companies aim for bigger clients rather than recognising an opportunity to server small clients.

While this was going on we were of course working on a separate replacement mailserver, built from the backups. Since normally we’d have a replacement server already set up, the “build from scratch using backups” is a slower path. As it turned out, we got our servers back online around the same time we had our replacement ready, and for various reasons it was easier to just use the original servers at that point.

From this story you can work out several useful lessons, remembering that it’s always a trade-off. At some point the cost of being able to mitigate a particular scenario is so high that it’s not worthwhile. You just have to plan for several most common possibilities, with a slower recovery from backup as the last resort.

There’s also another piece of information which is highly relevant for Australian businesses, and that’s the Australian Personal Property Securities Register. Legislation for this system was enacted in 2009, the scheme is only since January 2012 and there’s a two-year transitional period. Remember how “posession is 9/10ths of the law” ? Well, if you ignore PPS it’s now 10/10ths. It is the primary and only register and reference for ownership of items (and data!) that are in care of another legal entity. So we own some servers, that reside in a rack of another company in a DC. We register ourselves, and then our servers (short description and serial numbers and such) and associated data content with PPS, against both the intermediate company (which had legal charge over the rack they reside in) and the hosting company (where the items physically reside). This way, we have a claim that indeed the stuff is ours, but also since the PPS is the only register we have ensure that noone else (inadvertently or even maliciously) claims to own something that’s actually ours. If you have a similar situation (and remember that data is as important as physical items!) you want to register it with PPS. The registration process is somewhat convoluted, but it is free – searches cost. Remember IANAL (I am not a lawyer) so do your research and get appropriate legal advice. If you’re not in Australia, other similar legislation may apply and you’ll want to check to make sure you’re safe.

 

Posted on
Posted on

MariaDB User Feedback plugin

MariaDB includes a User Feedback plugin. When enabled, the plugin submits basic, completely anonymous MariaDB usage information. This information is used by the developers to track trends in MariaDB usage to better guide development efforts.

To help make MariaDB better, simply add “plugin-load=feedback.so” to your my.cnf file! On Windows, add “feedback=ON” to your my.ini file, or click the checkbox during the installation of the MSI package.

See http://kb.askmonty.org/en/user-feedback-plugin for more information.

Posted on
Posted on

Open Query training at Drupal DownUnder 2012

DrupalDownUnder 2012 will be held in Melbourne Australia 13-15 January. A great event, I’ve been to several of its predecessors. People there don’t care an awful lot for databases, but they do realise that sometimes it’s important to either learn more about it or talk to someone specialised in that field. And when discussing general infrastructure, resilience is quite relevant. Clients want a site to remain up, but keep costs low.

I will teach pre-conference training sessions on the Friday at DDU:

The material is made specific to Drupal developers and users. The query design skills, for instance, will help you with module development and designing Drupal Views. The two half-days can also be booked as a MySQL Training Pack for $395.

On Saturday afternoon in the main conference, I have a session Scaling out your Drupal and Database Infrastructure, Affordably covering the topics of resilience, ease of maintenance, and scaling.

I’m honoured to have been selected to do these sessions, I know there were plenty of submissions from excellent speakers. As with all Drupal conferences, attendees also vote on which submissions they would like to see.

After DDU I’m travelling on to Ballarat for LinuxConfAU 2012, where I’m not speaking in the main program this year, but will have sessions in the “High Availability and Storage” and “Business of Open Source” miniconfs. I’ll do another post on the former – the latter is not related to Open Query.

Posted on
Posted on 1 Comment

SQL Locking and Transactions – OSDC 2011 video

This recent session at OSDC 2011 Canberra is based on part of an Open Query training day, and (due to time constraints) without much of the usual interactivity, exercises and further MySQL specific detail. People liked it anyway, which is nice! The info as presented is not MySQL specific, it provides general insight in how databases implement concurrency and what trade-offs they make.

See http://2011.osdc.com.au/SQLL for the talk abstract.

Posted on 1 Comment
Posted on

Green HDs and RAID Arrays

Some so-called “Green” harddisks don’t like being in a RAID array. These are primarily SATA drives, and they gain their green credentials by being able reduce their RPM when not in use, as well as other aggressive power management trickery. That’s all cool and in a way desirable – we want our hardware to use less power whenever possible! – but the time it takes some drives to “wake up” again is longer than a RAID setup is willing to tolerate.

First of all, you may wonder why I bother with SATA disks at all for RAID. I’ve written about this before, but they simply deliver plenty for much less money. Higher RPM doesn’t necessarily help you for a db-related (random access) workload, and for tasks like backups which do have a lot of speed may not be a primary concern. SATA disks have a shorter command queue than SAS, so that means they might need to seek more – however a smart RAID controller would already arrange its I/O in such a way as to optimise that.

The particular application where I tripped over Green disks was a backup array using software RAID10. Yep, a cheap setup – the objective is to have lots of diskspace with resilience, and access speed is not a requirement.

Not all Green HDs are the same. Western Digital ones allow their settings to be changed, although that does need a DOS tool (just a bit of a pest using a USB stick with FreeDOS and the WD tool, but it’s doable), whereas Seagate has decided to restrict their Green models such that they don’t accept any APM commands and can’t change their configuration.

I’ve now replaced Seagates with (non-Green) Hitachi drives, and I’m told that Samsung disks are also ok.

So this is something to keep in mind when looking at SATA RAID arrays. I also think it might be a topic that the Linux software RAID code could address – if it were “Green HD aware” it could a) make sure that they don’t go to a state that is unacceptable, and b) be tolerant with their response time – this could be configurable. Obviously, some applications of RAID have higher demands than others, not all are the same.

Posted on
Posted on

Open Query looking for new colleagues!

My colleagues and I are looking for extra talent – is that you?

What we do:help clients prevent problems (rather than being the fire department), we work on a subscription basis although we also do some ad-hoc consulting, and training. Apart from MySQL/MariaDB query and DBA work, we do quite a bit of system administration. Mainly Red Hat and Debian based distros, and expect to see replication and the MySQL-MMM multi-master system. You’d work from home, whereever it might be, so you will need to be self-motivating (but we do keep in touch online).

What we’re not: a full-time employer. With us, you make a life rather than a living. Everybody is contracted part-time. You can make enough to live comfortably, but that has nothing to do with hours. If you’re stressed about not filling all hours in your week with work-work-work, we’re not the company for you… there’s more to life than work, and we feel that’s really important.

Haven’t scared you off yet? Groovy. Take a peek at our jobs page for additional detail and contact info. Hope to hear from you!

Posted on
Posted on

On Password Strength

XKCD (as usual) makes a very good point – this time about password strength, and I reckon it’s something app developers need to consider urgently. Geeks can debate the exact amount of entropy, but that’s not really the issue: insisting on mixed upper/lower and/or non-alpha and/or numerical components to a user password does not really improve security, and definitely makes life more difficult for users.

So basically, the functions that do a “is this a strong password” should seriously reconsider their approach, particularly if they’re used to have the app decide whether to accept the password as “good enough” at all.

Update: Jeff Preshing has written an xkcd password generator. Users probably should choose their own four words, but it’s a nice example and a similar method could be used by an app to give “password suggestions” that are still safe.

Posted on