MariaDB Foundation

You may have already seen the announcement MariaDB Foundation to Safeguard Leading Open Source Database. We at Open Query wholeheartedly support this (r)evolution of the MySQL ecosystem, which appears to be increasingly necessary as Oracle Corp is seriously dropping the ball with security updates and actually just general development and innovation. Oracle has actually done some very good work, I happily acknowledge that – but security issues are critical, having crashing bugs and incorrect query results in a .28 of a GA release is uncool, and not incorporating awesome development efforts by the community is just astonishing.

MariaDB is where the Sphinx fulltext search and OQGraph engines are integrated, the community developed virtual columns and Galera synchronous replication are added, and the fabulous developers at Monty Program have taken the time to fix up the previously ghastly performance of subqueries so that they now rock. And that’s apart from the many little bugs Monty Program has fixed along the way. Awesome work, guys!

MariaDB still merges from stock MySQL, but it’s important to not be dependent on that – the MariaDB Foundation can help support that already existing effort and raise the profile of MariaDB to – as far as I’m concerned – move on from MySQL.

From our end, apart from our continued active involvement in the community, Open Query will be adjusting and augmenting its fee structure so that our clients co-fund MariaDB Foundation and contributing companies such as Codership (Galera). MySQL and MariaDB are mission critical for businesses. It’s not (and never has been) a case of volunteers and mere charity. We all p(l)ay our part.

It’s time to upgrade.

MariaDB security updates

Important Security Fix for a Buffer Overflow Bug: MariaDB 5.5.28a, 5.3.11, 5.2.13 and 5.1.66 include a fix for CVE-2012-5579, a vulnerability that allowed an authenticated user to crash MariaDB server or to execute arbitrary code with the privileges of the mysqld process. This is a serious security issue. We recommend upgrading from older versions as soon as possible.

MariaDB 5.5.28a, 5.3.11, 5.2.13 and 5.1.66 (GA) binaries, packages, and source tarballs are now available for download from http://downloads.mariadb.org. So you can upgrade within your own major series.

Note that while this fix has just been published, some other vulnerabilities have been noted over the weekend also. Below a summary of these other CVEs as documented by Red Hat Security Response Team, with annotations by Sergei Gulubchik who is the Security Coordinator for MariaDB.

See http://seclists.org/oss-sec/2012/q4/388 for Sergei’s full response.

Note that stock MySQL is also affected – in this post we’re just referring to the specific MariaDB fixes/releases/responses. It appears that Oracle has not yet made any releases for this security issue, which is unfortunate as the issues have been published and can therefore be easily exploited by malicious users. In the same thread referenced above it is stated that Oracle has been made aware of the issues so a fix should be forthcoming for people who use stock MySQL also.

Naturally these security advisories also affect anyone still running a 5.0 OurDelta or early 5.1 OurDelta version. Please upgrade urgently to the latest MariaDB 5.1 (5.1.66) or above. If you require any assistance with this, please contact Open Query. This advisory is also noted on the front page of ourdelta.org.

MariaDB C client libraries and the end of dual-licensing

Finally there is an LGPL C client library for MariaDB, and thus also for MySQL. Monty Program and SkySQL have been working on this for some time. Admittedly there was already the BSD licensed Drizzle client library which was also able to talk to a MySQL/MariaDB server, however its API is different. The C client library for MariaDB has exactly the same API existing applications are used to, so you can just re-link and keep going! There is also a new LGPL Java client library for MariaDB.

In case you don’t quite realise: this is actually a major thing.

At MySQL AB, the client library was made GPL and this flowed through to Sun Microsystems and then Oracle Corp. This licensing choice for the client library was the basis of the infamous “dual licensing” model: if you developed a proprietary application to distribute (not just internal or used in a website), you could obviously not just link against the GPL client library was its license explicitly states that whomever receives that code is entitled to ask for the source code. Some describe that as the “viral nature” of GPL – if you want to use such licensed code it prescribes what kind of license you can use for the rest of the code it’s linked to – but in any case that’s how the license operates, following the objective of GPL to promote free as in freedom (for recipients of the programs).

So the way this worked in practice is that MySQL Sales could sell you a non-GPL license for this code, thus enabling proprietary use. It’s legal leverage and not intrinsically bad, but IMHO the way the sales people played this was. Using FUD (fear-uncertainty-doubt) and sometimes misdirection they coaxed clients in to costly licensing arrangements, including in cases where really the GPL license would have sufficed. While this behaviour was by no means corporate policy, the sales people got a lot of leeway – and being paid on commission, these were the consequences.

When the MySQL Network subscription offering (now called MySQL Enterprise) for support was introduced around 2004, the intent was that this new revenue stream would make the dual licensing irrelevant. But that’s not what happened (hindsight gives insight). Since selling non-GPL licenses was a major revenue stream for the sales people, they kept on pushing it. Clayton Christensen and others at Harvard Business School observed years ago that you can’t disrupt yourself, and this might be an interesting example. While the subscription model did take off nicely, the licensing sales were still making lots of money, and thus it did not diminish in importance. Note that this sales model exists until today, see Commercial License for OEMs, ISVs and VARs (Oracle Corp) .

The MySQL Enterprise subscription (which is still sold by Oracle Corp) has had other consequences also. The tools that clients get via that subscription are nice, they provide information on how the server is running and how queries are handled. But it does so at a cost: for some tasks it needs a proxy (sits inbetween app and db server) which of course adds overhead. One simple example is the case where you want to know whether a particular query uses a tmp table or a sort operation, and whether those operations happened in memory or on disk. But the server already knows, so why is a proxy required? Because if the code were in the server, everybody would have it and not just the subscribers. That’s a valid business decision of course, but as you see it has significant technical consequences and someone’s business model should not concern us. In the MySQL 5.0 OurDelta enhanced builds we already applied a patch (an amalgamation of work by several people) to enhance the slow query log, Monty ported it for 5.1 and it’s now a standard part of MariaDB.

For everybody’s sake it would have been great if the dual licensing model had gone the way of the dodo. It had its time (until around 2004-2005, arguably), after which it became a hindrance, actually hurting the MySQL ecosystem – and that situation pretty much persisted until now. That’s why this replacement library is so important, it makes that whole case obsolete. I say horay, and I hope distributions will soon use this library (as apart from anything else, the licensing for it is clear rather than messy).

To summarise, as talks about licensing often confuse (particularly with the amount of FUD and old info around on the net):

  1. GPL v2 only “kicks in” when software is distributed (beyond the boundary of your organisation).
  2. Internal use (which includes the code on running a website) is not distribution.
  3. Whether or not you sell something is irrelevant for GPL: it’s purely about distribution of code, not whether money is involved.
  4. Using MySQL or MariaDB Server is almost always fine under GPL, since you’re not linking against anything else.
  5. The old MySQL client library is GPL, so if you link your app against it you’re subject to GPL, that is, the rest of the app would need to be licensed under something compatible with GPL (typically, GPL itself). Again, this is only relevant if you distribute that app.
  6. The new MariaDB client library is LGPL v2.1, so you are allowed to link it with proprietary application code and then distribute. The recipients must be able to re-link the proprietary code with another version (modified or not) of the client library (easily satisfied if you link dynamically rather than statically).
  7. If someone tells you that use of a wire protocol constitutes linking and that therefore a proprietary app is in essence linked to MySQL/MariaDB Server (which are GPL licensed), this can easily be disputed using the following example (there are many): Microsoft Internet Explorer often talks to an Apache web server using the HTTP protocol, but noone would suggest that these two codebases are in any way linked or co-dependent in a way that would be relevant to their licensing.
  8. While I was one of the people at MySQL AB who dealt with licensing questions and I strongly encourage you to reject FUD (the above info given is no different from what I used to tell while employed by MySQL), I am not a lawyer. Get legal advice if/when you need it, and get it from someone with a serious clue about GPL. Check their public credentials, not just advertising claims.

Also see:

On a side-note, perhaps someone can now link the LibreOffice Base code and the native MySQL SDBC driver (written by Georg Richter years ago) with the MariaDB client library, as they’re all LGPL. This would resolve a long-standing issue that Sun Microsystems (when they owned both OpenOffice and MySQL) sadly did not fix. See OpenOffice.org, MySQL… aren’t they both owned by Sun? And there may be other software projects that can now be “fixed up” in a similar way. If you know of any, please let me know!

Serving Clients Rather than Falling Over

Dawnstar Australis (yes, nickname – but I know him personally – he speaks with knowledge and authority) updates on The Real Victims Of The Click Frenzy Fail: The Australian Consumer after his earlier post from a few months ago.

Colourful language aside, I believe he rightfully points out the failings of the organising company and the big Australian retailers. From the Open Query perspective we can just review the situation where sites fall over under load. Contrary to what they say, that’s not a cool indication of popularity. Let’s compare with the real world:

  1. Brick & Mortar store does something that turns out popular and we see a huge queue outside, people need to wait for hours. The people in the queue can chat, and overall the situation can be regarded as positive: it shows passers-by that there’s something special going on, and that’s cool. If you don’t want to be in the crowd, you’ll come back later.
  2. Website is unresponsive/inaccessible. There’s nothing cool or positive about this, as the cause is not only unknown, but in fact irrelevant in the context. Each potential client is on their own. Things fail, so they go elsewhere (if there are substitutes) or potentially away completely (concert, it’ll sell out). The bad taste sticks, so if there are alternatives they will not only move there, but be quite vocal about it so others move also.

So you see, you really don’t want your site to go down because of popularity, or for any other reason. Slashdot years ago created a “degrade gracefully” mechanism, where parts of the site would go static. So where normally users would be able to comment and rate posts, they’d just be able to read. In the worst case, only the front page would remain active. On Sept 11 2001, Slashdot was one of the few big sites that actually remained accessible and provided regular news that people could then read even though the topic was not really in its normal scope. The point is, they proved the approach multiple times.

Contrarily, companies like Ticketek have surely got Enterprise Design architecture, however their site has been seen to fall over with events such as The Wiggles. They might be able to get away with this since they’re essentially a monopoly provider: if you want a ticket for this particular event, you need to go to them. But it’s not good. Generally they acted surprised, even though the huge load was entirely predictable. Is that just naive, or a hope to mislead the public, or negligent? You decide.

It’s really a failure in design of sorts. As to where exactly, only an architectural review would show, and it’ll be different for different sites. However, the real lesson is that it’s not about “Enterprise Design” at all, nor about using any particular high-profile hosting provider or involvement of other buzzwords. It’s about proper architecture and deployment and the database is only one aspects of this. It doesn’t have to end up particularly expensive either, it just has to be done right and there’s no single magical approach – each case is unique. Looking at this is best done early on (it tends to also work our better and cheaper), but we’ve helped clients out at much later stages also.  Ideally, we do like to help before there’s a raging fire.

Web Logs and Statistics

I’ve previously covered how user perspective and that browser to web server latency is a dominate cause of web server performance issues.

Assuming you’ve resolved those as best as you can how to you measure server performance at a per request level. Web servers allow you to put their processing time into the logs however its not there by default.

Using Nginx you can define a custom format by adding $request_time:

log_format extended '$remote_addr - $remote_user [$time_local]  '
                    '"$request" $status $request_time $body_bytes_sent '
                    '"$http_referer" "$http_user_agent"';

access_log [current access log file] extended;

In Apache’s HTTPd you can do a similar thing using %D:

LogFormat "%h %l %u %t \"%r\" %>s %D %b \"%{Referer}i\" \"%{User-agent}i\"" extended

CustomLog [current access log file] extended

After you’ve collected a meaningful sample of web activity you’re ready to analyse. First step copy the logs off all web servers to a common directory.

Second combine the logs. Assuming you haven’t played with the time format in the log, logmerge using –apache-access is a good way to get a single log even with the Nginx format above.

logmerge --apache-access *access.log > weball.log

Assuming your time is in the 10th column like I’ve done in the example. The following will put the slowest responses at the top.

sort -n -r -k 10 --parallel=8 weball.log > weball-sort.log

Take a look though and look at what’s here.

If you consider more that 4 seconds for a response is slow lets move all those requests to their own file. The grep here is matching the first response of 0, 1, 2 or 3 seconds in the sorted log.

grep -n -m 1 'HTTP/1\.[01]" 200 [0123]\.' weball-sort.log

42435:127.0.0.1 - - [02/Oct/2012:22:24:33 -0400] "POST /search ...

This will print the first line that has a response time beginning with 0,1,2 or 3. Because its sorted all lines before this are bigger. So lets put that in its own file.

head -n {linenumber - 1} weball-sort.log > weball-sort-more-than-4-seconds.log

To group these by URL path and get some frequency counts:

cut -f 8 -d ' ' web-access-sorted-time-more-than-4-seconds.log | sort | uniq -c | sort -n -r

6377 /new-releases
3753 /userrecommendations/index
2160 /home
1073 /bestsellers/reviewed
1048 /myfantasicproduct
563 /category/cool/10
464 /category/fun/17
395 /index.php?do=/blog/add/

....

From this you should start to recognise some URL patterns that need attention.

Take a note of these. And we’ll use statistics to see how these really are across all logs. I didn’t find a good simple number counter quickly enough so I’m using the modified one as attached. Compile with gcc stat.c -o stat -lm. This should compile on any ansi C compiler. It takes numbers as the input and calculates output when the EOF is reached.

/* stat.c
* This program reads numbers from standard input and outputs
* sum, mean and standard deviation.
*
* Rewritten by Daniel Black of Open Query
*
* To build:
* gcc stat.c -o stat -lm
*
* originally based of:
* http://www.dreamincode.net/forums/topic/138734-sum-mean-median-max-min-standard-deviation-in-c/
*/

#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include <string.h>

int main(void)
{
long unsigned count = 0;
double sum_sqrs = 0, total = 0, max = 0, min = 0, mean = 0, std_dev = 0, var = 0;
int res = 0;
float tmp;

/* Read inputs from the console window */
res = scanf("%f", &tmp);

if (res == EOF) return 0;
count = 1;
min = max = total = tmp;
sum_sqrs = tmp * tmp;

while(1)
{
res = scanf("%f", &tmp);
if (res == EOF) break;
count += res;

if(tmp < min) min = tmp;
if(tmp > max) max = tmp;

total += tmp;
sum_sqrs += tmp * tmp;
}

printf("Count: %lu\n", count);
printf("Sum of All Values: %.2f: \n", total);

mean = total / count;
printf("Mean of Values: %.2f: \n", mean);

printf("Minimum value entered: %.2f: \n", min);
printf("Maximum value entered: %.2f: \n", max);

var = sum_sqrs / count - mean*mean;
std_dev = sqrt(var);

printf("Standard Deviation: %.2f\n", std_dev);

return 0;
}

So:

fgrep "GET /home" weball.log | cut -f 11 -d ' ' | ./stat
Count: 47438
Sum of All Values: 69947.67:
Mean of Values: 1.47:
Minimum value entered: 0.00:
Maximum value entered: 200.00:
Standard Deviation: 1.83

Assuming this is a normal like distribution 97% of responses will lie within mean + 2* standard deviation so 5.13 seconds. Acceptable? up to you.

There’s still a 200 second maximum on this. Though there shouldn’t be that many outlying entries we’ll need to look at this to see its significance.

How many took more than 10 seconds?

fgrep 'GET /home' web-access-sorted-time-slowest-40000.log| egrep -v 'HTTP/1\.[01]" [0-9]{3} [0-9]\.' | wc -l

61

So 61 request out of 47438? Using “more” instead of “wc -l” will enable you to see some responses. Sometimes it a property of the dataset that causes these particular pages to be slow so even a particularly low number here may be worth examining.

Repeating the above steps for other URLs is a pretty good way of identifying problem pages by examining the frequency, mean, standard deviation and the business value of the page. So now you just need to dive into the mechanics of the page and work out what’s being slow.

Aside if you’re interested in graphing:

If you want to do XY graphs using the time as a X column I suggest using logmerge’s –tempfile option then:

sed -i -e 's/^\(..............\)_\([^ ]*\)\(.*\)/\1\3/g' logfile

After this and the first column with be a YYYYMMDDHHMMSS time and thus usable on an X axis. I was having trouble using Libreoffice Calc for this on a 2 million line log file though putting the file with a .csv extension helped.

Optimising Web Servers

I was lucky enough to attend PyCon-AU recently and one talk in particular highlighted the process of web server optimisation.

Graham Dumpleton’s add-in talk Web Server Bottlenecks And Performance Tuning available on YouTube (with the majority of PyCon-AU talks)

The first big note at the beginning is that the majority of the delay in user’s perception of a website is caused by the browser rendering the page. Though not covered in the talk for those that haven’t used the tool YSlow (for Firefox and Chrome) or Google’s Developer Tools (ctrl-alt-I in Chrome), both tools will give you pretty much identical recommendations as to how to configure the application page generated and server caching /compression settings to maximise the ease at which a web browser will render the page. These recommendations also will also minimise the second most dominate effect in web pages displayed, network latency and bandwidth. Once you have completed this the process of making web pages faster on the web server begins to take a measurable effect to the end user.

The majority of the talk however continues talking about web server configuration. The issues you will find at the web server are the memory, CPU and I/O are the constraints that you may hit depending on your application.

Measuring memory usage by considering an applications use of memory multiplies by how many concurrently running processes will give you an idea of how much memory is needed. Remember always that spare memory is disk cache for Linux based systems and this is significant in reducing I/O read time for things like static content serving. Memory reduction can be helped by front-end proxying as described by the question at offset 19:40 and relating it to the earlier description of threads and processes.  In short the buffering provided by Nginx in a single process on the input ensures that the application code isn’t running until a large amount of input is ready and that output is buffered in Nginx such that the process can end quicker while Nginx trickles the web page out to the client depending on the network speed. This reduction in the running time of the application enables the server to support better concurrency and hence better memory usage. This is why we at Open Query like to Nginx as the web server for the larger websites of our clients.

Database is effectively an I/O constraint from the web server perspective as it should be on a different server if you run something more than a simple blog or an application where database utilisation is very low.  A database query that requires input from the web request that takes a long time to run will add to the time taken in rendering the page in fairly significant terms. Taking note of which queries are slow, like enabling the slow query log is the first step to identifying problem. Significant gains can usually be made by using indexes and using the database rather than the application to do joins, iterations and sorting. Of course much more optimisation of server and queries is possible and Open Query is happy to help.

Thanks again to PyCon speakers, organisers, sponsors and delegates. I had a great time.

The Optimiser Conundrum

We’ve been helping a long-term client who runs some fairly complex queries (covering lots of tables and logic on a respectably big but mainly volatile dataset). We tend to look first at query structure and table design, as fixing problems there tends to have the most impact. This contrary to just tossing more hardware at the problem, which is just expensive.

As subqueries are used (and necessary in this case), MariaDB 5.3 was already a great help with its subquery optimisations. Once again thanks, Monty and the Monty Program optimiser team (Igor, Sergey, Timour, and possibly others) – all former colleagues and they’re absolutely awesome. Together, they know the MySQL optimiser like no other.

Because the queries are generated indirectly from an exposed API (just for paying clients, but still), the load is more unpredictable than having merely a local front-end. Maintaining spare capacity with slaves addresses this, but naturally it’s still important to ensure that each query takes as little time as possible. Queries use appropriate indexes and each bit is fairly optimal by now.

Then the client reported that while overall things were going really well, there was this one (type of) query that was really misbehaving. The report was that it appeared to be “locking up the server”. Now I generally don’t believe in that, because I know how the server works: it doesn’t just hang, but it might just take a long time to do exactly what it was told to do. Computers are like that.

Typically that kind of thing would happen with for instance a really bad join, but in this case there was nothing like that. Rather than running the query which was sure to not go well (or quick) regardless of what it was doing, the proper thing to try was an EXPLAIN (on a slave) – after 10 minutes I got fed up and killed the query. So, the server was spending all that time in the query analysis/optimisation phase to figure out the optimal order of the tables and which access method for each table it should use.

I know that pattern too, we used to see MySQL 4 do this. Obviously, spending ages on finding the optimal execution plan for a tiny query makes no sense – so back then the optimiser team (same people as noted above!) implemented a limitation and pruning algorithm, with configurable settings: optimizer_search_depth (default: 62) and optimizer_prune_level (default: 1). After this algorithm was implemented, we didn’t see the problem with clients so adjusting the search depth was not even required.

There are exceptions. Another former colleague, Max Mether (now at SkySQL) wrote about this a year ago: Setting optimizer search depth in MySQL, based on an experience with one of their clients. While staying with EXPLAIN rather than actually trying to run the query, I did SET SESSION optimizer_search_depth=1 and validated the hypothesis that indeed this had been where the server was spending its time. Then I played with the setting a bit, and found that for my 28-table query the curve was much steeper than what Max had found. At depth=15 I already had to wait 15 seconds just for the analysis (Max had 5 seconds at that point). While any of us old hands can look at a query and figure out reasonably well what the optimiser would do with it, that exercise becomes a bit tedious when dealing with 28 tables and subqueries involved.

As a side-note, running the actual query (with a low depth setting) takes hardly any extra time, which is typical – it’s the analysis/optimisation phase that’s eating all the time.

For now, we have a workaround… the application can set the option before running queries like this.

SET @save_optimizer_search_depth = @@optimizer_search_depth;
SET SESSION optimizer_search_depth=1;
...query...
SET SESSION optimizer_search_depth=@save_optimizer_search_depth;

But I’m not satisfied: I don’t believe the server should be acting in this way. The issue appears similar to the MySQL 4 scenario, and just like then we should come up with a way for the optimiser to decide when to call it quits and just execute. Perhaps we need to set a cap on the total amount of time allowed (configurable), but that seems rather crude. Other things have changed in the optimiser since that time, so it could even be some kind of regression – hopefully the optimiser team will figure this out when they look at it.

I’m working with the client to isolate a dataset sufficient for reproducing this issue, so that we can give it to the team at Monty Program. I’ll file a bug report when I have that set. If you happen to have something similar, please contribute your insights (and data/query) also! For now, if you can, please comment to this post. I’m interested to learn if it’s a more common occurrence now. thanks!

The Data Charmer: Is Oracle really killing MySQL?

http://datacharmer.blogspot.it/2012/08/is-oracle-really-killing-mysql.html

An insightful post for my former  (MySQL AB) colleague Giuseppe Maxia about how Oracle’s actions affect the MySQL landscape.

My own comment exploring why it’s happening (from Upstarta perspective) is on his blog post rather than here. From Open Query’s business perspective, we generally deploy MariaDB unless client prefers distro stock. We get the features we need in MariaDB, see the bugfixing and have an open dialog with the developers and see the development process.

While the current new code coming from Oracle definitely has interesting components, MariaDB has solved some real problems (such as subqueries), and integrated useful engines such as Sphinx, FederatedX, and our on OQGraph.

As long as Oracle does useful things for MySQL, MariaDB will keeping picking up those changes also. If/when it doesn’t, MariaDB is still viable. Its team has the necessary expertise, experience and vision. So that has been and will remain our approach to this matter. I don’t see the landscape now as different from when we made that decision.

One-way Password Crypting Flaws

I was talking with a client and the topic of password crypting came up. From my background as a C coder, I have a few criteria to regard a mechanism to be safe. In this case we’ll just discuss things from the perspective of secure storage, and validation in an application.

  1. use a digital fingerprint algorithm, not a hash or CRC. A hash is by nature lossy (generates evenly distributed duplicates) and a CRC is intended to identify bit errors in transmitted data, not compare potentially different data.
  2. Store/use all of the fingerprint, not just part (otherwise it’s lossy again).
  3. SHA1 and its siblings are not ideal for this purpose, but ok. MD5 and that family of “message digests” has been proven flawed long ago, they can be “freaked” to create a desired outcome. Thus, it is possible to manufacture a source string that generates an MD5 of course.
  4. Add a salt of reasonable length (extra string added to password), otherwise dictionary attacks are way to easy. In addition, not using a salt means that two users who have the same password end up with the same encrypted password which is another case of “too much info” for people. Salt should of course be different for each user. Iterate.
  5. Even if someone were to capture your user/pwd table, they should not be able to decode the passwords within a reasonable amount of time. Flaws in any of the above issues can make such attacks easy.

The below code, used in a variety of ecommerce packages (osCommerce prior to v2.3.0, ZenCartCRE Loaded / Loaded Commerce, and other derivatives and descendants of oscommerce) on the surface appears to do something quite smart. Note that this code does not use an external salt (such as the username or other separate field) but instead generates it and adds it to the encrypted password. This enables it to be used in applications where no username or other login constant other than the password is available, although I’d consider that quite rare.

function validateAdminPassword($plain, $encrypted) {
  if (!$plain && !$encrypted) {
    return false;
  }

  $stack = explode(':', $encrypted);
  if (sizeof($stack) != 2)
    return false;

  if (md5($stack[1] . $plain) == $stack[0]) {
    return true;
  }

  return false;
}

function encryptAdminPassword($plain) {
  $password = '';

  for ($i=0; $i<10; $i++) {
    $password .= rand();
  }

  // arjen comment: so the 2 is what you need to increase,
  // as well as the length of the relevant database column.
  $salt = substr(md5($password), 0, 2);

  $password = md5($salt . $plain) . ':' . $salt;

  return $password;
}

This code is flawed. Apart from being confusing (using the $password variable name when calculating the salt) the main problem is that the salt ends up too short. The code generates 10 pseudo-random characters (PHP tends to initialise the random generator from time, so it can be somewhat predictable which is a potential attack vector – for instance when the creation time of the user record is also stored) but then it’s run through MD5() after which only the two first characters of the resulting message digest are used for the actual salt. Since the MD5 comes out as hex digits, the range of each of the two characters is [0-9a-f] and so the total number of possibilities for the salt string is 256. That’s not a lot!

The effort involved in pre-calculating the MD5s (including all salt permutations) is not that high, it’s merely 256 times the size of the dictionary used. Wouldn’t take that much disk space. Since this code is used by lots of sites, the potential for a successful attack is rather high in that sense also. Combined with the lack of iteration, this just makes an attack all too easy.

Finally, if the user table were captured from a site with a large number of users, the chance of finding colliding encrypted passwords is quite a bit higher than it should be. But the above mentioned approach already has sufficient potential for a damaging security breach.

If this code is active on your site, a quick patch would be to increase the length of the salt by changing the substr() call, and make it do iterations. Obviously you’ll also need to similarly increase the length of the password storage column in your database. You then get old and new crypted passwords in your table and you can work out which version by checking the length of the crypted password string. On login you can replace old for new for each user as you’ll have their plain password at that point (since they just filled it in and sent it to your app). That way you can create a clean transition. I grant you it’s not perfect but it’s at least improving an otherwise very insecure situation.

If, rather than pragmatically fixing up an existing environment that you didn’t write, you want to do all this properly, read Password Storage Cheat Sheet from OWASP (the Open Web Application Security Platform). It lists a range of considerations (and reasoning) beyond my basic pragmatic list. If you’re going to code from scratch, please do it right.

Edit: 20120717: added maximum affected osCommerce version thanks to Harald Ponce de Leon. osCommerce as of v2.3.0 uses phpass like WordPress

Understanding SHOW VARIABLES: DISABLED and NO values

When you use SHOW VARIABLES LIKE “have_%” to see whether a particular feature is enabled, you will note the value of NO for some, and DISABLED for others. These values are not intrinsically clear for the casual onlooker, and often cause confusion. Typically, this happens with SSL and InnoDB. So, here is a quick clarification!

  • NO means that the feature was not enabled (or was actively disabled) in the build. This means the code and any required libraries are not present in the binary.
  • DISABLED means that the feature is built in and capable of working in the binary, but is disabled due to relevant my.cnf settings.
  • YES means the feature is available, and configured in my.cnf.

SSL tends to show up as DISABLED, until you configure the appropriate settings to use it in my.cnf (SHOW VARIABLES LIKE “ssl_%”). From then on it will show up as YES.

Depending on your MySQL version and distro build, InnoDB can be disabled via the “skip-innodb” option. Obviously that’s not recommended as InnoDB should generally be your primary engine of choice!

However, InnoDB can also show up as DISABLED if the plugin fails to load due to configuration or other errors on startup. When this happens, review the error log (often redirected to syslog/messages) to identify the problem.

If InnoDB is configured as the default storage engine, failed initialisation of the plugin should now result in mysqld not starting, rather than starting with InnoDB disabled, as obviously InnoDB is required in that case.