Posted on 1 Comment

Contributing to popular frameworks for scalability

Right now we’re volunteering some engineering time to assisting the WordPress and WooCommerce people with scalability issues. In the past we’ve put similar efforts into Drupal.

There are many opinions on these systems out there, most of them sadly negative. We take a different view. Each of these frameworks obviously has their advantages and disadvantages, but the key question is why people use them. When we understand that, we can assess that reasoning, and assist further.

Obviously writing your own code all the way is going to potentially create the most optimal result for your site. A custom tool is going to be less code and more optimal for your situation. However, it also requires you to put in quite a bit of development effort both to create and to maintain that system, including security issues. When you’re big enough (as a site/company) this can be worthwhile, but more and more organisations actually appear to be moving towards utilising a CMS and then extending that to their needs using the usual module/plugin/extension model.

This also means that bigger and bigger sites use these systems, and thus we need to look at the scalability. CMS frameworks tend to be “db heavy”, caused by being highly configurable combined with a modular/abstraction architecture that sometimes goes for “code architectural correctness” without taking scaling into account. Naturally most such code works fine on a dev box and even on a modest live server. But add sufficient content, and things rapidly go downhill. Just like with other database related tuning issues, there is no gradual degradation in performance – when certain boundaries are reached, performance plummets to cause nasty page load times or even complete page load failures.

Are these systems inherently dreadful? Actually, no. They do a darn good job and even in security terms they’re not bad at all. Naturally one always has to be careful with modules/plugins and whether they are (still) maintained.

So, with that assessment out of the way – we can look at the actual issues. It makes sense for us to contribute to these systems as it directly benefits our clients, ourselves, and many many others.

 

Just no the option structure in WordPress has drawn our interest. It’s abstracted, and so a plugin will request the value of an individual option item (by name). Typically it causes a db query. WordPress has an ‘autoload’ mechanism particularly for its core settings, which it loads in one bigger query and caches – that makes sense.

We’ve just commented on an existing issue regarding the indexing of the autoload column, with people arguing that indexing a boolean field is not beneficial (not necessarily true and easily tested for a specific case – the outcomes is that in this case an index IS beneficial) and that having more indexes slows down writes (true of course, but that’s never an argument against proper indexing – also this is mostly a read table, so a bit of overhead on writes is fairly immaterial). Lastly there were comments re MyISAM which has a slightly different performance profile to InnoDB when testing this. But InnoDB has been the default engine for quite a few years now – any installation in recent years, and any installation where the owner needs to care for more performance and other factors, will already be using InnoDB. Taking MyISAM into account is not beneficial. We hope the index will shortly be added. In the mean time you can add it yourself:

ALTER TABLE wp_options ADD INDEX (autoload)

Using autoload for everything would not be efficient, there can be tens of thousands of options in a decently sized site. So we see plugins retrieve half a dozen options, one by one using the appropriate framework function, and each triggers a database query. Again we have to appreciate why this architecture is the way it is, and not waste time on arguing with that. There are sensible reasons for having such an option architecture. What we need to address is the inefficiency of triggering many queries.

Our proposal is to extend the get_option() function with an extra parameter for a plugin name (call it a domain), then the function can issue a single db query to retrieve all the options for that plugin and cache it in a single object. When another option is requested it can be retrieved from there rather than triggering another query. This reduces the number of db queries from N per plugin to 1, which is significant. Mind you, each of these queries is tiny and doesn’t require much lookup time. But running any query has intrinsic overhead so we want to reduce the number of queries whenever possible – not running superfluous queries, combining queries, and so on.

There are variations on the proposal as there are many aspects to consider, but it’s important to not leave it “as is” as currently it affects performance and scalability.

So how do we catch potential items of interest? On the database end, the slow query log is still very useful with the proper settings. log_queries_not_using indexes, min_examined_row_limit=1000 (tune appropriate for the environment), log_slow_filter and log_slow_verbosity to give as much info as possible, and long_query_time to something sub-second. Naturally, this presumes you’ve already taken care of real nasties otherwise you’ll get very big slow log files very quickly. You can analyse the slow query log using Percona Tools, but the ‘mysqldumpslow’ tool which is part of the server package is pretty useful also. You’re looking for queries that either take a long time, or that happen often, or just look odd/inefficient/unnecessary. Then hunt where they come from in the code, and figure out the why. Often reasons aren’t necessarily good or current (historical scenarios linger), but there will be reasons and understanding them will help you talk with developers more effectively – seeing where they come from.

WordPress has a nice plugin called Query Monitor. As admin you get details from a page load in your top admin bar, and you can delve into specific aspects. Things to look out for are an excessive number of queries for a page (a hundred is sadly common, but when you see into the thousands it’s worthwhile investigating – of course, start with the biggest problems first and work your way down), duplicated queries (indicating framework inefficiencies) and things like that. Have a good browse around, there’s much there that’s useful.

If you want, feel free to bring suspected issues to our attention. We’ll look at them, perhaps write about them here, and in any case follow up with the developers in their bug tracking system. It’s worthwhile.

Posted on 1 Comment

1 thought on “Contributing to popular frameworks for scalability

  1. On a very suspect query and structure: https://jira.mariadb.org/browse/MDEV-12343. Just waiting on author to identify origin. Can this EAV structure be moved to just columns? Does the wordpress abstraction may help do this? With Innodb and ALTER TABLE online there is there any operational reason to a EAV table. This should be workable from an environment point of view and reduce such a query to a single join and be a lot faster.

Comments are closed.