Posted on

On Open Source and Business Choices

Open Source is a whole-of-process approach to development that can produce high-quality products better tailored to users’ real world needs.  A key reason for this is the early feedback cycle built into that complete process.

Simply publishing something under an Open Source license (while not applying Open Source development processes) does not yield the same quality and other benefits.  So, not all Open Source is the same.

Publishing source of a product “later” (for instance when the monetary benefit has diminished for the company) is meaningless.  In this scenario, there is no “Open Source benefit” to users whatsoever, it’s simply a proprietary product. There is no opportunity for the client to make custom modifications or improvements, or ask a third party to work on such matters – neither is there any third party opportunity to verify and validate either code quality or security.

Open Source is not a marketing gimmick.  Labels such as “Open Source”, or “Enterprise”, on their own, do not have any more positive outcome than a greasy hamburger labeled with “healthy”.  If a company “believes” in Open Source software, they’ll use the open source development model for their software development.

And now we see things like this: Uproar: MariaDB Corp. veers away from open source (by Simon Phipps, InfoWorld, August 2016)

So what does it mean when a company publishes some of their software under an open source license, and does some related products under a proprietary license?  To me, it’s generally a strong indication that the company either doesn’t believe in that model, or doesn’t understand it.  And we’ve seen it before.

It also reminds me of an interaction I had many years ago.  A Marketing VP asked me “How can we leverage our [Open Source] community?”  I answered the only possible way: “One does not ‘leverage’ the community, that’s not how it works.”  Of course that wasn’t the answer the VP wanted to hear, but that doesn’t make it less true.  They saw the community as an asset to use, rather than work with.  People don’t like getting used, and in the Open Source space that’s even more true.

Companies that have turned their back on their earlier Open Source work and who have devised some other model to (arguably) make more money, have all discovered that this fundamentally changes their market.  They’ll lose some of their users, customers and supporters, and gain some new different clients.  It’s a different market.  Whether and how that pans out in terms of commercial success is never certain.  Given that we know that the Open Source development process yields benefits in terms of quality and features users want, we can say that non-OSS products lack (some of) those benefits, so to put it bluntly, it’ll be a different product of possibly less quality and the feature set is likely to differ as well.

Naturally we cannot ascertain code quality directly as we can’t review closed code directly, bug systems of proprietary software tends to be closed, changelogs are condensed for marketing purposes, but as far back as a decade and a half there have been independent studies that worked out “lines of code per software flaw” and it came out significantly in favour of Open Source software, having proportionally much fewer bugs.  Bugs also tend to get fixed quicker in Open Source software.  None of this is new(s). see for instance Open-source vs. proprietary software bugs: Which get squashed fastest? (CNET, 2007)

For complete products (libraries are a slightly different beast) with a relatively large market scope, source code being available does not in any way diminish a company’s ability to make money.  Having the core developers, tech writers and support people gives them a significant edge in the open market, and that’s a business asset you can leverage.  You do that by focusing on those aspects in your communications – that’s basic marketing, you draw attention to the positive aspects that make your company/product stand out from the rest.  Clearly, this objective cannot not achieved by force, as you don’t make a (potential) client like or trust you by denying them choice or transparency.

There is one other known option aside from not believing or not understanding, and that’s fear. But fear is an awkward business driver, it makes for very bad decisions.

MariaDB Corp in part uses the Open Source development model, in part they’re an Open Source publisher (in-house work that’s only made available at a later stage in the development process), and now some proprietary product has been added to the mix (actually new versions of an existing product).  Looking at this I am rather unclear about what they believe in.  Of course companies can make business choices as they see fit – but they never operate in a vacuum.  In the end it doesn’t matter much what I believe personally, the market will do what it will – historically, it responds in the various ways as described above.  We’ll see how it pans out.

Open Query does not recommend (or re-sell at all) proprietary tools, as it just doesn’t make sense for us or our clients.  We often do bugfixes and improvements which we contribute upstream – for proprietary tools we can’t do that and thus it becomes a hindrance for us and our clients.  On the specific practical level, we’ve actually never used MaxScale (the product that MariaDB Corp will now sell under different conditions for future versions), and this stems from our experience with its effective predecessor MySQL Proxy.  Having a complex set of scripted logic in a proxy slows down applications and introduces a rather large extra (single) point of potential failure in to infrastructure.   So, while Simon refers to MaxScale as an essential tool for scale-able environments, we know from experience that there are other ways of achieving that desired objective, and without the downsides.

Rather than promoting a single tool for many wildly different jobs, we utilise a few different tools depending on the needs of particular client infrastructure.  We still have a couple of (now legacy) MySQL-MMM deployments, but also quite a few Galera clusters, and other setups as suit our clients’ needs.  Key is to not only make the infrastructure convenient to use for applications, but also to not introduce any more single points of failure.  We build resilience into the client’s server infrastructure, without adding significant overhead in either performance or maintenance requirements.

We believe that that’s what clients want, and since potential clients come to us asking exactly for that (and note our approach with relief) we think that we’re doing the right thing by our clients.  We’ve used this approach for over 9 years, and we’ll just keep on doing that – our basic approach doesn’t change even when our tools do.  If you’d like to talk with us about helping you with your infra, using our approach and way of working, contact us today!

Posted on

Serving Clients Rather than Falling Over

Dawnstar Australis (yes, nickname – but I know him personally – he speaks with knowledge and authority) updates on The Real Victims Of The Click Frenzy Fail: The Australian Consumer after his earlier post from a few months ago.

Colourful language aside, I believe he rightfully points out the failings of the organising company and the big Australian retailers. From the Open Query perspective we can just review the situation where sites fall over under load. Contrary to what they say, that’s not a cool indication of popularity. Let’s compare with the real world:

  1. Brick & Mortar store does something that turns out popular and we see a huge queue outside, people need to wait for hours. The people in the queue can chat, and overall the situation can be regarded as positive: it shows passers-by that there’s something special going on, and that’s cool. If you don’t want to be in the crowd, you’ll come back later.
  2. Website is unresponsive/inaccessible. There’s nothing cool or positive about this, as the cause is not only unknown, but in fact irrelevant in the context. Each potential client is on their own. Things fail, so they go elsewhere (if there are substitutes) or potentially away completely (concert, it’ll sell out). The bad taste sticks, so if there are alternatives they will not only move there, but be quite vocal about it so others move also.

So you see, you really don’t want your site to go down because of popularity, or for any other reason. Slashdot years ago created a “degrade gracefully” mechanism, where parts of the site would go static. So where normally users would be able to comment and rate posts, they’d just be able to read. In the worst case, only the front page would remain active. On Sept 11 2001, Slashdot was one of the few big sites that actually remained accessible and provided regular news that people could then read even though the topic was not really in its normal scope. The point is, they proved the approach multiple times.

Contrarily, companies like Ticketek have surely got Enterprise Design architecture, however their site has been seen to fall over with events such as The Wiggles. They might be able to get away with this since they’re essentially a monopoly provider: if you want a ticket for this particular event, you need to go to them. But it’s not good. Generally they acted surprised, even though the huge load was entirely predictable. Is that just naive, or a hope to mislead the public, or negligent? You decide.

It’s really a failure in design of sorts. As to where exactly, only an architectural review would show, and it’ll be different for different sites. However, the real lesson is that it’s not about “Enterprise Design” at all, nor about using any particular high-profile hosting provider or involvement of other buzzwords. It’s about proper architecture and deployment and the database is only one aspects of this. It doesn’t have to end up particularly expensive either, it just has to be done right and there’s no single magical approach – each case is unique. Looking at this is best done early on (it tends to also work our better and cheaper), but we’ve helped clients out at much later stages also.  Ideally, we do like to help before there’s a raging fire.

Posted on

Jetpants: a toolkit for huge MySQL topologies

From a Tumblr engineering blog post:

Tumblr is one of the largest users of MySQL on the web. At present, our data set consists of over 60 billion relational rows, adding up to 21 terabytes of unique relational data. Managing over 200 dedicated database servers can be a bit of a handful, so naturally we engineered some creative solutions to help automate our common processes.

Today, we’re happy to announce the open source release of Jetpants, Tumblr’s in-house toolchain for managing huge MySQL database topologies. Jetpants offers a command suite for easily cloning replicas, rebalancing shards, and performing master promotions. It’s also a full Ruby library for use in developing custom billion-row migration scripts, automating database manipulations, and copying huge files quickly to multiple remote destinations.

Dynamically resizable range-based sharding allows you to scale MySQL horizontally in a robust manner, without any need for a central lookup service or massive pre-allocation of tiny shards. Jetpants supports this range-based model by providing a fast way to split shards that are approaching capacity or I/O limitations. On our hardware, we can split a 750GB, billion-row pool in half in under six hours.

Jetpants can be obtained via GitHub or RubyGems.

Good work Tumblr, excellent move to open up your tools: you’re bound to get good feedback and bug catches/fixes from users in other environments now, making your toolset even better!

Posted on

MySQL Cluster on Raspberry Pi

Earlier this week, Andrew Morgan wrote a piece on running MySQL Cluster on Raspberry Pi. Since the term “Cluster” is hideously overloaded, I’ll note that we’re talking about the NDB cluster storage engine here, a very specific architecture originally acquired by MySQL AB from Ericsson (telco).

Raspberry Pi is a new single-board computer based on the ARM processor series (same stuff that powers most mobile phones these days), and it can run Linux without any fuss. Interfaces include Ethernet, USB, and HDMI video, and the cost is $25-50. I’m looking to use one for the front-end of a MythTV setup (digital video recorder and TV system), I can just strap the Raspberry Pi to the back of a TV or monitor to do its job.

As Andrew already notes, in practical terms you’re not likely to use Raspberry Pi for a cluster – perhaps for development and certain testing, and it’d be a neat solid state management server. Primarily, it’s “techie cool”.

Knowing the NDB architecture, one of the key issues is that all nodes need to communicate with each other (NxN) so the system is very network intensive, and network latency significantly affects performance. So commonly, a cluster would have at least separate interfaces for direct connections to its siblings (no switch), and possibly Dolphin Interconnect cards to provide a link with much less latency than regular Ethernet offers. And you can’t do either with Raspberry Pi.

However, there are important positive lessons in this setup:

  • Using the open source nature of the software it can be utilised in a new environment with only minimal tweaks. Not everybody needs to or wants to tweak, but the ability to do so is critical to innovation.
  • Overall, scaling out rather than up makes sense. There are cost, power-efficiency and other factors involved. More, cheap, relatively low-powered, systems can deliver a system architecture that would otherwise be unaffordable (and the expensive construct might not scale anyway).
  • Affordable resilience (redundancy).

What if you needed lots of MySQL slaves with a fairly small dataset? Raspberry Pi could well be the solution. Not everybody is “big” or “high performance” in the same way.

Posted on

When Clever Goes Wrong & How Etsy Overcame – Arstechnica

In 2007, Etsy made a big bet on homegrown middleware to help with the site’s scalability. A half-year after it was taken live, the company decided to abandon it. As a senior software engineer at Etsy put it, “if you’re doing something ‘clever,” you’re probably doing it wrong.”

Read the full article at Arstechnica.com

I want to focus on the important lessons from this article, about middleware and using stored procedures in this fashion for a public web application, creating unscalable design complexity (smart and “proper” according to the old enterprise design teachings…) – causing infrastructure, development and maintenance hassles.

In the process they did replace PostgreSQL with MySQL but that’s not the critical change that made all the difference. PostgreSQL is a fine database system also.

Posted on

Ladies and gentlemen, check your assumptions

I spent some time earlier this week trying to debug a permissions problem in Drupal.

After a lot of head-scratching, it turned out that Drupal assumes that when you run INSERT queries sequentially on a table with an auto_increment integer column, the values that are assigned to this column will also be sequential, ie: 1, 2, 3, …

This might be a valid assumption when you are the only user doing inserts on a single MySQL server, but unfortunately that is not always the situation in which an application runs.

I run MySQL in a dual-master setup, which means that two sequential INSERT statements will never return sequential integers.  The value will always be determined by the  auto_increment_increment and auto_increment_offset settings in the configuration file.

In my case, one master will only assign even numbers, the other only uneven ones.

My patch was accepted, so this problem is now fixed in the Drupal 7 (and hopefully soon in 6 as well) codebase.

The moral of the story is that your application should never make such assumptions about auto_increment columns.  A user may run the application on a completely different architecture, and it may break in interesting and subtle ways.

If you want to use defined integers like Drupal does, make sure you explicitly insert them. Otherwise, you can retrieve the assigned number via the mysql_insert_id() function in PHP or via SELECT LAST_INSERT_ID() in MySQL itself.

Have you checked your code today?

Posted on

libmemcached packages

Ronald Bradford last week posted about memcached not being multi-threaded on Ubuntu, something he discovered via some small utilities that are bundled with libmemcached, written by Brian Aker.

When I noticed there were no Ubuntu packages for libmemcached (or the CLI tools) I decided to create some.

For your enjoyment: http://ubuntu.cafuego.net/dists/jaunty-cafuego/memcached/ (Source debs are included)

The repository also contains a memcached that has been re-compiled with multithreading enabled.

Posted on

Predictive caching in a MySQL-backed infrastructure

Sounds a bit far fetched (pun intended ;-), but we’re doing it. This is not inside of the MySQL server, but rather the overall application design. Let me run you through the logic…

Some key aspects to scaling are: not doing unnecessary queries, and caching what you can. Just a quick baseline. The fastest query is the one you don’t do, or the one you’ve already done before – the latter being caching.

A simple yet brilliant example of this is the Youtube trick where a script reads the relay log, converting updates into appropriate selects and running them so that the InnoDB cache will have the blocks in memory when the slave SQL thread executes the actual update. Maatkit now has a tool for this, so it’s publically available. It’s not quite predictive, but it’s a neat trick anyway that sometimes comes in handy. Search engines use similar tricks.

Extending on this, with certain applications you actually tell what is likely to happen next, sometimes for a particular user and often for many users. Individual user behaviour may sometimes appear random, but as a group it can be highly predictable. The analysis needs to be done properly though, otherwise averaging will make certain interesting behavioural patterns disappear.

Anyway, if you can identify these patterns you can take appropriate measures, such as do some queries so they get cached, and/or schedule other relevant actions (so it’s more than just caching, but it’s a reasonably suitable name anyway). This allows the app to deal with higher peak load, as well as improving response time for individual user.

I might do a talk or article on the predictive caching concept some time, as I appreciate that the short description may appear a bit abstract or obscure. But I assure you it’s entirely practical and real.

It’s one example of how Open Query helps its clients scale well, by design. We focus on preventing emergencies, which includes not just scenarios where stuff fails (and does a safe failover), but also the “oh dear we suddenly have so many more users than a minute ago” type of happening, which should actually be an occasion to enjoy, not stress about.