Posted on

Dogfood: making our systems more resilient

This is a “dogfood” type story (see below for explanation of the term)… Open Query has ideas on resilient architecture which it teaches (training) and recommends (consulting, support) to clients and the general public (blog, conferences, user group talks). Like many other businesses, when we first started we set up our infrastructure quickly and on the cheap, and it’s grown since. That’s how things grow naturally, and is as always a trade-off between keeping your business running and developing while also improving infrastructure (business processes and technical).

Quite a few months ago we also started investing (mostly time) in the technical infrastructure, and slowly moving the various systems across to new servers and splitting things up along the way. Around the same time, the main webserver frequently became unresponsive. I’ll spare you the details, we know what the problem was and it was predictable, but since it wasn’t our system there was only so much we could do. However, systems get dependencies over time and thus it was actually quite complicated to move. In fact, apart from our mail, the public website was the last thing we moved, and that was through necessity not desire.

Of course it’s best for a company when their public website works, it’s quite likely you have noticed some glitches in ours over time. Now running on the new infra, I happened to take a quick peek at our Google Analytics data, and noticed an increase in average traffic numbers of about 40%. Great big auch.

And I’m telling this, because I think it’s educational and the world is generally not served by companies keeping problems and mishaps secret. Nasties grow organically and without malicious intent, improvements are a step-wise process, all that… but in the end, the net results of improvements can be more amazing than just general peace of mind! And of course it’s very important to not just see things happen, but to actively work on those incremental improvements, ongoing.

Our new infra has dual master MySQL servers (no surprise there 😉 but based in separate data centres so that makes the setup a bit more complicated (MMM doesn’t deal with that setup). Other “new” components we use are lighttpd, haproxy, and Zimbra (new in the sense that our old external infra used different tech). Most systems (not all, yet) are redundant/expendable and run on a mix of Linode instances and our own machines. Doing these things for your own infra is particularly educational, it provides extra perspective. The result is, I believe, pretty decent. Failures generally won’t cause major disruption any more, if at all. Of course, it’s still work in progress.

Running costs of this “farm”? I’ll tell later, as I think it’s a good topic for a poll and I’m curious: how much do you spend on server infrastructure per month?

Background for non-Anglophones: “eating your own dogfood” refers to a company doing themselves what they’re recommending to their clients and in general. Also known as “leading by example”, but I think it’s also about trust and credibility. On the other hand, there’s the “dentist’s tooth-ache” which refers to the fact that doctors are their own worst patients 😉

Posted on