Posted on

Amazon vs Linode in case of hardware failure

One of our clients received an email like this:

Dear Amazon EC2 Customer,

We have important news about your account (…). EC2 has detected degradation of the underlying hardware hosting your Amazon EC2 instance (instance-ID: …) in the […] region. Due to this degradation, your instance could already be unreachable. After […] UTC your instance, which has an EBS volume as the root device, will be stopped.

You can see more information on your instances that are scheduled for retirement in the AWS Management Console

  • How does this affect you?
    Your instance’s root device is an EBS volume and the instance will be stopped after the specified retirement date. You can start it again at any time. Note that if you have EC2 instance store volumes attached to the instance, any data on these volumes will be lost when the instance is stopped or terminated as these volumes are physically attached to the host computer
  • What do you need to do?
    You may still be able to access the instance. We recommend that you replace the instance by creating an AMI of your instance and launch a new instance from the AMI.
  • Why retirement?AWS may schedule instances for retirement in cases where there is an unrecoverable issue with the underlying hardware.

Great to be notified of such things. That’s not the problem. What I find curious is that Amazon tosses the problem resolution entirely at their clients. Time and effort (cost) is required, even just create an AMI (if you don’t already have one) and restart elsewhere from that.
Could it be done differently? I think so, because it has been done for years. At Linode (for example). If something like this happens on Linode, they’ll migrate the entire VM+data to another host – quickly and at no cost. Just a bit of downtime (often <30mins). They’ll even do this on request if you (the client) suspect there is some problem on the host and would just like to make sure by going to another host.
So… considering how much automation and user-convenience Amazon produces, I would just expect better.

Of course it’s nice to have everything scripted so that new nodes can be spun up quickly. In that case you can just destroy an old instance and start a new one, which might then be very low effort. But some systems and individual instances are (for whatever reason) not set up like that, and then a migration like the one that Linode does is eminently sensible and very convenient….

Posted on

Tom Eastman on File Uploads

The awesome Tom Eastman presented a session at PyCon Australia (Melbourne) 2016 entitled

“The dangerous, exquisite art of safely handing user-uploaded files”.

Every web application has an attack surface — the exposed points of interaction where a malicious or mischievous user can commit malice, or mischief (respectively). Possibly nowhere, however, is more vulnerable than places a user is allowed to upload arbitrary files.
The scope for abuse is eye-widening: The contents of the file, the type of the file, the size and encoding of the file, even the *name* of the file can be a potent vector for attacking your system.
The scariest part? Even the best and most secure web-frameworks can’t protect you from all of it.

In this talk, Tom shows you every scary thing he knows about that can be done with a file upload, and how to protect yourself from — hopefully — most of them.

Do watch it and pick up any hints you can.  This is important stuff.

How do your web applications handle file uploads?

Posted on

On Open Source and Business Choices

Open Source is a whole-of-process approach to development that can produce high-quality products better tailored to users’ real world needs.  A key reason for this is the early feedback cycle built into that complete process.

Simply publishing something under an Open Source license (while not applying Open Source development processes) does not yield the same quality and other benefits.  So, not all Open Source is the same.

Publishing source of a product “later” (for instance when the monetary benefit has diminished for the company) is meaningless.  In this scenario, there is no “Open Source benefit” to users whatsoever, it’s simply a proprietary product. There is no opportunity for the client to make custom modifications or improvements, or ask a third party to work on such matters – neither is there any third party opportunity to verify and validate either code quality or security.

Open Source is not a marketing gimmick.  Labels such as “Open Source”, or “Enterprise”, on their own, do not have any more positive outcome than a greasy hamburger labeled with “healthy”.  If a company “believes” in Open Source software, they’ll use the open source development model for their software development.

And now we see things like this: Uproar: MariaDB Corp. veers away from open source (by Simon Phipps, InfoWorld, August 2016)

So what does it mean when a company publishes some of their software under an open source license, and does some related products under a proprietary license?  To me, it’s generally a strong indication that the company either doesn’t believe in that model, or doesn’t understand it.  And we’ve seen it before.

It also reminds me of an interaction I had many years ago.  A Marketing VP asked me “How can we leverage our [Open Source] community?”  I answered the only possible way: “One does not ‘leverage’ the community, that’s not how it works.”  Of course that wasn’t the answer the VP wanted to hear, but that doesn’t make it less true.  They saw the community as an asset to use, rather than work with.  People don’t like getting used, and in the Open Source space that’s even more true.

Companies that have turned their back on their earlier Open Source work and who have devised some other model to (arguably) make more money, have all discovered that this fundamentally changes their market.  They’ll lose some of their users, customers and supporters, and gain some new different clients.  It’s a different market.  Whether and how that pans out in terms of commercial success is never certain.  Given that we know that the Open Source development process yields benefits in terms of quality and features users want, we can say that non-OSS products lack (some of) those benefits, so to put it bluntly, it’ll be a different product of possibly less quality and the feature set is likely to differ as well.

Naturally we cannot ascertain code quality directly as we can’t review closed code directly, bug systems of proprietary software tends to be closed, changelogs are condensed for marketing purposes, but as far back as a decade and a half there have been independent studies that worked out “lines of code per software flaw” and it came out significantly in favour of Open Source software, having proportionally much fewer bugs.  Bugs also tend to get fixed quicker in Open Source software.  None of this is new(s). see for instance Open-source vs. proprietary software bugs: Which get squashed fastest? (CNET, 2007)

For complete products (libraries are a slightly different beast) with a relatively large market scope, source code being available does not in any way diminish a company’s ability to make money.  Having the core developers, tech writers and support people gives them a significant edge in the open market, and that’s a business asset you can leverage.  You do that by focusing on those aspects in your communications – that’s basic marketing, you draw attention to the positive aspects that make your company/product stand out from the rest.  Clearly, this objective cannot not achieved by force, as you don’t make a (potential) client like or trust you by denying them choice or transparency.

There is one other known option aside from not believing or not understanding, and that’s fear. But fear is an awkward business driver, it makes for very bad decisions.

MariaDB Corp in part uses the Open Source development model, in part they’re an Open Source publisher (in-house work that’s only made available at a later stage in the development process), and now some proprietary product has been added to the mix (actually new versions of an existing product).  Looking at this I am rather unclear about what they believe in.  Of course companies can make business choices as they see fit – but they never operate in a vacuum.  In the end it doesn’t matter much what I believe personally, the market will do what it will – historically, it responds in the various ways as described above.  We’ll see how it pans out.

Open Query does not recommend (or re-sell at all) proprietary tools, as it just doesn’t make sense for us or our clients.  We often do bugfixes and improvements which we contribute upstream – for proprietary tools we can’t do that and thus it becomes a hindrance for us and our clients.  On the specific practical level, we’ve actually never used MaxScale (the product that MariaDB Corp will now sell under different conditions for future versions), and this stems from our experience with its effective predecessor MySQL Proxy.  Having a complex set of scripted logic in a proxy slows down applications and introduces a rather large extra (single) point of potential failure in to infrastructure.   So, while Simon refers to MaxScale as an essential tool for scale-able environments, we know from experience that there are other ways of achieving that desired objective, and without the downsides.

Rather than promoting a single tool for many wildly different jobs, we utilise a few different tools depending on the needs of particular client infrastructure.  We still have a couple of (now legacy) MySQL-MMM deployments, but also quite a few Galera clusters, and other setups as suit our clients’ needs.  Key is to not only make the infrastructure convenient to use for applications, but also to not introduce any more single points of failure.  We build resilience into the client’s server infrastructure, without adding significant overhead in either performance or maintenance requirements.

We believe that that’s what clients want, and since potential clients come to us asking exactly for that (and note our approach with relief) we think that we’re doing the right thing by our clients.  We’ve used this approach for over 9 years, and we’ll just keep on doing that – our basic approach doesn’t change even when our tools do.  If you’d like to talk with us about helping you with your infra, using our approach and way of working, contact us today!

Posted on

The Australian Online Census 2016 Example of How-Not-To

error crossOne of the key problems with the 2016 online census was the architecture, but also the how the whole thing was organised and who was contracted for the job.

IBM, for the $9.6mln it got paid for the job, built something very clunky. They used Java, which is not bad per-se but the system also required Java on the client (browser) side which is just daft. The number of systems that either don’t have or can’t run client side Java is huge, and for the rest you get into version conflict mayhem. And it’s clunky, it’s a lot of code and heaviness to shuffle around which is not a great approach to build a scalable site.

If you think of the census form, the total amount of data gathered is not actually that big. It doesn’t require any particularly complicated database or storage setup.
Serving forms to clients is very light on web servers – if you then use Javascript logic to control the flow through the forms you can actually run most of the work on the client side, including intermediate local saving for the “just in case”. Then you produce a single submit with confirmation, and a transaction with a number of inserts into the database. The language used on the server end is not that important as its job is minimal. Most of the content served can be static, and might even be handled through a CDN.

The scale of the online census task is quite small, relative to many websites. Not only Twitter/Facebook/etc but many e-commerce sites have a vastly more complicated situation where they have to serve many different pages of which many are dynamic, lots of writes and shopping carts that get updated in chunks, then the whole checkout process…. and all that can work fine too. So the census is not a big or complicated problem, really. It just needs to be done right.

The fact that IBM, for $9.6mln, completely stuffed it, is a very serious indicator of where the relevant skills and innovation capability lies. For this type of job, not with IBM. Going with a big company does not guarantee good results. If you reckon this is a one-off, ask Queensland Health about their payroll debacle (SAP implemented by… IBM). Similarly, very expensive is not necessarily better. It can be just very costly, in so many respects.

ABS/IBM also declined the NextDC offer for datacenter level firewalling and DoS protection. Another serious mistake. But application architecture too affects security. When I googled for Census 2016 on census night, the first link that came up was a Census staff login. That’s just beyond astonishing. That should not be public at all. It doesn’t need to be on a public domain, and probably should be only accessible via a VPN.

The company that did the online Census 2016 load testing for another half million $ and bragged before census night about how well their team worked together with the ABS and IBM people, should also be seriously embarrassed about the shoddy job they delivered. From their own site:

“Revolution IT worked in a highly collaborative manner, and their subject knowledge, expertise and advice were key to achieve our project goals and objectives. We were impressed with how well they engaged with our e-Census solution provider (another private company). [IBM]”

Success is not defined by how well your team worked, it’s very simply proven by how well the system deals with the real world. In this case, it didn’t. At all. So, total process fail. It would have been very wise to wait with the bragging until after census night. If it holds up well, you can brag. Otherwise, you hush and no public embarrassment at least on that front. PR fail.

Their public statement (after census night) is at where they explain that the Census site was taken offline due to security concerns, and since security was not part of their brief, their performance was all ok and successful.  But come on now, how is security not part of any practical testing?  It is by nature an integral part of how things work online!  Implementation of security may impact performance, and obviously security aspects always impact availability – and without availability you have no performance at all.

All in all, Census 2016 is a brilliant example of “how not to” in modern online architecture.

And to prove all this again, two students at QUT in Brisbane just built the same in a few days and for about $500 which I understand was mostly pizza costs.

Read that story at (that write-up is rather populist simplistic, but the fact that a few students can very well design a site like this, and properly, is absolutely correct).

Posted on

Choosing Whether to Migrate to Another Database: Uber

Posted on

Found data leak of a company while giving a college lecture | Sijmen Ruwhof

Sijmen writes:

A few weeks ago I gave a guest lecture at the Windesheim University of Applied Sciences in The Netherlands. I graduated there and over the years I kept in contact with some of my teachers since then. One of the teachers told me recently that a lot of students wanted to learn more about IT security and hacking and asked me to give a lecture about it. Of course! And to keep it a bit juicy, I built in a hacking demonstration in my lecture.

Read the full story at

For any server that’s connected to the Internet (and these days, that’s most servers), security is very important.

Mind that as a fundamental, you have to regard any web server as compromised. Not that they necessarily are, but it’s a very useful baseline to use as these are the most visible servers and thus potentially the easiest targets. What information is present on the web server itself, and what information is on there that can be used to access other systems (and to what extent). Scary? Perhaps. But that’s no reason to not review and put sensible practices in place.

If you’d like to discuss ways to secure your online environment, or would like to see how your current setup holds up to the various security benchmarks, have a chat with us: Open Query offers a security review (ad-hoc consulting) package, and we also offer regular security check-ups for our subscription clients.

Posted on

Web Security: SHA1 SSL Deprecated

You may not be aware that the mechanism used to fingerprint the SSL certificates that  keep your access to websites encrypted and secure is changing. The old method, known as SHA1 is being deprecated – meaning it will no longer be supported. As per January 2016 various vendors will no longer support creating certificates with SHA1, and browsers show warnings when they encounter an old SHA1 certificate. Per January 2017 browsers will reject old certificates.

The new signing method, known as SHA2, has been available for some time. Users have had a choice of signing methods up until now, but there are still many sites using old certificates out there. You may want to check the security on any SSL websites you own or run!

To ensure your users’ security and privacy, force https across your entire website, not just e-commerce or other sections. You may have noticed this move on major websites over the last few years.

For more information on the change from SHA1 to SHA2 you can read:

To test if your website is using a SHA1 or SHA2 certificate you can use one of the following tools:

Open Query also offers a Security Review package, in which we check on a broad range of issues in your system’s front-end and back-end and provide you with an assessment and recommendations. This is most useful if you are looking at a form of security certification.

Posted on

Motivation to Migrate RDBMS

Companies that use a standard edition of Oracle’s database software should be aware that a rapidly approaching deadline could mean increased licensing costs.

Speaking from experience (at both MySQL AB and Open Query), typically, licensing/pricing changes such as these act as a motivator for migrations.

Migrations are a nuisance (doesn’t matter from/to what platform) and are best avoided as they’re intrinsically painful, costly and time-consuming. Smart companies know this.

When asked in generic terms, we generally recommend against migrations (even to MySQL/MariaDB) for the above-mentioned practical and business reasons. There are also technical reasons. I’ll list a few:

  • application, query and schema design tends to be most tuned to a particular RDBMS, usually the one the main developer(s) are familiar with. Features are used in a certain way, and the original target platform (even if non deliberate) is likely to execute most efficiently;
  • RDBMS choice drives hardware/network architecture. A migration should also include a re-think of this, to make optimal use of the database platform;
  • it’s quite rare (but not unheard of!) for an application to perform better on another platform, without putting a lot of extra work in. If extra work is on the table, then the original DB platform should also be considered as a valid option;
  • related to other points: a desire to migrate might be based on employees’ expertise with a particular platform rather than this particular application’s intrinsic suitability to that platform. While that can be a valid reason, it should be recognised as the actual reason as there are obviously cost/effort implications in terms of migration cost and other options such as training can be considered.
Nevertheless, a company that’s really annoyed by a vendor’s attitude can opt for the migration route, as they may decide it’s the path of less pain (and lower cost) in the long(er) term.

We do occasionally guide and assist with migrations, if after review it looks like a viable and sensible direction to take.

Posted on

Slow Query Log Rotation

Some time ago, Peter Boros at Percona wrote this post: Rotating MySQL slow logs safely. It contains good info, such as that one should use the rename method for rotation (rather than copytruncate), and then connect to mysqld and issue a FLUSH LOGS (rather than send a SIGHUP signal).

So far so good. What I do not agree with is the additional construct to prevent slow queries from being written during log rotation. The author’s rationale is that if too many items get written while the rotation is in process, this can block threads. I understand this, but let’s review what actually happens.

Indeed, if one were to do lots of writes to the slow query log in a short space of time, a write could block while waiting.

Is the risk of this occurring greater during a logrotate operation? I doubt it. A FLUSH LOGS has to close and open the file. While there is no file open, no writes can occur anyhow and they may be stored in the internal buffer of the lowlevel MySQL code for this.

In any case, if there is such a high write rate, that is an issue in itself: it is not useful to have the slow query log write that fast. Instead, you’d up the long_query_time and min_examined_rows variables to reduce the effectively “flow rate”. It’s always best to resolve an underlying issue rather than its symptom(s).

Posted on

Using Persistent Memory in RDBMS

People at Intel started the pmem library project some time ago, it’s open to the broader community at GitHub and  other developers, including Linux kernel devs, are actively involved.

While the library does allow interaction with an SSD using a good-old-filesystem, we know that addressing SSD through SATA or SAS is very inefficient. That said, the type of storage architecture that SSD uses does require significant management for write levelling and verifying so that the device as a whole actually lasts, and your data is kept safe: in theory you could write to an NVRAM chip, and not know when it didn’t actually store your data properly.

But there are other technologies, such as Memristor (RRAM) and Phase Change Memory (PCM, PRAM). Numonyx (founded by Intel and others, since acquired by Micron) was one of the companies developing PCM some years ago, to the point of some commercial applications. Somewhat oddly (in my opinion), Micron ditched their PCM line in 2014 focusing more on 3D NAND technology. In 2015, Intel and Micron announced that they were working on something called 3D XPoint but Micron denies that it’s based on PCM.

I like the concept of PCM because it has a lot of advantages over NAND technology. It’s very stable, doesn’t “bleed” to adjacent memory cells, if it writes correctly it’s stored correctly, and it’s fast. Not as fast as ordinary RAM, but it’s persistent! What I’ve been holding out for is just a small amount of PCM or similar storage in computers, phones, tablets and e-book readers.

In small mobile devices the advantage would be vastly reduced power consumption. ARM processors are able to put entire sections of the processor in standby to save power, but RAM needs to be powered and refreshed regularly. So with persistent memory, a device could maintain state while using hardly any power.

For RDBMS such as MySQL and MariaDB, persistent memory could be used for the InnoDB log files and other relatively small state information that needs to be persistently kept. So this storage would behave likely memory and be addressed as such (pmem uses mmap), but be persistent. So you could commit a transaction, your fsync is very quick, and the transactional information has been stored in a durable fashion. Very shiny, right?

It doesn’t need to be large, something like 512MB would be ample for RDBMS, and possibly much less for mobile devices.

I still reckon persistent memory space has huge potential – and I mention the mobile devices because that’s obviously a larger market. Previously Micron did work with Nokia on using NVM in phones, but as we all know Nokia was acquired and the Micron focus changed. I find the current state of it all quite disappointing, but I do hope the various players in this field will soon focus on this again properly and get the tech out there to be used!

If you happen to know of any current developments and activities, I’d like to hear about it!