Your opinion on EC2 and other cloud/hosting options

EC2 is nifty, but it doesn’t appear suitable for all needs, and that’s what this post is about.

For instance, a machine can just “disappear”. You can set things up to automatically start a new instance to replace it, but if you just committed a transaction it’s likely to be lost: MySQL replication is asynchronous, EBS which is slower if you commit your transactions on it, orĀ EBS snapshots which are only periodic (you’d have to add foo on the application end). This adds complexity, and thus the question arises whether EC2 is the best solution for systems where this is a concern.

When pondering this, there are two important factors to consider: a database server needs cores, RAM and reasonably low-latency disk access, and application servers should be near their database server. This means you shouldn’t split app and db servers to different hosting/cloud providers.

We’d like to hear your thoughts on EC2 in this context, as well as options for other hosting providers – and their quirks. Thanks!

6 Responses to “Your opinion on EC2 and other cloud/hosting options”

  1. I’ve done a little bit of work with EC2. I helped one company migrate their app from physical servers onto EC2 instances, and I’ve also done a bit of testing with a MySQL server of my own.

    First, while an instance *can* disappear entirely, the EBS volume should not. It’s possible (the same as losing an entire disk array is possible), but generally speaking failure of an instance won’t kill the EBS volume. In this case, recovery from the instance failure means attaching the EBS volume to a new instance, at which point you go through normal recovery as though you’d had an unexpected power failure, kernel panic or similar. Of course, this only works if you have all of the required data on the one EBS volume!

    That said, my tests with a “small” instance and a single EBS volume showed that the EC2 instance couldn’t keep up with my replication load, let alone serve any actual queries. I do roughly 1GB of binlog traffic a day – I’m guessing that’s not small, but not enormous either.

    The EC2 instance was constantly behind in replication, usually by several minutes at a time. IO latency on the EBS volume *seemed* to be the cause, but I ran out of time to investigate further. Possible fixes (which I didn’t try) include using local disk (which requires bigger instances and has the risk of losing the disk if the instance goes down), or striping across multiple EBS volumes (which complicates things and would make getting a consistent snapshot much harder). Of course, the possible fixes increase the complexity and cost of the solution.

    I *love* EC2 – it’s incredibly flexible and I think it’s on the right track – but you certainly need to be careful when migrating existing applications across. I’m not convinced it’s a good fit for traditional RDBMS setups. You might be able to make it fit, but you’ll have to use a shoehorn.

  2. You have to realise that all hardware is susceptible to disappearing without warning and the EC2 hardware is no more or less likely to ‘disappear’ than any other hardware.

    BTW John, what instance size are you using? I’ve read a lot on running MySQL on EC2 and the minimum recommended instance size for reasonable performance in a production environment is Large, otherwise you’re going to run into contention issues with the CPU (you’re only getting approx. 1GHz virtual) and you have very low network bandwidth (around 250Mbps and this includes all traffic to EBS as well). With large, this should increase to approximately 1 whole CPU core and about 750-1000Mbps network bandwidth.

  3. Thanks Wayne, that’s interesting. Network bandwidth shouldn’t have been a problem, and it wasn’t the slave’s IO thread that was falling behind. I did see reasonably high %steal at times though, which could be a factor.

    I could repeat the experiment when I have some time to spare, perhaps using a distribution master with a blackhole storage engine inside EC2, and a number of slaves of different instance types replicating from that. The answers may already be out there though – I’ve stopped paying attention to the MySQL on EC2 stuff for now after seeing that it wasn’t a quick fix for *my* requirements. :)

  4. There are many definitions of cloud computing, and there are many different providers that offer different types of implementations.

    If you want cloud capabilities on more known physical hardware then AppNexis is worth looking at.

    Also Rackspace Cloud (formally Mosso) appear to indicate they offer more robustness on the physical H/W site.

    You might like to also read Enterprise Class Cloud
    http://cloudpundit.com/2009/06/16/enterprise-class-cloud/

  5. I second Ronald’s points of looking at http://www.appnexus.com and http://www.mosso.com

    Nobody has true cloud computing just yet — where the computing just happens, regardless of hardware, such as with a Beowulf cluster.

  6. Well, Google AppEngine perhaps… there you don’t have machines as such, and apps don’t live on a particular machine. You just have an app which lives in the cloud.

Leave a Comment