Posted on

Amazon vs Linode in case of hardware failure

One of our clients received an email like this:

Dear Amazon EC2 Customer,

We have important news about your account (…). EC2 has detected degradation of the underlying hardware hosting your Amazon EC2 instance (instance-ID: …) in the […] region. Due to this degradation, your instance could already be unreachable. After […] UTC your instance, which has an EBS volume as the root device, will be stopped.

You can see more information on your instances that are scheduled for retirement in the AWS Management Console

  • How does this affect you?
    Your instance’s root device is an EBS volume and the instance will be stopped after the specified retirement date. You can start it again at any time. Note that if you have EC2 instance store volumes attached to the instance, any data on these volumes will be lost when the instance is stopped or terminated as these volumes are physically attached to the host computer
  • What do you need to do?
    You may still be able to access the instance. We recommend that you replace the instance by creating an AMI of your instance and launch a new instance from the AMI.
  • Why retirement?AWS may schedule instances for retirement in cases where there is an unrecoverable issue with the underlying hardware.

Great to be notified of such things. That’s not the problem. What I find curious is that Amazon tosses the problem resolution entirely at their clients. Time and effort (cost) is required, even just create an AMI (if you don’t already have one) and restart elsewhere from that.
Could it be done differently? I think so, because it has been done for years. At Linode (for example). If something like this happens on Linode, they’ll migrate the entire VM+data to another host – quickly and at no cost. Just a bit of downtime (often <30mins). They’ll even do this on request if you (the client) suspect there is some problem on the host and would just like to make sure by going to another host.
So… considering how much automation and user-convenience Amazon produces, I would just expect better.

Of course it’s nice to have everything scripted so that new nodes can be spun up quickly. In that case you can just destroy an old instance and start a new one, which might then be very low effort. But some systems and individual instances are (for whatever reason) not set up like that, and then a migration like the one that Linode does is eminently sensible and very convenient….

Posted on