We just had a booboo in one of our internal systems, causing it to not come up properly on reboot. The actual mishap occurred several weeks ago (simple case of human error) and was in itself a valid change so monitoring didn’t raise any concerns. So, as always, it’s interesting and useful to think about such events and see what we can learn.
Years ago, but for some now still, one objective is to see long uptime for a server, sometimes years. It means the sysadmin is doing everything right, and thus some serious pride is attached to this number. As described only last week in Modern Uptime on the Standalone Sysadmin blog, security patches are a serious issue these days, and so (except if you’re using ksplice sysadmin quality has become more a question of when you last did security updates (which might have involved a reboot), rather than the uptime number.
But I think the aforementioned booboo illustrates an additional aspect, I think it might be quite sensible to reboot a system every so many weeks (we can debate the interval and it may differ per system and situation) since in the end it will be rebooted some time, and that may show trouble at an inconvenient time. Better to test and fine out when you’re there.
Of course this also has consequences for either your external uptime (scheduled maintenance slots with outages), or thinking about your architecture differently. Can you take out any individual system in your infrastructure without some service getting interrupted? It’s doable, but not necessarily with some traditional approaches or equipment that carries the “enterprise” label.
Food for thought! As always, comments welcome.
Last night my residential area lost power for about 2 hours, between 2-4 am. This reminded me of something, and there’s analogies to MySQL infrastructure. Power companies have over recent years invested a lot of money in making the supply more reliable. But, it does fail occasionally still.
From my perspective, the question becomes: is it worth the additional investment for the power companies? Those extra few decimal points in reliability come at a very high cost, and still things can go wrong. So a household (or business) that relies on continuity has to put other measures in place anyway. If the power company has an obligation to deliver to certain standards, it might be more economical for them to provide suitable equipment (UPS, small generator) to these households and business (for free!) and the resulting setup would provide actual continuity rather than merely higher reliability with occasional failures. Everybody wins.
As a general architectural considering, new houses can be designed with low voltage circuits (12-24V) in most areas and 110/240V just in kitchen, laundry. Why? Because most stuff around your house actually runs on low voltage anyway, but uses inefficient heat-generating transformer blocks (power adapters) to get it. Cut out the middleman! It saves money, looks better, is safer (no transformers in ceiling for halogen lighting, etc), and there are fewer points of failure. It’d be fed from central batteries, charged by solar if you wish, and a single transformer (basically like a car battery charger) from mains power. Also think about led lighting for some places and uses, very cheap to run. Apart from all the aforementioned advantages, again it delivers higher uptime since a power failure will then not affect your lights and other stuff running off the low voltage circuit, since its direct power source is a battery – essentially a UPS for most of your house’s electricity needs.
Mind you, I didn’t invent any of this this. It’s been done. All it takes is builders with vision, and/or home owners with initiative. Appying existing technology.