Geek On The Hill

There’s Never a Good Time for Planned Downtime

A human hand holding an Ethernet plug that is not inserted in the network switch

That’s one thing I learned early on in this business: No matter how carefully you try to plan server maintenance and upgrades, there’s always someone who needs the service up during any given time because they have something they absolutely must do during that window, lest civilization as we know it cease to exist.

Having clients in different time zones invariably mean that what’s a good (or at least less-bad) maintenance window for one is going to be a horrid maintenance window for another. That’s unavoidable. I try to put all the clients in adjacent time zones on the same servers and schedule maintenance during off-peak hours; but because automated tasks like backups, invoice mailings, file syncs, and so forth happen during off-hours, the robots become unhappy and complain to their masters when they can’t connect.

It’s not easy being me. I even have to keep robots happy.

Don’t get me wrong: Most of my clients understand that scheduled downtime helps prevent unscheduled downtime. And if the downtime is for an upgrade that will improve their experience, then they also understand that the reward in terms of better performance will be well worth the short inconvenience. But there are always a few who can’t be pleased.

You have to understand that this isn’t a frequent thing. Linux servers don’t need rebooting very often. We measure uptime in months and years. But every two or three years, on average, some upgrade or maintenance event requires some downtime of more than a few minutes. No machine runs 100 percent of the time — forever — without needing some occasional TLC.

In between, I maintain 100 percent uptime most months and better than 99.95 percent most of the rest. Most outages are very brief and are for things like reboots after kernel updates; so we’re only talking a few minutes, and I try to schedule them during the middle of the night. But invariably, some client raises a stink because he or she has some task that they believe absolutely, positively, must be done at that particular moment in time, lest their entire world collapse around them.

It’s very frustrating.

Or at least it used to be. Nowadays, I no longer worry about it. I just do the only sensible thing: I look at the server logs to find the window during which the server is least busy, and that’s when I schedule downtime. I notify as many clients as I can, but I don’t ask their permission. Maintenance has to happen during some window of time, and server load is really the only fair, impartial, and sensible way to decide when that sometime will be. When the server is least busy is when the downtime event will affect the smallest number of clients and their visitors.

It’s really that simple. Clients have to understand that and work around it.

That, by the way, is why I’m posting this blog entry a bit earlier than I usually would: The server this site is hosted on is scheduled for multiple upgrades this evening. I’m not exempt from my own downtime. But I wouldn’t want to disappoint all my fans and followers — both of them — so I worked my blogging schedule around it.

I’ll be back tomorrow, folks, God and fsck willing.