Friday, June 19, 2009

Fud Buster Friday #45 - Our Systems Never Go Down

Better late than never, as you can see I made it to the UK and have just now gotten my broadband hooked up at the house.

For anyone that still doesn't believe you need redundancy or clustering or whatever you prefer to call it, I leave you with this real lifer scenario.

A prominent cell phone telecom provider(T-) in the UK was down ALL day, basically from around 9am to after 5PM. While their service was sporadic for some people, I and my wife just had problems trying to get our phones and the new SIM chips to work.

So what should have been a 5 minute effort turned in to a 12 hour one. My wife's Blackberry only started getting email this morning. Mine didn't get proper configuration OTA until this afternoon when I had to manually change some email settings.

But the fact that T- answered their free phone call help service with "Sorry but our internal systems are down, can we help you with general questions" was not very helpful at all.

Nor was the 4 different people that I got disconnected from who claimed they would call me back. None ever called back.

Evidently their internal servers were either not set up for redundancy or for disaster planning. I have no idea if they had server issues or telecom issues, but either way, this is NOT how you ensure 100% uptime of your network.

So the next time someone says to you they don't need clustering or do not want to spend the money on a second server or enterprise license, do them a favor and figure it out for them and let them know they can pay you for it when they use it or lease them a backup box or co-location.

Living proof of this problem is everywhere, does you customer want to have these problems? Even Gmail has outages, but clustered Lotus Domino Servers don't have these problems. Well they do if you put all the clustered servers on the same VM machine as one of my Fortune 150 clients did and the VM fails.

1 comment:

  1. We've had outages (and at the moment management won't let us cluster the Domino servers).

    So... let me tell you about our outages...
    - Cables cut
    - Power loss to the building
    - Crazy misbehaving Symantec software
    - ISP Failure (many times)
    - Hard drive failure (several times)
    - Video card failure
    - Raid controller failure
    - Microsoft automatically rebooting when installing an update (turned off now)

    hey... I never mentioned Domino failure. That's right... because in the ten years I've worked at my current location (and the 10 before it at various others), I've never seen a "domino" failure.

    I'm not prepared for one, couldn't handle one simply because it doesn't happen. Our Domino servers haven't ever crashed.

    Of course, we maintain them well... A reboot every few months (if I remember).

    Try doing that with one of the competing systems.