Friday, February 27, 2009

Fud Buster Friday #29 - One Outage isn't so bad.. is it?

Naturally it depends on who has the outage and how long it lasts and well, a million other things.

As we know Google had an outage this week. Ho hum. Welcome to IT guys. The world keeps spinning, we get blamed, we fix it, life moves on. Or does it?

Just spent over a week on a server crashing daily problem, because of my laziness and lack of attention to the clients plight. Well, to be fair, I was at 2 new clients this week, plus demoing for a 3rd and rewiring my office but the bottom line is I took a casual liberty attitude with a clients business and that's not good.

The client is extremely happy I have resolved the problem, will wait 72 hours before total acceptance.

But this taught me, and should teach you too, that one outage is too many for ANY client. And repeated ones spell disaster for any IT credit or expectations and it is VERY hard to recover from this and in some cases leads the customer to think about alternative email solutions.

The client had turned down clustering, even a backup replicated server(I am sure some PC could have been found but I digress). Their backup procedures are appalling and they have no idea why they were running LDAP, POP3,DECS, DIIOP or IMAP on the server. (Note to jr admins, just because you see a check box does NOT mean it must be checked.)

The server was throwing out the usual errors about calendar profiles, users not found in the NAB and a bunch of agent signing errors, which happen to run their apps but signed by a previous admin/developer, of course. And they had small quotas like 150MB. And of course you can guess which mail files had these errors. Funny how nonexistent people can fill up a mail file,waste disk space, routing, backup time and space and even have agents running! Also, if you have a address, be prepared to have many replies to it. The fix is to just have an entry in the NAB and a mail file for it that purges every day or 7 or whatever you want. Otherwise, like the client, you could have 100's of dead mails in your 4 files.

Suggested to the client to take care of these, only to realize they could have been all along and never did. So as any good consultant would, we billed them for the mini clean up and are planning on a bigger clean up under a new project.

I saw the errors upfront, but we were working on a (thought we were)bigger problem and could fix these later. Remember, Eat the Frog, do the stuff you don't like or want to first and move forward. Taking the easy way out will never help you as an admin.

The other thing is delegate. It was bumped to me as others had tried to tackle it already. I was not happy none of them had resolved the errors or the small things.

Training is important, but walk the talk. As the Worst Practices session at LS08 showed, even the great ones don't always follow their own guidelines.

So I apologized to the client and oddly enough they were just happy anyone could resolve the problem. It has been a month or so off and on.

The downside is their clients just expect the server to be down and even if it stays up won't care. The worst part, for me, was after it all, they tell me they are going to move off Domino for mail. Working on that problem next and may blog it if it gets interesting.