Thursday, December 4, 2008

SnTT - Waiter there's a local ID lost on my server

This one's for everyone at the IBM Support team, thank you!

Ripped from the headlines....well maybe one of my clients servers, I am happy to bring something to light which happens, sometimes, for no reason, that my client can say, yet I must fix it anyway.

Good thing I am not hands on anymore these days? Riiiiight! Because 15 years of Domino Admin experience gets me all the problem clients no matter what I am doing these days.

So back to why I asked you here today to read my minds thoughts.

The server console spits out, every hour consistently mind you the following:
Error updating local ID file: The public keys specified in the Name Change Request do not match those specified in the new certificate


Now raise your hand if you haven't seen a "similar" message before. It happens sometimes, in fact there are tons of technotes that start with "Error updating local ID file" but only 1-2 that include my error.

Now remember in my previous post I said do at least 5 things before calling support, well I did the following:
1) Immediately checked the server ID file and it's date of expiration, domain, certifier, etc.. All checked out ok.
2) Went to the Admin4.nsf databsse, otherwise known as AdminP's database. Oddly enough no reference to name changes, certifier changes or anything even remotely close to the name of the server at issue,
3) Next up Certlog.nsf, just checking expirations maybe someone inside decided to make sure the server's wouldn't expire in 2080 as designed. Nothing unusual, although a number of people's ID's are going to expire this year, made note, informed client just to check.
4) When specifics don't work, go back a step to more general areas like the Server Doc. No why would I do that for a name change you ask? Simple, what if someone decided to textually change the name without recertifying? Guess what server isn't going to talk to anyone? While all looked okay I noticed there was no entry under port in the Ports tab. Odd I thought didn't notice anything funny in the Admin4.nsf database for errors. Should go check..
5) The server's log.nsf. And what do I see, an admin process error every time the server was restarted. Swap over to the administration server, tell adminp to process all and notice errors and some other pieces getting done. Adminp must have been turned off or stuck.

Now replicate changes to the server in question and the ports field is now fine. I then check on security and some other fields I like to check and find the lookup to the NAB fails. Errors. So, check it out and on the server in question, the NAB shows a number of replication conflicts. Fixed them, deleted them, ran a quick CTRL-SHFT-F9 and restarted the server. No more admin process error but local ID error is still there.

This Technote, #1097801 explains how to resolve the problem.

Now for you newbies out there. It should scare you to do what the technote says.
I hate playing with public and private key information unless I really must do it. So many ways to mess this up and render your server DOA if you are not careful.

That is when I called IBM. Level 2, because playing with certificates is NOT something to be taken lightly. Luckily Geno was able to work with me on it.
We ran some debugging (set config debug_threadid=1) first to see if the error was from an agent or an application, you run (sh ta debug) but it didn't show us anything aside from it being a server process.

Went over many choices and ideas and we each searched our respective knowledgebases and came up with the Technote above. Well when all else fails, you try it.
So follow the Technote steps in simple are:

First make a backup of the certificate public key you are deleting from the server doc, just in case.
From the administrative client of the server:
1. Select File, Tools, UserID (assuming the client is using the same ID).
2. Select More Options.
3. Click Copy Public Key.
4. Open the Server document, delete the existing key from the Certified Public Key field and then paste the newly copied key into the field.
5. Save the document and restart the server

And it worked. Another client happy.

To turn off the debug code change the =1 to 0.

Plus I learned that Geno follows me on Twitter(Edited jan2009 with his permission) and reads my posts. As much as this is nice to know, it's better for me to know someone on top of it all is there when I/we need him.

So to Geno and his boss, Mark H. thanks and give Geno an extra coffee break, answering our calls can drive a person crazy.

3 comments:

  1. Those error messages always drive me batty. glad it worked out. kudos to geno and mark.
    happy holidays, mtw

    ReplyDelete
  2. It's the descriptive explanation right? :-)
    Some are better than others but you know it happens we all find these odd examples out in the wild.
    Like to post about the people that help me when I can. If I missed anyone in past ones, will catch you next time.

    ReplyDelete
  3. I happened to re-visit this post from a more recent post support has changed.

    Just wanted to let you know that I updated that t/n 1097801 with the slightly different error message we were seeing.

    Also I have no problem with providing my twitter ID @genosis.

    ReplyDelete