Wednesday, July 31, 2024

Stress testing HCL Domino and your "Other" Mail Infrastructure

 How does 1.2M emails sound?

There are people out there who say HCL Domino can't handle the stress of modern times.

It is an old system(truth be told, Exchange isn't much younger) with limitations.

Well, yesterday, I did an unintended stress test of Domino and a client's internal infrastructure.

Like most large and well-known companies, they run many Domino applications that handle millions of dollars a day but also have an O365 infrastructure.

Mail doesn't come into Domino, but it does go out from there, and this is where it got interesting.

Domino sent over 275,000 emails over about 15 minutes +/-.

This was all internal SMTP, so we didn't get spam blocked or anything,

Normally, a company would have a choke hold option on mail sent per minute, but for some reason, that was not in place or was avoided due to routing configurations yet to be determined.

And my 275K grew to 1.2M inside the network architecture.

Fun day at the office, right?

No, I caused the problem and also knew how to fix it, but that small window of time was enough to unleash the tsunami.

Sometimes, the simplest things become the hardest things, and this was such a case. 

Even veterans of the email wars screw up.

I now have a great session for F*uck Up Nights if anyone is interested.

Definitely used up one of my IT 9 lives, this brings me down to about 5 left.

HCL Domino was amazing throughout. Not only did the servers not complain about anything, but they just kept going and going and going. Multithreading FTW!

O365/Exchange did not do so well, got clogged up and had issues because it still is not a multithreaded service.

I pulled the numbers from the Mail statistics and showed them on my Admin client Monitoring dashboard. The left side is tasks, which I covered in my previous blog post. On the right side, you can add statistics, something not many people do, but I like to see some things there and probably should do a separate session on the statistics, but that is for another time.

My peers, I am sure, can guess what caused the tsunami, so there is no reason to elaborate. But let's just say when you are a junior admin, this is one of the outcomes of your trial-by-fire Domino Administration education.

For those who think Ambassadors and long-time Yellowbleeders are these great Gods of tech, some really are titans, I admit my mistake and that, indeed, you are never too young or old to learn something new....or mess up royally.


Tuesday, July 30, 2024

Domino RESTAPI Bug and WorkAround

This is not my usual line of thought as an Admin, but sometimes, AdminOps is better than DevOps because troubleshooting is not an exact science.

While customers over the last year or so have been asking about the HCL Domino REST API, my reply is usually something like, I can install it, but you are on your own afterward, or I point them to a Developer friend.

To be fair, HCL will help them/me with getting started or "where is/How do I" questions. But this is about the bug my client and I discovered and how to work around it.

While updating the v1202FP3 servers to v14FP1, all went okay, even with the change in Java classes, until we got to the REST API server.

If you had downloaded and added my tasks update for the Admin client, you would see that RESTAPI was running on the machine, as I saw. I let the customer know I would upgrade it from 1.04 to 1.014.

Usually, it's not a big deal; you run a long string of commands or broken down into 4 lines like the example shown in the documentation which I prefer as my typing is not perfect.

java -jar restapiInstall-14r.jar ^ -d="C:\Program Files\HCL\Domino\Data" ^ -i="C:\Program Files\HCL\Domino\notes.ini" ^ -p="C:\Program Files\HCL\Domino" ^ -r="C:\Program Files\HCL\Domino\restapi" ^ -u

 You then type an A to accept the update, which will upgrade the REST API code.

It replaces the existing files with new ones and any updated files.

Great. I rebooted the server, and it looks like everything is up. The task view in the Admin client shows REST API is up.

Test a few things, but Swaager is not connecting to anything.

My initial thought was that the developers had some code that had been deprecated or maybe not valid with something in v14. I wasn't far off. I tried a few things to get it to work, and then I decided to look at the schemas. I found them deactivated, which seemed wrong to me, so I enabled one to test.

And it worked again.

Great! I did the same trick with the others, and they also worked.

One last test, shut down Domino and reboot the box clean to ensure it was ok.

No luck. Back to square one, but now the schemas all showed activated, but still just errors like this one:

Domino RESTAPI Fetch Error
This not being my first effort, I figured let's deactivate everything and reactivate it and test it.
Sure enough, it worked.

So, in my mind, the problem was somewhere between the REST API code and some type of flag on the schemas/databases that just wasn't being accepted.

Opened a ticket with HCL to discuss the bug I found.

After the usual back and forth, get us this, debug that, a copy of the NAB, a db, schema db, etc.. HCl said they could not recreate my problem, yet here it was.

We had an online meeting so the dev team could get a good luck at my testing and poke around deeper.

They agreed something was fishy and went off to look into it further.

In the interim, given this was a key production server, we wanted to revert back to the old REST API version, which worked fine prior to the upgrade. But would it work on v14, among other questions we had?

HCL and I discussed it with the customer, and this is how you revert to an older version.

We had the old code downloaded, which is key as sometimes HCL has a way to make older versions "disappear" from the public. The steps included:
  1. Shut down Domino
  2. Copy the Domino\restapi folder contents out from Domino to a backup space.
  3. Once the RESTAPI folder is empty, you can then walk through the installation steps as I showed them above, but the last line instead of -u should be -a.
  4. This will reinstall all you need(presuming all URLs, folders, etc are NOT being edited/changed)
  5. Start Domino
  6. Test
This worked well, and the customer was back up and running on V14FP1, albeit with the older REST API code. 

After a few days, we got back some information from HCL, which I quote directly below:

Problem:
REST API 1.014 Running the APIs in local swagger returns Error "Failed to Load API Definition"

Possible cause
The issue is that the scope's Server name is not correct.
This setting was not working correctly in older versions of DRAPI and was fixed in v1.0.6. It was pretty much ignored in v1.0.4.

Possible solution/Workaround:
With DRAPI 1.0.14 you have to add the CN name/hierarchical name of the Domino server instead of host name for example: CN=customerTest/O=HCLLabs in the Edit scope>>Server field>>CN=customerTest/O=HCLLabs or CustomerTest/HCLLabs
Earlier server name field in scope>>Server field was: Host name of the server i.e keithbrooks.com

Change it to either CN name or the hierarchical name of the Domino server or can be left blank stating any server that has this scope in the KeepConfig.nsf assumes the database / schema exists.

Moving further the product dev team is going to update the error message in the product code as 'You need to query a different server' so it makes more sense to the testing on affected DRAPI versions.

There you have it. My guess initially was partially correct. 

The configuration was correct for the old version, but because the customer did not upgrade the RESTAPI code along the way, they missed the changes in code at 1.06, which would have probably prompted an HCL ticket at that time as well, but it would have been easier for all of us to see as the issue at that time. as the change would have been fresh in their mind.

The story's moral is, of course, ABU or "Always Be Upgrading" because of security, code, and functionality adjustments over time. While we don't want to break production, sometimes you must do so for your benefit.

The second moral is to have a Development or Staging environment for these critical applications. In this case, the lower environment did not match the upper one entirely, so the problem was not seen when we updated the lower environment.



Thursday, July 4, 2024

An Admin Present You Didn't Know You Needed

 Hi, welcome back to my burnt out blog. 1,500 posts and, well, I am kind of burnt out, but that doesn't stop me from giving to the community these little bits.

I'd like to write more but a lot of things have been internal client items that I can't write about, but I am active.

Preamble, excuses out of the way, so, who wants some goodies?

About 2 weeks ago, I gave an impromptu webinar for Openntf.org as a last-minute fill-in.

Openntf, for those that don't know, is the Notes/Domino+ community, where devs, admins, business people, HCL, and others share code and ideas, templates, and projects for the benefit of the greater worldwide community.

I wanted to inform people that monitoring Tasks in the Administrator client has some changes.

Why is this important? Because unless you are a 1 server company, you have a lot of information to remember, such as:

  1. How do you know if DBMT ran? 
  2. How do you know which server Certmgr runs on?
  3. Which web server do you run the Domino REST API on?
  4. Which server handles your Backups and Restores, presuming you leverage the v14 options?
  5. Is NOMAD running?
  6. Is your DirSync working?
  7. Are you sure the awesome OnTime Group calendar is running?
  8. Have you enabled Aautoupdate yet? One look and you know.
Intriguing questions, right?

Between v9 and v12, nothing changed in the tasks that could be monitored. Traveler seems to have been the last item added, and that was from 8.5, but it found its way into the Monitoring Dashboard in v9.

Now comes v14 and HCl has cleared out some older items, like x500 info and the Fax server....but did not add any of the newer tasks that have come along since v9.

To be fair to HCL, it is not as simple as a few fields and renaming a file.

But fear not my fellow Admins, for I have not only explained it all in my presentation, which you can watch over here on the Openntf YouTube page, I have made the tasks available for everyone to update their Admin Monitoring Dashboard.

If you just want the slides, go here.

And because I know you are probably as lazy as I am, I have made the forms available with instructions to help you get more from your Monitoring Dashboard.

Go get the tasks from my Openntf project over here.

If I missed a task that is not listed, let me know, and I will update the project database.