In the past year I have experienced two incidents in which important applications were no longer available. In both cases the cause turned out to be an expired internal certificate. Although these incidents can be solved using KB articles, the lesson is to check these critical components at least once a year. With the start of a new year, this is a good time to pay attention to this topic. First vRealize Operations Manager (vROPS).
Expiration of vROPS internal certificate has the following symptoms:
– Unable to log into the Admin UI.
– The cluster is Offline and you are unable to bring it Online with the message “Data Retriever is not initialized yet. Please wait.”.
The procedure to Replace expired internal certificate in vRealize Operations can be found in this KB.
The best way to check the validity of the certificate is using a browser; connect to the vROPS master node over port 6061 and check the validity of the certificate.
You can also run the following command, I don’t consider this useful because the script doesn’t return an expiration date.
# /bin/grep -E --color=always -B1 'java.security.cert.CertPathValidatorException: validity check failed|java.security.cert.CertificateExpiredException' $ALIVE_BASE/user/log/*.log | /usr/bin/tail -20
For both scenarios; “Certificate has expired” or “Certificate has not yet expired”, the procedure for replacing the expired internal certificate is described. The procedure for “Certificate has expired” is unfortunately a bit more cumbersome to perform.
The KB mentioned before, states that “Starting in vRealize Operations 8.0, a pop up is displayed in the UI, warning when certificate expiration will occur.”. But bettter safe then sorry and perform the check on a regular interval.
Another product that comes with internal certificates is vCenter Server. After expiration of the STS certificate, you cannot login to vCenter Server anymore. In some cases (see KB below for more details), the STS certificate has a lifetime of only 2 years!
VMware KB Checking Expiration of STS Certificate on vCenter Server (79248) is there to help you to identify the expiration date. Attached to the KB, you will find a Python script named checksts.py. Follow the instructions and run the script. In my case (recent vCSA 7.x), no actions are needed.
However, in case the STS certificate is expired, you will find instructions for replacing this certificate for the vCSA or a vCenter Server on Windows.
In VMware KB “Signing certificate is not valid” error in VCSA 6.5.x/6.7.x and vCenter Server 7.0.x (76719) you will find instructions and another script named fixsts.sh for replacing the STS certificate.
Step 5 and 6 in the resolution are important, restart of services may fail if there are other expired certificates. Step 6 presents a one-liner to do the check.
The first certificates are about to expire in July 2022.
Also in this case references to KB’s to replace these certificates using the vSphere Certificate Manager. Also be aware that you may encounter the situation as described in VMware KB “Failed to login to vCenter as extension, Cannot complete login due to an incorrect user name or password”, ESX Agent Manager (com.vmware.vim.eam) solution user fails to log in after replacing the vCenter Server certificates in vCenter Server 6.x (2112577).
If you are on vSphere 7 (and some editions of vSphere 6.7), there is an even more convenient option.
One of my colleagues (thank you Joop Kramp) discovered this pre-configured vSphere Alarm
In more recent versions (at least in vSphere 7).
Hopefully these checks will help you to avoid unexpected downtime of important management applications in your vSphere environment.
As always, I thank you for reading.