Certificates, without them the Internet would not be possible (or at least very, very unsafe). Unfortunately, security and user-friendliness are often two sides of the same coin. Working with certificates is always risky business, and you’re bound to run into some issues along the way.
Last week I replaced the vCenter and NSX Machine SSL certificates in one of our environments. Having done this previously, and learning from the troubles I had then, I thought I was in for a smooth ride.
Anyone who has replaced the certificates in vSphere 6.x with NSX has probably run into this issue at one point or another:
The dreadful NSX LookupService failure. You replace your certificates, get the SSO working smoothly, but then there’s that one other red bubble that just doesn’t want to turn green! Maddening!
What all these posts have in common is that they rely on discrepancies between the presented thumbprint and the actual machine SSL certificate. The thumbprint is presented by the Security Token Services (STS) and is (sometimes?) not updated by the certificate manager utility in vCenter.
Using the Managed Object Browser (MOB) you can find the STS service, locate the certificate that it has, extract the thumbprint and use it in a Python script. This Python script,
ls_update_certs.py then finds the certificate by this thumbprint, updates it with the correct certificate (which you have to supply) and presto change-o, now it works! Well, according to almost all the blogs out there.
The fix to all your problems:
Every blog on the subject
python /usr/lib/vmidentity/tools/scripts/ls_update_certs.py --url https://psc.domain.com/lookupservice/sdk --fingerprint <fingerprint> --certfile <path_to_certfile> --user Username --password Password
But what if it doesn’t?
So there you are, your maintenance window minutes ticking away. Every blog you can find tells you that your problems are solved. And yet you have that red bubble still staring away at you.
The problem I was facing was the fact that the thumbprint that STS presented to NSX was the correct certificate! Everything was fine, according to, well, everything! The MOB showed no discrepancies, the commands that echo the certificates showed the same values for the Machine SSL certificate and the Lookup Service, nothing I did showed me where stuff was wrong.
vCenter stores its certificates in the vCenter Certificate Store (VECS).
vecs-cli is the tool that allows you to manipulate this certificate store directly. Naturally, this is a dangerous place to be, but with great responsibility also comes great power (right?).
vecs-cli you can list the certificate stores, what certificates are in the stores, create certificates, delete certificates, etc. Using this tool I was able to finally find a discrepancy which resulted in finding the root cause.
With the commands
vecs-cli store list you are presented with a list of all the different certificate stores that VECS currently has.
You can see the two stores I’m most interested in:
STS_INTERNAL_SSL_CERT. Using the same utility I can check what the certificate is that is in these stores:
vecs-cli entry list --store STS_INTERNAL_SSL_CERT and
vecs-cli entry list --store MACHINE_SSL_CERT.
And lo and behold, there’s a difference! You can see by the alias that they are supposed to be the same certificate (
__MACHINE_CERT) but the actual certificate itself is different.
From that point on it was a simple case of backing up the wrong/old certificate, copying over the correct one, and restarting the services.
- Backup the existing entry of certificate and key for __MACHINE_CERT in the STS_INTERNAL_SSL_CERT store
/usr/lib/vmware-vmafd/bin/vecs-cli entry getcert --store STS_INTERNAL_SSL_CERT --alias __MACHINE_CERT > oldmachine.crt /usr/lib/vmware-vmafd/bin/vecs-cli entry getkey --store STS_INTERNAL_SSL_CERT --alias __MACHINE_CERT > oldmachine.key
- Copy out the __MACHINE_CERT certificate and key from the MACHINE_SSL_CERT store
/usr/lib/vmware-vmafd/bin/vecs-cli entry getcert --store MACHINE_SSL_CERT --alias __MACHINE_CERT > machine.crt /usr/lib/vmware-vmafd/bin/vecs-cli entry getkey --store MACHINE_SSL_CERT --alias __MACHINE_CERT > machine.key
- Delete the old
/usr/lib/vmware-vmafd/bin/vecs-cli entry delete --store STS_INTERNAL_SSL_CERT --alias __MACHINE_CERT -y
- Add the copied certificate into the store
/usr/lib/vmware-vmafd/bin/vecs-cli entry create --store STS_INTERNAL_SSL_CERT --alias __MACHINE_CERT --cert machine.crt --key machine.key
- Restart services
service-control --stop --all && service-control --start --all
And now my problem is fixed.