NSX Manager backup to SDDC Manager fails

When running the pre-check for a VCF upgrade (I’m running 3.7.1) it failed on the ‘Backup Availability Check’ for the NSX-V manager:

Logging in to the NSX-V manager I was greeted with this error:

"Unable to connect to server <ip> at 22. Either server details are invalid or invalid credentials are presented."

Luckily, I had encountered this error in my testlab when fiddling with some settings regarding backups, so I was able to resolve it quite quickly. Should you run into this issue (or similar), here is how to solve it.

Background

The SDDC manager has a number of users configured, these are the most relevant:

  • vcf (or ‘super-user’), the one you can log in with
  • root, can only be accessed via su
  • backup, the backup user

Now, the backup user is a bit of a strange beast. The password that is configured isn’t something that you as a user can configure, in fact, it’s configured in the PostgreSQL database. The information about this user isn’t really well documented, either. Fun fact: this password expires…

After some Google-fu I found this VMware KB article that no longer exists (for some strange reason):

NSX Manager backups in VMware Cloud Foundation fail with the error “Invalid credentials are presented” (67638)

I have no clue why this article isn’t live anymore! There’s nowhere in the release notes since 3.7.1 that specifies that this issue has been addressed. Thankfully, I found a cached version.

I’m fully expecting that future versions will have a better way of handling the backup user, but for the time being this is the best way to handle it.

Symptoms

If you have the alerts above, check in the SDDC manager if the password is set to expire:

chage -l backup

Using the following command check the logs for what is the cause.

journalctl -u sshd.service -r | grep  backup -m 10

The password could be expired:

Apr 21 14:15:07 sddc-manager-controller.vcf.vxrack.local  sshd[11753]: Accepted password for backup from 172.30.0.24 port 41460  ssh2
Apr 21 14:15:07 sddc-manager-controller.vcf.vxrack.local sshd[11753]: pam_unix(sshd:account): expired password for user backup (password aged) 

The password could also have been changed, which will result in these messages:

Apr 09 11:37:36 sddc-manager-controller.vcf.vxrack.local  sshd[76309]: pam_unix(sshd:auth): authentication failure; logname= uid=0  euid=0 tty=ssh ruser= rhost=127.0.0.1  user=backup
Apr 09 11:37:38 sddc-manager-controller.vcf.vxrack.local sshd[76309]: Failed password for backup from 127.0.0.1 port 42396 ssh2 

VMware makes the following statements in the article:

  • This is a known issue affecting VMware Cloud Foundation. There is currently no resolution.
  • Never manually change the password for the “backup” account or  the “backupuser” account.  Both accounts need to match what is in the SDDC Manager PostgreSQL database
  • For VMware Cloud Foundation 3.0 to 3.7 you won’t be able to rotate the “backup” account password.  The account is not part of the Password Rotation workflow yet.

Resolution/workaround

1. Retrieve the password (hint: it’s VMware123!)

  1. Log in to the SDDC Manager as the vcf user and switch to the root user by using the su command.
  2. Run the following command to retrieve the backup-user password:

curl http://localhost/css/credentials | json_pp | grep backup -C 5

Which will give you the following output:

"credentialType" : "FTP",
"modificationTime" : 1555480009830,
"id" : "84f06ff5-976b-49eb-aa65-88baea7f3d45",
"username" : "backup",
"creationTime" : 1555480009830,
"entityId" : "38a3f4d1-5bb8-11e9-8efc-a7ef5fefc939",
"secret" : "VMware123!",
"entityType" : "BACKUP"

2. Change the password

With your super-duper-secret password in hand, change the backup-user to a temporary password, and switch it back using passwd backup. So that means you have to make 2 changes!

3. Clear the login failures

To make sure that the user is able to log in after it possibly got locked out due to failures, use pam_tally2 -u backup -r to clear the failure counter.

4. Issue a new backup from NSX-V manager

Log in to the NSX-V manager and run a new backup:

After a few minutes a new entry should appear in the table below. If it doesn’t, try using the ‘change’ buttons and re-enter the correct details. Enter the password in the ‘passphrase’ field.

5. Fix permanently

The KB article stops at step 4. This means that in 90 days you’ll be facing the same issue again. In order to prevent the backup-user password from expiring, use the following command as the root user on the SDDC manager:

chage -I -1 -m 0 -M 99999 -E -1 backup

This sets the following:

  • Minimum Password Age to 0
  • Maximum Password Age to 99999
  • Password Inactive to -1
  • Account Expiration Date to -1

Verify again with chage -l backup:

The backup-user should be good to go from now on. Hope this helps!