I just wanted to post this up in case it helps someone else out.
Starting yesterday, when our VBR server was inadvertently rebooted and finished applying some patches that were outstanding for many weeks (don’t get me started on the fiasco of a background story), we started getting multiple failures and error messages. I only caught this while doing a job reconfiguration and trying to map a cloned job to an existing backup chain, which was failing with an error of “Unable to perform data sovereignty check…”. The displayed size on disk listed under the Backup Repository selection of the Storage screen in the Backup Job wizard displayed a helpful “Failed” text.
Our configuration backup job just after the reboot failed with an error message of “Error The keyset is not defined. Failed to create Cryptographic Service Provider. –tr:Error code: 0x80090019”
Windows proxies with local repositories also failed a config database resync task with the same error of “Warning Failed to synchronize repo Details: The keyset is not defined. Failed to create Cryptographic Service Provider. –tr:Error code: 0x80090019″
Our Linux proxies all failed manual host discovery with an error message of “Warning [hostname] Unable to detect IP address” (to specify, these are all ExaGrid nodes running ExaGrid-Veeam Accelerated Data Mover shares)
The repositories hosted on those ExaGrid nodes all reported the same cryptographic error as above during their config database resync tasks.
Attempts to reset the credentials for the Windows proxies were all met with a message similar to action canceled by user (this is the single error that I didn’t capture exact text for), while the Linux proxies failed with an error of “No such interface detected.”
READ THIS CAREFULLY: All of our backup jobs did not run last night, they did NOT run and fail or give an error, they appear to have not even attempted to run.
A manual run of a smaller backup job hung at the task of “Polling scale-out backup repository extents” for the very first VM.
Going off of an old issue that I remember from years ago (forum post here, I actually wiped the user profile directory on the VBR server for the service account and logged back into the console. That resolved the Windows issue and allowed the VBR server to rescan the proxies (after they were rebooted and the patches completely applied).
To resolve the issues with the Linux proxies, I had to reset the credentials within the configuration (I just ran the ‘Set-VBRCredentials’ cmdlet to re-enter the same password), and this allowed for me to successfully rescan the Linux (ExaGrid) proxies and the repositories.
I suspect that I will have to re-import all of the Veeam credentials (shameless plug – hope y’all paid attention in my VeeamON preso and started to get proficient with your PowerShell cmdlets 😀 )
Once the Windows/Linux credentials were set (just an update of existing data), and everything had been rescanned, I was able to successfully perform a manual run of the prior failed backup job.
I have not opened a ticket with Veeam yet to investigate further, and I would like to know where to find details about the backup jobs which appear to have just skipped their scheduled runs last night. I will do my best to get a ticket opened tomorrow and work through this with Veeam support and provide some additional details once we have dug in, as my current effort is ensuring that I have updated everything and gotten our backups on track again for the day and rolling into the weekend.
If nothing else, use this as a real-world case of where leveraging automation for your backup environment can help to minimize the impact required to even reconfigure your entire environment, even if it is just setting all current options to get them “reset”.