Details
-
Bug
-
Status: Closed
-
Normal
-
Resolution: Fixed
-
None
-
None
-
None
-
None
-
Froyo
-
Automated Test
-
Needs Assessment
Description
Implementing CRL re-downloading uncovered an existing bug in HA failover where Puppet will crash if additional requests are made as a by-product of attempting to make the node request that is used to find a functional server, and the first server in server_list is unavailable. Rather than continuing to check later servers in the list, it will simply exit with an "unable to connect" error.
This was probably unlikely to surface in the past, because such additional requests were only made on initial agent runs, during SSL bootstrapping. However, we are now attempting to redownload the CRL on every request (which causes an additional request to be made to the server being tested), so failover now doesn't work at all.
The easiest solution to this is probably to restrict when the CRL re-download happens, possibly so that it happens explictly after failover has been resolved. However, this does not fix the underlying issue, where failover still breaks on initial agent runs.