Details
-
New Feature
-
Status: Closed
-
Normal
-
Resolution: Fixed
-
None
-
None
-
5
-
Client 2016-06-29, Client 2016-07-13 (HA, 1.5.3)
-
New Feature
-
Description
There will be a new capability to provide a list of hosts the agent should try to contact when initiating a run. The agent will always go down the list, in order, until it finds a master it can contact. It will use that master for the remainder of the run and stop that run if that master ceases to be available during the run.
We'll need to add a server_urls option to puppet which accepts an array. Other considerations include:
- Interaction with the old server option: what happens when both are specified in config? By command line? A mix?
- The actual failover logic - catching a failed attempt contact a master and rolling off onto the next one.
- Logging: from kylo: I'd suggest a notice-level message for any decision point in the failover logic. It'll be easier to reason about this when we have the failover code in front of us, but I'm thinking we want to leave bread crumbs for people debugging funky failovers, and we want those breadcrumbs on by default. The report element will be ticketed separately.
Also to consider: we need to keep track of which server was successfully failed over to so we make sure to hit the correct master for pluginsync, report submission, etc.
Also, from Kylo: One thing to be careful about with this is making sure it works with and without use_cached_catalog. I mention this because that setting may change what the first endpoint the agent will hit during a given agent run (and thus the code and the context in which it might failover).
When failing over, are we restricted to only using the master which the cached catalog came from if we need file content? And are we restricted to that master for report submission? Edit: the answer is no, we are not restricted in this way!
Also, if you're running with cached catalogs, do we fail if we can't reach the master we got that catalog from originally? Or do we failover to a different master? - Edit: once again, not restricted!
Attachments
Issue Links
- is supported by
-
PUP-6384 Acceptance: test that agent failover behavior works correctly
-
- Closed
-