Uploaded image for project: 'Puppet'
  1. Puppet
  2. PUP-6376

Add `server_urls` option to puppet with agent failover logic

    Details

    • Type: New Feature
    • Status: Closed
    • Priority: Normal
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: PUP 4.6.0
    • Component/s: None
    • Labels:
    • Template:
    • Story Points:
      5
    • Sprint:
      Client 2016-06-29, Client 2016-07-13 (HA, 1.5.3)
    • Release Notes:
      New Feature
    • Release Notes Summary:
      Hide
      This change adds master failover functionality to the puppet agent. Using the new {{server_list}} option to specify multiple masters, an agent will now attempt to fall back to a functional master should a failure to download a catalog occur. The {{server_list}} setting can be either provided on the command line or configured in {{puppet.conf}}, and has the format {{server_list = master1_hostname:port,master2_hostname:port,master3_hostname:port}}.

      The old {{server}} option can still be used to specify a single master, in which case failover will not be attempted and puppet will behave as it always has. Specifying a single server with the {{server_list}} option has the same effect.
      Show
      This change adds master failover functionality to the puppet agent. Using the new {{server_list}} option to specify multiple masters, an agent will now attempt to fall back to a functional master should a failure to download a catalog occur. The {{server_list}} setting can be either provided on the command line or configured in {{puppet.conf}}, and has the format {{server_list = master1_hostname:port,master2_hostname:port,master3_hostname:port}}. The old {{server}} option can still be used to specify a single master, in which case failover will not be attempted and puppet will behave as it always has. Specifying a single server with the {{server_list}} option has the same effect.

      Description

      There will be a new capability to provide a list of hosts the agent should try to contact when initiating a run. The agent will always go down the list, in order, until it finds a master it can contact. It will use that master for the remainder of the run and stop that run if that master ceases to be available during the run.

      We'll need to add a server_urls option to puppet which accepts an array. Other considerations include:

      • Interaction with the old server option: what happens when both are specified in config? By command line? A mix?
      • The actual failover logic - catching a failed attempt contact a master and rolling off onto the next one.
      • Logging: from Kylo Ginsberg: I'd suggest a notice-level message for any decision point in the failover logic. It'll be easier to reason about this when we have the failover code in front of us, but I'm thinking we want to leave bread crumbs for people debugging funky failovers, and we want those breadcrumbs on by default. The report element will be ticketed separately.

      Also to consider: we need to keep track of which server was successfully failed over to so we make sure to hit the correct master for pluginsync, report submission, etc.

      Also, from Kylo: One thing to be careful about with this is making sure it works with and without use_cached_catalog. I mention this because that setting may change what the first endpoint the agent will hit during a given agent run (and thus the code and the context in which it might failover).

      When failing over, are we restricted to only using the master which the cached catalog came from if we need file content? And are we restricted to that master for report submission? Edit: the answer is no, we are not restricted in this way!

      Also, if you're running with cached catalogs, do we fail if we can't reach the master we got that catalog from originally? Or do we failover to a different master? - Edit: once again, not restricted!

        Attachments

          Issue Links

            Activity

              jsd-sla-details-panel

                People

                • Assignee:
                  Unassigned
                  Reporter:
                  whopper William Hopper
                • Votes:
                  0 Vote for this issue
                  Watchers:
                  5 Start watching this issue

                  Dates

                  • Created:
                    Updated:
                    Resolved: