Uploaded image for project: 'Puppet'
  1. Puppet
  2. PUP-10844

Agent failures with server_list when one puppetserver fails

    XMLWordPrintable

Details

    • Night's Watch
    • 5
    • NW - 2021-01-20, NW - 2021-02-03, NW - 2021-04-28
    • Customer Feedback
    • 40535
    • 1
    • Bug Fix
    • Hide
      When puppet processes server_list and tries to find a functional server, it go through each server, and if it cannot connect it throws an error, but it still moves on to the next server in server_list.

      Now it only throws a warning for each server it cannot connect to, and if no server from server_list is functional, then it throws an error.
      Show
      When puppet processes server_list and tries to find a functional server, it go through each server, and if it cannot connect it throws an error, but it still moves on to the next server in server_list. Now it only throws a warning for each server it cannot connect to, and if no server from server_list is functional, then it throws an error.
    • Needs Assessment

    Description

      Puppet Version: 6.15.0+
      Puppet Server Version: 6.x
      OS Name/Version: Any

      After the changes in 6.15.0 the server_list setting has different behavior. Previously when server_list was configured and the first puppetserver in the list failed, the agent would continue to run by connecting to the next puppetserver on the list. In 6.15.0, if the primary puppetserver fails while an agent is running, it results in a failed agent run.

      Desired Behavior:
      When the first puppetserver in the server_list goes offline, the agents should automatically try to connect to the second puppetserver in the server_list even if it is mid agent run.

      Actual Behavior:
      The agent run fails if the first puppetserver in the server_list goes offline while the agent is in the middle of a run.

      Some failures are below.

      Could not evaluate: Could not retrieve file metadata for puppet:///pe_packages/2019.8.1/windows-x86_64/puppet-agent-x64.msi: Request to https://primary.example.com:8140/puppet/v3/file_metadata/pe_packages/2019.8.1/windows-x86_64/puppet-agent-x64.msi?links=manage&checksum_type=sha256lite&source_permissions=ignore&environment=windows_testing failed after 21.011 seconds: Failed to open TCP connection to primary.example.com:8140 (A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond. - connect(2) for "primary.example.com" port 8140)
      

      puppet-agent[7053]: Could not retrieve catalog from remote server: Request to https://primary.example.com:8140/puppet/v3/catalog/agent.example.com?environment=development failed after 0.004 seconds: Failed to open TCP connection to primary.example.com:8140 (Connection refused - connect(2) for "primary.example.com" port 8140)
      

      Reproduction
      1. Configure the server_list for two Puppetservers
      2. Configure 10 agents with the server_list and a run interval of a minute
      3. Shutdown the Puppetserver service on the first server in the server_list

      Likely one of the agents will have the failure. It seems to be more reproducible with file resources inside the catalog.

      We believe this is related to the changes in PUP-10363

      Attachments

        1. puppet.txt
          42 kB
          Vadym Chepkov

        Issue Links

          Activity

            People

              dorin.pleava Dorin Pleava
              jarret.lavallee Jarret Lavallee
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Zendesk Support