Uploaded image for project: 'Puppet'
  1. Puppet
  2. PUP-5389

PID race condition allows two puppet instances, which can result in failure to request certificate

    XMLWordPrintable

Details

    • Bug
    • Status: Closed
    • Normal
    • Resolution: Duplicate
    • None
    • None
    • None
    • None

    Description

      By using mcollective's puppet agent, we're triggering puppet to retrieve and apply the new catalog (mco puppet runonce) while the puppet daemon is idle. This has been automated and sometimes can happen while a node is rebooting. During the start-up, mcollective starts first (S24) and can receive and execute the mco puppet call about the same time the daemon is starting (S98).

      If agent.pid file is not detected by the mcollective agent, it proceeds executing puppet agent --onetime. There's a time window from the moment puppet starts until it creates the pid file, which the mcollective agent could miss the fact that a puppet process is already running and start another instance of the puppet agent.

      Normally this doesn't result in a bad state, since a second lock is acquired before applying the catalog. But if the certificates aren't generated yet or new ones are required, both instances are attempting a certificate request, which result in failure to sign them.

      Oct 16 18:17:19 node1 puppet-agent[52032]: Reopening log files
      Oct 16 18:17:20 node1 puppet-agent[52032]: Creating a new SSL key for node1.p.com
      Oct 16 18:17:20 node1 puppet-agent[52026]: Creating a new SSL key for node1.p.com  <--- another pid 
      Oct 16 18:17:20 node1 puppet-agent[52032]: Creating a new SSL certificate request for node1.p.com
      Oct 16 18:17:20 node1 puppet-agent[52032]: Certificate Request fingerprint (SHA256): 29:7D:9B:17:88:03:F0:DA:B1:86:CC:21:0A:C0:D0:9F:4A:92:03:1C:F1:19:0E:6A:B7:7C:7E:F1:66:02:EA:4B
      Oct 16 18:17:20 node1 puppet-agent[52032]: Caching certificate for node1.p.com
      Oct 16 18:17:20 node1 puppet-agent[52032]: Starting Puppet client version 3.3.2
      Oct 16 18:17:20 node1 puppet-agent[52047]: Retrieving plugin
      Oct 16 18:17:22 node1 puppet-agent[52026]: Could not request certificate: The certificate retrieved from the master does not match the agent's private key.
      Oct 16 18:17:22 node1 puppet-agent[52026]: Certificate fingerprint: 2E:BC:7A:5F:D8:FB:33:03:CD:BC:CB:68:59:B5:BD:75:86:37:A9:62:2C:84:53:7F:D9:90:10:50:62:B1:32:59
      Oct 16 18:17:22 node1 puppet-agent[52026]: To fix this, remove the certificate from both the master and the agent and then start a puppet run, which will automatically regenerate a certficate.
      Oct 16 18:17:22 node1 puppet-agent[52026]: On the master:
      Oct 16 18:17:22 node1 puppet-agent[52026]:   puppet cert clean node1.p.com
      Oct 16 18:17:22 node1 puppet-agent[52026]: On the agent:
      Oct 16 18:17:22 node1 puppet-agent[52026]:   rm -f /var/lib/puppet/ssl/certs/node1.p.com.pem
      Oct 16 18:17:22 node1 puppet-agent[52026]:   puppet agent -t
      Oct 16 18:45:55 node1 puppet-agent[55039]: Retrieving plugin
      Oct 16 18:45:56 node1 puppet-agent[55039]: (/File[/var/lib/puppet/lib]) Failed to generate additional resources using 'eval_generate': SSL_CTX_use_PrivateKey:: key values mismatch
      Oct 16 18:45:56 node1 puppet-agent[55039]: (/File[/var/lib/puppet/lib]) Could not evaluate: SSL_CTX_use_PrivateKey:: key values mismatch Could not retrieve file metadata for puppet://ms1/plugins: SSL_CTX_use_PrivateKey:: key values mismatch
      Oct 16 18:46:01 node1 puppet-agent[55039]: Could not retrieve catalog from remote server: SSL_CTX_use_PrivateKey:: key values mismatch
      Oct 16 18:46:02 node1 puppet-agent[55039]: Using cached catalog
      Oct 16 18:46:06 node1 puppet-agent[55039]: Applying configuration version '2015-10-14 09:47:23.386140'
      ...
      Oct 16 18:46:22 node1 puppet-agent[55039]: Finished catalog run in 18.19 seconds
      Oct 16 18:46:23 node1 puppet-agent[55039]: Could not send report: SSL_CTX_use_PrivateKey:: key values mismatch
      ...
      

      This is surprisingly easy to reproduce manually, as it seems that it takes more than 1 second from the moment puppet starts until a pid file is created. Here's a normal startup with a bit of additional log messages.

      Oct 16 15:59:42 node1 puppet-agent: About to run puppet. Time: 1445007582.080518313
      Oct 16 15:59:43 node1 puppet-agent[30750]: daemonizing
      Oct 16 15:59:43 node1 puppet-agent[30755]: Time: 1445007583.30188 About to create PID file /var/run/puppet/agent.pid
      Oct 16 15:59:43 node1 puppet-agent[30755]: Reopening log files
      Oct 16 15:59:43 node1 puppet-agent[30755]: Starting Puppet client version 3.3.2
      Oct 16 15:59:43 node1 puppet-agent[30755]: Setting signal traps
      Oct 16 15:59:43 node1 puppet-agent[30755]: Time: 1445007583.58079 About to create PID file /var/run/puppet/agent.pid
      Oct 16 15:59:43 node1 puppet-agent[30764]: Retrieving plugin
      ...
      

      The first timestamp (up to nanosecs) is just before executing puppet from the init script and the second timestamp (up to microsecs) is just before PID creation. Should the agent acquire a lock before attempting to request new certificates?

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              jirauser18479 user-b87ca (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Zendesk Support