[PUP-8905] New certificate download code breaks autosigning workflow Created: 2018/06/05  Updated: 2018/09/19  Resolved: 2018/06/26

Status: Closed
Project: Puppet
Component/s: None
Affects Version/s: None
Fix Version/s: PUP 6.0.0

Type: CI Blocker Priority: Normal
Reporter: Maggie Dreyer Assignee: Maggie Dreyer
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
blocks PDB-3926 Find/fix integration test failures wi... Closed
CI Pipeline/s:
platform puppetserver
Epic Link: Robust Intermediate CA
Team: Server
Release Notes: Not Needed
QA Risk Assessment: Needs Assessment


Autosigning appears to be broken with the new certificate download code (PUP-8652). When using an autosigning workflow, the first agent run fails with

Error: Could not request certificate: Error 400 on SERVER: pto1to2xvyajwdw.delivery.puppetlabs.net already has a signed certificate; ignoring certificate request

It seems that the agent is re-submitting a CSR even though it has successfully downloaded the signed host certificate. This could be related to some weird interaction between remaining indirector code and the code.

This issue appears to be affecting both Puppetserver's beaker acceptance tests (whose pre-suites use autosigning to set up their environment) and PuppetDB's clojure integration tests.

Comment by Maggie Dreyer [ 2018/06/05 ]

This occurs because we actually create two SSL::Host objects during bootstrap, one at the start of the agent run to represent the agent in question, and a persistent localhost representing the same, that is lazily created and then used when validating HTTP connections, see https://github.com/puppetlabs/puppet/blob/master/lib/puppet/ssl/validator/default_validator.rb#L114. localhost is created the first time the indirector tries to make an HTTP request to check the CAs SSL record for this host, and as part of its creation sets up the hosts certs, which the host object from the agent run then sees and uses.

Before the recent code changes, this first HTTP request occurred as part of this call to certificate: https://github.com/puppetlabs/puppet/blob/5.3.x/lib/puppet/ssl/host.rb#L329. But now that we have replaced the internals of the certificate method to avoid the indirector, the setup done by localhost occurs during the call to generate on the line below instead, here https://github.com/puppetlabs/puppet/blob/5.3.x/lib/puppet/ssl/host.rb#L231. However, apparently the submission of a CSR during an indirection.find for it does not actually return said CSR, so existing_request stays nil, and later on in the call to generate we issue another request, which the server rejects, issuing the error above.

Note that in all of this, no actual inconsistent state occurs, and the autosigned cert is downloaded as part of the implicit initialization by localhost, so the agent actually ends fully initialized and further runs work correctly.

The simplest fix is simply to re-check for a signing request, e.g. changing this line https://github.com/puppetlabs/puppet/blob/5.3.x/lib/puppet/ssl/host.rb#L252 to say "certificate_request" instead of "exising_request". However, that only really obfuscates the problem further, so I am going to look into not using the indirector to manage signing requests, similar to what we have done with certs and CRLs.

Generated at Wed Nov 20 21:29:37 PST 2019 using JIRA 7.7.1#77002-sha1:e75ca93d5574d9409c0630b81c894d9065296414.