Details
-
Improvement
-
Status: Closed
-
Normal
-
Resolution: Declined
-
None
-
None
-
None
-
Froyo
Description
The current mechanism that the CA service uses for creating / modifying / retrieving keys, CSRs, and certs is based on storing everything in PEM files on disk, in a very particular directory structure. This is problematic for several reasons:
1. It'll be a giant pain to try to make it HA and keep all of the files in sync,
2. We get a lot of bug reports that have to do with users creating or moving these files around improperly
3. There are race conditions where if two actions attempt to modify something (e.g. the CRL) at the same time, the data will be corrupted.
4. Puppet likes to keep two copies of many of these things around, and has two settings to specify their locations (e.g. 'cacert' vs. 'localcacert'). It's often hard to predict which one of the two is used by certain code paths, and we've had several bugs around that.
5. The 'cert_status' endpoint needs to be able to do things like query for the count of valid certs, etc. Current implementations of this require walking the directories and reading all of the pem files into objects in memory serially. This is expensive and we've had bug reports of it causing timeouts for PE users.
We need to move the data to a different persistence store. We need to put some thought into what the backend should be, but my initial leaning is to just put them into postgres. The data maps very well to an RDBMS with a very simple schema.
We might need to go the route that PuppetDB went and also support HSQL for this functionality, to avoid introducing a large new dependency for OSS users.
Eventually when we want to add HA, we can probably do it as a PE-only feature and implement it via postgres clustering, hazelcast laid over the pg data, or similar. There's a good chance we can re-use some of whatever the classifier uses to solve their HA issues.
This ticket is just to capture that there is work to be done. We need to scope it in much greater detail before getting started.
Attachments
Issue Links
- relates to
-
SERVER-115 Concurrent access to the CRL can corrupt it
-
- Resolved
-
-
PUP-3535 certificate_status endpoint is slow
-
- Closed
-
-
SERVER-137 Compose X509CRL once and reuse for get-certificate-statuses
-
- Closed
-