[TK-293] tk-auth should support x.509 extensions for authentication instead of just certname Created: 2015/11/10  Updated: 2016/09/22  Resolved: 2016/09/22

Status: Closed
Project: Trapperkeeper
Component/s: None
Affects Version/s: None
Fix Version/s: PE 2016.2.0

Type: New Feature Priority: Normal
Reporter: Chris Barker Assignee: Erik Dasher
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Blocks
blocks SERVER-1245 Pass puppet oids and custom_trusted_o... Closed
blocks SERVER-1252 Document new tk-auth feature Closed
blocks TK-352 Support subjectAltName in tk-auth Resolved
Relates
relates to SERVER-1268 Build acceptance test(s) for TK-293 Closed
Template:
Epic Link: Securing SSL Extensions
Sub-team: jade
Story Points: 5
Sprint: Server Jade 2016-03-23, Server Jade 2016-04-06, Server Jade 2016-04-20
QA Contact: Erik Dasher

 Description   

Summary

tk-auth should allow auth rules that match on CSR attributes (aka SSL Extensions) instead of just certname. tk-auth's service should accept a mapping of OID to short names that can be used to define the auth rules.

In Scope

  • Ability to provide the tk-auth service with a map of OID -> shortname
  • Ability to set an :allow or :deny stanza to use a map of shortnames (or raw OIDs) to values instead of just a hostname
  • Extracting a certificate's extensions as part of tk-auth middleware and translating them from OID -> shortname using the provided mapping
  • Affordance for an empty mapping
  • Failing fast if a given extension key in the auth rules can't be found as a shortname in the OID -> shortname mapping AND isn't a valid OID

Out of Scope

  • Supporting any shortnames outside of what tk-auth's service gets in the passed mapping. This includes puppet's shortnames. Such things should be passed into consumers of tk-auth (like puppet server)
  • Wildcard/pattern matching on extension values

Bonus Round

There is still dead code detritus leftover from when tk-auth was originally adopted from the community. This should be cleaned up as needed to implement this new functionality.

Design

The following are all valid values for allow or deny in the auth conf.
Please note that the puppet shortnames used are not supported by tk-auth
natively but must be passed in.

# classic
"node.foo.com"
 
# simple csr attr case
{attrs: {pp_role: compile_master}}
 
# complex csr attr case
{attrs: {pp_role: [compile_master, console],
         pp_environment: pe_inf}}
 
# mix
["node.foo.com",
 "node.bar.com",
 {attrs: {pp_role: compile_master}}]



 Comments   
Comment by Chris Barker [ 2015/11/10 ]

This will also remove the issues of having trouble parsing tk-auth files, and having to do things like reuse a pe-internal certificate for services.

Comment by Chris Price [ 2015/11/12 ]

Chris Barker this sounds like a really cool idea to me, thanks for the ticket!

ping Jeremy Barlow Eric Sorenson just as food for thought.

Chris Barker can you elaborate a bit on what you mean by "remove the issues of having trouble parsing tk-auth files"?

Comment by Jeremy Barlow [ 2015/11/12 ]

Ping Nicholas Fagerlund. Thought you'd be interested since you've brought this up in the past as well.

Comment by Chris Barker [ 2015/11/12 ]

A lot of tools / PE components still reference a pe-internal-dashboard certificate as a hard coded name that they can use, so that is what is added to PE's tk-auth file. The PE installer itself still halts on finding auth.conf versions that don't match previous releases checksums, for example.

Being able to just say "nodes with this csr_attribute can talk to this service" means that the likelihood of needing PE to manage / change the auth.conf in the future is less.

It is another ticket to make sure that tk-auth is easily parseable by puppet (this is again assuming that some customers will want to make changes to auth.conf for some services, so an erb template isn't sufficient - for example allowing the PE radiator page viewable without having to be logged in).

Comment by Chris Price [ 2015/11/12 ]

We have a puppet module shipping in Ankeny that is intended to support managing the tk-auth file. That's tangential to most of the benefits you're getting at with this ticket, but, just for the record.

Comment by Jeremy Barlow [ 2015/11/12 ]

We have a puppet module shipping in Ankeny that is intended to support managing the tk-auth file.

For reference, the github project for that is here - https://github.com/puppetlabs/puppetlabs-puppet_authorization. This should at least allow users to be able to augment the new "auth.conf" that puppetserver uses with additional rules via puppet code – without having to massage the more "global" erb template.

The sort-order range reservation that we're using for the new auth.conf should allow custom user rules to be added into the auth.conf in such a way that the PE-managed rules can be updated on upgrade without global checksum matching / forcing the user to hand-reconcile conflicts.

I'm not arguing that it wouldn't be valuable to have a way to have authorization rules express users that should be allowed/denied via other characteristics that can be shared across multiple users - e.g., more role-based like in your example - as opposed to just the ability to identify unique names as both the legacy and new auth.conf only do today.

Comment by Nicholas Fagerlund [ 2015/11/13 ]

Yeah!!! I'm all in favor of more flexible allow criteria.

My initial spitball design was that allow: could accept maps in its array of values as well as strings. So something like

allow: [
  node1.example.com,
  {certname: node2.example.com},
  $1,
  {pp_role: compile_master}
]

So if it's a string, it's a certname, same as doing {certname: name}, but you can use maps to put together arbitrarily complicated combinations of cert metadata.

Comment by Chris Barker [ 2015/11/13 ]

Chris Price Yeah, being able to parse tk-auth file is great, but csr_attributes rules specifically allow for us to not have to modify the tk-auth for every service when a compile master is spun up, for example.

Currently, if I provision a compile master, puppetdb and nc services need their whitelists updated before it can take traffic. But with rules based auth, a node with pp_role: compile_master and pp_environment: pe_inf in their csr_attributes would be able to immediately talk to the NC service, the PuppetDB service, the code manager service, etc, minimize the bootstrap time from a user requesting a new compile master and it being able to take load.

See https://tickets.puppetlabs.com/browse/SERVER-1005 for the ability to specify autosigners that handle alt names.

Comment by Chris Price [ 2015/11/13 ]

Chris Barker - yeah, I totally agree with you, not having to modify tk-auth at all is preferable. Just pointing it out because a few of your comments made it sound like you weren't sure about whether the file could be managed with puppet.

Comment by Joshua Partlow [ 2016/02/18 ]

What allows a particular node to request a particular pp_role extension in its CSR? Does this mean we now need a PE specific custom policy executable? (https://docs.puppetlabs.com/puppet/latest/reference/ssl_autosign.html#policy-based-autosigning) And if so, wouldn't that preclude customers from using this facility?

Comment by Nicholas Fagerlund [ 2016/02/18 ]

Joshua Partlow It's just a client-side config file. Nodes can request whatever they want.

However, I think the puppet cert command and the PE console node manager still don't have any affordances for viewing the requested extensions, which makes them unusable for anyone who's not doing custom policy autosigning. We could make this a practical feature by updating puppet cert and the node manager.

Comment by Joshua Partlow [ 2016/02/18 ]

Nicholas Fagerlund I know the csr_attributes are just a file on client prior to the csr submission--I think that's where I'm getting confused, without some policy around the ca deciding to accept a pe_role attribute, aren't we allowing any node to sign up for being allowed to talk to a pe infrastructure service that tk-auth has a pe_role specific rule for?

Comment by Nicholas Fagerlund [ 2016/02/18 ]

Joshua Partlow Yeah—that's why I consider their absence from puppet cert (list / sign) to be the underlying problem, because if you aren't using automation you're probably automatically approving all extensions.

Comment by Chris Price [ 2016/02/19 ]

Chris Barker can you clarify (or point me to another ticket that clarifies) what parts of the PEM/PE Installer proposal would be blocked by this and SERVER-1005? We are interested in doing this work but there are a lot, lot, lot of things competing for our time in Couch. I'm relying on Archana Sridhar to help prioritize and unless we end up with more wiggle room on some project like Direct Puppet it may be hard to get to this, but if I fully understood the implications on the PEM/PE Installer it might make it easier for me to advocate.

Joshua Partlow Nicholas Fagerlund it seems like some of the discussion about the autosigning / puppet cert bits of this might be better to have on SERVER-1005, so that if that work is picked up, your (very legitimate) concerns can be taken into account.

Comment by Chris Barker [ 2016/02/24 ]

Chris Price - This and SERVER-1005 aren't blockers for PEM/MEEP, but are speedbumps to getting rapid deployment of compilemasters setup for people in scaling situations / wanting more flexible / better installation experience of PE. Without it, we will need to be dependent on coordinating puppet runs on compile masters and puppetdb servers for example (to promote a node to be a compile master, it needs a certificate with an altname - hence server-1005, and puppetdb would need to have it's whitelist updated to include the new compile masters certificate for it to connect).

Joshua Partlow Nicholas Fagerlund - that is a really good point, I hadn't considered the bad actor tactic that this would enable to get access to the catalogs / infrastructure. Would it be safe to say that if we enable rules based tk-auth, we need to enable rules based routing of certificate requests - instead of just having SERVER-1005 be about dns_altnames? In this situation, i'd want to ensure that any certificate with pp_environment = pe_inf doesn't accidentally get confused with a normal agent certificate, etc. This has made this a much larger feature to add, but may answer Chris Price's question of priorities (which is this is now probably too big for Couch).

Comment by Chris Price [ 2016/02/24 ]

Chris Barker so, if you had this stuff, you'd include something in MEEP that allowed a user to promote a node to a compile master via autosigning... and without it, MEEP would have to include a call to a signing command? Still not quite understanding how these would change the UX, but I know I'm just missing something...

Comment by Chris Barker [ 2016/02/24 ]

Without it, currently the user would just have to perform the signing manually, which is how we tell users to do the promotion process right now.

Comment by Chris Price [ 2016/02/24 ]

Ah, OK. Starting to make sense now.

Comment by Chris Price [ 2016/04/08 ]

As time has progressed and I've started to understand this more completely, it's now fully sunk in that there's a possible security risk here because any agent can submit a CSR with a privileged role extension in it, and the CLI cert tool doesn't show the extensions, and the existing autosigner doesn't enforce that anyone's script validate these extensions in any way. I know I'm late to the party

Anyway, since this work to allow the auth is already in flight, and since we're assuming we want to target this part at Couch, we should probably think about what we consider the minimum amount of additional work that we'd need to do in order to feel like these auth changes aren't going to introduce any significant additional security risk. Otherwise we might need to target this feature at a branch that lands after Couch?

Chris Barker Joshua Partlow Nicholas Fagerlund Eric Sorenson interested in your feedback on this.

Comment by Chris Barker [ 2016/04/08 ]

Right now - and we would have to be explicit about this in the documentation and our implementation - is that all tk_auth based rules include a requirement of the dns_altnames validation for certificate - we've already established that dns_altnames certificate requests should be a specially handled request and that users shouldn't just autosign them with the same process they autosign certificates without dns_altnames.

It is not the best solution, but in the short term it would mean that in our product someone couldn't get accidentally sign or autosign a certificate that has access to sensitive information - for the certificate to be valid against the tk_auth rules, it has to contain dns_altnames, for it to be signed, the user has to override the default signing behavior with --allow-dns-altnames, and until SERVER-1005 is in the product, no autosigner will process the request either.

Comment by Chris Price [ 2016/04/08 ]

Chris Barker so you're saying that any places where we were going to add some kind of new "pe_role" extension to the certs for use in the installer / PE modules, to allow us to avoid the requirement of explicitly adding all of the compile master certnames individually to those rules, that we are also going to have the rule match on altname for now?

Comment by Nicholas Fagerlund [ 2016/04/08 ]

Chris Price Heheh, welcome to team "`puppet cert` sucks".

To my eyes, we need to do ALL of the following before trying to use extensions for real authorization:

  • Modify `puppet cert` to do 3 new things:
    • List all extensions for every cert when you do puppet cert list, same way we list alt names. Improve the formatting so they're easily legible and scannable (instead of just overflowing the line like we do with alt names).
    • Balk when trying to sign a CSR with extension requests unless you provide a special flag, same as with alt names.
    • Provide a way to filter the list of certificates (and/or CSRs) to show:
      • Only those that have a given extension (or list of extensions) set/requested.
      • Only those that have a specific value for a given extension set/requested.
  • Stress-test and improve Puppet Server and PuppetDB's ability to reject revoked certificates when in a PE-like environment.
    • Make sure we have an easy (read: automatic) workflow for securely distributing updated CRLs from the CA server to the non-CA compile masters and the PuppetDB servers. (B/c puppet agent never updates its CRL and never retrieves it securely.)
      • Stretch goal: fix revocation on agents too.
    • Make sure the CRL actually works, and ID any actions you must take to make it work (do we have to restart the JVM? do a HUP?), and make them part of the automatic CRL distribution.
  • Update the PE console node manager to mirror our changes to `puppet cert`.
    • Right now it doesn't allow you to sign certs with alt name requests, right? If so, it should do the exact same for extension requests.
    • ...but really, it should provide an informative interface that lets you sign either if you know what you're doing. That's probably out of scope for now, but the more important extensions become, the more useless the node manager will become.
    • It should also let you browse the existing certificates with extensions in the same way described above for `puppet cert`, but again, probably out of minimum scope?

That list will at least make it possible to, with a bunch of manual work and vigilance:

  • Identify which certificates have special permissions in your PE infrastructure, even if you have a realistic (i.e. huge) number of issued certificates.
  • Revoke any certificates that should NOT have those permissions, and have the revocation actually take.
  • Refrain from accidentally signing certificates with unapproved extension requests.

IDEK what to do about autosigning. Probably, we should at least change naïve whitelist autosigning to refuse CSRs with either alt-names or extensions. (Right now it just approves everything if the name matches, right?) Policy-based autosigning already allows you to refuse based on extension requests, but the policy executable author has to do all the work, and we all know how easy it is to get certificates right. Maybe we should be providing more affordances for this kind of check in the policy exe API? Perhaps even add a new, non-script-based interface that lets you write declarative policies instead.

Comment by Chris Price [ 2016/04/08 ]

Nicholas Fagerlund I agree with you on all of those things as medium-to-long-term goals. However, we've had a bunch of discussions about doing a major rework of the CA stuff, like using a database to store them on the server, possibly using OCSP, doing a total re-write of the CLI tools, etc. I think that stuff is all fairly high on Eric Sorenson's priority list, and if it is, then what you've described is obviously more effort than we can afford to put into the legacy system.

Are you saying that you'd consider all of that stuff to be prerequisite to using extensions for auth in the PE infrastructure?

Comment by Nicholas Fagerlund [ 2016/04/08 ]

Chris Price Well, I might have written the post backwards, sorry. What I actually think is mandatory are those three bullets at the bottom:

  • Identify any existing certs with extensions they shouldn't have.
  • Be able to neutralize any such certs.
  • Not get silently tricked into signing any more such certs.

If can't get all three of those, extension-based auth is too great a risk to use.

(The list at the top was my best guess at what we'd need for that, but I might have been putting some carts before horses.)

Comment by Chris Price [ 2016/04/08 ]

Nicholas Fagerlund well, one problem with what you're proposing is that today's CA doesn't officially keep any historical record of all of the outstanding certs. It happens to store them in a certain directory after it signs them, and serve them to the agents from that directory when they check back in, but from that point forward there is no contract that says that we have any way of authoritatively knowing what certs are out there or what extensions they have in them. You could totally blow away that directory and everything would just keep on chugging along, and I would strongly suspect that most customers who have been using Puppet for a while and have ever needed to rebuild their MoM for any reason are likely to have lost data and not have any way to inventory all of the outstanding certs. (This is part of the motivation for switching to some other kind of persistence store in the future.)

Given that, it seems like bullet #1 is probably not possible?

Comment by Nicholas Fagerlund [ 2016/04/08 ]

Chris Price Ah, you're right. So then... extension-based auth is a non-starter until we get that rework of the CA infrastructure, right? And we should just punt this ticket until then? We'll need some way to prevent people from using this unless their CA has lived its whole lifetime under the new regime.

Having a leaky record of certs works fine for name-based auth because your name is your name, but as soon as you start using non-unique metadata for auth... I think you have to have bookkeeping. Or am I missing something again?

Comment by Chris Price [ 2016/04/08 ]

Nicholas Fagerlund I definitely see what you're getting at, but I don't feel qualified to weigh in on whether those concerns would be an absolute blocker for rolling out this feature. Maybe Eric Sorenson can weigh in when he gets back. (godspeed, Eric Sorenson!)

Comment by Chris Barker [ 2016/04/08 ]

Chris Price Nicholas Fagerlund

Yes, I am saying at all tk-auth rules that are created also have to check for the presence of the environments dns_altnames. All PE related services that would be created almost always already contain dns_altnames because the reason we are deploying more compile masters, or puppetdb instances, or activemq spokes is to handle load or HA configures, which usually involve a loadbalancers, which need their hostnames added to the dns_altnames already.

By ensuring our rules that we we create in PE require the presence of dns_altnames, it means that we are providing the short term functionality until updating the CA / cert tools to handle ssl extensions is done. This does make it dependent on the tk-auth rules to be validated and tested thoroughly, but they should be already, and since we still can't control an enduser from changing auth rules from exposing themselves to security risks anyway, I don't see that we need to wait to introduce this feature as long as we follow a standard on how we implement the checks against which are valid ssl extensions / certificates.

Comment by Chris Price [ 2016/04/08 ]

Chris Barker I think I agree with that, as long as Eric Sorenson signs off on it when he gets back.

I wonder, though, if we're going to require the altnames to be included in any out-of-the-box PE rules, then do we even need to add an additional extension to those rules? Or would the altnames be sufficient for identifying the PE nodes that need access to the endpoints in question?

Comment by Chris Barker [ 2016/04/08 ]

We should still require more than just altnames, ie, I don't want to grant every PE Infrastructure node access to PuppetDB if it doesn't need it - activemq spokes don't need that information, etc.

So you'd have rulesets on the services with (sorry I'm munging all the UX work above):

pp_role = [compilemaster,rbac,console] && dns_altnames = customer.foo.com

dns_altnames means a certificate had to go through a non automated / accidental signing process, but pp_role has to also contain the right role's allowed to talk to the service.

Comment by Nathaniel Smith [ 2016/04/08 ]

It sounds like the minimum work needed to make this ticket's work viable is captured by TK-352. Chris Barker, Nicholas Fagerlund, can you confirm that?

Comment by Nicholas Fagerlund [ 2016/04/08 ]

Hmm, I hadn't thought of (has > 0 alt names) && (particular extension value), so I guess I WAS missing something! ...Yeah, I think that makes it viable, as long as everyone making those auth rules understands the problem.

Documenting the safe use of extension rules for users is going to be kind of rough.

Comment by Chris Barker [ 2016/04/08 ]

Yeah, I realize its not ideal, but it enables us to get this out of the door. In theory the rules remove a lot of the use cases for users to have to hand edit the auth in the first place.

It may have to fall under an advanced / here be dragons (or apiarists only) header until we get the rest of the toolchain in place to make signing ssl extensions a lot clearer to the users.

Comment by Reid Vandewiele [ 2016/04/08 ]

It's worth noting that despite the long laundry list of things it would be nice to have in place for a complete cert-based auth experience (which Nicholas Fagerlund did a great job of enumerating above), there's really only one that's an MVP hard-line requirement.

Balk when trying to sign a CSR with [a given, known extension request] unless you provide a special flag, same as with alt names.

Using dns-alt-names is a cheap way of doing this today because it is effectively a given, known extension request which we balk on when signing without an explicit flag.

If we have any engineering effort to devote to this and want something slightly cleaner than hijacking dns-alt-names, we could choose to designate a particular OID arc as "secure", and require an explicit flag to sign certs that request extensions under this arc—the same way dns-alt-name requests do. e.g.

puppet cert sign --allow-secure-extensions

We would then be free to define an OID under the secure arc, name it something like "pe_infrastructure", and require that a node have BOTH a value in the pe_infrastructure extension AND an appropriate pp_role extension in order to grant authorization.

Again, same idea, but a little more engineering effort and I think a lot cleaner.

Comment by Chris Price [ 2016/04/09 ]

Reid Vandewiele I was thinking of some things along those same lines, but I think that Nicholas Fagerlund's point is that that doesn't give us any guarantee that there aren't already existing certs that were signed with one of these newly-declared-"secure" extensions, and those certs would be grandfathered in, and that might be a security risk.

Comment by Nicholas Fagerlund [ 2016/04/11 ]

Yeah exactly.

Comment by Reid Vandewiele [ 2016/04/12 ]

Chris Price Nicholas Fagerlund I just talked to Adrien Thebo about this. Due to thoughtful planning in how our OID arcs were designed, we have the ability to implement a real solution cleanly, and easily.

Our root OID is 1.3.6.1.4.1.34380 for "puppetlabs". We then have:

puppetlabs.1:   "Puppet Certificate Extension"
puppetlabs.1.1: "Puppet Registered Certificate Extension" (ppRegCertExt)
puppetlabs.1.2: "Puppet Private Certificate Extension"    (ppPrivCertExt)

See details here.

Our CA will currently ONLY accept extension requests if they are under one of the existing ppRegCertExt or ppPrivCertExt OIDs. See here.

If we want to implement "authorization" certificate extensions, we could do so by defining another OID arc under puppetlabs.1. For example, we could define:

puppetlabs.1.3: "Puppet Authorization Certificate Extension" (ppAuthCertExt)

Because any extensions under this OID are flatline rejected today, we have full control over how we implement the methodology for accepting them. We could require on the command line something like:

puppet cert sign --allow-authorization-extensions

This would not be subject to the security concerns expressed above.

Making an immediate-term plan to base trust on dns-alt-names existing in a cert is an acceptable work-around, but it's a hack. It is not a good design, and it is something we should be intentionally seeking a solution to. It's technical debt. If we can solve the problem cleanly now, it seems worth it to me to avoid paying back technical debt on this later.

Thoughts?

Comment by Chris Price [ 2016/04/12 ]

Reid Vandewiele interesting.

Our CA will currently ONLY accept extension requests if they are under one of the existing ppRegCertExt or ppPrivCertExt OIDs. See here.

That's only the ruby code, though. Jeremy Barlow Nate Wolfe Justin May do you know if this logic is also present in the Clojure CA?

Assuming that we do have that protection in the Clojure code and that we decide we're safe, then it sounds like what we'd have to do to support this would be:

1. Add the new OID arc in both Ruby and Clojure
2. Add something to the Ruby ca cli tool to list extensions in a CSR?
3. Add the '--allow-authorization-extensions' to the Ruby CLI?

That doesn't sound like a ton of work, and I agree with you that it would avoid some stuff that I'd consider tech debt in the default PE rules, but much of the work above will end up being thrown away whenever we redo the CA... so it's a different kind of tech debt. I also think that since the PE tk-auth rules are managed by PE modules, translating the old rules that include altnames to new, better rules in the future would be pretty straightforward and invisible to the user?

I see pros and cons to both, just trying to enumerate them, not necessarily advocate for one or the other. Probably what this comes down to is how confident Eric Sorenson is that we'll get to the CA rework any time soon, or whether that's far enough out that investing some energy in the Ruby cli tool is justifiable for the short term.

Comment by Kevin Corcoran [ 2016/04/12 ]

I'm going to resolve this ticket since the associated PR has been merged and the feature has landed in tk-auth. TK-352 and PE-14835 are currently tracking follow-up work.

Comment by Reid Vandewiele [ 2016/04/12 ]

Chris Price Nicholas Fagerlund Chris Barker Nathaniel Smith Eric Sorenson with this ticket marked resolved I thought it prudent to create a new one to track the proposal to implement Puppet Authorization Certificate Extensions instead of keying off of the presence of subject alternative names. See PUP-6166.

Comment by Kurt Wall [ 2016/04/12 ]

Kevin Corcoran Okay. Thx.

Comment by Nate Wolfe [ 2016/04/12 ]

Our CA will currently ONLY accept extension requests if they are under one of the existing ppRegCertExt or ppPrivCertExt OIDs

That's only the ruby code, though. Jeremy Barlow Nate Wolfe Justin May do you know if this logic is also present in the Clojure CA?

Chris Price Behavior should match. Corresponding implementation here: https://github.com/puppetlabs/puppet-server/blob/master/src/clj/puppetlabs/puppetserver/certificate_authority.clj#L550-L568

Generated at Tue Jul 14 04:17:12 PDT 2020 using Jira 8.5.2#805002-sha1:a66f9354b9e12ac788984e5d84669c903a370049.