4.x Sensitive Type
4 - 50-90% of Customers
4 - Major
5 - $$$$$$
We currently have no good way of handling sensitive data in a puppet manifest.
PUP-5831 we are adding a binary type. Ticket PUP-3600 is about making that Binary type work as content in an unmodified 3.x PSON/JSON encoded catalog for the File content attribute, but only there. That is not enough for Binary, and certainly no enough for Sensitive data.
While other attributes could possibly also be handled as special cases,there is no general handling of binary until we have better serialization. (TBD Ref to ticket). While Binary works for some attributes, it makes it hard to support as the receiving end must know that it may get binary data, and what that binary data represents. Say that it should be a secret Integer - how does the receiver know this and not mistake for say a GIF?
To support this, we add a new type; Sensitive[T, chipher], a subtype of Binary where T represents the unencrypted value type, and chiper a chipher reference as in OpenSSL::Chipher. A runtime object maintains both the encrypted and unencrypted value, or just the encrypted value. A Sensitive value may have to be able to present itself in a surrogate form i certain situations, such a surrogate value must be of the actual data type or usage may break.
It is uncertain if the surrogate is of value or if this is a presentation issue as each presentation technology may be happy with just a sequence of asterisks for any data type, a very narrow type (like an Enum would also not be able to have a surrogate if the surrogate is one of the acceptable values. For those reasons a surrogate value should not be included in the design. The Boolean 'value_available' attribute is used to denote if the value attribute is available. This boolean is needed as the value T may accept Undef.
Thus a sensitive object can be described by he struct type:
When creating an instance of Sensitive, the data may come from a data source where the value is already encrypted. Data sources that supports this, should return an instance of Sensitive instead of the String representation. The data source will have some way of representing the Cipher as well as the encrypted value. When a Sensitive is created from unencrypted source enough information must be given to enable encryption. The sensitive data can hold multiple encrypted values, one for each recipient. The typical use cases is to allow the master to see the value for the purpose of deriving new values, and for a node, or nodes..
The most common case would be to encrypt for a node - the clear data is only readable by the node, but other scenarios may apply (a certificate shared by a group etc.). Information available only to the master, then made available to a node etc. The method that creates a sensitive must therefore be able to handle these various options.
In this proposal, the keyname is assumed to uniquely identify a name, and that a node's fqdn is such a suitable key.
Handling data of Sensitive type in the 3.x catalog is a separate ticket (TBD REF). How that is done has implications on the type algebra. If we assume that the serialization layer is capable of performing the decryption and producing instances of Sensitive the it can be a very general and powerful mechanism. If Sensitive is seen as a data type different from all other types, then the algebra is simple, but also not very flexible.
Here is a description of the most flexible way of handling Sensitive[T].
A Sensitive[T] is equivalent to a T in all cases. Operations on the value produces new instances that are also Sensitive. e.g
When operations are performed the value is not encrypted until the value is serialized (reaches the end of the secure domain) as it would be wasteful to compute this for temporary variables.
Maintaining the sensitivity of the value is vital and enables such things as auditing. (We may want to add something to describe the degree of sensitivity; Confidential, Secret, Top Secret. Lower levels than Confidential are probably not needed as that is a level that is handled in general by the security at a site). These levels exists mainly for auditing, and RBAC purposes.
In many systems dealing with sensitive data, an aggregation of sensitive data is considered to have a higher degree of sensitivity (in warfare if the amount of ordnance in one warehouse is not as important as the sum of all ordnance in a region). While probably complete overkill, puppet could assign a higher sensitivity whenever aggregation takes place - as it would be doing this blindly it must be conservative meaning that any derived value would have a higher sensitivity degree (which in practice means that everything becomes Top Secret).
An alternative to the Sensitive name is Classified.
Calling functions with Sensitive arguments is an issue. Should they have access to the clear text value? Are all functions and all modules treated equally? A simple way to handle this would be to have an environment setting that defines the modules that may get sensitive data; (unspecified means only puppet core functions, all (turn off), or a list of modules). This should probably be a whitelist/blacklist setting).
On the agent side the corresponding mechanism is required for types and providers. Thus the agent will refuse to send sensitive data to a type/provider if it is not whitelisted.
It is difficult to protect the source code in an AST from leaking sensitive data as it would require the source code repository itself to be highly protected - i.e. stored in an encrypted file system. The parser would then need (when it is producing pre-parsed files) ensure that it is not leaking clear text values in logs, in the embedded source code, nor by deserializing a pre-parsed (.xpp) file and finding literal strings and integers etc. As this becomes almost impossible to handle, the entire .xpp file would need to be encrypted (.expp). This could be done by the parser - it would then parse for a particular master.
A security conscious user will want to bless all ruby code as it is not possible to prevent that loaded ruby code from does something malicious (it can sniff out the clear text values if it wants to). The security level provided by the proposals in this ticket will ensure inadvertent leakage by only allowing certain specially vetted functions from having access to clear text values, if it is allowed to send sensitive data without the clear text, or if it should error on an attempt to do this.
- The Sensitive Data Type and runtime object must be implemented
- Handling of Binary, and extended data in catalogs is required (to be able to also handle Sensitive)
- The native parser should have an encrypted output mode that encrypts the output for a master cert
- The loader of .xpp must be aware of .expp and search for those first; loader must decrypt and then give it to the deserializer to create the AST
- The evaluation of expressions must be able to operate on Sensitive[T] data as if it was the original value of type T, and must fail if the value cannot be decrypted (for the master; the data is for someone else only, and derived values cannot be created).
- A security profile must be created that defines how sensitive data may be given in clear text to modules and or individual functions, and under which conditions logging in clear text is allowed (root, debugging).
- The security profile must be applied at all points where values escape from the evaluator:
- function invocation, calling ruby function means giving up control, a function must be trusted as it can leak information
- template invocation (epp, erb are different since erb is ruby code and can execute arbitrary code; puppet has no control after that, it must trust the ruby code.
- Logging routines when receiving sensitive data should write surrogate values
- It must be possible to create Sensitive data from literal values - a function can be used for this
- Data providers must be developed for secure storing of data.
- Tools must be available to allow users to manage (write values) in a secure key-value store
- Agent side logic must able to handle Sensitive objects, decide on which providers are trusted and that can receive either Sensitive data directly, or be given values in clear text, and can be trusted to not leak them.
The above work can be broken down into several sets of features where security increases. First out would be the fundamental handling of Sensitive data (as its own type), and a secure backend that returns Sensitive data. Secondly, an entire resource could be made sensitive via a flag, its values are delivered as Sensitive in the catalog, but turned into clear text on resource instantiation. Providers, reporters, loggers must then all check if the resource is sensitive and then not leak the clear text values. Here, the Agent side handling may be the most difficult to secure.