[SERVER-584] Ability to specify arbitrary values for environment variables in JRuby container Created: 2015/04/29  Updated: 2016/09/08  Resolved: 2016/08/12

Status: Closed
Project: Puppet Server
Component/s: None
Affects Version/s: None
Fix Version/s: SERVER 2.6.0

Type: Task Priority: Normal
Reporter: Jeremy Barlow Assignee: Unassigned
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Relates
relates to SERVER-1250 puppetserver subcommands gem, ruby an... Resolved
relates to SERVER-377 "puppetserver gem" command doesn't wo... Closed
relates to SERVER-297 Consolidate environment variable hand... Closed
Template:
Epic Link: Improve handling of environment variables
Sub-team: emerald
Story Points: 3
Sprint: Server Emerald 2016-08-10, Server Emerald 2016-08-24
Release Notes: New Feature
Release Notes Summary: Puppet Server now supports the ability to specify a whitelist of environment variables which are made available to Ruby code. This can be done via the 'environment-vars' section under the 'jruby-puppet' configuration section.
QA Contact: Kurt Wall

 Description   

This is an offshoot of the environment handling consolidation discussed in SERVER-297.

During the initial development of Puppet Server, a conscious decision was made to shield any Ruby code being executed within Puppet Server’s Java process from directly inheriting any shell environment variables. The primary factor motivating this decision was a fear about the potential contamination of the Puppet Server JRuby runtime by shell environment variables configured for execution under MRI Ruby.

For example, gems with native C extensions require different implementations for the MRI vs. JRuby execution environments. Assuming the default value for a system's GEM_HOME environment variable, if defined, would most commonly contain a path under which MRI-compatible gems reside, we thought it would be better for the JRuby-based Puppet Server to maintain its own setting for this path which would be completely independent from the shell environment. This is captured under a setting called “gem-home” in the “jruby-puppet” section of the “puppetserver.conf” file. Effectively, the value for this “gem-home” setting is injected into the configuration of the JRuby container in Puppet Server such that Ruby code sees it as the value of the GEM_HOME "environment variable”.

In current Puppet Server code, GEM_HOME is the only “environment variable” that any Ruby code would see. All other environment variables which may have been active in the shell under which the Puppet Server process was launched - PATH, USER, etc. - are removed / not made visible to any Ruby code running within the Puppet Server process.

We realized later, however, that by not providing a mechanism for Puppet Server to configure other environment variables into a JRuby container that some Ruby code would be impossible to configure under Puppet Server. For example, assume that your Puppet code needed to make use of a gem that is designed to only be able to read its configuration from an environment variable called FOO. Under current Puppet Server code, it would not be possible for the gem to get a value for this variable.

To address this use case while preserving the ability to avoid contaminating the JRuby environment with content intended only for use with MRI Ruby, we’ve discussed the possibility of adding an “environment variable” map to the “jruby-puppet” section of the “puppetserver.conf” file.

For example, assume that an init script had defined a shell environment variable as….

export FOO=for_mri_ruby

… whereas the “jruby-puppet” section of Puppet Server’s “puppetserver.conf” file were to have:

jruby-puppet {
  gem-home: /var/lib/puppet/jruby-gems 
  environment-variables: { 
     FOO: for_jruby 
  }
  ...
}

When the gem mentioned earlier were to access ENV[‘FOO’] while running under Puppet Server, the gem would get a value of “for_jruby”. This would happen because the source of the environment variable for the JRuby container would be drawn from the configuration file as opposed to the actual shell environment.

The particulars of the API design may need to change a bit before implementation but this ticket is intended to capture the work needed to allow for the population of "custom" environment variables into a JRuby container.



 Comments   
Comment by Jeremy Barlow [ 2015/07/15 ]

The Puppet Server team has been discussing this one a bit more and have honed in on a couple of basic approaches that we're considering. Pinging a few folks that have been involved in the previous discussions on this topic for any feedback they may have in favor of one or the other approaches listed below - Owen Rodabaugh, Geoff Williams, Nan Liu, Nick Walker, Charlie Sharpsteen, Eric Sorenson, Archana Sridhar, and Aaron Armstrong.

1) whitelist-based approach, fairly close to what Geoff Williams had suggested here - https://tickets.puppetlabs.com/browse/SERVER-297?focusedCommentId=162370&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-162370.

With this approach, the values for any environment variables with a pre-defined setting in jruby-puppet would be seen by the running Ruby code in Puppet Server. Today, the only variable this applies to is gem-home. By default, no other environment variables would be set to a value within the context of the running Ruby code. Through an optional environment-variable-whitelist, however, the user would have the ability to specify a list of environment variables whose values would be populated into Ruby code directly from the values set in the shell environment.

Consider the following jruby-puppet configuration for Puppet Server:

jruby-puppet {
    gem-home: /my/path/for/jruby
    environment-variable-whitelist: [
      FOO,
      bar,
      GEM_PATH
    ]
    ...
  }

... and that an init script upstream from the start of the Puppet Server process had set:

export GEM_HOME=/my/path/for/mri-ruby
export GEM_PATH=/my/path/for/mri-ruby
export RUBYLIB=/my/path/for/mri-ruby
export FOO=FOOVAL
export bar=barval
export BAR=BARVAL

For this example, the Ruby code running in Puppet Server would see:

ENV['x'] Value
GEM_HOME /my/path/for/jruby
RUBYLIB <nil>
FOO FOOVAL
bar barval
BAR <nil>
GEM_PATH /my/path/for/mri-ruby

The only significant distinction between this and what Geoff Williams had proposed is that we wouldn’t immediately be adding in a way to specify values for other arbitrary variables in the configuration file. (This wouldn’t necessarily preclude adding on such an approach later if needed but wouldn’t seem to be strictly required upfront).

2) blacklist-based approach

With this approach, most environment variables would be flowed from the shell environment through to the running Ruby code. A blacklist would be used to wall off a subset of environment variables that would be undesirable to have populated into the running Ruby code. The blacklist would be optional. If not provided, Puppet Server would blacklist a default set of variables embedded in code. We’re thinking of starting with just a set of variables that are commonly used by the Ruby runtime which could suffer from the MRI/JRuby pollution problem that we’re concerned about - GEM_HOME, GEM_PATH, RUBYOPT, RUBY_OPTS, and RUBYLIB. We’d document the default set which would be used and could expand this over time if it were proven necessary. If the blacklist were defined in the configuration, only the entries in that list would be honored.

Consider the following jruby-puppet configuration for Puppet Server:

jruby-puppet {
    gem-home: /my/path/for/jruby
    ...
  }

... and that an init script upstream from the start of the Puppet Server process had set:

export GEM_HOME=/my/path/for/mri-ruby
export GEM_PATH=/my/path/for/mri-ruby
export RUBYLIB=/my/path/for/mri-ruby
export FOO=FOOVAL
export bar=barval
export BAR=BARVAL

For this example, the Ruby code running in Puppet Server would see:

ENV['x'] Value
GEM_HOME /my/path/for/jruby
RUBYLIB <nil>
FOO FOOVAL
bar barval
BAR BARVAL
GEM_PATH <nil>

If the jruby-puppet configuration were changed to the following…

jruby-puppet {
    gem-home: /my/path/for/jruby
    ...
    environment-variable-blacklist: [
      FOO,
      bar,
      GEM_PATH
    ]
  }

… the Ruby code running in Puppet Server would see:

ENV['x'] Value
GEM_HOME /my/path/for/mri-ruby
RUBYLIB /my/path/from/mri-ruby
FOO <nil>
bar <nil>
BAR BARVAL
GEM_PATH <nil>

A few considerations:

a) The blacklist approach optimizes for what is, arguably, more standard Linux ecosystem tool behavior of allowing arbitrary shell environment variables to be flowed through to the running application code, allowing environment variables to win over configuration settings (where not prohibited by a blacklist entry).

b) The whitelist approach optimizes more for “safety” in terms of prohibiting environment variables that might have been established for proper use with MRI Ruby from contaminating the JRuby environment. With the blacklist approach, there probably would be some continued discussion about which variables should be in the default set. That debate would be irrelevant with the whitelist approach.

c) Both approaches should provide “power users” with the ability to overcome default behaviors and have environment variables be populated as the user sees fit, including the *_proxy variables being discussed in SERVER-377.

We’re not unanimous in this opinion but the majority of those who have voiced an opinion within the team have been leaning toward the blacklist approach. Would like to get feedback from others, though, as to whether there’s a preference for one or the other solution - or if we’ve completely missed the boat in these discussions. Understanding that a number of other solutions have been proposed as well, we’re sensitive about meeting essential user needs without overly complicating the configuration.

Comment by Chris Price [ 2015/07/16 ]

I think it is worth explicitly calling out that with a blacklist approach, there is the possibility of environment variables that we hadn't thought of flowing through the system and causing unexpected behaviors that are very challenging to identify / diagnose / debug, and that resulting in support escalations. The blacklist approach would be a much more significant change from the current behavior of Puppet Server than would the whitelist approach, since we currently filter out most environment variables.

I would extend the phrasing about "optimizing for safety" to say "optimizing for avoiding risk w/rt tricky support escalations".

The blacklist approach probably would be more intuitive for power users, so weighing that against the risk seems like the key decision to be made here.

Comment by Jeremy Barlow [ 2015/07/16 ]

I should also mention that in the latest Puppet Server code, we already have a hard-coded (non-externally configurable) whitelist. This includes just the HOME and PATH environment variables.

If we were to go with a configurable whitelist like is mentioned above, it would raise a couple of additional points related to the current list...

1) Assuming we would want the environment-variable-whitelist to be "optional", then, for backward compatibility, I'd assume we'd want the default behavior to still implicitly whitelist the HOME and PATH environment variables.

2) If a user were to explicitly define an environment-variable-whitelist, would it be better for the user-configured list to just "augment" or to "replace" the internal list?

In other words, would the user have to add HOME and PATH into the custom environment-variable-whitelist in order for those to be set when running in Puppet Server? I can't think of a good case for why someone would want to have the HOME and PATH variables not be set by Puppet Server. (If anything, I think it would case a number of problems like Puppet Server not being able to start or for various CLI tools to not function properly).

That said, though, there's also the related ticket about the *_proxy environment variables in SERVER-377 to consider. For the case that the environment-variable-whitelist were not set by a user in configuration, I would think the expectation would be that the *_proxy environment variables – like HOME and PATH – would still implicitly be whitelisted. If an environment-variable-whitelist were set by a user in configuration, would it be reasonable to assume that the *_proxy environment variables would also still be whitelisted even if not explicitly put into that list? I haven't heard of a case yet where users have specifically said they don't want the *_proxy environment variables to be whitelisted.

From an ease of use and practicality perspective, I'd be more inclined to just have the environment-variable-whitelist in configuration just "augment" the default whitelist. From the perspective of allowing total control for "power users", though, I could understand the argument of just having the environment-variable-whitelist, when set in configuration, be the single-source of truth for what is whitelisted. It does seem that a "power user" could black-out an undesired setting at a higher level than the Puppet Server configuration anyway - via service init script modification, etc. I'd lean a little toward the "augment" approach in this case but really don't have a strong opinion on this.

Chris Price - any preferences on this?

All of the above would only pertain to the whitelist approach. With the blacklist approach, these standard/safe variables would just flow through along with any others not explicitly prohibited by the blacklist.

Comment by Chris Price [ 2015/08/06 ]

I dunno, I'd say keep HOME and PATH on the implicit whitelist and document it as such, but I don't have a strong opinion. Mostly just want to make a decision on this and stick to it.

Comment by Jeremy Barlow [ 2016/08/10 ]

Merged to puppetserver#master at 6e7d274.

Comment by Kurt Wall [ 2016/08/12 ]

Vis-â-vis Chris Price's last remark, it appears the decision was to go with a whitelist?

Comment by Chris Price [ 2016/08/12 ]

@krw yes, that is what we ended up implementing.

Comment by Kurt Wall [ 2016/08/12 ]

Given the following ruby code and puppetserver.conf file:

# x.rb
puts ENV['JENNY']
puts ENV['FOO']
 
# /etc/puppetlabs/puppetserver/conf.d/puppetserver.conf
{"jruby-puppet":{"environment-vars":{"JENNY":"867-5309"},"ruby-load-path":["/opt/puppetlabs/puppet/lib/ruby/vendor_ruby"],"gem-home":"/opt/puppetlabs/server/data/puppetserver/jruby-gems","master-conf-dir":"/etc/puppetlabs/puppet","master-code-dir":"/etc/puppetlabs/code","master-var-dir":"/opt/puppetlabs/server/data/puppetserver","master-run-dir":"/var/run/puppetlabs/puppetserver","master-log-dir":"/var/log/puppetlabs/puppetserver","use-legacy-auth-conf":true},"http-client":{},"profiler":{}, versioned-code : { }}

The specified value of JENNY was available in a ruby script run under a puppetserver jruby instance, as expected. Also as expected, the value of the envvar FOO was not available under a puppetserver jruby instance:

# export JENNY=bogus
# export FOO=fooval
# puppetserver ruby x.rb
867-5309
 
# echo $JENNY
bogus
# echo $FOO
fooval
# puppetserver ruby -e "puts ENV['JENNY']"
867-5309
# puppetserver ruby -e "puts ENV['FOO']"

Comment by Kurt Wall [ 2016/08/12 ]

Closing after manual verification.

Generated at Fri Aug 07 09:35:24 PDT 2020 using Jira 8.5.2#805002-sha1:a66f9354b9e12ac788984e5d84669c903a370049.