[HI-559] Danish UTF-8 chars kills Puppet Server in v2.7.2 (not in 2.5.0) Created: 2017/02/07  Updated: 2017/03/03  Resolved: 2017/03/03

Status: Resolved
Project: Hiera
Component/s: None
Affects Version/s: HI 3.2.0
Fix Version/s: None

Type: Bug Priority: Normal
Reporter: Klavs Klavsen Assignee: Joe Pinsonault
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File puppetserver-debug3.log     File puppetserver-trace.log     File server-1724-example.tar.bz2    
Issue Links:
duplicates HI-519 Hiera Filecache doesn't support encod... Closed
relates to PUP-7195 on huawei: puppet lookup / hiera fail... Closed
Team: Systems Engineering
Story Points: 3
Sprint: SE 2017-02-22, SE 2017-03-08
QA Risk Assessment: Needs Assessment


This content in my hieradata/common.yaml
realname: "Lisbeth Ammitzbøll Hansen"
uid: '3004'

Kills puppet (after I upgraded from 2.5.0 who had no issues with it) to 2.7.2 - with this error:
Error: Could not retrieve catalog from remote server: Error 500 on SERVER: Server Error: Evaluation Error: Error while evaluating a Resource Statement, Lookup of key 'common::system::enable_selinux' failed: DataBinding 'hiera': ""\xC3"" on US-ASCII

Comment by Henrik Lindberg [ 2017/02/07 ]

Ping Klavs Klavsen, how did you manage to test this on Puppet 4.9.2 - it is not released yet. Are you running from source?
The problem sounds similar to older issues where hiera data was read without proper UTF-8 encoding being enforced.

Ping Thomas Hallgren, can you think of anything here that could be a regression?

Comment by Thomas Hallgren [ 2017/02/07 ]

Klavs Klavsen, I tested the above in a yaml file with the latest puppet. It finds and presents the 'realname' without problems.
An od -c on the relevant part gives me:

A   m   m   i   t   z   b 303 270   l   l

Is that what you have in your common.yaml too?

Comment by Klavs Klavsen [ 2017/02/07 ]

It was late - so I wrote the puppetserver version in the subject.. googleing couldn't give me a hit showing which puppet version that is
I did restart puppetserver (twice to be sure).

I just looked a bit further, and that exact array, is not loaded via autoloading, but loaded using:
$users = hiera_hash('eit_users', {})

so its the hiera_hash function most likely.

Comment by Lindsey Smith [ 2017/02/08 ]

I moved this to the server project, but perhaps that was premature.

Comment by Henrik Lindberg [ 2017/02/08 ]

Klavs Klavsen It is not totally clear what you are running, and how you manage to trigger the problem you are seeing.
We have not been able to reproduce the problem on the version you say you are using.

Comment by Rune Juhl Jacobsen [ 2017/02/08 ]

Hi Henrik Lindberg,

Puppetserver is 2.7.2:

[root@puppet01 ~]# /opt/puppetlabs/bin/puppetserver --version
puppetserver version: 2.7.2

agent is 4.8.1:

[root@test02 ~]# /opt/puppetlabs/bin/puppet --version

Thomas Hallgren, yes, we have the same in that file:

$ grep Lisbeth  envs/production/hieradata/common.yaml  | hexdump -C
00000000  20 20 20 20 72 65 61 6c  6e 61 6d 65 3a 20 22 4c  |    realname: "L|
00000010  69 73 62 65 74 68 20 41  6d 6d 69 74 7a 62 c3 b8  |isbeth Ammitzb..|
00000020  6c 6c 20 48 61 6e 73 65  6e 22 0a                 |ll Hansen".|

(0xc3 0xb8 == 0303 0270)

Don't hesitate to ask if there's anything else we can do to help.

Comment by Thomas Hallgren [ 2017/02/08 ]

I tried with hiera_hash too, just to be sure. I still cannot reproduce the problem. Also, how a lookup of 'eit_users' triggers an error with "Lookup of key 'common::system::enable_selinux' failed" is incomprehensible.

I tried with Puppet 4.8.1 too. Still not able to reproduce.

Comment by Rune Juhl Jacobsen [ 2017/02/08 ]

Thomas Hallgren agreed, I don't see what the enable_selinux variable has to do with this. Changing the ø to oe makes the error go away and everything work fine...

Comment by Henrik Lindberg [ 2017/02/08 ]

Rune Juhl Jacobsen Thomas Hallgren The diff may be the locale setting that defines the default locale of the server.

Comment by Thomas Hallgren [ 2017/02/08 ]

Rune Juhl Jacobsen, can you please enable debug logging on the server and convey the server log where the error is triggered?

Comment by Rune Juhl Jacobsen [ 2017/02/08 ]

Example environment which exhibits the same behavior

Comment by Rune Juhl Jacobsen [ 2017/02/08 ]

I created a tiny test environment that reproduces the error: server-1724-example.tar.bz2

The error message is a bit different though:

Error: Could not retrieve catalog from remote server: Error 500 on SERVER: Server Error: Evaluation Error: Error while evaluating a Function Call, ""\xC3"" on US-ASCII at /etc/puppetlabs/code/envs/test/manifests/site.pp:2:8 on node rune-test-rsnapshot.petardo

I'll enable debug logging and try again.

Comment by Rune Juhl Jacobsen [ 2017/02/08 ]

puppetserver-debug3.log is a log with debug output. There are a few puppet runs in there – didn't want to redact too much. The puppet run with errors are on rune-test-rsnapshot.

If I've cut too much away, let me know and I'll provide the full log. The lines in the file are the ones with identifiers qtp2111054879-103 and qtp2111054879-106.

The puppetservers locale is as follows:

[root@puppet01 envs]# locale

Comment by Thomas Hallgren [ 2017/02/08 ]

Unfortunately, that debug log doesn't contain exception traces, so the only thing it tells us is that it fails to evaluate a function call at:
Is that where the hiera_hash call is?

Rune Juhl Jacobsen, would it be possible for you to also enable exception trace output? I think that is done by using <root level="trace"> instead of <root level="debug"> in the logback.xml.

Comment by Rune Juhl Jacobsen [ 2017/02/08 ]

Thomas Hallgren of course – here's one: puppetserver-trace.log

Yes, that's where the hiera_hash call is – I've attached the env as server-1724-example.tar.bz2.

Comment by Thomas Hallgren [ 2017/02/08 ]

Great. Thanks! Now I think the problem is pin-pointed. It's Jruby specific, which explains why it's impossible to reproduce using the hiera or puppet CLIs. The culprit is the java class org.jruby.ext.psych.PsychParser that is bundled with the puppet-server-release.jar. I'll drop the ball here, to be picked up by someone on the Puppet Server team.
Ping Jeremy Barlow.

2017-02-08 12:40:53,906 ERROR [qtp2111054879-106 - /puppet/v3/catalog/rune-test-rsnapshot.petardo?environment=test] [puppetserver] Puppet Evaluation Error: Error while evaluating a Function Call, ""\xC3"" on US-ASCII at /etc/puppetlabs/code/envs/test/manifests/site.pp:2:8 on node rune-test-rsnapshot.petardo
:in `parse'
/opt/puppetlabs/server/apps/puppetserver/puppet-server-release.jar!/META-INF/jruby.home/lib/ruby/1.9/psych.rb:375:in `parse_stream'
/opt/puppetlabs/server/apps/puppetserver/puppet-server-release.jar!/META-INF/jruby.home/lib/ruby/1.9/psych.rb:323:in `parse'
/opt/puppetlabs/server/apps/puppetserver/puppet-server-release.jar!/META-INF/jruby.home/lib/ruby/1.9/psych.rb:250:in `load'
/opt/puppetlabs/puppet/lib/ruby/vendor_ruby/hiera/backend/yaml_backend.rb:19:in `lookup'

Comment by Rune Juhl Jacobsen [ 2017/02/08 ]

You're very welcome – we appreciate your effort!

Comment by Joe Pinsonault [ 2017/02/23 ]

A small update, sorry for the delay.

Well I've tried to reproduce the error using the provided test environment under puppetserver 2.7.2, but I'm not having any luck

I also tried directly using the puppetserver ruby command directly, which uses the jruby version puppetserver is using. I used this yaml file and ruby script, but it doesn't produce an error

test: "ø"

require 'psych'
puts Psych.load(open("test.yml"))

However, if I don't load the file through the open function and give it the string directly, it errors

require 'psych'
puts Psych.load('test: "ø"')

puppetserver ruby repro.rb outputs

SyntaxError: repro.rb:3: invalid multibyte char (US-ASCII)

This seems to indicate that there might be a problem determining the character encoding when hiera loads the file, but I'm still not sure why you are having this problem but we can't reproduce it locally when running it through puppetserver.

Comment by Jeremy Barlow [ 2017/02/23 ]

Joe Pinsonault Thanks for that update. From the stack trace above, it appears that hiera is calling YAML.load with a string it read back from a file. This makes me wonder if this issue is the same as one Moses Mendoza had pointed out to me, HI-519, where hiera itself was reading a file from disk without specifying UTF-8 encoding.

Thomas Hallgren, I wonder if the reason you may not have been able to reproduce the problem when running under MRI Ruby is that you might have been running with a version of Hiera which has the fix for HI-519 rather than Hiera 3.3.0, which is what was delivered in the latest version of the puppet-agent package, 1.9.1.

From this comment, it appears that the fix for HI-519 would land in Hiera 3.3.1 and presumably the next Puppet Agent release?

Comment by Thomas Hallgren [ 2017/02/24 ]

Jeremy Barlow, I've tested with and without the fix in HI-519. As long as UTF-8 is the default encoding (and Rune Juhl Jacobsen writes that it is), it makes no difference.

[thhal@home puppet]$ LANG=en_US.UTF-8 puppet lookup --explain --environment server_1724 eit_users
Data Binding "hiera"
  Found key: "eit_users" value: {
    "test" => {
      "realname" => "ø"

Comment by Jeremy Barlow [ 2017/02/24 ]

Hmm, interesting. Using puppetserver ruby, which just runs arbitrary code under JRuby like the full Puppet Server stack does, I played around with loading yaml from a file that has the latin small letter o with stroke character (U+00F8) (0xC3B8) character in it. I had Puppet Server 2.7.2 and Puppet Agent 1.9.1 (which contains Hiera 3.3.0) installed.

# xxd /opt/puppetlabs/server/data/puppetserver/myfile.yaml
0000000: 7465 7374 3a20 22c3 b822 0a              test: "..".

I used a File.read to read the file content without specifying an encoding, like Hiera 3.3.0 does.

First, I tried with the LANG environment variable set to "en_US.UTF-8". This seemed to work fine:

# LANG=en_US.UTF-8 /opt/puppetlabs/bin/puppetserver ruby -ryaml -e "puts YAML.load(File.read('/opt/puppetlabs/server/data/puppetserver/myfile.yaml'))" | xxd
0000000: 7b22 7465 7374 223d 3e22 c3b8 227d 0a    {"test"=>".."}.

If I change the LANG environment variable value to "en_US.ASCII", I get an error:

# LANG=en_US.ASCII /opt/puppetlabs/bin/puppetserver ruby -ryaml -e "puts YAML.load(File.read('/opt/puppetlabs/server/data/puppetserver/myfile.yaml'))"
Encoding::InvalidByteSequenceError: ""\xC3"" on US-ASCII
         parse at org/jruby/ext/psych/PsychParser.java:178
  parse_stream at /opt/puppetlabs/server/apps/puppetserver/puppet-server-release.jar!/META-INF/jruby.home/lib/ruby/1.9/psych.rb:375
         parse at /opt/puppetlabs/server/apps/puppetserver/puppet-server-release.jar!/META-INF/jruby.home/lib/ruby/1.9/psych.rb:323
          load at /opt/puppetlabs/server/apps/puppetserver/puppet-server-release.jar!/META-INF/jruby.home/lib/ruby/1.9/psych.rb:250

This seems to suggest that JRuby is honoring the LANG environment variable value when File.read is called with no encoding parameter, just like MRI Ruby does.

This makes me wonder whether the problem has to do with the LANG environment variable not being propagated from the shell environment into the Java environment of Puppet Server, running as a daemon. On my test VM, which was CentOS 7, I added the following lines to the /opt/puppetserver/server/apps/puppetserver/cli/apps/start script, which is the script that is run to launch the Puppet Server daemon when service puppetserver start is performed:

echo "LANG is $LANG" >/opt/puppetlabs/server/data/puppetserver/the-lang.txt
${JAVA_BIN} ${JAVA_ARGS} -Djava.security.egd=/dev/urandom \
  -cp ${INSTALL_DIR}/puppet-server-release.jar \
  clojure.main -m puppetlabs.puppetserver.cli.ruby \
  --config "${CONFIG}" -- -ryaml \
  -e "puts YAML.load(File.read('/opt/puppetlabs/server/data/puppetserver/myfile.yaml'))" \
  >/opt/puppetlabs/server/data/puppetserver/myfile-read.txt 2>&1 &

The above basically runs what the same puppetserver ruby command from above runs, but with args like the full Puppet Server daemon process uses. I ran service puppetserver start to have the "ruby" command launched as a background process. In my case, again, though, it appeared that "en_US.UTF-8" was being passed through to the "ruby" command because the file was parsed correctly:

# cat /opt/puppetlabs/server/data/puppetserver/myfile-read.txt | xxd
0000000: 7b22 7465 7374 223d 3e22 c3b8 227d 0a    {"test"=>".."}.
# cat /opt/puppetlabs/server/data/puppetserver/the-lang.txt
LANG is en_US.UTF-8

This makes me wonder if this problem is only reproducible if the shell environment that the Puppet Server process is running under is somehow different between what I am using vs. what Rune Juhl Jacobsen is using.

Under what OS distribution(s) has this been reproduced? If instrumentation similar to what I did above is done, do you see results which are the same as what I am seeing?

Is the problem no longer reproducible with Puppet Server if line 52 in /opt/puppetlabs/puppet/lib/ruby/vendor_ruby/hiera/filecache.rb is changed from the following:

data = File.read(path)

... to ...

data = File.read(path, :encoding => 'BOM|UTF-8')

If so, this would suggest that the forthcoming Hiera release would fix this specific problem. It still would be good to try to figure out why Puppet Server / JRuby doesn't seem to be processing the LANG environment variable in some environments, though.

Comment by Rune Juhl Jacobsen [ 2017/03/02 ]

Hi Jeremy Barlow,

Changing line 52 in /opt/puppetlabs/puppet/lib/ruby/vendor_ruby/hiera/filecache.rb as described fixed the issue. For reference, this Puppetserver is running on CentOS Linux release 7.2.1511.

Comment by Jeremy Barlow [ 2017/03/03 ]

Thanks for confirming, Rune Juhl Jacobsen. At this point, it appears that the immediate problem of the hiera yaml with the "latin small letter o with stroke character" not being parsed correctly should be resolved when the fix for HI-519 is delivered. I still am unaware of a way to reproduce the behavior seen with the locale environment variable values not controlling how a file is parsed in JRuby when no character encoding is specified in code. Without a way to progress on this further, I'm inclined to move this particular issue over to the HI project in JIRA and close as a duplicate of HI-519. Sound okay to others here?

Klavs Klavsen, since you had originally reported this issue, can you confirm that this addresses the issue for you?

Comment by Rune Juhl Jacobsen [ 2017/03/03 ]

Jeremy Barlow you're very welcome! I work with Klavs Klavsen, so his issue is my issue

Comment by Jeremy Barlow [ 2017/03/03 ]

Excellent, thanks again, Rune Juhl Jacobsen. Given that, I'm going to close this one out.

Generated at Sun Feb 16 18:20:40 PST 2020 using JIRA 7.7.1#77002-sha1:e75ca93d5574d9409c0630b81c894d9065296414.