[PUP-5584] Cached catalogs are loaded using the agent's string locale which can result in corrupted data Created: 2015/12/07  Updated: 2019/04/04  Resolved: 2016/01/20

Status: Closed
Project: Puppet
Component/s: None
Affects Version/s: PUP 3.8.4, PUP 4.2.3
Fix Version/s: PUP 4.3.2

Type: Bug Priority: Normal
Reporter: Zee Alexander Assignee: Unassigned
Resolution: Fixed Votes: 0
Labels: support
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File Screenshot 2015-12-07 15.23.05.png     File chocolatey.config.erb    
Issue Links:
Relates
relates to PUP-5727 acceptance: cached catalog should all... Resolved
Template:
Epic Link: Unicode Encodings
Story Points: 3
Sprint: Client 2016-01-13, Client 2016-01-27
CS Priority: Major
CS Frequency: 3 - 25-50% of Customers
CS Severity: 4 - Major
CS Business Value: 4 - $$$$$
CS Impact: This impacts the use of cached catalogs on windows, making them potentially destructive. If this is not resolved it will have future impacts on the proposed funtionality of direct puppet which relies on cached catalogs.

The behavior seen by users is that the initial puppet run works as expected, but if the following run uses a cached catalog it may cause misconfiguration. This is particularly bad in the Application Orchestration/Direct Puppet scenario because the initial deploy appears to work but then 30 minutes later the follow-up checkin uses the cached catalog and only then does this problem surface.
Release Notes: Bug Fix
Release Notes Summary: When a catalog contained inlined file content (typically from a template) with non-ASCII unicode characters, those characters could be corrupted when the agent used a cached catalog. This has been resolved for the JSON cache.
QA Contact: Eric Thompson

 Description   

If a cached catalog is applied, which contains a file resource that used the content parameter, and that content begins with a byte-order mark, the character will be rendered as garbage text when that cached catalog is applied on Windows.

This does not occur during normal master-agent puppet runs, only when applying a cached catalog, and only on Windows.

Reproduction:

Download the attached ERB file. If you view it with vim -b chocolatey.confg you will see <feff> at the beginning, indicating the byte-order mark.

Use the template with a file resource, and deploy it to a Windows System, in any location.

View the file and it will appear normally.

Now, apply the cached catalog, either by disabling the puppet master and performing a puppet run (note that --test disables the use of cached catalogs), or via puppet apply C:\ProgramData\PuppetLabs\puppet\var\client_data\catalog\<certname>.json replacing <certname> with the test node's actual certname.

Open the file on disk, and it will look like so:

Note the junk characters at the beginning.

Tested with a CentOS 6.6 master running PE 3.8.3 (Puppet 3.8.4) and a Windows Server 2012 agent running PE 3.8.3. I also tested with both the 64 and 32 bit clients.
Also tested with the same setup, running PE 2015.2.3 (Puppet 4.2.3) on the master and Windows agent.

The current workaround would be to disable cached catalogs. I'm also working on an additional workaround that would not require this, using concat to retrieve the first part of the file via the source parameter, to avoid storing the BOM in the catalog.

Note: This is not a PE bug. I did my testing with PE, but this is not a PE-specific bug, as far as I can tell. It deals with generic Puppet functionality only. E.g. cached catalogs on Windows.



 Comments   
Comment by Zee Alexander [ 2015/12/07 ]

I'll test to see if this still affects 2015.2.3 shortly.

Edit: Confirmed, this does affect 2015.2.3 / Puppet 4.

Comment by Charlie Sharpsteen [ 2015/12/09 ]

This issue isn't specific to Windows and occurs on Linux as well. Basically, what is happening is that when the puppet agent loads a cached catalog from disk, it assigns a string encoding equal to whatever locale Ruby's Encoding.default_external is set to. This usually isn't noticeable on Linux because the server and agents are likely using the same locale and both likely using UTF-8. Windows agents are likely to be using a different locale than a Linux master. For example, my 2012 R2 box lists "IBM 437" as the default encoding:

C:\ProgramData\PuppetLabs\puppet\etc>cd "C:\Program Files\Puppet Labs\Puppet\sys\ruby\bin"
 
C:\Program Files\Puppet Labs\Puppet\sys\ruby\bin>ruby -e 'puts Encoding.default_external'
IBM437

IBM 437 is backwards-compatible with ASCII but not UTF-8. Therefore any UTF-8 content produced by the master will be garbled if rendered as IBM 437.

Re-production Case

Add the following file resource to site.pp:

$test_path = $::osfamily ? {
  'Windows' => 'C:/tmp/utf_test',
  default   => '/tmp/utf_test',
}
 
file { $test_path:
  content => @(UTF8)
    Mønti Pythøn ik den Hølie Gräilen
    Røtern nik Akten Di
    Wik
    Alsø wik
    Alsø alsø wik
    Wi nøt trei a høliday in Sweden this yër?
    See the løveli lakes
    The wøndërful telephøne system
    And mäni interesting furry animals
    Including the majestik møøse
    A Møøse once bit my sister...
    No realli! She was Karving her initials on the møøse with the sharpened end of an interspace tøøthbrush given her by Svenge - her brother-in-law - an Oslo dentist and star of many Norwegian møvies: "The Høt Hands of an Oslo Dentist", "Fillings of Passion", "The Huge Mølars of Horst Nordfink"...
 
    We apologise for the fault in the subtitles. Those responsible have been sacked.
    | UTF8
}

Run puppet agent -t. The correct contents should appear in /tmp/test.

For a Windows agent, run puppet agent -t --use_cached_catalog. For a POSIX agent, shift the locale into ISO-8859-1 (or other non-UTF8 locale listed by locale -a) by running LC_ALL=en_GB.iso88591 puppet agent -t --use_cached_catalog. The contents of /tmp/test will be replaced with corrupted text.

Comment by Zee Alexander [ 2015/12/09 ]

Charlie Sharpsteen would it be appropriate to update the bug description to more accurately reflect the issue/conditions under what it occurs, given your last comment? As things stand it might be misleading if you didn't read the comments.

Comment by Josh Cooper [ 2016/01/19 ]

Couple of notes about what Branan Riley and I talked about leading up to merging this.

  1. We call Puppet::Indirector::JSON#to_json to save the catalog to the cache. The method converts the catalog to a Hash, and calls into the PSON (pure json) library that we vendor. PSON serializes the nested structure, and for each string, e.g. hash value, calls String#to_json. The PSON library's utf8_to_pson method encodes the string as PSON, and returns it as an ASCII-8BIT encoded string. It appears the latest version of the upstream pson library returns a UTF-8 encoded string instead. We decided to not try to update our vendored PSON library right now.
  2. After serializing the catalog to pson, we force the encoding to ASCII-8BIT so that ruby will not transcode the contents when writing the pson-encoded string to the file (based on the default external encoding). This step is not strictly necessary, because due to the above issue, the pson-encoded catalog is already ASCII-8BIT encoded. However, that could change in the future, if/when we update our vendored pson.
  3. When reading in the cached catalog, ruby will force encode the string contents based on the current external encoding. On Windows, this depends on the current code page, e.g. CP437. On *nix, it is typically UTF-8 though doesn't have to be. In either case, we need to override ruby's force encoding so that the catalog we read in has is tagged with an encoding that matches what it started out in, i.e. ASCII-8BIT.
  4. Prior to this PR, when reading the cached catalog, we were feeding the Windows-specific encoded string into PSON.from_json. If given something other than ASCII-8BIT, then from_json calls source.encode(::Encoding::UTF_8) which causes the BOM corruption described in the ticket.
  5. In the future, we may want to patch our vendored pson library. At that point, we should force_encoding the catalog read from disk to UTF-8.

Additional info:

  1. replace_file uses Puppet::FileSystem::Uniquefile to open the file for writing, which defaults to text mode. On Windows, this will cause each \n to be written out as \r\n. However, to_pson escapes newlines, so the pson-encoded string contains the byte sequence [backslash, r, backslash, n]. As a result, no CRNL translation takes place on Windows, and we don't need to worry about text vs binary modes.
  2. The PSON library assumes utf8_to_json receives a UTF-8 encoded string, but that is not always the case. It appears that any data read from puppet.conf has an encoding based on the default external encoding:

On Windows:

Notice: Encoding US-ASCII: service_provider
Notice: Encoding UTF-8: windows
Notice: Encoding UTF-8: clientcert
Notice: Encoding Windows-1252: agent1
Notice: Encoding UTF-8: clientversion
Notice: Encoding UTF-8: 4.3.2

On *nix with an alternate encoding:

$ env LC_CTYPE=de_DE.ISO8859-1 bundle exec puppet agent -t --server arcturus
...
Notice: Encoding UTF-8: /var/root
Notice: Encoding US-ASCII: service_provider
Notice: Encoding UTF-8: launchd
Notice: Encoding UTF-8: clientcert
Notice: Encoding ISO-8859-1: agent2

We should file a separate ticket to ensure data read from settings files is converted to UTF-8 when read in.

Comment by Eric Thompson [ 2016/01/20 ]

validated on ubuntu12.04 at stable SHA: 49af8b004d8341e2f690b6629d42f043e0aa8b8a

root@e3cqp6105ksash0:~# puppet agent -t --server xr6scl53fsl3wii.delivery.puppetlabs.net
Info: Using configured environment 'production'
Info: Retrieving pluginfacts
Info: Retrieving plugin
Info: Caching catalog for e3cqp6105ksash0.delivery.puppetlabs.net
Info: Applying configuration version '1453315001'
Notice: /Stage[main]/Main/File[/tmp/utf_test]/ensure: defined content as '{md5}3f6c1e2b400d2e18b01ce60648ca83c9'
Notice: Applied catalog in 0.03 seconds
root@e3cqp6105ksash0:~# cat /tmp/utf_test
Mønti Pythøn ik den Hølie Gräilen
Røtern nik Akten Di
Wik
Alsø wik
Alsø alsø wik
Wi nøt trei a høliday in Sweden this yër?
See the løveli lakes
The wøndërful telephøne system
And mäni interesting furry animals
Including the majestik møøse
A Møøse once bit my sister...
No realli! She was Karving her initials on the møøse with the sharpened end of an interspace tøøthbrush given her by Svenge - her brother-in-law - an Oslo dentist and star of many Norwegian møvies: "The Høt Hands of an Oslo Dentist", "Fillings of Passion", "The Huge Mølars of Horst Nordfink"...
 
We apologise for the fault in the subtitles. Those responsible have been sacked.
 
 
# with a codepage not installed:
root@e3cqp6105ksash0:~# LC_ALL=en_GB.iso88591 puppet agent -t --use_cached_catalog --server xr6scl53fsl3wii.delivery.puppetlabs.net
2016-01-20 10:37:47.355419 WARN  puppetlabs.facter - locale environment variables were bad; continuing with LANG=C
Info: Using configured environment 'production'
Info: Retrieving pluginfacts
Info: Retrieving plugin
Error: Cached catalog for e3cqp6105ksash0.delivery.puppetlabs.net failed: Could not parse JSON data for catalog e3cqp6105ksash0.delivery.puppetlabs.net: Could not intern from pson: "\xC3" on US-ASCII
Notice: Using cached catalog
Info: Caching catalog for e3cqp6105ksash0.delivery.puppetlabs.net
Info: Applying configuration version '1453315068'
Notice: Applied catalog in 0.02 seconds
root@e3cqp6105ksash0:~# locale -a
C
C.UTF-8
en_AG
en_AG.utf8
en_AU.utf8
en_BW.utf8
en_CA.utf8
en_DK.utf8
en_GB.utf8
en_HK.utf8
en_IE.utf8
en_IN
en_IN.utf8
en_NG
en_NG.utf8
en_NZ.utf8
en_PH.utf8
en_SG.utf8
en_US.utf8
en_ZA.utf8
en_ZM
en_ZM.utf8
en_ZW.utf8
POSIX
root@e3cqp6105ksash0:~# LC_ALL=en_DK.utf8 puppet agent -t --use_cached_catalog --server xr6scl53fsl3wii.delivery.puppetlabs.net
Info: Using configured environment 'production'
Info: Retrieving pluginfacts
Info: Retrieving plugin
Notice: Using cached catalog
Info: Applying configuration version '1453315068'
Notice: Applied catalog in 0.02 seconds
root@e3cqp6105ksash0:~# cat /tmp/utf_test
Mønti Pythøn ik den Hølie Gräilen
Røtern nik Akten Di
Wik
Alsø wik
Alsø alsø wik
Wi nøt trei a høliday in Sweden this yër?
See the løveli lakes
The wøndërful telephøne system
And mäni interesting furry animals
Including the majestik møøse
A Møøse once bit my sister...
No realli! She was Karving her initials on the møøse with the sharpened end of an interspace tøøthbrush given her by Svenge - her brother-in-law - an Oslo dentist and star of many Norwegian møvies: "The Høt Hands of an Oslo Dentist", "Fillings of Passion", "The Huge Mølars of Horst Nordfink"...
 
We apologise for the fault in the subtitles. Those responsible have been sacked.

Generated at Tue Jan 28 09:15:10 PST 2020 using JIRA 7.7.1#77002-sha1:e75ca93d5574d9409c0630b81c894d9065296414.