Details
-
Bug
-
Status: Closed
-
Normal
-
Resolution: Fixed
-
None
-
None
-
3
-
Windows 2016-01-27, Windows 2016-02-10
-
Bug Fix
-
Description
As part of PUP-2564, we changed our language to state that manifest files must be in UTF-8. However, this is not enforced when we actually lex files with the parser, and it causes a host of issues on Windows in particular.
https://docs.puppetlabs.com/puppet/latest/reference/lang_summary.html#files
As josh alluded to, Windows will read the file then treat it as whatever the current codepage is, and this carries a number of issues.
For instance, take the manifest
user { 'Umlautä':
|
ensure => present,
|
password => 'password'
|
}
|
If running with the codepage 437, the string produced by Puppet::FileSystem.read is incorrect for subsequent usage as ä gets turned into \xC3\xA4:
"user { 'Umlaut\xC3\xA4':\n ensure => present,\n password => 'password'\n}\n"
|
So while the user is created, the name is Umlaut├ñ instead of Umlautä due to the way the converted bytes are represented when making the appropriate conversions (i.e calling .encode('utf-8') on the above string produces:
"user { 'Umlaut\u251C\u00F1':\n ensure => present,\n password => 'password'\n}\n"
|
If instead the local codepage is set to 65001, which is Unicode... then the behavior is correct. However, this is not something we should expect users to do, and we should treat the incoming file as UTF-8 as that's what our documentation specifies / and what we claim to expect.
The prior PUP-2564 ticket mentioned the inability of specifying an encoding during file reading at https://tickets.puppetlabs.com/browse/PUP-2564?focusedCommentId=125526&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-125526. I think that's a problem that can be easily fixed, at least in this narrow use case to start.
I spiked a very quick solution to this problem for the sake of discussion. With the change, the parsed string is now represents ä correct as \u00E4:
"user { 'Umlaut\u00E4':\n ensure => present,\n password => 'password'\n}\n"
|
I know that it fixes the problem with the manifest above (i.e. you can run any local codepage and the correct user name is used during creation based on the UTF8 file contents). However, there could be some additional fallout, and there are certainly other places where files are read that could take similar tacts.
Attachments
Issue Links
- is duplicated by
-
PUP-4059 Managing a innexistant Windows service result in error "Could not evaluate"
-
- Closed
-
-
PUP-5733 The Puppet Parser should ensure files are read with UTF-8
-
- Closed
-
- relates to
-
PUP-2937 Windows: Unable to reference packages with UTF-8 Characters in their names
-
- Closed
-
-
PUP-5819 Lexer should raise a better error when loading manifest files containing a UTF-8 BOM (Byte Order Mark)
-
- Closed
-
-
PUP-5731 Specify that Puppet Language Source must be in UTF-8
-
- Resolved
-
-
PUP-5851 FR - Puppet Lookup with Unicode Facts File
-
- Resolved
-
-
PUP-5538 Puppet fails to convert Windows Unicode group or user names to sids
-
- Closed
-
-
PUP-5879 Ensure Puppet uses FileSystem.read where applicable to read JSON, settings and other files as UTF-8
-
- Closed
-
-
PUP-5884 Internationalization for Windows
-
- Closed
-
-
PUP-2564 Specify UTF-8 as a default encoding for puppet
-
- Closed
-
- supports
-
PUP-4361 4.x Language - Puppet 4 Language fixes and enhancements
-
- Closed
-