[PUP-2700] Puppet 3.6.1 File recurse improperly handles spaces in filenames Created: 2014/05/29  Updated: 2018/04/03  Resolved: 2014/06/17

Status: Closed
Project: Puppet
Component/s: Types and Providers
Affects Version/s: PUP 3.6.0
Fix Version/s: PUP 3.7.0

Type: Bug Priority: Critical
Reporter: Tristan Smith Assignee: Unassigned
Resolution: Fixed Votes: 2
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Puppet 3.6.1 server, Puppet 3.6.1 client. Separate puppet server for catalog and puppet fileserver for this content, using passenger for both.


Attachments: File test-test_url_encode-0.1.0.tar.gz    
Issue Links:
Blocks
Relates
relates to PUP-3135 Square brackets in file names not han... Accepted
relates to PUP-1892 PR (2420) Puppet remote fileserver fa... Closed
relates to PUP-2687 File Names with "?" cause failures wi... Accepted
relates to PUP-1890 URI.unescape mangles UTF-8 paths prod... Closed
Template:
Story Points: 2
Sprint: Week 2014-6-4 to 2014-6-11, Week 2014-6-11 to 2014-6-18
QA Contact: Narmadha Perumal

 Description   

This one's new on me.
We're using file{ : recurse => remote} to pull a directory structure from a remote puppet fileserver. It would appear that there's a mismatch causing it to multiply urlencode resources it's recursing over, causing it to render a space ' ' as %2520 in the file_content call.

Here's a convenient, if ugly, example:

In the Fileserver:

[29/May/2014:14:15:18 -0700] "GET /production/file_content/puppet/python340/lib/python3.4/site-packages/setuptools/__pycache__/script%2520template%2520(dev).cpython-34.pyc HTTP/1.1" 404 147 "-" "-"

in the client:

Error 404 on SERVER: Not Found: Could not find file_content puppet/python340/lib/python3.4/site-packages/setuptools/__pycache__/script%20template%20(dev).cpython-34.pyc
Error: /Stage[main]/.../File[/opt/python340/lib/python3.4/site-packages/setuptools/__pycache__/script template (dev).cpython-34.pyc]/ensure: change from absent to file failed:
Could not set 'file' on ensure: 
Error 404 on SERVER: Not Found: 
Could not find file_content puppet/python340/lib/python3.4/site-packages/setuptools/__pycache__/script%20template%20(dev).cpython-34.pyc



 Comments   
Comment by Tristan Smith [ 2014/05/29 ]

This appears to be server-side, as it did not manifest with puppet 3.4.3 server and does still happen with 3.4.3 clients.

Comment by Kylo Ginsberg [ 2014/05/29 ]

Charlie Sharpsteen if you get a chance to look at this one, that would be great. We had another report of this added as a comment to PUP-2687 but I think those are separate issues. If this one is a regression in the 3.6.x series, it would be nice to verify that.

Comment by Charlie Sharpsteen [ 2014/05/30 ]

I haven't been able to reproduce this using a simple module containing files with spaces.

Tristan Helmich: Could you share a complete example of the failing resource declaration? Also, are you serving the files from a custom fileserver mount? If so, could you share an example of the fileserver.conf entry?

Comment by Charlie Sharpsteen [ 2014/05/30 ]

Oops, that should have been Tristan Smith, not Tristan Helmich. Twitchy tab completion finger.

Comment by Tristan Smith [ 2014/05/30 ]

Yeah, it's a custom mount on a separate fileserver. I'll do some isolation work and get you a simplified example.

Comment by Trent Lloyd [ 2014/06/04 ]

Running into this here. Puppet 3.6.1-1puppetlabs1 Ubuntu 14.04. This worked fine on Puppet 3.4.

There seems to be two bugs here

(1) 'puppet master' appears to be processing encoded url entities differently than Apache2 with Passenger. These are possibly being decoded by Apache but not 'puppet master' but that is just a guess.
(2) Puppet 3.6 is either URL escaping an extra time, or (more likely) failing to URL un-escape the path for file_content, where as it works as expected for file_metadata.

Test Manifest 1: This works with 'puppet master' but fails with 'apache2+passenger'

    file{'/tmp/x': ensure => present, source => 'puppet:///files/lathiat file'}

Test Manifest 2: This fails with both, and you see %2520 (encoded %20) under 'puppet master' for
file_content (and %20 for file_metadata)

    file{'/tmp/x': ensure => present, source => 'puppet:///files/lathiat%20file'}

The following applies to Test 1:
file_metadata shows the path with a space, and file_content shows %20. This appears both in the debug log and in the actual file access as shown below.
Only the file_content stage is failing, file_metadata passes.

[pid 18295] stat("/etc/puppet/files/lathiat file", {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
[pid 18295] lstat("/etc/puppet/files/lathiat file", {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
[pid 18295] open("/etc/puppet/files/lathiat file", O_RDONLY) = 12
[pid 18295] stat("/etc/puppet/files/lathiat%20file", 0x7f608bf2d540) = -1 ENOENT (No such file or directory)
[pid 18295] lstat("/etc/puppet/files/lathiat%20file", 0x7f608bf2d540) = -1 ENOENT (No such file or directory)

Debug: Did not match path ("/development/file_metadata/files/lathiat%20file")
Debug: Did not match path ("/development/file_content/files/lathiat%2520file")

The source file must exist to trigger the error. If the file does not exist you get a different error as expected:

Error: /Stage[main]/Main/Node[default]/File[/tmp/x]: Could not evaluate: Could not retrieve information from environment development source(s) puppet:///files/lathiat file

The destination file must also not exist to ensure it retrieves a copy of the content and not just the metadata.

rack/config.ru in case it matters:

$0 = "master"
ARGV << "--rack"
ARGV << "--confdir" << "/etc/puppet"
ARGV << "--vardir"  << "/var/lib/puppet"
require 'puppet/util/command_line'
run Puppet::Util::CommandLine.new.execute

Comment by Trent Lloyd [ 2014/06/04 ]

Probably obvious but this issue is not recurse specific, though how it handles the escaping may come into play to when the error is triggered based on the environment differences noted above. I'm getting this with a plain old single file source.

Not sure if it's acceptable for me to change the title, so I'll leave that up to a dev

Comment by Josh Cooper [ 2014/06/04 ]

Trent Lloyd thanks for great debugging information!

file_metadata shows the path with a space, and file_content shows %20. This appears both in the debug log and in the actual file access as shown below. Only the file_content stage is failing, file_metadata passes.

To me that means file_metadata requests are not correctly URL encoded on the agent, and I suspect that the puppet master is not URL decoding the value. So source => 'puppet:///files/lathiat file' works when using the puppet master, because both agent and master are misbehaving. We'll need to dig into this some more to figure out what's going on.

Comment by Charlie Sharpsteen [ 2014/06/05 ]

Thanks for the debugging info everyone! Dug into this some more and it appears that running Puppet under Passenger is a key to re-producing this problem. I was able to run a git bisect and determined that this behavior did indeed regress in 3.6.0.

The specific commit is 89b8ac1. Tristan Smith: this appears to be one of yours, any idea what might be going wrong?

Comment by Kylo Ginsberg [ 2014/06/06 ]

That makes sense. Before that commit the code was just source.to_s, now it's going through the uri method at https://github.com/puppetlabs/puppet/blob/3.6.1/lib/puppet/type/file/source.rb#L211-L213 which both parses and escapes. It seems like the intent was to get the parsing, but not the escaping.

Comment by Charlie Sharpsteen [ 2014/06/06 ]

So, the core issue appears to be taking place on the client in Type::File::Content#get_from_source. Prior to commit 89b8ac1, we would take the string representation of a path, such as dir/foo bar, and pass it to indirection2uri which would perform escaping and return the path with an escaped space:

dir/foo%20bar

Now, we take the URI representation, which is escaped, and end up with a double-escaped request after indirection2uri is finished:

dir/foo%2520bar

The string %2520 is the result of %-escaping a space to get %20 and then %-escaping the percent sign to get %2520. The result of this double encoding is that the server tries to find a file named dir/foo%20bar instead of dir/foo bar.

Not sure at the moment why the double-encoding problem does not seem to affect Webrick.

Comment by Kylo Ginsberg [ 2014/06/11 ]

Let's target the PR at stable. We're not sure if there will be a 3.6.3 or not, but if there is, it should be in there. If not, it will come out in 3.7.0.

Comment by Charlie Sharpsteen [ 2014/06/11 ]

Did some more investigation today. Prior to 3.6.0, both Passenger and WebRick masters will not handle paths with the following printable ASCII characters:

[
]
?

Use of brackets anywhere in a filename throws hard errors in eval_generate that doom an entire recursive copy. The ? character just causes whichever files use it to fail.

After 3.6.0 (commit 89b8ac1), Passenger is additionally unable to process file_content requests for paths containing any of the following printable ASCII characters:

[space]
"
#
%
<
>
\
^
`

WebRick is unaffected and continues to process requests for files that do not include brackets or ?. A likely explanation for the difference in behavior is that Passenger is only unescaping the incoming path once while WebRick is applying multiple unescape operations.

The question now is how to fix Type::File::Content#get_from_source in a way that preserves the intent of PUP-1892.

Comment by Kylo Ginsberg [ 2014/06/12 ]

Charlie Sharpsteen I think the behavior around ? (and perhaps the square brackets) is long-standing, see PUP-2687, and not a regression. So that can be tackled separately from this issue about the regression wrt spaces (which likely accounts for the other characters you've documented).

For the current issue, since this was introduced by calling source.rb's uri method which consists of

 @uri ||= URI.parse(URI.escape(metadata.source)) 

would this be addressed simply by a new method which skips the escape logic e.g.

 @uri ||= URI.parse(metadata.source) 

?? I haven't tried this, just tossing an idea out there. Puppet does some odd things wrt escaping, so it might not be that simple, but might be worth a look.

Comment by Charlie Sharpsteen [ 2014/06/12 ]

Kylo Ginsberg I agree that the issues with ? ] [ are outside the scope of getting the specific regression in 3.6.x fixed. My main reason for noting them was to separate new bugs from old bugs

As for a fix, I noticed that after generating source = source_or_content.uri, the very next thing we do is build an Indirector request using source.to_s. So, a quick fix appears to be wrapping that string in a URI unescape:

--- a/lib/puppet/type/file/content.rb
+++ b/lib/puppet/type/file/content.rb
@@ -208,7 +208,7 @@ module Puppet
     def get_from_source(source_or_content, &block)
       source = source_or_content.uri
 
-      request = Puppet::Indirector::Request.new(:file_content, :find, source.to_s, nil, :environment => resource.catalog.environment)
+      request = Puppet::Indirector::Request.new(:file_content, :find, URI.unescape(source.to_s), nil, :environment => resource.catalog.environment)
 
       request.do_request(:fileserver) do |req|
         connection = Puppet::Network::HttpPool.http_instance(req.server, req.port)

This should cancel the extra escaping added by source_or_content.uri.

Comment by Kylo Ginsberg [ 2014/06/12 ]

That would do the trick too, though I'm not crazy about a gratuitous URI.escape - URI.unescape pair of operations, both from the perspective of efficiency, and from the perspective of readability (i.e. someone reading the code would wonder "why is there an unescape here?"). That's why I suggested a new method that only does the URI-fication needed (apparently) by the PR that introduced this regression.

Comment by Charlie Sharpsteen [ 2014/06/13 ]

Unfortunately, you have to escape when building a URI:

> URI.parse("http://foo.bar.com/some file")
URI::InvalidURIError: bad URI(is not URI?): http://foo.bar.com/some file
from /usr/local/var/rbenv/versions/1.9.3-p547/lib/ruby/1.9.1/uri/common.rb:176:in `split'

However, looking at the code some more, it looks like Puppet::Indirector::Request calls both to_s and URI.parse/escape internally during initialization. So, instead of passing in a stringified URI that gets stringified again and then re-constituted, we can probably pass source_or_content.metadata.source and save a bunch of trouble.

PR submitted

Comment by Charlie Sharpsteen [ 2014/06/17 ]

Notes for functional review:

The issue exposed here is that file resources with a source URI that contains reserved characters are double encoded. The issue only shows up under Puppet 3.6.0 – 3.6.2 and only causes a failure when the Puppet Master is running under Passenger.

Attached to this ticket is a test module, test_url_encode, that contains a bunch of files named with reserved characters. Adding the following to a node definition will trigger a recursive copy of the files to /tmp/test_url_encode:

class{"test_url_encode":}

Under Puppet 3.6.0 – 3.6.2, the agent should fail to copy the files listed in my comment from June 11th. After updating to a version of Puppet containing commit 591198c, char_63_question_mark?.txt should be the only file that fails to transfer.

Comment by Kurt Wall [ 2014/06/17 ]

Using Charlie's module, this verifies in master at SHA=c24626ec2a93e7fc4c1b4530d3e8b2c701d925cc. The only file that fails to transfer is char_63_question_mark?.txt:

[root@centos6-5-base puppet]# be puppet apply -e "class{'test_url_encode':}"
Notice: Compiled catalog for centos6-5-base.localdomain in environment production in 0.10 seconds
Notice: /Stage[main]/Test_url_encode/File[/tmp/test_url_encode]/ensure: created
Notice: /Stage[main]/Test_url_encode/File[/tmp/test_url_encode/test_files]/ensure: created
Notice: /Stage[main]/Test_url_encode/File[/tmp/test_url_encode/test_files/char_94_caret_hat^.txt]/ensure: defined content as '{md5}d41d8cd98f00b204e9800998ecf8427e'
Notice: /Stage[main]/Test_url_encode/File[/tmp/test_url_encode/test_files/char_59_semi_colon;.txt]/ensure: defined content as '{md5}d41d8cd98f00b204e9800998ecf8427e'
Notice: /Stage[main]/Test_url_encode/File[/tmp/test_url_encode/test_files/char_124_pipe|.txt]/ensure: defined content as '{md5}d41d8cd98f00b204e9800998ecf8427e'
Notice: /Stage[main]/Test_url_encode/File[/tmp/test_url_encode/test_files/char_37_percent%.txt]/ensure: defined content as '{md5}d41d8cd98f00b204e9800998ecf8427e'
Notice: /Stage[main]/Test_url_encode/File[/tmp/test_url_encode/test_files/char_44_comma,.txt]/ensure: defined content as '{md5}d41d8cd98f00b204e9800998ecf8427e'
Notice: /Stage[main]/Test_url_encode/File[/tmp/test_url_encode/test_files/char_36_dollar$.txt]/ensure: defined content as '{md5}d41d8cd98f00b204e9800998ecf8427e'
Error: Could not find any content at puppet:///modules/test_url_encode/test_files/char_63_question_mark?.txt
Error: /Stage[main]/Test_url_encode/File[/tmp/test_url_encode/test_files/char_63_question_mark?.txt]/ensure: change from absent to file failed: Could not find any content at puppet:///modules/test_url_encode/test_files/char_63_question_mark?.txt
Notice: /Stage[main]/Test_url_encode/File[/tmp/test_url_encode/test_files/char_40_open_bracket(.txt]/ensure: defined content as '{md5}d41d8cd98f00b204e9800998ecf8427e
[...]
Notice: Finished catalog run in 0.27 seconds

I also confirmed that the original bug report does not reproduce. Given the file /etc/puppet/files/foo file (with a space in the name), puppet handles it properly:

# be puppet apply -e "file { '/tmp/x': ensure=>present,source=>'puppet:///files/foo file'}"
Notice: Compiled catalog for centos6-5-base.localdomain in environment production in 0.09 seconds
Notice: /Stage[main]/Main/File[/tmp/x]/ensure: defined content as '{md5}307d782814a21e490229732d72440e39'
Notice: Finished catalog run in 0.06 seconds
# diff /etc/puppet/files/foo\ file /tmp/x
#

Comment by Kurt Wall [ 2014/06/17 ]

Resolved per previous comment.

Generated at Sat Aug 08 07:27:24 PDT 2020 using Jira 8.5.2#805002-sha1:a66f9354b9e12ac788984e5d84669c903a370049.