Uploaded image for project: 'Puppet'
  1. Puppet
  2. PUP-7251

gzip decompression mangles utf-8 content in catalog

    XMLWordPrintable

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • PUP 4.10.0
    • None
    • Not Needed
    • Automate

    Description

      It appears a regression in UTF-8 handling by Puppet was introduced when gzip compression was fixed in trapperkeeper / puppet server via https://tickets.puppetlabs.com/browse/TK-429

      This was discovered in investigating a local failure of the agent 1.9 series against https://github.com/puppetlabs/puppet/blob/master/acceptance/tests/resource/file/ticket_6448_file_with_utf8_source.rb against puppet server 2.7.2.master-0.1SNAPSHOT.2017.02.22T1106. We are not seeing this failure in CI presumably because CI is pinned to a released version of puppet server without this change.

      The catalog returned by the server in this test contains a UTF-8 named file resource,

      "\u9759\u7864"

      Prior to the change, the catalog received by the agent was uncompressed and contained:

      [5] pry(#<Puppet::Resource::Catalog::Rest>)> response.body
      => "{\"tags\":[\"settings\"],\"name\":\"cpq55y0vuphp44u.delivery.puppetlabs.net\",\"version\":1487871618,\"code_id\":null,\"catalog_uuid\":\"cf89d5fd-ebdf-426a-832b-68812066c165\",\"catalog_format\":1,\"environment\":\"ticket_6448_file_with_utf8_source_j9w3rhmk\",\"resources\":[{\"type\":\"Stage\",\"title\":\"main\",\"tags\":[\"stage\"],\"exported\":false,\"parameters\":{\"name\":\"main\"}},{\"type\":\"Class\",\"title\":\"Settings\",\"tags\":[\"class\",\"settings\"],\"exported\":false},{\"type\":\"Class\",\"title\":\"main\",\"tags\":[\"class\"],\"exported\":false,\"parameters\":{\"name\":\"main\"}},{\"type\":\"File\",\"title\":\"/tmp/ticket_6448_file_with_utf8_source_j9w3rhmk.7EeVuV/\xEF\xBD\xB2\xEF\xBD\xA7\xE3\x83\x95\xE3\x83\xAB\",\"tags\":[\"file\",\"class\"],\"file\":\"/etc/puppetlabs/code/environments/ticket_6448_file_with_utf8_source_j9w3rhmk/manifests/site.pp\",\"line\":2,\"exported\":false,\"parameters\":{\"ensure\":\"present\",\"source\":\"puppet:///modules/utf8_file_module/\xE9\x9D\x99\xE7\x9A\x84\"}}],\"edges\":[{\"source\":\"Stage[main]\",\"target\":\"Class[Settings]\"},{\"source\":\"Stage[main]\",\"target\":\"Class[main]\"},{\"source\":\"Class[main]\",\"target\":\"File[/tmp/ticket_6448_file_with_utf8_source_j9w3rhmk.7EeVuV/\xEF\xBD\xB2\xEF\xBD\xA7\xE3\x83\x95\xE3\x83\xAB]\"}],\"classes\":[\"settings\"]}"
      

      It thus did not go through gzip decompression, here:
      https://github.com/puppetlabs/puppet/blob/master/lib/puppet/network/http/compression.rb#L23

      After the change, the catalog requires decompression - and the decompressed result is mangled UTF-8:

      [4] pry(#<Puppet::Resource::Catalog::Rest>)> Zlib::GzipReader.new(StringIO.new(response.body)).read
      => "{\"tags\":[\"settings\"],\"name\":\"cpq55y0vuphp44u.delivery.puppetlabs.net\",\"version\":1487871046,\"code_id\":null,\"catalog_uuid\":\"413cbbbb-9035-4ffd-84bd-d792a7cf7976\",\"catalog_format\":1,\"environment\":\"ticket_6448_file_with_utf8_source_j9w3rhmk\",\"resources\":[{\"type\":\"Stage\",\"title\":\"main\",\"tags\":[\"stage\"],\"exported\":false,\"parameters\":{\"name\":\"main\"}},{\"type\":\"Class\",\"title\":\"Settings\",\"tags\":[\"class\",\"settings\"],\"exported\":false},{\"type\":\"Class\",\"title\":\"main\",\"tags\":[\"class\"],\"exported\":false,\"parameters\":{\"name\":\"main\"}},{\"type\":\"File\",\"title\":\"/tmp/ticket_6448_file_with_utf8_source_j9w3rhmk.7EeVuV/イァã\x83\x95ã\x83«\",\"tags\":[\"file\",\"class\"],\"file\":\"/etc/puppetlabs/code/environments/ticket_6448_file_with_utf8_source_j9w3rhmk/manifests/site.pp\",\"line\":2,\"exported\":false,\"parameters\":{\"ensure\":\"present\",\"source\":\"puppet:///modules/utf8_file_module/é\x9D\x99ç\x9A\x84\"}}],\"edges\":[{\"source\":\"Stage[main]\",\"target\":\"Class[Settings]\"},{\"source\":\"Stage[main]\",\"target\":\"Class[main]\"},{\"source\":\"Class[main]\",\"target\":\"File[/tmp/ticket_6448_file_with_utf8_source_j9w3rhmk.7EeVuV/イァã\x83\x95ã\x83«]\"}],\"classes\":[\"settings\"]}"
      

      This is because ruby's Zlib::GzipReader will default to decompressing as Encoding.default_external - in our environment this is not always UTF-8.

      The solution is to specify UTF-8 BINARY encoding as part of the decompression, ie at:
      https://github.com/puppetlabs/puppet/blob/master/lib/puppet/network/http/compression.rb#L23:

      -Zlib::GzipReader.new(StringIO.new(response.body), :encoding => Encoding::UTF_8).read-
      Zlib::GzipReader.new(StringIO.new(response.body), :encoding => Encoding::BINARY)
      

      Note - originally it was thought that we should coerce the decompressed body to UTF-8, but per https://docs.puppet.com/puppet/latest/http_api/pson.html#differences-from-json it was determined that PSON specifies ASCII-8BIT (binary). Thus we need to encode the decompressed result as binary. This is confirmed by analyzing the body returned when puppet server's webserver is run with decompression disabled. In this case we fall through to the final statement in the compression determination, here: https://github.com/puppetlabs/puppet/blob/master/lib/puppet/network/http/compression.rb#L27, which simply returns response.body. With compression disabled, the encoding of response.body is binary:

      [1] pry(#<Puppet::Resource::Catalog::Rest>)> response.body
      => "{\"tags\":[\"settings\"],\"name\":\"beth2uuaz9mwtt3.delivery.puppetlabs.net\",\"version\":1487877686,\"code_id\":null,\"catalog_uuid\":\"f2cb0993-b259-4c77-8776-7df5500caace\",\"catalog_format\":1,\"environment\":\"ticket_6448_file_with_utf8_source_w1h8my9a\",\"resources\":[{\"type\":\"Stage\",\"title\":\"main\",\"tags\":[\"stage\"],\"exported\":false,\"parameters\":{\"name\":\"main\"}},{\"type\":\"Class\",\"title\":\"Settings\",\"tags\":[\"class\",\"settings\"],\"exported\":false},{\"type\":\"Class\",\"title\":\"main\",\"tags\":[\"class\"],\"exported\":false,\"parameters\":{\"name\":\"main\"}},{\"type\":\"File\",\"title\":\"/tmp/ticket_6448_file_with_utf8_source_w1h8my9a.4Nw2GO/\xEF\xBD\xB2\xEF\xBD\xA7\xE3\x83\x95\xE3\x83\xAB\",\"tags\":[\"file\",\"class\"],\"file\":\"/etc/puppetlabs/code/environments/ticket_6448_file_with_utf8_source_w1h8my9a/manifests/site.pp\",\"line\":2,\"exported\":false,\"parameters\":{\"ensure\":\"present\",\"source\":\"puppet:///modules/utf8_file_module/\xE9\x9D\x99\xE7\x9A\x84\"}}],\"edges\":[{\"source\":\"Stage[main]\",\"target\":\"Class[Settings]\"},{\"source\":\"Stage[main]\",\"target\":\"Class[main]\"},{\"source\":\"Class[main]\",\"target\":\"File[/tmp/ticket_6448_file_with_utf8_source_w1h8my9a.4Nw2GO/\xEF\xBD\xB2\xEF\xBD\xA7\xE3\x83\x95\xE3\x83\xAB]\"}],\"classes\":[\"settings\"]}"
      [2] pry(#<Puppet::Resource::Catalog::Rest>)> response.body.encoding
      => #<Encoding:ASCII-8BIT>
      

      Note - we also rely on Zlib::Inflate but this defaults to decompressing as ASCII-8BIT rather than default-external, which may be appropriate for our purposes.

      Attachments

        Issue Links

          Activity

            People

              branan Branan Riley
              moses Moses Mendoza
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Zendesk Support