Uploaded image for project: 'Puppet'
  1. Puppet
  2. PUP-6059

File type does not fsync temp file before checksum verification during file creation

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Normal
    • Resolution: Fixed
    • Affects Version/s: PUP 3.6.2
    • Fix Version/s: PUP 4.5.0
    • Component/s: Types and Providers
    • Labels:
      None
    • Environment:

      RHEL 6.7 2.6.32-573.3.1.el6.x86_64
      Red Hat Gluster Storage 3.1u1 glusterfs-3.7.1-16.el6.x86_64
      Puppet 3.6.2 (PE build) pe-puppet-3.6.2.14-1.pe.el6.noarch

    • Template:
    • Story Points:
      1
    • Sprint:
      Client 2016-05-04
    • Release Notes:
      Bug Fix
    • Release Notes Summary:
      A potential race condition verifying file checkums on GlusterFS has been fixed

      Description

      Summary

      We are seeing checksum verification errors when using Puppet to create files in GlusterFS volumes.

      The apparent cause of this error is that checksum verification happens before the temporary file is completely synced to the file system, and the verification code uses incomplete cached data that doesn't match what was written to disk.

      Modifying file.rb to call fsync() on the temporary file descriptor before checksum verification allows file installation on our GlusterFS volumes to succeed.

      We are submitting this as a Puppet bug because the read-before-sync order of operations seems incorrect regardless of the file system type being used.

      Demonstration

      # cat /etc/redhat-release 
      Red Hat Enterprise Linux Server release 6.7 (Santiago)
      # rpm -qa glusterfs
      glusterfs-3.7.1-16.el6.x86_64
      # rpm -qa pe-puppet
      pe-puppet-3.6.2.14-1.pe.el6.noarch
      # uname -r
      2.6.32-573.3.1.el6.x86_64
      # mount -t glusterfs glusterhost:/glustervolume /mnt
      # cat file-test.pp
      file { '/mnt/this-is-a-file':
        ensure   => 'file',
        content  => "This is a file!\n",
        group    => 'root',
        mode     => '644',
        owner    => 'root',
      }
      # puppet apply file-test.pp
      Warning: Setting modulepath is deprecated in puppet.conf. See http://links.puppetlabs.com/env-settings-deprecations
         (at /opt/puppet/lib/ruby/site_ruby/1.9.1/puppet/settings.rb:1095:in `block in issue_deprecations')
      Notice: Compiled catalog for local-hostname in environment production in 0.13 seconds
      Error: File written to disk did not match checksum; discarding changes ({md5}3e4e89714360452d113f009f37f742ed vs {md5}d41d8cd98f00b204e9800998ecf8427e)
      Error: /Stage[main]/Main/File[/mnt/this-is-a-file]/ensure: change from absent to file failed: File written to disk did not match checksum; discarding changes ({md5}3e4e89714360452d113f009f37f742ed vs {md5}d41d8cd98f00b204e9800998ecf8427e)
      Notice: Finished catalog run in 0.51 seconds
      
      

      Discussion
      System call traces using strace show the following sequence of events:

      1) Puppet opens a temp file and writes data:

      27047 open("/mnt/this-is-a-file20151112-27047-112t46h", O_RDWR|O_CREAT|O_EXCL, 0600) = 5
      27047 write(5, "This is a file!\n", 16) = 16
      

      2) The temp file descriptor remains open while the checksum
      verification code opens a new file descriptor on the same filename:

      27047 open("/mnt/this-is-a-file20151112-27047-112t46h", O_RDONLY) = 6

      3) The checksum code reads temp file data from the new FD. This
      incorrectly returns zero bytes, possibly because of inconsistent
      caching within the clustered file system environment.

      27047 read(6, "", 4096)                 = 0
      27047 close(6)                          = 0
      

      4) The checksum verification fails and prints an error because the data from the second file descriptor does not match the data written to the first one.

      The key point is that the file checksum is performed on a second file descriptor before the first file descriptor has been completely synced to disk. That is a data consistency risk regardless of the type of file system, but in practice most file systems serve both file descriptors out of the same cache so no error is found. Our particular GlusterFS environment exhibits the risk to a greater extent.

      Potential Patch

      As a test, we added an fsync call immediately before the call to fail_if_checksum_is_wrong in type/file.rb:

      # cat file.rb.patch
      --- /opt/puppet/lib/ruby/site_ruby/1.9.1/puppet/type/file.rb.dist	2015-11-11 12:05:46.103283761 -0600
      +++ /opt/puppet/lib/ruby/site_ruby/1.9.1/puppet/type/file.rb	2015-11-11 11:46:13.464903043 -0600
      @@ -759,6 +759,18 @@
               file.binmode
               content_checksum = write_content(file)
               file.flush
      +        begin
      +          file.fsync
      +        rescue NotImplementedError
      +          # fsync may not be implemented by Ruby on all platforms, but
      +          # there is absolutely no recovery path if we detect that.  So, we just
      +          # ignore the return code.
      +          #
      +          # However, don't be fooled: that is accepting that we are running in
      +          # an unsafe fashion.  If you are porting to a new platform don't stub
      +          # that out.
      +        end
      +
               fail_if_checksum_is_wrong(file.path, content_checksum) if validate_checksum?
               if self[:validate_cmd]
                 output = Puppet::Util::Execution.execute(self[:validate_cmd].gsub(self[:validate_replacement], file.path), :failonfail => true, :combine => true)
      

      With this patch file installation operates as expected, even on the GlusterFS file system:

      # strace -f -o works.trace puppet apply file-test.pp 
      Warning: Setting modulepath is deprecated in puppet.conf. See http://links.puppetlabs.com/env-settings-deprecations
         (at /opt/puppet/lib/ruby/site_ruby/1.9.1/puppet/settings.rb:1095:in `block in issue_deprecations')
      Notice: Compiled catalog for local-hostname in environment production in 0.16 seconds
      Notice: /Stage[main]/Main/File[/mnt/this-is-a-file]/ensure: defined content as '{md5}3e4e89714360452d113f009f37f742ed'
      Notice: Finished catalog run in 0.91 seconds
      

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              mpkut Michael Kaczmarczik
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Zendesk Support