Affects Version/s: PUP 3.6.2
Fix Version/s: PUP 4.5.0
Component/s: Types and Providers
RHEL 6.7 2.6.32-573.3.1.el6.x86_64
Red Hat Gluster Storage 3.1u1 glusterfs-3.7.1-16.el6.x86_64
Puppet 3.6.2 (PE build) pe-puppet-18.104.22.168-1.pe.el6.noarch
Release Notes:Bug Fix
Release Notes Summary:A potential race condition verifying file checkums on GlusterFS has been fixed
We are seeing checksum verification errors when using Puppet to create files in GlusterFS volumes.
The apparent cause of this error is that checksum verification happens before the temporary file is completely synced to the file system, and the verification code uses incomplete cached data that doesn't match what was written to disk.
Modifying file.rb to call fsync() on the temporary file descriptor before checksum verification allows file installation on our GlusterFS volumes to succeed.
We are submitting this as a Puppet bug because the read-before-sync order of operations seems incorrect regardless of the file system type being used.
System call traces using strace show the following sequence of events:
1) Puppet opens a temp file and writes data:
2) The temp file descriptor remains open while the checksum
verification code opens a new file descriptor on the same filename:
3) The checksum code reads temp file data from the new FD. This
incorrectly returns zero bytes, possibly because of inconsistent
caching within the clustered file system environment.
4) The checksum verification fails and prints an error because the data from the second file descriptor does not match the data written to the first one.
The key point is that the file checksum is performed on a second file descriptor before the first file descriptor has been completely synced to disk. That is a data consistency risk regardless of the type of file system, but in practice most file systems serve both file descriptors out of the same cache so no error is found. Our particular GlusterFS environment exhibits the risk to a greater extent.
As a test, we added an fsync call immediately before the call to fail_if_checksum_is_wrong in type/file.rb:
With this patch file installation operates as expected, even on the GlusterFS file system: