[PUP-6861] [regression] Mount options not idempotent Created: 2016/11/02  Updated: 2016/11/22  Resolved: 2016/11/16

Status: Closed
Project: Puppet
Component/s: Types and Providers
Affects Version/s: PUP 4.8.0
Fix Version/s: PUP 4.8.1

Type: Bug Priority: Major
Reporter: Arnout Boks Assignee: Unassigned
Resolution: Fixed Votes: 4
Labels: regression
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Debian 8, CentOS 7.2.1511


Attachments: JPEG File systemd_0_days_since_bugs.jpg    
Issue Links:
Duplicate
is duplicated by PA-670 mount resources incorrectly report as... Closed
Relates
relates to PUP-6457 Mount resources could handle invalid ... Closed
Template:
Team: Agent
Story Points: 3
Sprint: AP 2016-11-16
Release Notes: Bug Fix
Release Notes Summary: In Puppet 4.8.0, we added new functionality to the mount provider intended to detect if a mounted filesystem did not match /etc/fstab. Unfortunately, there is not a reliable mechanism for doing this and the detection did not work as intended. It has been removed.

 Description   

After upgrading to Puppet 4.8.0, I keep seeing Puppet reports along the following lines:

2016-11-02 09:32:16 +0100 /Stage[main]/Mount[/]/options (notice): options changed 'defaults,acl,user_xattr' to 'defaults,acl,user_xattr'
2016-11-02 09:32:16 +0100 /Stage[main]/Mount[/] (notice): Triggered 'refresh' from 1 events
2016-11-02 09:32:17 +0100 /Stage[main]/Mount[/var/redacted]/options (notice): options changed 'rw,bind,acl' to 'rw,bind,acl'
2016-11-02 09:32:17 +0100 /Stage[main]/Mount[/var/redacted] (notice): Triggered 'refresh' from 1 events

Even though the actual mount options are identical to the desired state, Puppet keeps thinking they are out-of-sync and re-mounting the mount point. This worked fine with Puppet 4.7.x.



 Comments   
Comment by Arnout Boks [ 2016/11/02 ]

Maybe relevant: / mounts an ext4 filesystem, /var/redacted is a bind mount (fstype => 'none'). Is also see this same problem for a NFS mount.

Comment by Moses Mendoza [ 2016/11/02 ]

From a git bisect it appears this is a regression introduced in https://github.com/puppetlabs/puppet/pull/5125/commits/59cb02950c2d77e13aac6aeda8ce6dd8f1f7238e

Comment by Jason Slagle [ 2016/11/02 ]

I suspect PUP-6457 is likely the culprit.

Comment by Branan Riley [ 2016/11/02 ]

I've managed to reproduce this, and my research makes me believe two things.

1) You have an invalid option in your mount resources (I suspect acl, as it came up as invalid for XFS on centos 7 in my testing)
2) There is a bug in the OS that is causing mount -o remount /path/to/mountpoint to silently fail when an invalid option is present in /etc/fstab

You should be able to check this yourself by comparing /etc/fstab and /etc/mtab. If I am correct, the mount options for the affected mountpoints will be different. If they are different, you can confirm that there is a bad option by doing a manual umount and then mount of the affected mountpoint. It ought to fail, and then dmesg can give you the invalid option.

So what has actually happened is that Puppet has been silently failing to remount filesystems when a mount resource contains invalid options. The point of PUP-6457 was to surface error messages about failed remounts after the first failed run. In this case, since mount is exiting zero we have no error message to share. But the change has correctly revealed that the mounted filesystem does not match Puppet's expectations.

Below is my reproduction case. Note that mount -o remount /boot provides no output and returns 0.

[root@fad1og1xao9yeal ~]# cat test.pp
mount { '/boot':
  dump    => 1,
  ensure  => 'mounted',
  device  => 'UUID=4a9725cc-739a-45f5-8ea0-a83885eaeea0',
  fstype  => 'xfs',
  options => 'defaults,foo',
  pass    => '2',
}
[root@fad1og1xao9yeal ~]# cat /etc/fstab
# HEADER: This file was autogenerated at 2016-11-02 14:38:42 -0700
# HEADER: by puppet.  While it can still be managed manually, it
# HEADER: is definitely not recommended.
 
#
# /etc/fstab
# Created by anaconda on Thu Jul 10 12:29:49 2014
#
# Accessible filesystems, by reference, are maintained under '/dev/disk'
# See man pages fstab(5), findfs(8), mount(8) and/or blkid(8) for more info
#
/dev/mapper/centos-root /       xfs     defaults        1       1
UUID=4a9725cc-739a-45f5-8ea0-a83885eaeea0       /boot   xfs     defaults        1       2
/dev/mapper/centos-swap swap    swap    defaults        0       0
/lib    /tmp/lib        none    bind    0       0
[root@fad1og1xao9yeal ~]# /opt/puppetlabs/bin/puppet apply test.pp
Notice: Compiled catalog for fad1og1xao9yeal.delivery.puppetlabs.net in environment production in 0.15 seconds
Notice: /Stage[main]/Main/Mount[/boot]/options: options changed 'defaults' to 'defaults,foo'
Notice: /Stage[main]/Main/Mount[/boot]: Triggered 'refresh' from 1 events
Notice: Applied catalog in 0.08 seconds
[root@fad1og1xao9yeal ~]# /opt/puppetlabs/bin/puppet apply test.pp
Notice: Compiled catalog for fad1og1xao9yeal.delivery.puppetlabs.net in environment production in 0.13 seconds
Notice: /Stage[main]/Main/Mount[/boot]/options: options changed 'defaults,foo' to 'defaults,foo'
Notice: /Stage[main]/Main/Mount[/boot]: Triggered 'refresh' from 1 events
Notice: Applied catalog in 0.07 seconds
 
[root@fad1og1xao9yeal ~]# cat /etc/fstab
# HEADER: This file was autogenerated at 2016-11-02 14:55:38 -0700
# HEADER: by puppet.  While it can still be managed manually, it
# HEADER: is definitely not recommended.
 
#
# /etc/fstab
# Created by anaconda on Thu Jul 10 12:29:49 2014
#
# Accessible filesystems, by reference, are maintained under '/dev/disk'
# See man pages fstab(5), findfs(8), mount(8) and/or blkid(8) for more info
#
/dev/mapper/centos-root /       xfs     defaults        1       1
UUID=4a9725cc-739a-45f5-8ea0-a83885eaeea0       /boot   xfs     defaults,foo    1       2
/dev/mapper/centos-swap swap    swap    defaults        0       0
/lib    /tmp/lib        none    bind    0       0
[root@fad1og1xao9yeal ~]# mount | grep /boot
/dev/sda1 on /boot type xfs (rw,relatime,seclabel,attr2,inode64,noquota)
[root@fad1og1xao9yeal ~]# mount -o remount /boot
[root@fad1og1xao9yeal ~]# echo $?
0
[root@fad1og1xao9yeal ~]# umount /boot
[root@fad1og1xao9yeal ~]# mount /boot
mount: wrong fs type, bad option, bad superblock on /dev/sda1,
       missing codepage or helper program, or other error
 
       In some cases useful info is found in syslog - try
       dmesg | tail or so.

If I change the manifest to include an entirely known-good set of options (in this case, a subset of the values provided by defaults), then it behaves as-expected.

[root@fad1og1xao9yeal ~]# cat test.pp
mount { '/boot':
  dump    => 1,
  ensure  => 'mounted',
  device  => 'UUID=4a9725cc-739a-45f5-8ea0-a83885eaeea0',
  fstype  => 'xfs',
  options => 'rw,attr2,inode64,noquota',
  pass    => '2',
}
[root@fad1og1xao9yeal ~]# puppet apply test.pp
Notice: Compiled catalog for fad1og1xao9yeal.delivery.puppetlabs.net in environment production in 0.14 seconds
Notice: /Stage[main]/Main/Mount[/boot]/options: options changed 'defaults,foo' to 'rw,attr2,inode64,noquota'
Notice: /Stage[main]/Main/Mount[/boot]: Triggered 'refresh' from 1 events
Notice: Applied catalog in 0.06 seconds
[root@fad1og1xao9yeal ~]# puppet apply test.pp
Notice: Compiled catalog for fad1og1xao9yeal.delivery.puppetlabs.net in environment production in 0.13 seconds
Notice: Applied catalog in 0.06 seconds

Comment by Branan Riley [ 2016/11/02 ]

I think we may be able to attribute the misbehavior of the mount command to systemd, as the other reporter of PA-670 is on Debian 8. There's not a lot in common between EL7 and Deb8, and especially little that could cause something like this to fail.

I've also confirmed that the PUP-6457 changes work as-expected on EL6, using my above reproduction cases.

Comment by Branan Riley [ 2016/11/02 ]

Arnout Boks My apologies, I got mixed up as to which of the two tickets was Debian and which as RedHat.

In the Debian case, it does seem to fail as expected with an obviously invalid option such as foo. But there is a very interesting behavior with acl - it appears to just silently ignore it. See below:

root@cysowaxcttc26jn:~# grep /boot /etc/mtab
/dev/sda1 /boot ext2 rw,relatime 0 0
root@cysowaxcttc26jn:~# grep /boot /etc/fstab
# /boot was on /dev/sda1 during installation
UUID=82a9230e-17a6-4b70-b1f2-2ff2d4bfa8c6       /boot   ext2    defaults,acl    0       2
root@cysowaxcttc26jn:~# mount -o remount,defaults,acl /boot
root@cysowaxcttc26jn:~# grep /boot /etc/mtab
/dev/sda1 /boot ext2 rw,relatime 0 0
root@cysowaxcttc26jn:~# umount /boot
root@cysowaxcttc26jn:~# mount /boot
root@cysowaxcttc26jn:~# grep /boot /etc/mtab
/dev/sda1 /boot ext2 rw,relatime 0 0

Can you confirm the same behavior in your Jessie environment?

Comment by Michael Watters [ 2016/11/03 ]

One thing I've noticed with bind mounts on a few servers is that the device name reported by the kernel is different than what's in fstab. For example:

[root@tc-util ~]# grep git /etc/mtab
/dev/vdb1 /srv/git_home xfs rw,seclabel,relatime,attr2,inode64,noquota 0 0

[root@tc-util ~]# grep git /etc/fstab
/storage/git_home /srv/git_home none bind 0 0

As you can see the device names and the FS options are different. This may be confusing the agent depending on how it checks for mounted devices.

cifs mounts also show a difference between what's reported by the kernel and fstab.

[root@tc-util ~]# grep volumes /etc/fstab
//mas-cad55/volumes /srv/mas-cad55-volumes cifs credentials=/etc/tc-credentials,dir_mode=0555 0 0
//mas-cad23/volumes /srv/mas-cad23-volumes cifs rw,uid=0,gid=54321,file_mode=0660,noperm,credentials=/etc/tc-credentials,dir_mode=0775 0 0

[root@tc-util ~]# mount | grep volume
//mas-cad55/volumes on /srv/mas-cad55-volumes type cifs (rw,relatime,vers=1.0,cache=strict,username=user,domain=DOMAIN,uid=0,noforceuid,gid=0,noforcegid,addr=192.168.0.5,file_mode=0755,dir_mode=0555,nounix,serverino,rsize=61440,wsize=65536,actimeo=1)
//mas-cad23/volumes on /srv/mas-cad23-volumes type cifs (rw,relatime,vers=1.0,cache=strict,username=user,domain=DOMAIN,uid=0,forceuid,gid=54321,forcegid,addr=192.168.0.5,file_mode=0660,dir_mode=0775,nounix,serverino,noperm,rsize=61440,wsize=65536,actimeo=1)

The options specified in fstab are correct, I've double checked against the mount.cifs man page and the options specified are valid.

Comment by Arnout Boks [ 2016/11/03 ]

Branan Riley I can confirm that the acl and user_xattr do not show up in my /etc/mtab (at least for the ext4 filesystem):

$ grep "/dev/sda1" /etc/mtab
/dev/sda1 / ext4 rw,relatime,data=ordered 0 0

However, they seem to be part of the default mount options for that filesystem according to tune2fs:

$ tune2fs -l /dev/sda1 | grep "Default mount options"
Default mount options:    user_xattr acl

Comment by Tim Bishop [ 2016/11/03 ]

Similar issue with an NFS mount. I have options bg and retry that are valid mount options, but don't appear in /etc/mtab.

Comment by Branan Riley [ 2016/11/03 ]

Tim Bishop What's your OS?

Comment by Tim Bishop [ 2016/11/03 ]

Branan Riley Sorry - should have given that. It's Ubuntu 16.04.

Comment by Branan Riley [ 2016/11/03 ]

Arnout Boks With a bit of testing, it looks like the noacl flag will show up in /etc/mtab - so perhaps what's happening is that anything in the filesystem's default mount options doesn't show up in /etc/mtab. If that's what's going on, we may be able to handle that case in Puppet (for at least the common set of Linux filesystems). This may fix the EXT4 on Deb 8 case for you, although it's probably not a full solution (and requires lots of filesystem-specific code in Puppet, which I'm not totally thrilled about)

Tim Bishop I suspect a similar thing with NFS - perhaps bg and retry are considered "default behaviors" now, and an NFS mount with nobg or noretry would show them in /etc/mtab? I'm not sure of an easy way to query what's considered "defaults" for NFS, though. It'd take me a little while to get an NFS reproduction environment set up (I have never done anything with NFS), so if you have a test environment that you can mess with and let me know if the negative options appear in mtab that'd be super helpful.

Comment by Tim Bishop [ 2016/11/03 ]

Branan Riley nobg isn't an option, but fg is the opposite of bg. Mounting with either of them doesn't result in anything in mtab. And retry is a numerical option, so I'm not sure it's the same issue the either. In both cases they're options that control the behaviour of mounting the mountpoint (whether to background the task, and how to retry if it fails), rather than properties of the mountpoint itself afterwards. That's a bit of a guess though

Comment by Branan Riley [ 2016/11/03 ]

In both cases they're options that control the behaviour of mounting the mountpoint

aha! This may affect my mental model of mtab. Rather than a reflection of what's specified in fstab with a bunch of weird exceptions, it is more in the line of "options that affect the operation of the filesystem compared to its defaults". This matches with the missing NFS connection options, as well as acl being missing on ext4 (as acl is a default behavior}}.

This makes the entire premise of PUP-6457 kinda iffy. If the system isn't able to tell us whether a mounted filesystem is in sync with fstab, then we can't really try to handle failure case correctly.

Comment by Thomas Kornack [ 2016/11/04 ]

As Branan Riley suggested in his answer to PUP-6457 I gladly added some examples where the puppet agent remounts a share at every run and where fstab options and output of mount differ. All the cases did not occur prior to the update to puppet agent v1.8.0:

1st Example: (2 Hosts)
Distro: Ubuntu 12.04.05 LTS
FS type, issue: nfs, remout with every agent run
fstab:

some.ip.add.ress:/home/some.user	/home/some.user	nfs	_netdev,vers=3,bg,soft,intr,retry=5	0	0

mtab:

some.ip.add.ress:/home/some.user /home/some.user nfs rw,vers=3,bg,soft,intr,retry=5 0 0

2nd example:
Distro: Ubuntu 14.04.05 LTS
FS type, issue: tmpfs, remout with every agent run
fstab:

/dev/shm	/var/lib/amavis/tmp	tmpfs	defaults,size=250m,mode=770,uid=amavis,gid=amavis	0	0

mtab:

/run/shm /var/lib/amavis/tmp tmpfs rw,size=250m,mode=770,uid=109,gid=115 0 0

Finally another example where fstab options and output of mount differ but the agent does not remout the share at every run:

Distro: Ubuntu 14.04.05 LTS
FS type, issue: nfs, no issue
fstab:

some.ip.add.ress::/home	/home	nfs	_netdev,vers=3,bg,soft,intr,retry=5	0	0

mtab:

some.ip.add.ress:/home /home nfs rw,vers=3,bg,soft,intr,retry=5,addr=some.ip.add.ress,_netdev 0 0

Comment by Ruben Laban [ 2016/11/07 ]

Distro: Ubuntu 16.04.1

Filesystem: bindmounts (ext4)

fstab:

/var/www/vhosts/Beta/source/pub/sitemap /home/sftp/sftp_belloya_beta_01/pub-sitemap     none    bind,rw 0       0

mtab:

/dev/mapper/belloya--web01--vg001-sys--var--www /home/sftp/sftp_belloya_beta_01/pub-sitemap ext4 rw,relatime,data=ordered 0 0

Comment by Alexander Ivanes [ 2016/11/07 ]

Same thing for me.

Distro: Ubuntu 14.04.1

fstab:

hydra1:/hg  /mnt/data1      glusterfs       defaults,_netdev        0       0

mtab:

hydra1:/hg /mnt/data1 fuse.glusterfs rw,default_permissions,allow_other,max_read=131072 0 0

Comment by Bill Glick [ 2016/11/07 ]

I have 1 CentOS 7 node that is having these issues, mounting both NFS and GPFS volumes.
The weird thing is that I have 2 other CentOS 7 nodes that are almost identical that are NOT having the mount issues.
(Most of our systems are CentOS 6 and do not have any of these issues with mount regression.)

The main difference is the EL7 systems that work are running kernel 3.10.0-327.36.2.el7.x86_64, while the broken system is running kernel 3.10.0-327.36.3.el7.x86_64 (the patch for Dirty COW).

Is there any chance that this issue is somehow triggered by the combination of the Dirty COW kernel patches and the changes in puppet-agent 1.8.0-1?

Comment by Branan Riley [ 2016/11/07 ]

Bill GlickI have definitely entertained the idea that this is related to some version difference in the kernel or a userland utilty, since it seems to mostly affect modern distros. I'd be curious to have the affected fstab and mtab lines from the systems that work and that don't. And also maybe your systemd and util-linux package versions on those systems as well?

Thanks

Comment by Bill Glick [ 2016/11/07 ]

I was mistaken. The reason I wasn't seeing the mount issues on the pre-Dirty COW systems was because those systems use different mount types. We aren't seeing issues with mount for mounting GPFS filesystems, but rather when using bind mounts to mount into subdirectories which are under our GPFS mount. If I use a bind mount on a kernel 3.10.0-327.36.2.el7.x86_64 system, I see the same issues.

But here are some details from one of our systems that is having issues:

# uname -r
3.10.0-327.36.3.el7.x86_64
 
# cat /etc/fstab | grep gpfs
/gpfs/fs0/scratch	/scratch	none	bind	0	0
/gpfs/fs0/software	/software	none	bind	0	0
/gpfs/fs0/datasets	/datasets	none	bind	0	0
/dev/fs0	/gpfs/fs0	gpfs	rw,dev=gpfs.server.hostname:fs0,ldev=fs0,noauto	0	0
 
# cat /etc/mtab | grep gpfs
/dev/fs0 /gpfs/fs0 gpfs rw,relatime 0 0
/dev/fs0 /scratch gpfs rw,relatime 0 0
/dev/fs0 /software gpfs rw,relatime 0 0
/dev/fs0 /datasets gpfs rw,relatime 0 0
 
# rpm -qa | egrep 'systemd|util-linux'
systemd-libs-219-19.el7_2.13.x86_64
systemd-219-19.el7_2.13.x86_64
systemd-sysv-219-19.el7_2.13.x86_64
util-linux-2.23.2-26.el7_2.3.x86_64

And then, here is what a puppet run is showing:

2016-11-07 11:56:04 -0600 /Stage[main]/Gpfs_client/Gpfs::Bindmount[/scratch]/Mount[/scratch]/options (notice): options changed 'bind' to 'bind'
2016-11-07 11:56:04 -0600 /Stage[main]/Gpfs_client/Gpfs::Bindmount[/scratch]/Mount[/scratch] (notice): Triggered 'refresh' from 1 events
2016-11-07 11:56:04 -0600 /Stage[main]/Gpfs_client/Gpfs::Bindmount[/software]/Mount[/software]/options (notice): options changed 'bind' to 'bind'
2016-11-07 11:56:04 -0600 /Stage[main]/Gpfs_client/Gpfs::Bindmount[/software]/Mount[/software] (notice): Triggered 'refresh' from 1 events
2016-11-07 11:56:04 -0600 /Stage[main]/Gpfs_client/Gpfs::Bindmount[/datasets]/Mount[/datasets]/options (notice): options changed 'bind' to 'bind'
2016-11-07 11:56:04 -0600 /Stage[main]/Gpfs_client/Gpfs::Bindmount[/datasets]/Mount[/datasets] (notice): Triggered 'refresh' from 1 events

Comment by Branan Riley [ 2016/11/07 ]

I think I've tracked this down.

On Linux systems without systemd, /etc/mtab is managed entirely in userspace by the mount command. It contains all options that are specified in /etc/fstab, and the change in PUP-6457 works entirely as expected. On Linux systems with systemd, /etc/mtab is now a symlink to /proc/self/mounts. It now reflects the kernel's view of what is mounted, which may omit options that don't affect the "runtime" behavior of the filesystem. This would include acl on ext4 (since that's a default behavior), and retries or bg for nfs (since those only affect the connection).

I don't think there's any reasonable way to work around this. I'm going to file tickets with the major distros that are approximately of the form "It's impossible to tell if a mounted filesystem is in-sync with fstab".

In the meantime, there are two options for fixing this issues

  • Rip out PUP-6457
  • Add a check for /etc/mtab being a symlink, and leave the mtab validation in place otherwise. This will keep the fix for PUP-6457 on Linuxes that aren't broken, but that number will continue to decrease.

I'll discuss those options with my team internally, but any feedback from the community there would also be greatly appreciated.

Comment by Reinhard Vicinus [ 2016/11/14 ]

Linux systems with upstart instead of systemd are also affected by this bug (like Ubuntu 14.04.). Also on this systems /etc/mtab is a regular file and not a link, so option two will not work there.

Comment by Kenn Hussey [ 2016/11/16 ]

Branan Riley did this get through CI yet?

Comment by Maggie Dreyer [ 2016/11/16 ]

Kenn Hussey yes it did, looks good.

Comment by Stig Sandbeck Mathisen [ 2016/11/17 ]

In Debian, /etc/mtab was made a symlink to /proc/mounts in the "wheezy" release in 2013.

https://www.debian.org/releases/wheezy/amd64/release-notes/ch-information.en.html#mtab

It does not look related to systemd.

Comment by Steph Gosling [ 2016/11/17 ]

Late to the party it appears but also affects Ubuntu 12.04 with puppet-agent 1.8.0-1precise but not 12.04 with 1.5.2-1precise

Comment by Daniel Klockenkämper [ 2016/11/17 ]

Also affected: Gentoo, no systemd, puppet-agent 1.8.0 when having samba or nfs mounts in place

Example puppet options:

options   => "credentials=/etc/samba/fstab_credentials,rw,uid=${samba_owner},gid=${samba_owner},forceuid,file_mode=${samba_mode},dir_mode=${samba_mode},nobrl,iocharset=utf8,_netdev,auto,noserverino,cache=loose,noacl,vers=3.0,echo_interval=60"

/etc/fstab:

credentials=/etc/samba/fstab_credentials,rw,uid=apache,gid=apache,forceuid,file_mode=0777,dir_mode=0777,nobrl,iocharset=utf8,_netdev,auto,noserverino,cache=loose,noacl,vers=3.0,echo_interval=60

/proc/self/mounts / /etc/mtab:

rw,relatime,vers=3.0,sec=ntlmssp,cache=loose,username=***,domain=***,uid=81,forceuid,gid=81,forcegid,addr=***,file_mode=0777,dir_mode=0777,iocharset=utf8,nounix,mapposix,nobrl,rsize=1048576,wsize=1048576,echo_interval=60,actimeo=1

Generated at Wed Nov 13 16:56:26 PST 2019 using JIRA 7.7.1#77002-sha1:e75ca93d5574d9409c0630b81c894d9065296414.