[PUP-4461] manifest changes are ignored when using hiera_include Created: 2015/04/21  Updated: 2015/06/26  Resolved: 2015/05/20

Status: Closed
Project: Puppet
Component/s: None
Affects Version/s: PUP 3.8.0, PUP 4.0.0
Fix Version/s: PUP 3.8.1, PUP 4.1.0

Type: Bug Priority: Critical
Reporter: Roman Mueller Assignee: Unassigned
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Relates
relates to PUP-4789 hiera_include does not have access to... Closed
Template:
Epic Link: 4.x Language
Story Points: 2
Sprint: Language 2015-04-29, Language 2015-05-13
Release Notes: Bug Fix
QA Contact: Eric Thompson

 Description   

I'm experimenting with puppetserver 2 / puppet-agent 1 and I'm running into an issue where changes to manifests don't get picked up by the server.
When I'm making changes to a manifest after starting puppetserver it is working for the first 2 or 3 changes I make.
After that new changes don't get picked up anymore until I restart puppetserver.
I can reproduce that behavior.

For an example session see: https://gist.github.com/roman-mueller/cf776fd180a085a551c2

I'm running:
CentOS 7.0.1406 (but also reproduced it on CentOS 7.1.1503)
puppetserver-2.0.0-1.el7.noarch
puppet-agent-1.0.0-1.el7.x86_64

I've set "environment_timeout = 0" in the "production/environment.conf" - which also should be the default.
Everything else is kept on default settings too.
I've committed my experimental Vagrant environment here:
https://github.com/roman-mueller/puppet4-sandbox
Note that this is using vboxsf but I get the same behavior with local files.

When I'm running strace on the puppetserver PID I can see for the first 2 puppet agent runs that global/manifests/init.pp gets accessed.
After that it is not getting read anymore.
I've also waited 45 minutes, but still the change was not picked up.
Only a restart of the puppetserver fixes it at that point.

WORK AROUND

There is no simple work around, but there is a simple patch that can be applied.

The problem is on line 17 of lib/puppet/pops/loaders.rb which says:

@@puppet_system_loader ||= create_puppet_system_loader()

This can be patched to read:

@@puppet_system_loader = create_puppet_system_loader()

QA


risk: medium (manual validate only for now)
probability: medium (hiera_include picking up manifest changes downstream only)
severity: medium (changes not picked up, confusing but possibly not blocking)
test level: acceptance (we should update existing acceptance to ensure an altered manifest is picked up)



 Comments   
Comment by Chris Price [ 2015/04/21 ]

Hmmm... that doesn't sound good.

I'm wondering if maybe the issue is that you are setting that setting only for your production environment, and Puppet itself is using a different environment timeout for the 'global' environment.

One thing you could try is to hit the Admin API to force a flush of the environment cache, and see if that does anything for you. If it does, that would at least narrow down the scope of the issue.

https://docs.puppetlabs.com/puppetserver/latest/services_master_puppetserver.html#admin-api-service

Comment by Roman Mueller [ 2015/04/21 ]

Sorry, didn't explain that properly.
I only have one environment: "production"
I'm making changes in a "global" module in that environment.
I've also tried not setting environment_timeout anywhere (since default should be 0).
And I also tried setting it in puppet.conf to 0.
Still, always the same issue.

Calling `/puppet-admin-api/v1/environment-cache` does not help either, it logs:

2015-04-21 09:19:12,187 INFO  [p.s.j.puppet-environments] Marking all registered environments as expired.

But running puppet agent the change still does not get applied.

Comment by Chris Price [ 2015/04/21 ]

And re-starting the server does cause the change to be applied? What about if you hit the jruby flush endpoint?

Comment by Roman Mueller [ 2015/04/21 ]

Yes, restarting the server and then running puppet agent does apply the change.

I haven't used jruby yet, I need to figure that out. I'll get back to you on that one.

Comment by Roman Mueller [ 2015/04/21 ]

OK, when I'm in the state that no changes get picked up anymore and then call `/puppet-admin-api/v1/jruby-pool`: next puppet agent run applies the changes.
I can then make changes again, usually 3.
Then it again does not pick up anymore changes until I again call `/puppet-admin-api/v1/jruby-pool`.

Comment by Chris Price [ 2015/04/21 ]

OK, thanks. That helps. Something about the environment cache flushing seems to have gone awry.

We'll have to try to set up a repro case--thanks for the Vagrant environment, that should help.

Henrik Lindberg you're not aware of anything that has changed related to environment caching in Puppet 4.0 that would be an obvious first place for us to look on this one?

Comment by Henrik Lindberg [ 2015/04/21 ]

Chris Price Sorry, I am not aware of anything obvious that has changed. Afaik there has been no recent changes to the environment caching logic.

In the described scenario, with a timeout of 0, each request for an environment should get a fresh instance and everything should again be loaded from (puppet) source. Changes to ruby code is different, but the description does say "changes to manifests".

Comment by Roman Mueller [ 2015/04/21 ]

Just to confirm: yes, I'm only making changes in manifests.
For testing this I'm just changing the target IP of "node1" here:
https://github.com/roman-mueller/puppet4-sandbox/blob/master/code/environments/production/modules/global/manifests/init.pp

Comment by Jeremy Barlow [ 2015/04/24 ]

I was able to reproduce this problem. Here are the files I used:

/etc/puppetlabs/code/hiera.yaml

---
:backends:
  - yaml
:hierarchy:
  - defaults
  - "%{clientcert}"
  - "%{environment}"
  - global
 
:yaml:
# datadir is empty here, so hiera uses its defaults:
# - /etc/puppetlabs/code/hieradata on *nix
# - %CommonAppData%\PuppetLabs\code\hieradata on Windows
# When specifying a datadir, make sure the directory exists.
  :datadir:

/etc/puppetlabs/code/hiera.yaml

---
classes:
  - global

/etc/puppetlabs/code/environments/production/manifests/site.pp

hiera_include('classes')

/etc/puppetlabs/code/environments/production/modules/global/manifests/init.pp

class global {
  host { 'node1':
    ip => '127.0.0.1',
  }
}

/etc/puppetlabs/code/environments/production/environment.conf

# Just comments, so "environment_timeout" should default to 0 since not defined in puppet.conf either

/etc/puppetlabs/puppet/puppet.conf

[master]
vardir = /opt/puppetlabs/server/data/puppetserver
logdir = /var/log/puppetlabs/puppetserver
rundir = /var/run/puppetlabs/puppetserver
pidfile = /var/run/puppetlabs/puppetserver/puppetserver.pid
codedir = /etc/puppetlabs/code

I did the same rolling experiment, changing the "ip" attribute on the "host" resource and running "puppet agent" afterward, without restarting the master or doing an admin API "flush" call of any kind.

I interpreted "success" in the new manifest being read as corresponding to the appearance of a message like the following in the agent output...

Notice: /Stage[main]/Global/Host[node1]/ip: ip changed '127.0.0.35' to '127.0.0.36' 

...as well as the updated /etc/hosts entry for the IP address.

For "failure" cases, the "notice" line was not written and the /etc/hosts file was not updated.

For a Puppet Server 2.0 master running with Puppet 4.0 code, the manifest changes were only picked up for each run up to the max-active-instances that I had configured in the jruby-puppet section of /etc/puppetlabs/puppetserver/conf.d/puppetserver.conf for the master at startup. For example, if max-active-instances was set to 1, only the first agent run picked up the manifest changes. Subsequent runs did not. If max-active-instances was set to 2, only the first two agent runs picked up the manifest changes. This suggests that some state may be cached with each the JRubyPuppet instance that is persisted since the JRubyPuppet instances are handed out from a FIFO queue.

Using the same configuration, I ran a Puppet 4.0 Ruby master under WEBrick. For this case, there would only be 1 Ruby environment in the master. The behavior basically followed the Puppet Server 2.0 behavior. Manifest changes were picked up for the first run but not for any subsequent runs.

For comparison purposes, I tried the same tests with a Puppet Server 1.0.8 master running with Puppet 3.7.5 code. Both for a Puppet Server master and a WEBrick master, the manifest changes were picked up for every Puppet agent run.

I did one additional test for Puppet Server 2.0 and Puppet 4.0. I changed the "site.pp" file to...

/etc/puppetlabs/code/environments/production/manifests/site.pp

include global

..., restarting the master afterward.

This change eliminates the use of Hiera in the inclusion of the class.

With this change, every Puppet agent run against both the Puppet Server 2.0 master and the WEBrick Puppet 4 master – even ones done for the Puppet Server master where a JRubyPuppet instance would have been used more than once – picked up the manifest changes.

The above results seem to indicate that there was some sort of change done in Puppet 4 core code which, when Hiera is involved, causes referenced manifests to be cached even if the environment_timeout is set to 0. This change seems to be independent of Puppet Server in that the behavior seems to follow that of the underlying Ruby Puppet code.

Henrik Lindberg, does this suggest that this is a problem in core Ruby Puppet 4 code and nothing Puppet Server specific? Seems like there were some known issues around caching with Hiera and requiring restarts of the master so that changes could be applied, although I hadn't understood that this might affect the manifests that Hiera pulls in as opposed to just Hiera configuration or data changes. What do you think?

Comment by Henrik Lindberg [ 2015/04/24 ]

If I read what you are saying correctly:

  • the problem is that on subsequent runs, an include via hiera_include causes the catalog to not contain the resource
    • this as opposed to having the resource but with the "old/previous" value
    • this as opposed to when doing an include in the puppet language directly

This makes me suspect the hiera_include/include combination - it must be looking at the wrong information when deciding if the class is included or not.

What happens if doing this:

include hiera_array('classes')

or

$the_classes = hiera_array('classes')
include $the_classes

or

include ['global']

Does those result in the same problem?

Ping Thomas Hallgren Any ideas what could be wrong here?

Comment by Henrik Lindberg [ 2015/04/24 ]

oh, I have a hypothesis about what could be going wrong here:

  • Functions in puppet are loaded by the system loader
  • As the functions are instantiated, they are bound to a closure - the scope they are defined in
  • When the 4.x hiera_include calls 'include', the 'include' function is called in its closure scope, not in the scope of the caller
  • The scope/environment and everything bound by the system functions are retained for the life of the master held alive because the functions hold on to their closure (holding on to a banana, held on by gorilla, holding a palm tree, rooted in a jungle).

I suspect the problem is on line 17 of lib/puppet/pops/loaders.rb which says:

    @@puppet_system_loader ||= create_puppet_system_loader()

This can be patched to read:

@@puppet_system_loader = create_puppet_system_loader()

If that works, we know what the problem is. We have however lost a bit of performance in doing this, and we should instead do something else (say, have a special scope (that does not involve a gorilla and a jungle) - this since most functions do nothing with their closure scope (a separate issue though).

Roman Mueller Are you up for patching that to see if it solves the problem?

Comment by Henrik Lindberg [ 2015/04/24 ]

I am changing this to a PUP ticket and making it critical !

Comment by Roman Mueller [ 2015/04/24 ]

Henrik Lindberg I had to change it to:

    @@puppet_system_loader = create_puppet_system_loader()

(@@ not @)

And yes, that solves the problem.

Comment by Henrik Lindberg [ 2015/04/24 ]

Thanks Roman Mueller (yeah, doh, sorry, the rest of the code obviously refers to @@puppet_system_loader).

Ok, then we know that what the cause is. Have to think about if that is the only fix we are going to make.
We should change this in 3.8.1 as well.

Comment by Henrik Lindberg [ 2015/04/27 ]

In the most pathological case if a sequence of catalogs are requested, each loading one system function there would be one copy of each such environment, up to around ~20 (the number of 4.x. functions that can be loaded from the system).

While it would be beneficial to keep the loaded system code, it can not be instantiated and bound to a closure across environment instances.
The fix to this problem is simply to change the class instance variable to a instance variable.

Comment by Thomas Hallgren [ 2015/04/27 ]

Merge to 3.x at fef28c6

Comment by Thomas Hallgren [ 2015/04/27 ]

Merged to stable at 087071d and onward to master at 6960815

Comment by Eric Thompson [ 2015/05/04 ]

validated on ubuntu14 at 3.x SHA: 761ccb5

[root@n87zti4y85shins puppet]# puppet agent -t --server $(hostname -f) --environment production
Info: Retrieving pluginfacts
Info: Retrieving plugin
Info: Caching catalog for n87zti4y85shins.delivery.puppetlabs.net
Info: Applying configuration version '1430762305'
Notice: /Stage[main]/Global/Host[node1]/ensure: created
Info: Computing checksum on file /etc/hosts
Notice: Finished catalog run in 0.05 seconds
[root@n87zti4y85shins puppet]# cat /etc/hosts
# HEADER: This file was autogenerated at 2015-05-04 11:01:47 -0700
# HEADER: by puppet.  While it can still be managed manually, it
# HEADER: is definitely not recommended.
127.0.0.1	localhost	localhost.localdomain localhost4 localhost4.localdomain4
::1	localhost	localhost.localdomain localhost6 localhost6.localdomain6
127.0.0.1	node1
[root@n87zti4y85shins puppet]# vi /etc/puppet/environments/production/modules/global/manifests/init.pp
[root@n87zti4y85shins puppet]# puppet agent -t --server $(hostname -f) --environment production
Info: Retrieving pluginfacts
Info: Retrieving plugin
Info: Caching catalog for n87zti4y85shins.delivery.puppetlabs.net
Info: Applying configuration version '1430762542'
Notice: /Stage[main]/Global/Host[node1]/ip: ip changed '127.0.0.1' to '127.0.0.42'
Info: Computing checksum on file /etc/hosts
Notice: Finished catalog run in 0.05 seconds

Comment by Eric Thompson [ 2015/05/04 ]

validated on ubuntu14.04 at SHA: 04320b0

[root@d21roj3e8z8dbun global]# puppet agent -t --server $(hostname -f)
Info: Retrieving pluginfacts
Info: Retrieving plugin
Info: Caching catalog for d21roj3e8z8dbun.delivery.puppetlabs.net
Info: Applying configuration version '1430777733'
Notice: /Stage[main]/Global/Host[node1]/ensure: created
Info: Computing checksum on file /etc/hosts
Notice: Applied catalog in 0.05 seconds
[root@d21roj3e8z8dbun global]# cat /etc/hosts
# HEADER: This file was autogenerated at 2015-05-04 15:15:34 -0700
# HEADER: by puppet.  While it can still be managed manually, it
# HEADER: is definitely not recommended.
127.0.0.1	localhost	localhost.localdomain localhost4 localhost4.localdomain4
::1	localhost	localhost.localdomain localhost6 localhost6.localdomain6
127.0.0.1	node1
[root@d21roj3e8z8dbun global]# vi /etc/puppetlabs/code/environments/production/modules/global/manifests/init.pp
[root@d21roj3e8z8dbun global]# puppet agent -t --server $(hostname -f)
Info: Retrieving pluginfacts
Info: Retrieving plugin
Info: Caching catalog for d21roj3e8z8dbun.delivery.puppetlabs.net
Info: Applying configuration version '1430777761'
Notice: /Stage[main]/Global/Host[node1]/ip: ip changed '127.0.0.1' to '127.0.0.42'
Info: Computing checksum on file /etc/hosts
Notice: Applied catalog in 0.05 seconds

Generated at Sat Dec 14 02:58:08 PST 2019 using JIRA 7.7.1#77002-sha1:e75ca93d5574d9409c0630b81c894d9065296414.