Uploaded image for project: 'Puppet'
  1. Puppet
  2. PUP-5380

Slow catalog run after updating to puppet 3.7.5



    • Type: Bug
    • Status: Closed
    • Priority: Normal
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: PUP 3.8.4
    • Component/s: None
    • Labels:
    • Environment:

      puppetserver vm w. 8vcpu, 16GB ram

    • Template:
    • Story Points:
    • Sprint:
      Language 2015-10-28
    • Release Notes:
      Bug Fix


      After upgrading 3.7.4 -> 3.7.5 (actually i was upgrading to 3.8.2 but was able to trace to performance problems to 3.7.5) I am seeing slower catalog compilations and a lot of failed puppet runs.

      I have about 800-1000 puppet clients connecting to a single puppetserver vm with 8 cores and 16GB RAM. Puppetserver configured with 9 jruby instances and 12GB of java Xmx. With 3.7.4 I have compilation times of 1-5s and now the compilation time is 15-100s if it even succeeds. The majority of my nodes fails with the WritePendingException error. I'm not using puppetdb.

      Here are the log output from puppetserver-access.log for one failed run: - - - 12/Oct/2015:11:18:27 +0000 "GET /production/node/lxserv641.smhi.se?transaction_uuid=95384261-d8f5-48a1-bf35-a4faf21fe5c9&fail_on_404=true HTTP/1.1" 200 7011 8140 36744 - - - 12/Oct/2015:11:19:03 +0000 "GET /production/file_metadatas/pluginfacts?links=manage&recurse=true&checksum_type=md5&ignore=.svn&ignore=CVS&ignore=.git HTTP/1.1" 200 305 8140 36117 - - - 12/Oct/2015:11:20:05 +0000 "GET /production/file_metadata/pluginfacts?links=manage&source_permissions=use HTTP/1.1" 404 51 8140 61710 - - - 12/Oct/2015:11:20:38 +0000 "GET /production/file_metadatas/plugins?links=manage&recurse=true&checksum_type=md5&ignore=.svn&ignore=CVS&ignore=.git HTTP/1.1" 200 60788 8140 33207 - - - 12/Oct/2015:11:21:17 +0000 "GET /production/file_metadata/plugins?links=manage&source_permissions=ignore HTTP/1.1" 200 316 8140 39250 - - - 12/Oct/2015:11:25:33 +0000 "POST /production/catalog/lxserv641.smhi.se HTTP/1.1" 200 229376 8140 255513

      and from puppetserver.log

      2015-10-12 11:23:58,642 INFO  [puppet-server] Puppet Compiled catalog for lxserv641.smhi.se in environment production in 43.13 seconds
      2015-10-12 11:25:33,160 WARN  [o.e.j.s.HttpChannel] /production/catalog/lxserv641.smhi.se
      java.nio.channels.WritePendingException: null

      Somehow my puppet master is getting overloaded. If I downgrade to 3.7.4, everything is fine. Looking at the release notes for 3.7.5 I cant really see what could cause such a performance difference, any ideas?

      My guess is that since the catalog run takes so much longer for each agent, the number of concurrent agent runs is rising which overloads the server completely. So the underlying problem is probably a dramatic performance loss. When I use mcollective to trigger a puppet run with 20 concurrent agents it works without any failed runs. It's slow, but it works. Normally, we run puppet via cron with a 900s splay time, so when a single agent run takes up to 100s there's will be a lot of concurrent runs overloading the server.


          Issue Links



              Unassigned Unassigned
              adam.winberg@smhi.se Adam Winberg
              0 Vote for this issue
              12 Start watching this issue



                  Zendesk Support