[SERVER-993] Catalog compilation time Created: 2015/10/28  Updated: 2018/02/14  Resolved: 2018/02/14

Status: Closed
Project: Puppet Server
Component/s: Performance, Puppet Server
Affects Version/s: SERVER 2.1.2
Fix Version/s: None

Type: Bug Priority: Normal
Reporter: Vincent Assignee: Vincent
Resolution: Fixed Votes: 2
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Linux Centos 7 - Fully installed by yum
32vcpu / 32Go Ram

Java HEAP 25Go
Tried max-actives-instances 4/16/32 without performances notable differences

puppetlabs-release-pc1.noarch 1.0.0-1.el7
puppetserver version: 2.1.2
puppet --version 4.2.2
ruby 2.1.6p336 (2015-04-13 revision 50298) [x86_64-linux]
openjdk version "1.8.0_65"


Attachments: HTML File debug_puppetserver     File performance.tgz     File puppet-server-release.jar.part-aa     File puppet-server-release.jar.part-ab     File puppet-server-release.jar.part-ac     File puppet-server-release.jar.part-ad     File puppet-server-release.jar.part-ae     Zip Archive site.zip    
Issue Links:
Relates
relates to SERVER-1052 Consider allowing JRuby compile.mode ... Closed
relates to PUP-5554 Improve lexer speed on jruby Closed
Template:
QA Contact: Erik Dasher

 Description   

I recently started the migration of our production puppet from Puppet 3 / Apache with passenger to Puppet 4 / PuppetServer

The problem I encounter is about catalog compilation performance, indeed the duration passed from 1~3seconds on old platform to 20~50seconds on the new one.

I tried to get some DEBUG information by puppetserver with activating DEBUG loglevel, and with activating ruby profiling.
Unfortunately I can't obtain the precise informations I need, because it seems that the problem resides with compilation (ruby/puppet code).

[...]
2015-10-28 10:21:17,416 DEBUG [c.p.p.LoggingPuppetProfiler] [compiler find_node] (87 ms) Found node information
2015-10-28 10:21:20,731 WARN  [puppet-server] Puppet Unrecognized escape sequence '\.'
2015-10-28 10:21:20,801 WARN  [puppet-server] Puppet Unrecognized escape sequence '\.'
2015-10-28 10:21:20,843 WARN  [puppet-server] Puppet Unrecognized escape sequence '\.'
2015-10-28 10:21:21,877 WARN  [puppet-server] Puppet Unrecognized escape sequence '\;'
2015-10-28 10:21:22,213 WARN  [puppet-server] Puppet Unrecognized escape sequence '\.'
2015-10-28 10:21:22,218 WARN  [puppet-server] Puppet Unrecognized escape sequence '\.'
2015-10-28 10:21:22,503 WARN  [puppet-server] Puppet Unrecognized escape sequence '\*'
2015-10-28 10:21:22,506 WARN  [puppet-server] Puppet Unrecognized escape sequence '\*'
2015-10-28 10:21:39,209 DEBUG [c.p.p.LoggingPuppetProfiler] [compiler set_node_params] (2 ms) Compile: Set node parameters
[...]
2015-10-28 10:21:40,676 DEBUG [c.p.p.LoggingPuppetProfiler] [compiler finish_catalog] (1 ms) Compile: Finished catalog
2015-10-28 10:21:40,682 DEBUG [c.p.p.LoggingPuppetProfiler] [compiler compile myenv vincent-centos7.mydomain.local] (23266 ms) Compiled catalog for vincent-centos7.mydomain.local in environment myenv

Most of the 23 seconds (~17s) duration is with compilation, which I suppose is starting just after the puppet parser validate, generating the output "Unrecognized escape..." which I can produce manually.

My need I to have output on what happens next, the compilation.
Is there a way have a trace to find why there is such a performance issue ?

As an attachment there is Profiler output of the full test catalog application



 Comments   
Comment by Chris Price [ 2015/10/28 ]

Vincent one idea to help narrow it down might be to run `puppet master --compile` for a given node, both in your old Puppet 3.x environment and then in your new Puppet 4.x environment. This might help determine whether the webserver has anything to do with the compilation times or not.

Comment by Chris Price [ 2015/10/28 ]

ping Henrik Lindberg just in case you recognize these error messages.

Comment by Henrik Lindberg [ 2015/10/28 ]

Two regressions were found in versions after 3.7.5 that will be fixed in 3.8.4.

One regression was that file watching was accidentally turned on for files in directory environments. That coupled with the changed behavior in directory environments caused a lot of overhead.

The other was about excessive calls to check if future parser is in effect or not. This code is refactored in 3.8.4 which resulted in far fewer loads of environment.conf. The slowdown is irrespective of if future parser is in use or not. It also depends on what are in the manifests.

Comment by Henrik Lindberg [ 2015/10/28 ]

Those problems would not affect Puppet 4.x though as both concepts are gone (there is only one parser, and only directory environments, and file watching have been removed).

Comment by Henrik Lindberg [ 2015/10/28 ]

On 3.x - was future parser in use or not? I would expect a small speedup between 3.x future parser and 4.x.

Comment by Henrik Lindberg [ 2015/10/28 ]

I know there is a bug reported about accuracy of puppet compile profiling information (there are steps missing in the output on 4.x) - we hope to get that fixed before 4.3.0 release.

Comment by Vincent [ 2015/10/29 ]

Chris Price
On puppet 3 :
for a given node not already migrated passenger shows "Compiled catalog in 0.56 seconds"
for the same node with command puppet master --compile shows "Compiled catalog in 11.98 seconds"

On puppet 4 :

= Puppetserver
for a given node show "Compiled catalog in 27.25 seconds"
and with puppet master --compile "Compiled catalog in 7.05 seconds"

For information, due to these poor performance I installed this morning a new server on puppet4 but using passenger (5.0.21) instead of puppetserver, to compare performance in my case
And the performances are around 3 times faster on passenger. I don't know how to explain this.
Also, I don't understand the 0.xx compilation time I have on puppet3 and not on puppet4

= Puppetmaster on passenger
for a given node "Compiled catalog in 6.20 seconds"
and with puppet master --compile "Compiled catalog in 9.27 seconds"

Henrik Lindberg

My puppet 3 server is 3.7.4 with passenger-4.0.56, future parser isn't activated.

Comment by Chris Price [ 2015/10/29 ]

Vincent OK, that seems like an issue that we'd definitely like to dig into on the server side.

Our best chance of being able to sort this out would be if we are able to create a repro case on our side; is there any chance that we could get a copy of some of your module code and classification info about a node that is triggering this performance issue?

Comment by Vincent [ 2015/10/30 ]

Chris Price

I finally found exactly where the performance issue is.
This isn't with our modules code because I tried this on a new raw environment :

  • my test node
  • adding modules one by one to find which cause the bottleneck

No results... compilation in less than 1 second with all of our modules.

So I added

  • all nodes manifest

And now, again, 30seconds of compilation.

So the problem is :

  • 12 node manifests (very different sizes)
  • 790 nodes declarations
  • 69041 lines non-empty, non-commented

Which takes at least 20seconds to puppetserver to parse it

I don't really know how to optimize this... I thought about modules just like you suggest, but nodes, this is unexpected.

Comment by Chris Price [ 2015/10/30 ]

That's very interesting. Henrik Lindberg do you do any caching of the nodes / site.pp manifest? Is it possible that somehow we aren't leveraging the caching of that on Puppet Server?

Vincent I wonder if we could just substitute in some 'notify' resources in place of all of your real classes in your `site.pp` file, so that it wouldn't have any dependencies at all on module code, and see if compiles are still slow. If they are, then perhaps you could send us the site.pp file so we'd have a concrete example for us to do some testing against?

Comment by Henrik Lindberg [ 2015/10/30 ]

With directory based environments, I think the effect is that there is no caching of the site manifest.

A work around would be to break out the 69000 lines of code in the site manifest and place that in classes. That would also reduce the amount of time it takes for the cold run (since not all of the node logic would be used at the same time).

Comment by Vincent [ 2015/11/18 ]

Hi, sorry for the time since this answer...

So I tried some other tests :

1 - I tried the suggestion of Henrik Lindberg to manage the 70000 lines of code within node manifests by replacing all declarations just with one class for each node

Same performances at all

2 - Then I tried the suggestion of Chris Price to replace all classes content by : notify

{"$name":}

Same performances at all

3 - Finally I tried both at the same time (only one class for each node, and each class contain only notify)

1 second to compile.

So It really seems that the combination of number of nodes and number of elements in classes is the concrete drawback in this case.
But, It doesn't explain why I don't have this problem with Passenger, which I have installed in production now, because it can handle enough nodes and java don't.

Comment by Chris Price [ 2015/11/18 ]

Vincent thanks for the additional info. This is definitely something we'd like to investigate on our end.

Can you provide a few more details on your Passenger setup? Is it using the same version of Puppet, and the same exact directory of manifest/module code, as your Puppet Server setup does/did?

Also, would it be possible for you to share the version of your site.pp that you used for #2 above, where it's just got all of the 'notify' resources in it? It seems like that would accelerate our ability to set up a repro case on our end.

Comment by Henrik Lindberg [ 2015/11/18 ]

Vincent I am interested in how you organized the breakout of the code. For what I envisioned being relevant, only nodes, and no classes at all would be in the site wide part of the logic and all classes being autoloaded. Could you elaborate on how you organized the experiments?

Comment by Vincent [ 2015/11/19 ]

Chris Price Sorry, I have misunderstood your suggestion and my #2 test is a replacement of the content of all "init.pp" for all classes
So I made a new test, as your suggestion in site.pp with no classes but only notify for each node

Result = 2s of compilation

And of course I used the exact same code for my tests.

My passenger setup is :
Centos-7.1.1503
 
# yum 
Apache 2.4.10
puppetlabs-release-pc1 1.0.0-1
puppet-agent 1.2.7-1
puppetdb-termini 3.2.0-1
 
#gems
bigdecimal (1.2.4)                                                                                                                                                                                                                           
deep_merge (1.0.1)                                                                                                                                                                                                                           
hocon (0.9.3)                                                                                                                                                                                                                                
io-console (0.4.3)                                                                                                                                                                                                                           
json (1.8.1)                                                                                                                                                                                                                                 
minitest (5.8.2, 4.7.5)                                                                                                                                                                                                                      
net-ssh (3.0.1, 2.9.2)                                                                                                                                                                                                                       
passenger (5.0.21)                                                                                                                                                                                                                           
power_assert (0.2.5)                                                                                                                                                                                                                         
psych (2.0.5)                                                                                                                                                                                                                                
rack (1.6.4)                                                                                                                                                                                                                                 
rake (10.1.0)                                                                                                                                                                                                                                
rdoc (4.1.0)                                                                                                                                                                                                                                 
rubygems-update (2.4.8)                                                                                                                                                                                                                      
stomp (1.3.4, 1.3.3)                                                                                                                                                                                                                         
test-unit (3.1.5, 2.1.7.0)
 
#Tuning Apache
StartServers 32
ServerLimit 128
MaxClients 128
MaxRequestsPerChild 1000
ThreadsPerChild 32
KeepAlive On
KeepAliveTimeout 15
MaxKeepAliveRequests 100
 
#Tuning Passenger
PassengerMaxPoolSize 128
PassengerMaxRequests 1000
PassengerStatThrottleRate 30
PassengerPoolIdleTime 300
PassengerHighPerformance on
PassengerMinInstances 32

My site.pp is really simple in sense I made it by a script.
Only one true machine on this environment for my test, you can create the 794 fake nodes just as me by this simple template, a simple for loop will do the trick

node "host${i}.toto.com" {
  notify {"$name":}
}

But since I tried your suggestion correctly, there is no real problem here.

Comment by Vincent [ 2015/11/19 ]

Henrik Lindberg There is the ordering of my experiments

#1
Thinking problem is about the way our classes are made, I have create a raw environment, with 1 node, and adding classes slowly to see if compilation time grows.
No changes at all.

#2
Then I thought the problem is about nodes, so I've added my differents nodes.pp files, one by one.
There I see the compilation time growing next to each node file declaration I add.

#3
Then I tried your suggestion, by shorten the nodes definitions, and I made just ONE class declaration on all of my nodes (class which calls some others to manage system basics : crond, ntp, ssh_keys, editors...etc..) with no parameters.
No changes at all

#4 Then I tried to shorten my classes, but using full nodes definitions (many classes, many objects).
So I replaced ALL code in init.pp and in some of defined types and sub-classes (and just init.pp, I dont removed templates and parameters) with a notify.

class toto (
  parameter1 => false,
  ....
) {
  notify {"$name"':}
}

No changes at all... Which surprises me

#5 Then I tried the combination of #3 and #4 : Only one class for each node and all classes just containing a notify
1second to compile

#6 Then I tried the suggestion of Chris Price : Only notify for each node without any call to any classes. Same result if classes are normal or modified like #4
1 second to compile

Comment by Chris Price [ 2015/11/19 ]

Vincent thanks...

Only one true machine on this environment for my test, you can create the 794 fake nodes just as me by this simple template, a simple for loop will do the trick

node "host${i}.toto.com" {

notify

Unknown macro: {"$name"}

}

But since I tried your suggestion correctly, there is no real problem here.

Can you suggest a modification to the script that you describe above that would generate something that does reproduce the issue for you? It sounds like instead of the notify I'd just be putting in an include toto?, and that the toto class could just have a notify? If that is the case, is toto the same class for each node? Or do you have a unique class name for each node?

Or if you can share the site.pp and classes from your #4 above in a tarball or something, that seems like it would allow us to repro on our end.

Comment by Vincent [ 2015/11/23 ]

Test environment to reproduce performance difference between puppetserver and puppetmaster

Comment by Vincent [ 2015/11/23 ]

Hello,

So I managed to reproduce the problem with community modules, and following my previous tests I have done, where I really tried to find the bottleneck in our code, it seems that the performance problem is also seen with community modules.

As an attachment you can find an environment with 36 modules (our have 97) and 1000 nodes (1 true for test at first line, 999 generated by script, containing only class calls).
Using puppetserver I have this result :

INFO [puppet-server] Puppet Compiled catalog for mytrue_vm_test.com in environment performance in 19.33 seconds

And with puppet master --compile :

Notice: Compiled catalog for mytrue_vm_test.com in environment performance in 6.40 seconds

Comment by Jeremy Barlow [ 2015/11/24 ]

I may have been able to reproduce the problem locally with this example. I used the puppet module tool to load the latest versions of many of the same community modules as were included in the attached tarball. The tool encountered conflicts with some of the modules but also ended up recursively loading some additional modules that weren't in the tarball. In total, I ended up with 49 modules being loaded. I also created a site.pp, attached, which looks a lot like the one that Vincent had uploaded - with one node resource including just the apache class and 999 others including a named class matching the corresponding module.

With this manifest and modules loaded, I did a few agent runs against a puppetserver 2.2.0 master vs. a WEBrick-hosted Ruby 4.3.0 puppet master on a CentOS 7 VM. Note that the Ruby puppet master was being run under the bundled MRI Ruby 2.1.7. puppetserver runs JRuby under Ruby 1.9.3.

I looked for the "Compiled catalog for ... in environment production in X seconds" message to see how long the catalog compiles were taking. On average, the runs against the puppetserver 2.2.0 master were taking about 12 - 15 seconds vs. just 5 - 5.5 seconds for the WEBrick master. With some extra instrumentation added to the Ruby Puppet code, I was able to see that the big difference in performance between the puppetserver and WEBrick-hosted masters was the amount of time taken behind the yyparse call that occurs here - https://github.com/puppetlabs/puppet/blob/4.3.0/lib/puppet/pops/parser/parser_support.rb#L242 - when parsing the site.pp manifest for known_resource_types.

From this, I created the following stripped down Ruby script:

doparse.rb

require 'benchmark'
 
require 'puppet'
require 'puppet/pops'
 
file_to_parse = 'site.pp'
 
cnt = 1
if ARGV.length > 0
    cnt = Integer(ARGV[0])
end
 
parser = Puppet::Pops::Parser::Parser.new()
 
total = 0
 
cnt.times do |i|
    elapsed = Benchmark.realtime { parser.parse_file(file_to_parse) }
    puts "parse #{i+1}: #{elapsed}"
    total += elapsed
end
 
puts "mean: #{total/cnt}"

I then ran the following:

> RUBYLIB=/opt/puppetlabs/puppet/lib/ruby/vendor_ruby /opt/puppetlabs/puppet/bin/ruby ./doparse.rb 50
...
mean: 2.738 seconds  
 
> /opt/puppetlabs/server/bin/puppetserver ruby doparse.rb 50
...
mean: 8.705 seconds

I also ran this command against a few different versions of ruby under rbenv on my OSX laptop, using the jruby executable for runs under JRuby, as opposed to with the puppetserver wrapper. For this set of tests, I ran Puppet from source at the 4.3.0 git tag. The JRuby runs were faster in this case. Not sure if that's due in part to the JRuby executable not having to go through a ScriptingContainer like we do for Puppet Server - or maybe in part that I was running natively as opposed to going through a VM like I did with CentOS 7? In any event, I did still see a significant discrepancy across 50 site.pp parses from an MRI-based Ruby as opposed to JRuby:

version mean time
--------- ---------------
MRI 1.9.3 3.014 seconds
MRI 2.1.0 2.678 seconds
JRuby 1.7.20 (emulating Ruby 1.9.3) 5.495 seconds
JRuby 1.7.22 (emulating Ruby 1.9.3) 5.398 seconds
JRuby 9.0.4.0 (emulating Ruby 2.2.2) 4.886 seconds

Not sure at this point if this suggests that there are significant differences between the JRuby and MRI Ruby versions of Racc::Parser for parsing this particular kind of manifest file for some reason? The next step may to instrument deeper into the Racc parsing to see where the bottleneck may be. Chris Price or Henrik Lindberg - what do you think?

Comment by Henrik Lindberg [ 2015/11/24 ]

MRISeems to be a known problem: http://osdir.com/ml/lang.jruby.user/2008-05/msg00279.html (note the year).
MRI uses a native driver. If the JRuby Racc is written in Java I am not surprised. There seems to be a native extension since 2014. https://twitter.com/jruby/status/504323381509255168.

There is back and forth on this ticket (2013/2014) https://github.com/tenderlove/racc/issues/22

As usual in the Ruby world, the information and release notes are pretty terrible. I failed to get any good information on the state of racc on jruby.

Are we sure we are using the native Racc extension on JRuby ? (Not that I know how to check...)

Comment by Thomas Hallgren [ 2015/11/26 ]

FWIW, I tested this with jruby and racc 1.4.14 which incorporates the PR mentioned in Racc issue 22. I verified that it indeed is using the cparse-jruby.jar. Problem is, I don't see any performance boost compared to racc 1.4.11 where that change isn't present. When running profiling it seems most of the time is spent in Puppet::Pops::Parser::Lexer2#lex_token but I'm not sure whether the profiler includes time spent in native methods or not.

I looked at the culprit lex_token method and my qualified guess is that the optimizations made there can be part of the problem. While they provide great results on the MRI, some of them may actually be really bad for the jruby machine.

I'll make a minor effort just to check if this theory holds true.

Comment by Thomas Hallgren [ 2015/11/27 ]

I did the aforementioned minor effort and it gave some promising results. Attached as PR-4470 on PUP-5554. Would be great if someone could test that to verify it's impact on this issue.

Comment by Thomas Hallgren [ 2015/11/28 ]

The 'doparse.rb' attached in a previous comment needs to be modified to really parse all pp files found in an environment. At present, it will only parse one file (the 'site.pp'). The parser doesn't automatically load referenced classes.

I added this piece of code to really parse everything. It doesn't care about settings. It just assumes that all modules reside beneath the environment:

require 'benchmark'
 
require 'puppet'
require 'puppet/pops'
 
 
cnt = 1
if ARGV.length > 0
    cnt = Integer(ARGV[0])
end
 
env_to_parse = '/etc/puppetlabs/code/environments/production'
if ARGV.length > 1
    env_to_parse = ARGV[1]
end
 
parser = Puppet::Pops::Parser::Parser.new()
 
total = 0
 
Dir.chdir(env_to_parse)
cnt.times do |i|
    elapsed = 0
    Dir.glob('**/*.pp') do |file|
        elapsed += Benchmark.realtime { parser.parse_file(file) }
    end
    puts "parse #{i+1}: #{elapsed}"
    total += elapsed
end
 
puts "mean: #{total/cnt}"

Comment by Thomas Hallgren [ 2015/11/29 ]

Some additional info that might be valuable.

I tested with:

export JRUBY_OPTS="-Xcompile.invokedynamic=true"

and got very interesting results. The initial iterations takes longer but interesting things happen when JIT really kicks in. The iterations are now 25% faster on average and 15% faster than MRI.

Typical run with above option and JRuby:

parse 1: 11.730755757999873
parse 2: 6.260041269999874
parse 3: 5.014679593001688
parse 4: 4.05624749100059
...
parse 16: 3.0298004719992377
...
parse 37: 2.927366432998383
...
parse 64: 2.875723538999864

From there on the optimization seems to be complete.

Same run with MRI (2.2.1)

parse 1: 3.822166640006344
parse 2: 3.5026318540012653
...
parse 16: 3.4431655680054973
...
parse 64: 3.4878210889987713

Comment by Chris Price [ 2015/11/30 ]

Thomas Hallgren thanks for looking into this. I have a few questions:

1. Is the relevant code here only used for parsing the site.pp file? Or for parsing all Puppet code? If it's used for parsing all puppet code, then do we feel like we understand why Vincent was only seeing these performance issues with certain variants of his site.pp? Does that have to do with environment caching?
2. Do you see the performance benefit from invokedynamic without the linked PR against Puppet? Or do we need that patch before we can get the performance improvement?
3. Conversely, do you have any data on the performance improvements with just your Puppet patch, and without invokedynamic?

Comment by Thomas Hallgren [ 2015/11/30 ]

1. I adjusted the code to parse all files. The problem is apparent both when just parsing the site.pp and when parsing all code but it is more apparent when parsing all code. I can't see that the problem is tied to certain code variants.
2. invokedynamic will give a boost without the fix too. About 20%.
3. Just the puppet patch will yield about 55% better performance (i.e. it executes in 45% of the time). The numbers noted in the PR description are without invokedynamic.

Comment by Chris Price [ 2015/11/30 ]

Thomas Hallgren ok, thanks.

Vincent if you are interested in trying out either or both of those suggestions, we'd love to hear your results. The -Xcompile.invokedynamic=true can just be added to the java opts in, e.g. /etc/sysconfig/puppetserver. For the Puppet code changes in PUP-5554 you'd need to take the patch from that PR and apply it to your local Puppet ruby code, or, alternately you could check out that git branch and point puppet server to the source code by modifying the ruby-load-path setting in puppetserver.conf. If you're interested in trying that out but need any help figuring out how to set up, let us know and we can provide more detailed instructions.

Comment by Thomas Hallgren [ 2015/12/01 ]

I updated PUP-5554 with PR-4476 which targets stable instead of master.

Comment by Vincent [ 2015/12/01 ]

Chris Price So following your suggestion I tried out both :

With PR-4476 :
1 : 26.29s
2 : 15.85s
3 : 17.75s
4 : 17.78s
5 : 19.53s
6 : 18.20s

So, around 20% better.

But still so far from passenger :
1 : 7.46
2 : 7.16
3 : 5.43
4 : 5.01
5 : 5.64
6 : 5.23

With -Xcompile.invokedynamic=true the puppetserver dont start, no usefull information in log, even in debug mode.

Comment by Chris Price [ 2015/12/01 ]

Vincent whoops, my fault. Can you try -Djruby.compile.invokedynamic=true as the JVM arg?

Comment by Vincent [ 2015/12/01 ]

Chris Price I tried with java option :

1 : 23.98
2 : 17.68
3 : 17.63
4 : 20.11
5 : 21.21
6 : 19.89

And so I tried with both Java option and PR-4476
1 : 20.58
2 : 16.95
3 : 17.21
4 : 19.13
5 : 16.91
6 : 17.03

Substancial improvments indeed, but still far from Pupper master performance

Comment by Chris Price [ 2015/12/01 ]

Vincent thanks. These latest numbers you are showing in your last few comments - are they based on the site.zip file that you provided?

Comment by Vincent [ 2015/12/01 ]

yes, the 4 last tests I made are all with environment provided here

Comment by Chris Price [ 2015/12/01 ]

Great, then we should be able to dig in a bit deeper on our end. Are the numbers you are quoting just coming directly from the log messages (e.g. `Puppet Compiled catalog for mytrue_vm_test.com in environment performance in 19.33 seconds`) under Puppet Server and Passenger respectively?

Thomas Hallgren were you using that same site.zip file for your parsing benchmarks?

Comment by Chris Price [ 2015/12/01 ]

Vincent also what version of MRI ruby are you using in your passenger setup?

Comment by Thomas Hallgren [ 2015/12/01 ]

Yes, I used the same zip. The major part of the performance diff must come from some place other than the parser.

Comment by Vincent [ 2015/12/01 ]

Chris Price Yes, all my results come from puppetserver.log and puppetmaster.log
But dont have any idea how to get MRI ruby version, my ruby is : ruby 2.1.6p336 (2015-04-13 revision 50298) [x86_64-linux]
But said in ticket environment, I suppose you need something else ?

Comment by Chris Price [ 2015/12/01 ]

Vincent no, sorry, that's what we needed. The text with the environment info was collapsed by Jira so I didn't see that part in there. Thanks! Will try to allocate some more time to perf testing on this on our end ASAP.

Comment by Jeremy Barlow [ 2015/12/01 ]

I did some more local testing on my CentOS 7 VM - using the "doparse.rb" example to benchmark parser performance of the MRI-based ruby 2.1.7 (delivered as part of the "puppet-agent 1.3.0" package) vs. puppetserver 2.2.0 (which includes JRuby 1.7.20.1). Note that puppetserver was running under JDK version 1.7.0_51.

For the first set of tests, I ran 50 iterations of each of the following combinations with code very similar to the original doparse.rb file that I'd listed earlier. To limit the scope a bit, I only had it parse the 1000 node "site.pp" file from before, rather than the full set of manifests delivered in the zip file. For the mean times below, I threw away the first 10 measurements from each run, considering that the first few runs tend to take significantly longer while code paths are still being optimized.

In the columns below, a "parser code version" of "pup-agt 1.3.0" means that just the standard code from puppet-agent 1.3.0 was used. "pup5554 pr" means that the Ruby code from Thomas Hallgren's PR for PUP-5554 was patched in for the test run. I also experimented with the JRuby "compile.mode" flag - see https://github.com/jruby/jruby/wiki/JRubyCompiler#tweaking-and-troubleshooting. In Puppet Server today, we hardcode a value of OFF for the "compile.mode" flag. As the table below shows, though, running with the JRuby default value of JIT instead seems like it might offer a significant improvement over OFF.

Configuration Parser code version invokedynamic JRuby compile.mode Mean time (in seconds)
MRI pup-agt 1.3.0 N/A N/A 2.909
MRI pup5554 pr N/A N/A 2.197
puppetserver pup-agt 1.3.0 false OFF 8.566
puppetserver pup5554 pr false OFF 6.247
puppetserver pup5554 pr true OFF 6.348
puppetserver pup5554 pr false JIT 3.218
puppetserver pup5554 pr true JIT 2.848

For at least this set of experiments, it appears that the changes from the PR for PUP-5554 are beneficial for both MRI and Puppet Server / JRuby. For Puppet Server specifically, using a "compile.mode" of JIT appears to have a significant benefit over the value of OFF. The results for the invokedynamic flag seemed a bit inconclusive. It may help most only when the "compile.mode" of JIT is being used.

It does still seem that MRI outperforms Puppet Server / JRuby in all configuration scenarios. It may be that for the one scenario Thomas Hallgren saw where JRuby could outperform MRI that parsing all of the manifests under the production environment vs. just the one "site.pp" manifest was important.

I also did a follow-on test where I ran the variation of doparse.rb which parses over all of the manifests in the production environment 100 times - comparing the performance of MRI vs. puppetserver with "compile.mode" set to JIT and "invokedynamic=true" and with the changes from the PR for PUP-5554 in place. In this case, the mean time for MRI across all runs was 3.74 seconds vs. 5.95 seconds for PuppetServer. The numbers for the final 10 or so iterations of each was a bit closer:

For MRI:

parse 90: 3.5940851399999962
parse 91: 3.396483578999994
parse 92: 3.3532608970000046
parse 93: 3.699473139000003
parse 94: 3.738742035999995
parse 95: 3.809521127999994
parse 96: 3.7104314179999958
parse 97: 3.390724071
parse 98: 3.3356633929999995
parse 99: 3.2623399600000074
parse 100: 3.3991133940000005

For Puppet Server:

parse 90: 4.385999999999984
parse 91: 4.403999999999978
parse 92: 4.330999999999979
parse 93: 4.349999999999985
parse 94: 4.441999999999974
parse 95: 4.836999999999996
parse 96: 4.361999999999984
parse 97: 4.28099999999997
parse 98: 4.312999999999977
parse 99: 4.31499999999998
parse 100: 4.32699999999998

Vincent, would you be interested in testing catalog compilation with JRuby "compile.mode" set to JIT instead of OFF? Unfortunately, to do this today, you would have to make a change to the Clojure code which is embedded in the puppet-server-release.jar from the puppetserver package - or we could build a custom, experimental one for you to play with. Note that we haven't done any significant testing with a value of JIT, so we can't vouch for the overall stability of using this value with Puppet.

A couple of other notes on the "compile.mode" setting...

The default value of this setting in standard JRuby is JIT. So presumably any test runs that Thomas Hallgren did earlier may have been done with this set to JIT rather than the OFF value we use in puppetserver today.

With a value of JIT, the first few parse attempts after the Java process can be considerably slower than with a value of OFF. For example, the very first time to parse all manifests in the production environment took 44.739 seconds for JIT and just 12.388 for OFF when I tried. You'd have to do a significant number of iterations before all of the optimizations kick in - seemed like about 30 - 40 from my one quick test.

Comment by Chris Price [ 2015/12/01 ]

Jeremy Barlow that is interesting stuff, thanks. It might be nice for us to also set up a test environment where we are trying to basically run the exact stuff that Vincent is running, though... with his same site.zip files and full catalog compiles, just to make 100% sure that the perf difference is all in the parsing? Or do you feel like on your first pass at this you did enough instrumentation to where you are confident that the only significant difference is in the parsing?

Also, Vincent, do you use the environment_timeout setting at all? If the performance difference is all related to parsing, then the environment caching stuff seems like it would probably make a big difference for you.

Comment by Thomas Hallgren [ 2015/12/01 ]

Three comments on Jeremy's thorough testing:

1. All tests that I ran used JRuby 9.0.4.0 / JDK 8. I made some tests with JRuby 1.7.19 / JDK 8 too and that was significantly slower. Jeremy Barlow, if you have the time, I'd be interested to see what numbers you get using JRuby 9.0.4.0 / JDK 8.
2. My testing didn't go through PuppetServer at all. I used plain JRuby so yes, JIT was always enabled.
3. I agree with Chris Price about testing full compiles. Reason being that with my PR we already reach the point where JRuby is faster in the parsing phase provided the latest versions are used with the JIT and dynamic invocations. What is unknown to me is the impact of that during evaluation. Is the evaluation significantly slower? Can it be optimized too?

Comment by Vincent [ 2015/12/02 ]

Jeremy Barlow No problem for me to try with a new jar.
But sorry I'm not able to make this change by myself, if you could provide a modified .jar it would be great

Chris Price trying changes with environment_timeout are not in my perspective for now, documentation is not encouraging this, and never used before. Why puppetmaster dont need it ?
In fact, when we migrate our platform on Puppet4 we wanted to go on new features provided with.
Puppetserver was one of it, especially because we heard the benefits from conferences which seems to be a good solution in our case (we had 2 puppetmaster on passenger to loadbalance our 300seconds runinterval and ~800 nodes).
But performance was so far from we expected (at least 3x less puppet nodes served by hour), we installed puppet4 on puppetmaster/passenger instead of puppetserver.

Comment by Henrik Lindberg [ 2015/12/02 ]

The environment timeout is set to 0 by default because it was surprising to users that changes to files in their environment did not have an immediate effect. With a reasonably current puppet, there are no known issues that should prevent environment timeout from being used in production (you may want it to be off, or short for development).

Rue of thumb: if an increased environment timeout does not have an impact on subsequent compilations (within that timeout), then the performance problem is unrelated to loading and parsing.

Comment by Vincent [ 2015/12/02 ]

Henrik Lindberg I tried with timeout 30m, and the compilation time is about 1s in logs. But, as expected, any modification at modules or at nodes level dont change anything during cache timeout. This is a valuable workaround.

Comment by Henrik Lindberg [ 2015/12/02 ]

I am glad that worked out well. Now we also definitively know it is a load/parse performance problem as oppose to an evaluation/compile problem.

If you need to evict the environment cache, there is a way to do that with PuppetServer. Note that if you have many environments (for testing and development) and you set 30m for all of them you may need a lot of memory. You can set those that are for development to have a lower timeout (or 0) in their environment.conf file.

Comment by Chris Price [ 2015/12/02 ]

Vincent here are the docs for how to flush the environment cache on demand: https://docs.puppetlabs.com/puppetserver/latest/admin-api/v1/environment-cache.html

For many users, we recommend setting the environment_timeout to 'unlimited' and then hitting that HTTP endpoint to flush the caches when code changes are rolled out. As Henrik Lindberg pointed out, you can set a different value for the timeout for different environments, so if you have a development environment where you want the changes to be visible immediately, you can set it to zero.

Until recently the default value for this setting was to enable this caching, so most of our performance testing of Puppet Server vs. Passenger so far has probably included the caching, which may explain why you are seeing big performance differences without the caching enabled. We'd definitely like to improve the performance for the case where the caching is not enabled, though, so thanks very much for bringing it to our attention and helping to provide repro steps!

I think it should be easy for us to build a jar that has Jeremy Barlow's changes in it and attach it to the ticket so that you can try it out. Thanks again!

Comment by Jeremy Barlow [ 2015/12/02 ]

Vincent, I attached a puppet-server-release.jar, split into 5 parts due to size restrictions on individual attachments for JIRA. To reassemble this, you should just need to download each of the puppet-server-release.jar.part* files into a directory and run:

cat puppet-server-release.jar.part-* > puppet-server-release.jar

Here's what I'd expect to see if you run md5sum against the reassembled jar:

dfcfc130755796c3a242e27f3e8d249e  puppet-server-release.jar

This jar is basically the same as what is packaged for the Puppet Server 2.2.0 release for CentOS 7. The only difference should be that I updated one of the Clojure namespaces within it to set the compile.mode for JRuby ScriptingContainers to JIT instead of the default, OFF. To install this, you should just need to swap out this puppet-server-release.jar file with the file installed by default at /opt/puppetlabs/server/apps/puppetserver/puppet-server-release.jar and just do a service puppetserver restart. It would be good to backup the original puppet-server-release.jar file there so that you can restore it when you are done experimenting with this file.

During the startup of the puppetserver service, you should see a message like the following in the /var/log/puppetlabs/puppetserver/puppetserver.log file:

2015-12-02 11:59:12,561 INFO  [clojure-agent-send-pool-0] [p.s.j.jruby-puppet-internal] prepping scripting container with compile-mode: JIT

This would indicate that the custom jar file has been installed / is working as expected.

We definitely wouldn't recommend trying to use this for production even if it does appear to improve the performance you've been seeing with your Puppet code on the Puppet Server platform. We would definitely be interested in what impact this would have for the performance with your test case. Note as mentioned earlier that you may need to do a significant number of catalog compilations against a running Puppet Server before all of the JIT optimizations have completed - about 30 - 40 or so in my case locally - and, therefore, any potential performance benefit could be seen.

If this does prove to be beneficial for you, we may consider making the compile.mode flag externally configurable for Puppet Server in some future release.

Thanks for any time you may have to look at this further!

Comment by Jeremy Barlow [ 2015/12/02 ]

Henrik Lindberg, it is interesting to me that when Vincent enabled environment caching that the time to compile dropped to ~1 second for Puppet Server. As mentioned earlier, I think the Racc parsing of the site.pp file, which has 1000 node resources, is what has appeared to be responsible for a large percentage of the time spent in the catalog compilation - several seconds at best from the testing I and Thomas Hallgren had been doing. You had mentioned earlier that you didn't think the parsed result for the site manifest would be cached. Do these results suggest that this file is actually being cached per the environment? Or maybe you meant that an environment-specific site.pp file would be cached and that if the site.pp were in the "global" manifests directory that it would not be cached?

Comment by Henrik Lindberg [ 2015/12/02 ]

I have not looked at what it actually does - but I do think that the "initial import" (i.e. Puppet settings "code" , and "manifest") may possibly always be parsed - but I may be wrong, Requires debugging to figure out (or by making a change that is an error in site.pp while the environment is still cached).

The caching that takes place in an environment is the binding of "type/name" to code. The name of the code loaded from "code" and "manifest" (the initial import) does end up in a named entity - the class having the name '' (an empty string; aka. the 'main' class (although it does not have this name internally) - so, given that, it is possible that it could have such a cache entry.

Is Vincent still running with the very large site.pp ?

There is no difference wrt. where the site.pp comes from - it is all the code that is handled in the "initial import" that is either cached or not under the class 'main' (which we should check to see how it actually behaves).

Comment by Nick Walker [ 2015/12/02 ]

site.pp is cached

[root@master20153-centos ~]# puppet config set environment_timeout unlimited
[root@master20153-centos ~]# cat /etc/puppetlabs/puppet/puppet.conf
[main]
## site.pp ##
    certname = master20153-centos
    server = master20153-centos
    user  = pe-puppet
    group = pe-puppet
    archive_files = true
    archive_file_server = master20153-centos
    module_groups = base+pe_only
    environmentpath = /etc/puppetlabs/code/environments
environment_timeout = unlimited
 
[agent]
    graph = true
 
 
[master]
app_management = false
node_terminus = classifier
storeconfigs = true
storeconfigs_backend = puppetdb
reports = puppetdb
certname = master20153-centos
always_cache_features = true
[root@master20153-centos ~]# service pe-puppetserver restart
Stopping pe-puppetserver:                                  [  OK  ]
Starting pe-puppetserver:                                  [  OK  ]
[root@master20153-centos ~]# puppet agent -t
Info: Using configured environment 'production'
Info: Retrieving pluginfacts
Info: Retrieving plugin
Info: Loading facts
Info: Caching catalog for master20153-centos
Info: Applying configuration version '1449092598'
Notice: Applied catalog in 9.08 seconds
[root@master20153-centos ~]# vim /etc/puppetlabs/code/environments/production/manifests/site.pp
[root@master20153-centos ~]# puppet agent -t
Info: Using configured environment 'production'
Info: Retrieving pluginfacts
Info: Retrieving plugin
Info: Loading facts
Info: Caching catalog for master20153-centos
Info: Applying configuration version '1449092598'
Notice: Applied catalog in 9.76 seconds
[root@master20153-centos ~]# service pe-puppetserver restart; puppet agent -t
Stopping pe-puppetserver:                                  [  OK  ]
Starting pe-puppetserver:                                  [  OK  ]
Info: Using configured environment 'production'
Info: Retrieving pluginfacts
Info: Retrieving plugin
Info: Loading facts
Info: Caching catalog for master20153-centos
Info: Applying configuration version '1449092759'
Notice: test
Notice: /Stage[main]/Main/Node[default]/Notify[test]/message: defined 'message' as 'test'

Comment by Vincent [ 2015/12/03 ]

Jeremy BarlowThanks for your attachment. So I updated my puppetserver to 2.2, using puppetlabs-pc1 yum repo, and now puppetserver dont start (with standard jar and with the one you provided)

This is the error I have :

at clojure.main.main(main.java:37)
at clojure.lang.Var.applyTo(Var.java:700)
at clojure.lang.AFn.applyToHelper(AFn.java:178)
at clojure.lang.Var.invoke(Var.java:409)
at clojure.lang.RestFn.invoke(RestFn.java:512)
at clojure.main$main.doInvoke(main.clj:420)
at clojure.main$main_opt.invoke(main.clj:315)
at clojure.core$apply.invoke(core.clj:624)
at clojure.lang.Var.applyTo(Var.java:700)
at clojure.lang.AFn.applyToHelper(AFn.java:165)
at clojure.lang.Var.invoke(Var.java:394)
at clojure.lang.RestFn.invoke(RestFn.java:457)
at puppetlabs.trapperkeeper.main$_main.doInvoke(main.clj:7)
at clojure.core$apply.invoke(core.clj:624)
at clojure.lang.Var.applyTo(Var.java:700)
at clojure.lang.AFn.applyToHelper(AFn.java:165)
at clojure.lang.Var.invoke(Var.java:394)
at clojure.lang.RestFn.invoke(RestFn.java:457)
at puppetlabs.trapperkeeper.core$main.doInvoke(core.clj:159)
at puppetlabs.trapperkeeper.core$run.invoke(core.clj:144)
at puppetlabs.trapperkeeper.core$boot_with_cli_data.invoke(core.clj:113)
at puppetlabs.trapperkeeper.internal$boot_services_STAR_.invoke(internal.clj:462)
at puppetlabs.trapperkeeper.internal$boot_services_STAR_$fn__5789.invoke(internal.clj:463)
at puppetlabs.trapperkeeper.internal$build_app_STAR_.invoke(internal.clj:426)
at puppetlabs.trapperkeeper.internal$instantiate.invoke(internal.clj:108)
at puppetlabs.trapperkeeper.internal$instantiate$fn__5589.invoke(internal.clj:111)
at puppetlabs.trapperkeeper.internal$handle_prismatic_exception_BANG_.invoke(internal.clj:70)
Exception in thread "main" java.lang.RuntimeException: Service ':AuthorizationService' not found

Comment by Chris Price [ 2015/12/03 ]

Vincent that error sounds like your /etc/puppetlabs/puppetserver/bootstrap.cfg file didn't get updated properly from the new package, maybe? Did you get any warnings from yum on the upgrade?

I suspect that you are missing this line:

https://github.com/puppetlabs/puppet-server/blob/9a2db1144c9f8d5850ec3f64232b2010a289e04f/ezbake/config/bootstrap.cfg#L10

puppetlabs.trapperkeeper.services.authorization.authorization-service/authorization-service

Comment by Vincent [ 2015/12/03 ]

Chris Price Yes this is a misconfiguration by my side, sorry.

Comment by Jeremy Barlow [ 2015/12/03 ]

I spent some more time yesterday doing some tests with the full environment zip file from Vincent. I dumped the files into /etc/puppetlabs/code/environment/performance. I noticed that the zip file did not include the puppetlabs/stdlib module but the apache class that catalog compilation would exercise has a dependency on stdlib. In order to compile a full catalog properly, then, I used the puppet module tool to load version "4.9.0" of puppetlabs/stdlib under the base module directory, /etc/puppetlabs/code/modules.

Since we seem to have substantiated that both the module manifests and site manifest are cached per the environment timeout configuration, it seemed most interesting at this point to compare performance of a Puppet master running under Apache / Passenger vs. Puppet Server with environment caching disabled.

For these tests I again used a CentOS 7 VM with the following configurations:

For the Ruby master under Passenger:

  • Apache 2.4.6
  • Passenger 5.0.21

For Puppet Server:

  • Puppet Server 2.2.0
  • JRE 1.7.0_51

For both of the above, I configured core Ruby puppet/facter code from puppet-agent 1.3.1 to be used. I also applied the changes from Thomas Hallgren's PR for PUP-5554 for all test configurations.

I pre-generated a certificate named 'mytrue_vm_test.com' to match that of the node in Vincent's sample. I did one full noop agent run against the server in order to have a yaml facts file for the node be cached under the server's vardir. For the test iterations that I measured, I just did curl calls to the catalog endpoint for the 'mytrue_vm_test.com' node:

curl -sS "https://[myserver]:8140/puppet/v3/catalog/mytrue_vm_test.com?environment=performance" --cacert /etc/puppetlabs/puppet/ssl/certs/ca.pem --cert /etc/puppetlabs/puppet/ssl/certs/mytrue_vm_test.com.pem --key /etc/puppetlabs/puppet/ssl/private_keys/mytrue_vm_test.com.pem

The times measured would include the roundtrip time for the full curl call from the client's perspective. This would add in a little more overhead than what is involved in just the catalog compilation phase on the server. It looked like for both the tests with Passenger and tests with Puppet Server that the amount of time spent in non-catalog compilation work was in the range of 0.3 - 0.4 seconds. I didn't see an appreciable difference between the two. As with prior measurements, I did 50 iterations of the test for each configuration, throwing away the first 10 to account for optimization that occurs the first time through code paths after startup.

Here are the results of the tests:

Configuration invokedynamic JRuby compile.mode Mean time (in seconds)
Passenger (using Ruby 2.0.0p598) N/A N/A 3.2588
Passenger (using Ruby 2.1.7p400) N/A N/A 2.9324
puppetserver false OFF 7.3884
puppetserver true OFF 7.4295
puppetserver false JIT 4.1309
puppetserver true JIT N/A - ERROR

Note that for the combination of compile.mode = JIT and compile.invokedynamic = true, the server wasn't able to complete a single catalog compilation. Each attempt resulted in the following error being thrown on the server:

Puppet Evaluation Error: Error while evaluating a Resource Statement, Evaluation Error: Error while evaluating a Function Call, com.headius.invokebinder.InvalidTransformException: java.lang.IllegalAccessException: no such method: rubyjit.Module$$get_function_info_89777804a596e7525bfd76db55db4cfbbc2c53571301399523.__file__(Module$$get_function_info_89777804a596e7525bfd76db55db4cfbbc2c53571301399523,ThreadContext,IRubyObject,IRubyObject,Block)IRubyObject/invokeStatic at /etc/puppetlabs/code/environments/performance/modules/concat/manifests/fragment.pp:61:16

This error has been reported to the JRuby community - see https://github.com/jruby/jruby/issues/1518. Most references there seem to have been problems with using invokedynamic in conjunction with JDK 8. I didn't see any suggestions for workarounds - other than to not use invokedynamic (leave implicitly set to false). Maybe there's something specific about this code that could be reworked in order for invokedynamic to be viable in this case, but I didn't spend any further time looking into this.

For the above combinations, it does appear that Passenger is still faster in the best case - running under Ruby 2.1.7p400 (~ 2.9234 seconds) - vs. Puppet Server with compile.mode set to JIT (~ 4.1309 seconds). Note that the performance under Puppet Server for the last 10 measurements was a little better than the overall mean, so it may be that for an even longer run that further optimizations would occur.

Last 10 measurements for Passenger / Ruby 2.1.7p400:

parse 40: 2.977790956
parse 41: 2.74109675
parse 42: 2.876411698
parse 43: 2.871555633
parse 44: 2.804396444
parse 45: 3.044006175
parse 46: 2.763741381
parse 47: 2.851122733
parse 48: 2.865767259
parse 49: 2.72697933
parse 50: 2.999128202

Last 10 measurements for Puppet Server 2.2.0 with compile.mode set to JIT:

parse 40: 3.849922537
parse 41: 3.894616278
parse 42: 3.958217466
parse 43: 3.911266052
parse 44: 3.891125976
parse 45: 3.90737945
parse 46: 3.875692769
parse 47: 3.795240027
parse 48: 3.830240376
parse 49: 3.988967762
parse 50: 4.292533626

To comments that Thomas Hallgren made earlier, I suspect that we would see further performance improvements with Puppet Server when running on JDK 8 and JRuby 9.0.0.X. Note that for the Passenger tests that there was some improvement just in moving from Ruby 2.0.0p598 (~ 3.2588 seconds) to Ruby 2.1.7p400 (~ 2.9324 seconds). Note that JRuby 9.0.0.X is a Ruby 2.2-based implementation whereas the JRuby 1.7.20.1 version that we are currently using in Puppet Server is a Ruby 1.9.3-based implementation. At some point, we'll be looking to move Puppet Server up to JRuby 9.0.0.X - no set timeframe yet. Might be good to pick up some of these tests again at that point.

Where compile.mode = JIT seems to offer a noticeable improvement here, though, I'll go ahead and file a separate ticket for us to at least look into making this externally configurable for Puppet Server (even if not necessarily the default mode until further characterization could be done).

Comment by Jeremy Barlow [ 2015/12/03 ]

fyi, I submitted a ticket for making compile.mode be end-user configurable for Puppet Server as SERVER-1052.

Generated at Sat Feb 22 06:12:35 PST 2020 using JIRA 7.7.1#77002-sha1:e75ca93d5574d9409c0630b81c894d9065296414.