[SERVER-1735] Performance testing with http client metrics enabled Created: 2017/02/28 Updated: 2017/06/28 Resolved: 2017/06/28
|Fix Version/s:||SERVER 5.0.0|
|Reporter:||Ruth Linehan||Assignee:||Ruth Linehan|
|Remaining Estimate:||Not Specified|
|Time Spent:||Not Specified|
|Original Estimate:||Not Specified|
|Epic Link:||Http client metrics|
|Sprint:||Server 2017-04-05, Server 2017-04-19, Server 2017-05-03, Server 2017-05-31|
|Release Notes:||Not Needed|
|QA Risk Assessment:||No Action|
Run a gatling A/B performance test to ensure that adding http client metrics hasn't dramatically affected performance.
|Comment by Ruth Linehan [ 2017/05/24 ]|
For this ticket I did two types of testing, perf testing with Gatling and memory testing using a curl script.
For the performance testing I set up Gatling jobs with http-client-metrics enabled and disabled and ran them on our perf hardware. The two jobs are described in this branch: https://github.com/rlinehan/gatling-puppet-load-test/tree/SERVER-1735-http-client-metrics
The job reports can be seen on http://puppetserver-perf-driver68-dev.delivery.puppetlabs.net:8080/job/http-client-metrics/. I did two runs through the whole suite, which contained 500, 1000, and 1250 agents. Overall, in each scenario the mean agent response time between the runs with metrics enabled and metrics disabled was within a standard deviation of each other. Thus, it seems that having the metrics enabled doesn't affect performance.
In addition, I did some memory testing by running a curl script after installing Puppet Server + PuppetDB on two separate vmpooler instances. One one I had http client metrics enabled, on the other metrics were disabled.
On each machine, I ran the following script, which curled the master's /puppet/v3/catalog endpoint with a different agent each time. After 500 agents it dumped the output of the status service to a json file. This was run 150 times, for a total of 75,000 catalog requests.
(Note that for this to work you need to change the trapperkeeper auth.conf to make the /puppet/v3/catalog directive more permissive, e.g.)
Each catalog request in this script with a different agent generates 3 http-client metrics: 1) with-metric-id puppetdb.facts.find.<certname>, 2) with-url pdb/query/v4/nodes/<certname>/facts, 3) with-url-and-method pdb/query/v4/nodes/certname/facts.GET. (Note that were the catalog to compile, additional url metrics - e.g. sending the catalog to puppetdb - would be generated. However, in these requests the catalog did not actually compile.)
In an actual agent run, 9 http client metrics are generated per certname: 5 with-metric-id, 2 with-url, and 2 with-url-and-method
[:classifier :nodes :<node name>] - POST /v1/classified/nodes/<certname>
(metric ids are in the vectors)
Since each request in my simulation created 3 metrics, and I ran 75,000 requests, I was simulating differences in memory usage from http client metrics for 25,000 nodes.
The data I collected can be found in a Google Docs spreadsheet here: https://docs.google.com/spreadsheets/d/1Yun7uuxMRGUl0T19MrXHuXNNHzchWTM73BMu94KtgXU
Ultimately, this data shows a couple things: 1) there's about an increase of around 300-500 MB in heap memory used when http client metrics are enabled, 2) there's an increase in GC total time of about 22%, 3) GC CPU averaged over the second half of runs (averaging over the second half since there was a lot of variation in the first half) is 9% when http client metrics are enabled and 6% when disabled.
Initially, I thought that since we don't have any real use for the with-url metrics in puppetserver (since we use the metric id metrics), perhaps it would make sense to add the ability to disable the automatic creation of those to the http client library. However, unfortunately that turned out to be slightly more difficult than I expected (still doable, but not the 2 hours I was hoping).
Furthermore, eliminating those would eliminate 4 of the 9 http client metrics we currently create per-certname. Another option for reducing the number of metrics we create would be to not include the certname in any of the metric ids. Furthermore, in addition to these metrics we create quite a few other per-certname, or per-resource, per-puppetdb query metrics via the puppet profiler metrics. It might be best for us to do an audit of all of the profiler metrics we currently provide an whittle some of those out.
|Comment by Ruth Linehan [ 2017/06/01 ]|
I did some further memory testing and looked at the heap dumps with YourKit to see whether the additional memory used when http client metrics are turned on will ultimately get cleaned up when there's enough memory pressure - i.e. is it from strong references, weak references, etc.
With the same setup as above, I ran the following script:
(I needed to modify the puppet user to give it a login shell first).
The hprofs generated can be found here, with enabled.hprof being when http client metrics were enabled.
With http client metrics enabled, 587 MB was used, with 496 MB (shallow size) / 584 MB (retained size) reachable via strong references.
With http client metrics disabled, 387 MB was used, with 293 MB (shallow size) / 383 MB (retained size) reachable via strong references.
This is unfortunately a pretty sizeable difference , and looking at the difference between the two, the additional objects present do seem to all come from metrics (rather than some fluke in the run).
In order to combat this additional memory usage and leave http client metrics enabled my default, I have filed 4 tickets. The first two should be handled for Puppet Server 5, the second two are later improvements: