Affects Version/s: None
Fix Version/s: None
Epic Name:Gatling Automation - Foundation
At one point, we were fairly close to having a reasonably automated environment for running Gatling tests against the Puppet Master / Puppet Server. The jenkins jobs around this would give us a means to do performance testing (e.g. to visualize historical trends in performance and watch for improvements / regressions), scale testing, and long-running tests to help catch memory leaks and other issues.
The git repo that contains the code for this is here:
There is also a jenkins plugin that can be used to graph puppet-specific data here:
These repos have not been maintained and thus they fell out of sync with the state of the products. Given how frequently we find ourselves needing to kick off a gatling test to validate some feature or another, resuscitating these repos and getting the Gatling stuff back into a usable automated state would be a huge win. Gatling gives us very comprehensive data about performance during the run, and allows us to simulate much, much higher load than other perf testing options, without requiring unreasonable amounts of hardware. We can run on bare-metal in the office, rather than being forced to use cloud virtualization platforms whose variability can interfere with the accuracy of performance testing data.
Here are some things that have changed since the last time our gatling operation repos were fully operational:
- We had to upgrade to Gatling 2.0 because Gatling 1.x didn't work well with the agent "keep-alive" changes introduced Puppet 3.7.
- Puppet Server did not exist at the time, so the beaker setup scripts in those repos are almost certainly out of date in terms of getting a server installed on an SUT.
- PL was not using JJB at the time, so the updated versions of all of this stuff should probably take advantage of JJB in order to get the jenkins jobs set up.
- We were focused pretty much exclusively on fairly short-term (hours, not days) tests at the time. Lately we've had just as much - if not more - need for long-running (weeks) tests to validate memory use over time. So we'll need additional flavors of jobs.
- "Directory Environments" did not exist - so now we'll need to keep them in mind when we are considering where to install modules.
In an ideal world, when this work is done, we'd have (at least) the following:
- Some perf regression test jobs, which run nightly or weekly, against both OSS and PE. For each of these jobs, the steps would be:
1. Provision a bare-metal box for the test (using cobbler; we have three low-end blades dedicated for this purpose)
2. Install the latest version of Puppet Server / PE
3. Install all modules / environments / hiera stuff for whatever our baseline gatling simulation looks like
4. Classify all simulated agent names as necessary to ensure we trigger the correct catalog compilation
5. Do an automated gatling recording of an agent run, for each of the agent types that the simulation includes
6. Kick off the actual gatling run
7. Record results and make them visible in the jenkins plugin's graph.
Step 5 is important, because when new versions of PE come out, if they have changes to the installer modules that include new or different file resources, then the HTTP requests made by an agent will change accordingly, so older gatling recordings from previous release will no longer be accurate. However, in reality, automating step 5 is going to be challenging, and may require some PRs against upstream gatling. So, in the meantime, we'll just be doing step 5 manually at whatever frequency we decide is reasonable, and automating everything else.
- Ability to kick off one-off long-running tests. Steps here would be:
1. User goes to jenkins and clicks "build with parameters" on this job. they are prompted to enter the following as inputs:
a. OSS or PE
b. A build version number that matches a build that exists on builds.puppetlabs.net
c. id of a scenario configuration from gatling-puppet-load-test
2. Run begins... provision bare metal box
3. Install modules / code
4. Classify all simulated nodes
5. Automated gatling recording for each simulated agent (in The Future, we'll do manually for now as described above)
6. Set up some additional tooling on the SUT to track OS-level info like resident memory usage, and ideally graph that data somewhere at some interval
7. Kick off the actual gatling run
8. Record results somewhere (tbd)
One of the challenges of doing development work against this code has been getting a reasonable development environment set up. To this end, Joe Pinsonault has laid some great foundation here:
Hopefully this will make it much simpler for other devs to jump in and contribute to the project, and reconcile our old jenkins-integration code with the realities of the latest products.