Set max-requests-per-instance to 0 ( disabled ).
As a result of exploring a customer support escalation (PE-13809), we discovered that there is a major memory leak caused by the hiera-eyaml gem.
The situation is this:
1. Adding eyaml to your configured list of hiera backends causes the hiera-eyaml code to be loaded
2. The hiera-eyaml code has a require that loads the highline namespace from the Highline gem
3. Loading the highline namespace has a side effect of creating a global variable called $terminal, which, when run under JRuby, causes an object called ConsoleReader (from the JLine API) to be constructed.
4. The constructor for the JLine ConsoleReader spawns a Java Thread in the background. The ConsoleReader API expects for you to call .shutdown to clean up this thread when you are done using it.
5. The Highline library never attempts to clean up the ConsoleReader, so this Thread just keeps running indefinitely.
6. In the JVM, Threads are GC roots, which means that anything referenced by the Thread is ineligible for GC. The thread has a reference to the ConsoleReader.
7. A JRuby instance flush is triggered in Puppet Server, either via the pool flush API, or because we hit the value of max-requests-per-instance.
8. Puppet Server creates a new JRuby instance and attempts to shut down the old one.
9. It appears that JRuby is unable to clean up the old instance properly, because it still sees the reference to the ConsoleReader via the highline global $terminal variable, and something about the fact that the ConsoleReader is ineligible for GC prevents the JRuby instance from being cleaned up.
10. We now have two JRuby instances in memory, both using large amounts of RAM, one of which is no longer referenced or used by Puppet Server, but is ineligible for GC.
The most frustrating part of all of this is that highline is really only useful for CLI tools, and should not be necessary for any server-side functionality of hiera-eyaml at all. On the server, we're basically only ever using hiera-eyaml for decryption, and that should not ever need to interact with a TTY / CLI / readline / highline / JLine at all.
Some possible options for fixing the situation:
1. File a PR against hiera-eyaml that prevents highline from ever being "require"d, unless it's actually going to be used. This might be the easiest fix but it's gross because it relies on the knowledge that simply loading highline is going to have a side effect that causes a memory leak.
2. File a PR against highline that gets rid of the global variable, does not construct a ConsoleReader unless it is absolutely necessary, cleans up the ConsoleReader appropriately when it is done with it, or some combination of the above. I'm not certain that it would be possible to make some of these changes without breaking the highline API, because it is a very magical DSL that involves monkey patching major Ruby classes, and may not be possible to implement without a global variable.
3. Supposedly the next major version of highline (2.0) is refactored to no longer use JLine at all. It hasn't been officially released yet, and I don't know if it's API-compatible such that it could be consumed by hiera-eyaml w/o changes to hiera-eyaml as well. Further, I don't know that whatever replacement they have for JLine won't have the same issue with creation of and poor management of background threads, so we'd need to investigate that.
We believe that this issue came up for this customer because of the switch to provide a default value for max-requests-per-instance in the latest PE releases. If that is true, this problem is likely to affect all PE users that are using hiera-eyaml, so we need to get a solution in place ASAP.
IMHO we should be very seriously considering the idea of making hiera-eyaml an officially supported thing and moving it into the PL namespace. We're already supporting it, frequently, whether we say it's supported or not. Pretending that it's not supported only makes it more difficult for us to actually fix issues when they arise.