In our setup, rather than just having a single Puppet environment "production" we have multiple environments that ultimately correspond to VCS tags, and machines are moved from environment to environment to promote them to the desired version. Old environments are deleted to leave the latest N available.
Because an environment derived from a tag should never (rarely) change, we have historically set the `environment_timeout` setting to `unlimited` for any such environment, leaving it with a default of `0` for development-type environments that change frequently.
We have noticed that Puppet doesn't seem to bound the environment cache in any way or expire anything such that if you create many environments all with unlimited caching, Puppet uses all of the heap trying to satisfy that, the symptom is the GC log contains nothing but "Full GC" messages and the performance drops to a crawl.
Historically before I figured out the cause, restarting the Puppet server would restore performance (as it was clearing the cache out) and now I have verified that using the `https://puppet:8140/puppet-admin-api/v1/environment-cache` API endpoint has the same effect.
Can the environment cache size be visualized somehow with current metrics?
That would be useful in some respects, however IMO the environment cache should also be bounded to prevent it growing too big WRT the heap and maybe also operate some sort of LRU expiry such that space can be reclaimed by dropping the least-used environment(s) regardless of the timeout setting. Having extra metrics available to indicate the cache is either too small or is being thrashed by constant purging (f.ex. the cache can fit five environments but you have six live, etc.) would help here.