[SERVER-2138] Threadsafe Puppet Created: 2018/03/05 Updated: 2020/02/24
|Fix Version/s:||SERVER 6.y|
|Remaining Estimate:||Not Specified|
|Time Spent:||Not Specified|
|Original Estimate:||Not Specified|
|Epic Name:||Threadsafe Puppet|
|Epic Status:||In Progress|
|QA Risk Assessment:||Needs Assessment|
With the introduction of CD4PE's Impact Analysis, agentless device nodes, and compiling code snippets for Bolt, Puppet Server's ability to scale is more important than ever. Much of our current bottleneck comes from the need to only process one request per JRuby instance at a time, leading to the need to maintain a resource-heavy pool of JRuby instances. The JRuby pool also appears to be the largest generator of support escalations for Puppet Server. One of the main benefits of JRuby over MRI Ruby is its ability to use real multi-threading, but the Puppet codebase itself is not threadsafe, so we are completely missing out on this advantage. If we can make the Puppet codebase threadsafe, we unlock a large amount of scalability within our current architecture, while also having an opportunity to clean up some of the largest sources of technical debt.
This would be a win both for customer and developer efficiency and scalability.
|Comment by Henrik Lindberg [ 2019/07/31 ]|
It sound like making the puppet runtime completely threadsafe is not what is needed here if we impose the constraint that any evaluation of puppet logic / compilation of one catalog is only performed on a single thread. I.e. that there can be several compilations on different threads, but for one compilation it is always only a single thread. (I think that is implied, but I am pointing it out as I think the undertaking is otherwise really big).
With that assumption there is still a problem with environment cache. If it is to be shared by parallel threads (doing separate compilations) there will be the need to make both the environment cache logic and all of the loaders thread safe. I think that is a major undertaking and I would recommend re-architecting this so that all loading is made via a message queue - i.e. a single thread is responsible for loading all functions, defines, classes etc. as this removed the need to make all the classes in the loaders framework thread safe plus all "legacy loading" done with "old code". We can probably try with env timeout being 0 but I suspect that the env cache mechanism must be made thread safe in any case.
Yet another hurdle is the logic that makes use of polymorphic dispatch. In order to speed things up it caches all resolutions of class to method. This is currently (IIRC) in a class instance variable (i.e. a global). This will need to be made thread safe. On a fresh system starting compilation on multiple threads at the same time, there would be a lot of contention. One approach other than locking (which may have very negative impact on performance) is to have this cache in the environment (or thread local storage). There are several classes that makes use of polymorphic dispatch and that use caching. (Turning off these caches is an option, but it would then always be slow).
|Comment by Patrick Carlisle [ 2019/08/19 ]|
Yes, this is correct. I don't think there's any reason to try to do more than this. We expect a request to come in on a single thread from jetty, and that thread can be expected to handle only a single task in a single environment.