Details
-
New Feature
-
Status: Closed
-
Normal
-
Resolution: Fixed
-
None
Description
When a group of agents start their puppet runs together they form a "thundering herd" which can exceed server resources. This results in a growing backlog of requests from puppet agents that are waiting for a JRuby instance to become free. If this backlog exceeds the size of the Jetty thread pool, other requests such as status checks will start timing out. The agent herd will tend to persist until a human manually remediates the situation using a rolling restart to space out the agents involved.
Puppet Server should send a signal to agents when it is over capacity that indicates they should back off for a random period of time before resuming requests. This would allow a thundering herd to be automatically re-splayed without human intervention.
Attachments
Issue Links
- relates to
-
SERVER-2405 Skip enforcement of max-queued-requests if agent version is too old
-
- Resolved
-
-
SERVER-2025 Track the number of times max-queued-requests is exceeded as a metric
-
- Closed
-
-
PUP-3454 RFE: Allow master to hint when agent should reconnect
-
- Closed
-
-
PUP-7451 Puppet HTTP client should respect Retry-After headers
-
- Closed
-