Details
-
New Feature
-
Status: Closed
-
Normal
-
Resolution: Fixed
-
None
-
None
-
Platform Core
-
1
-
Platform Core KANBAN
-
Major
-
4 - 50-90% of Customers
-
3 - Serious
-
4 - $$$$$
-
-
New Feature
-
-
Automate
-
tests added with code change
Description
On occasion, a puppet agent can end up waiting indefinitely on some process that will never return or terminate. Some examples:
- A HTTP connection that was established, but then broken (http_read_timeout defaults to infinity...)
- An I/O read that won't return (for example, networked file system that got interrupted)
- A subprocess that was executed without a timeout and that isn't going to return.
- Module or other plugin code that is susceptible to hangs and contains no defensive timeout or other guard logic.
When this situation occurs, further agent runs will be blocked as the hung run will be holding onto required catalog locks. Often, manual remediation is required to re-start the hung agents.
In situations where hangs can occur often due to transient environment issues (such as flaky networks), it would be useful for the Puppet Daemon to have logic for automatically determining when a hung run should be terminated so that a new one can be started.
For example: if the previous run has been holding the catalog lock for longer than n times the run_interval, kill it start a new one.
Attachments
Issue Links
- is duplicated by
-
PUP-8680 Puppet should supervise and kill its applying child if it gets stuck
-
- Closed
-
- relates to
-
PUP-6387 Puppet agent hung waiting on read from puppetmaster
-
- Resolved
-
-
PUP-1965 Clients are hung when server has intermittent service
-
- Closed
-
-
PUP-4415 hung OS processes stall puppet (agent|apply) runs on client
-
- Closed
-
-
PUP-1427 a metaparameter to control the timeout of that resource evaluation/application, along with a default value.
-
- Accepted
-
-
MODULES-4605 add timeout option for keytool commands
-
- Closed
-
-
PUP-3238 puppet reports "end of file reached" if server closes HTTP connection
-
- Closed
-