[PA-1743] puppet-agent stuck on sched_yield Created: 2017/12/15 Updated: 2018/02/14 Resolved: 2018/02/12
|Affects Version/s:||puppet-agent 5.3.3|
|Fix Version/s:||puppet-agent 5.3.5, puppet-agent 5.4.0|
|Remaining Estimate:||Not Specified|
|Time Spent:||Not Specified|
|Original Estimate:||Not Specified|
|Sprint:||Platform OS Kanban|
|Method Found:||Needs Assessment|
|Release Notes:||Bug Fix|
|Release Notes Summary:||Apply an upstream Ruby patch to resolve a lockup in Exec resources|
|QA Risk Assessment:||Needs Assessment|
Using PA 5.3.3, we are experiencing the sched_yield busyloop documented here:
Example showing excessive CPU time, and process which has been around for 11 days:
root 27413 1 99 Dec04 ? 11-01:52:23 puppet agent: applying configuration
|Comment by Phil Oester [ 2017/12/20 ]|
Also, the bug report at upstream ruby:
|Comment by Wiebe Verweij [ 2017/12/22 ]|
We are experiencing the same issue on Debian Stretch, also with Puppet 5.3.3. It happens about 2 to 4 times a week and everytime on a random machine. The only error we see in the puppet logs when this happens is an exceeded timeout error for an exec command.
I can reproduce it with the sched_yield_loop.rb script attached from the ruby issue tracker and the following command:
This commit should fix it but it looks like it wont be included until Ruby 2.5. Maybe it could be backported to the version shipped with puppet?
|Comment by Matthias Baur [ 2018/01/17 ]|
Having the same issue. Any update on this?
OS: Gentoo/Ubuntu (14|16).04
|Comment by Matthias Baur [ 2018/01/22 ]|
As we're also using the Puppet Ruby for r10k, this also effects our Deployments. Even worse, i think it also effects the Puppetserver as we're currently seeing strange performance degradation from time to time.
|Comment by Kenn Hussey [ 2018/02/12 ]|
Branan Riley please add release notes for this issue, if needed. Thanks!