Details
Description
Recently, Windows acceptance VMs have started running more slowly due to infrastructure issues. This uncovered a ruby bug that causes the reboot module to not reboot the system. Instead the ruby thread that tries to shutdown the ruby VM, gets into an infinite loop, pinning one core of the VM, and the machine never reboots.
This issue has been seen internally https://jenkins.puppetlabs.com/view/Modules%20-%20Platform%20Owned/job/Puppet-Module-Reboot-Acceptance/ and externally https://github.com/puppetlabs/puppetlabs-reboot/issues/19
I have been able to reproduce the problem using pure ruby. It seems to be triggered when Kernel.spawn and using an IO.pipe to allow the parent process to write to the child's stdin. Eliminating the IO.pipe seems to avoid this particular ruby bug.
I captured a crash dump of the main ruby.exe process, and verified there is a thread in an infinite loop:
0:000> !runaway
|
User Mode Time
|
Thread Time
|
0:a14 0 days 0:05:53.451
|
1:89c 0 days 0:00:00.405
|
4:8d0 0 days 0:00:00.000
|
3:a50 0 days 0:00:00.000
|
where thread a14 is trying to terminate all threads and shutdown the VM:
0:000> kb
|
ChildEBP RetAddr Args to Child
|
0028fc38 76d315e9 00000001 0028fc88 00000001 ntdll!NtWaitForMultipleObjects+0x15
|
0028fcd4 75b71a2c 0028fc88 0028fcfc 00000000 KERNELBASE!WaitForMultipleObjectsEx+0x100
|
0028fd1c 75b74220 00000001 7efde000 00000000 kernel32!WaitForMultipleObjectsExImplementation+0xe0
|
0028fd38 62e5f913 00000001 0028fe1c 00000000 kernel32!WaitForMultipleObjects+0x18
|
WARNING: Stack unwind information not available. Following frames may be wrong.
|
0028fda8 62e6288b 00000000 62e5e128 0062c668 msvcrt_ruby191!rb_mutex_unlock+0x152b
|
0028fe38 62d425d9 0062c78c 00000000 0028fec8 msvcrt_ruby191!rb_thread_terminate_all+0x187
|
0028fec8 62d42710 00000006 006223c8 0062235c msvcrt_ruby191!ruby_cleanup+0x2c1
|
0028fef8 0040136f 015ad9b8 00622b70 0028ff28 msvcrt_ruby191!ruby_run_node+0x50
|
0028ff28 004010b9 00000005 00622b70 006214e8 ruby+0x136f
|
0028ff68 00401284 00000001 00000000 00000000 ruby+0x10b9
|
0028ff88 75b733aa 7efde000 0028ffd4 774b9f72 ruby+0x1284
|
0028ff94 774b9f72 7efde000 74b8f904 00000000 kernel32!BaseThreadInitThunk+0xe
|
0028ffd4 774b9f45 0040126c 7efde000 ffffffff ntdll!__RtlUserThreadStart+0x70
|
0028ffec 00000000 0040126c 7efde000 00000000 ntdll!_RtlUserThreadStart+0x1b
|
The call to WaitForMultipleObjects has the following parameters:
DWORD nCount = 00000001
|
HANDLE *lpHandles = 0028fe1c
|
BOOL bWaitAll = 00000000
|
DWORD dwMilliseconds = ffffffff
|
Address 0028fe1c contains the value 1c8 corresponding to the handle value, and timeout ffffffff corresponds to INFINITE. We see that handle 1c8 is not locked:
0:000> !handle 1c8 f
|
Handle 000001c8
|
Type Mutant
|
Attributes 0
|
GrantedAccess 0x1f0001:
|
Delete,ReadControl,WriteDac,WriteOwner,Synch
|
QueryState
|
HandleCount 2
|
PointerCount 4
|
Name <none>
|
Object specific information
|
Mutex is Free
|
Mutant Owner 0.0
|
So I think that there is a cycle in the thread's mutex list causing it to repeatedly try to unlock the already unlocked mutex:
void
|
rb_threadptr_unlock_all_locking_mutexes(rb_thread_t *th)
|
{
|
const char *err;
|
rb_mutex_t *mutex;
|
rb_mutex_t *mutexes = th->keeping_mutexes;
|
|
while (mutexes) {
|
mutex = mutexes;
|
/* rb_warn("mutex #<%p> remains to be locked by terminated thread",
|
mutexes); */
|
mutexes = mutex->next_mutex;
|
err = rb_mutex_unlock_th(mutex, th);
|
if (err) rb_bug("invalid keeping_mutexes: %s", err);
|
}
|
}
|
Attachments
Issue Links
- blocks
-
MODULES-448 puppetlabs/reboot: Publish new reboot module
-
- Closed
-
- is duplicated by
-
PUP-1110 [BUG] Reboot Module - When finished prevents puppet from finishing
-
- Resolved
-
- links to