Uploaded image for project: 'Puppet'
  1. Puppet
  2. PUP-1578

puppetlabs/reboot: Ruby on windows can get into an infinite loop when exiting

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Normal
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: PUP 3.5.0, PUP M
    • Component/s: None
    • Labels:
    • Template:
    • Story Points:
      3
    • Sprint:
      Week 2014-1-29 to 2014-2-05

      Description

      Recently, Windows acceptance VMs have started running more slowly due to infrastructure issues. This uncovered a ruby bug that causes the reboot module to not reboot the system. Instead the ruby thread that tries to shutdown the ruby VM, gets into an infinite loop, pinning one core of the VM, and the machine never reboots.

      This issue has been seen internally https://jenkins.puppetlabs.com/view/Modules%20-%20Platform%20Owned/job/Puppet-Module-Reboot-Acceptance/ and externally https://github.com/puppetlabs/puppetlabs-reboot/issues/19

      I have been able to reproduce the problem using pure ruby. It seems to be triggered when Kernel.spawn and using an IO.pipe to allow the parent process to write to the child's stdin. Eliminating the IO.pipe seems to avoid this particular ruby bug.

      I captured a crash dump of the main ruby.exe process, and verified there is a thread in an infinite loop:

      0:000> !runaway
       User Mode Time
        Thread       Time
         0:a14       0 days 0:05:53.451
         1:89c       0 days 0:00:00.405
         4:8d0       0 days 0:00:00.000
         3:a50       0 days 0:00:00.000
      

      where thread a14 is trying to terminate all threads and shutdown the VM:

      0:000> kb
      ChildEBP RetAddr  Args to Child              
      0028fc38 76d315e9 00000001 0028fc88 00000001 ntdll!NtWaitForMultipleObjects+0x15
      0028fcd4 75b71a2c 0028fc88 0028fcfc 00000000 KERNELBASE!WaitForMultipleObjectsEx+0x100
      0028fd1c 75b74220 00000001 7efde000 00000000 kernel32!WaitForMultipleObjectsExImplementation+0xe0
      0028fd38 62e5f913 00000001 0028fe1c 00000000 kernel32!WaitForMultipleObjects+0x18
      WARNING: Stack unwind information not available. Following frames may be wrong.
      0028fda8 62e6288b 00000000 62e5e128 0062c668 msvcrt_ruby191!rb_mutex_unlock+0x152b
      0028fe38 62d425d9 0062c78c 00000000 0028fec8 msvcrt_ruby191!rb_thread_terminate_all+0x187
      0028fec8 62d42710 00000006 006223c8 0062235c msvcrt_ruby191!ruby_cleanup+0x2c1
      0028fef8 0040136f 015ad9b8 00622b70 0028ff28 msvcrt_ruby191!ruby_run_node+0x50
      0028ff28 004010b9 00000005 00622b70 006214e8 ruby+0x136f
      0028ff68 00401284 00000001 00000000 00000000 ruby+0x10b9
      0028ff88 75b733aa 7efde000 0028ffd4 774b9f72 ruby+0x1284
      0028ff94 774b9f72 7efde000 74b8f904 00000000 kernel32!BaseThreadInitThunk+0xe
      0028ffd4 774b9f45 0040126c 7efde000 ffffffff ntdll!__RtlUserThreadStart+0x70
      0028ffec 00000000 0040126c 7efde000 00000000 ntdll!_RtlUserThreadStart+0x1b
      

      The call to WaitForMultipleObjects has the following parameters:

      DWORD nCount = 00000001
      HANDLE *lpHandles = 0028fe1c
      BOOL bWaitAll = 00000000
      DWORD dwMilliseconds = ffffffff
      

      Address 0028fe1c contains the value 1c8 corresponding to the handle value, and timeout ffffffff corresponds to INFINITE. We see that handle 1c8 is not locked:

      0:000> !handle 1c8 f
      Handle 000001c8
        Type         	Mutant
        Attributes   	0
        GrantedAccess	0x1f0001:
               Delete,ReadControl,WriteDac,WriteOwner,Synch
               QueryState
        HandleCount  	2
        PointerCount 	4
        Name         	<none>
        Object specific information
          Mutex is Free
          Mutant Owner 0.0
      

      So I think that there is a cycle in the thread's mutex list causing it to repeatedly try to unlock the already unlocked mutex:

      void
      rb_threadptr_unlock_all_locking_mutexes(rb_thread_t *th)
      {
          const char *err;
          rb_mutex_t *mutex;
          rb_mutex_t *mutexes = th->keeping_mutexes;
       
          while (mutexes) {
      	mutex = mutexes;
      	/* rb_warn("mutex #<%p> remains to be locked by terminated thread",
      		mutexes); */
      	mutexes = mutex->next_mutex;
      	err = rb_mutex_unlock_th(mutex, th);
      	if (err) rb_bug("invalid keeping_mutexes: %s", err);
          }
      }
      

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              ethan Ethan Brown
              Reporter:
              josh Josh Cooper
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved:

                  Zendesk Support