[PUP-3064] Prevent race condition in Windows service code from prematurely exiting win32-service Service_Main Created: 2014/08/14 Updated: 2020/03/04
|Labels:||daemon, platform-os, windows|
|Remaining Estimate:||Not Specified|
|Time Spent:||Not Specified|
|Original Estimate:||Not Specified|
A workaround has been put in place as part of
This should resolve deadlocks experienced when terminating the service.
However, the reason that this works is due to the fact that Service_Main now contains an ensure block that makes certain to call SetTheServiceStatus.call(SERVICE_STOPPED, NO_ERROR, 0, 0) if the thread prematurely aborts or an exception is otherwise generated.
In local testing under windbg, as soon as the SCM signals the service (inside of Service_CtrlEx) that it has requested a stop, it appears that the Service_Main completely aborts, despite the fact that it is currently waiting on while(WaitForSingleObject(@@hStopEvent, 1000) != WAIT_OBJECT_0) do.
It should continue to wait on while(WaitForSingleObject(@@hStopCompletedEvent, 1000) != WAIT_OBJECT_0) do, which allows for the service code to perform proper clean up, but this never happens.
Instead the service stops abruptly without following the orchestration set forward by the signaled events.
For our particular code, this might not be an issue – but should the daemon code ever become more complicated, it's possible that resources may not be freed and the situation could escalate quickly.
Some ideas of what to look at with this issue:
Other helpful tips:
|Comment by Ethan Brown [ 2014/08/20 ]|
During the last observation of the service code termination, the following threads were observed prior to the SCM stop request, and then were observed after the stop request was initiated, the order tracked as they shut down.
Those labelled with TOAST were active threads prior to the STOP request, but no longer running once threads began to exit. Those labelled DEAD were killed in the order given - there are notes about how each thread maps to the Ruby code, and any additional info. Therefore, this can be used as a key in any additional debugging. The important thing to note is that the thread exit order here is not what is expected.
TOAST - f3c
e7c - DEAD - 1
94c - DEAD - 2
edc – ThreadProc, which should be native thread based on using CreateThread to create it - DEAD - 3
850 - DEAD - 4
ea0 - thr object busyloop inside of Ruby thread in mainloop - DEAD - 5
8cc - DEAD - 6
ee4 – mainloop, owner of events, and thread handle - DEAD - 7
9fc – line 215 of Service_Main – part that ends up dying prematurely... DEAD - 8