[PUP-1368] Puppet on Windows segfaulting Created: 2014/01/04  Updated: 2019/04/04  Resolved: 2014/02/04

Status: Closed
Project: Puppet
Component/s: Types and Providers
Affects Version/s: PUP 3.3.1, PUP 3.4.1
Fix Version/s: PUP 3.4.3, PUP 3.5.0

Type: Bug Priority: Major
Reporter: Geek Man Assignee: Kylo Ginsberg
Resolution: Fixed Votes: 0
Labels: windows
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Windows 2008 R2 64-bit, Windows 2008 R2 SP1 64-bit
One agent with Puppet 3.3.1 and another with 3.4.1


Attachments: Text File puppet-segfault-output-stderr.txt     Text File puppet-test-manifest.txt    
Issue Links:
Blocks
Relates
relates to PUP-1389 Windows File: Error: ReplaceFile is u... Closed
Template:
Story Points: 3
Sprint: Week 2014-1-15 to 2014-1-21, Week 2014-1-22 to 2014-1-29, Week 2014-1-29 to 2014-2-05

 Description   

In the past day I've implemented some scheduled tasks which invoke an MSSQL stored procedure via SQLCMD. Today I've found that these scheduled tasks are causing a segfault on any agent which they run.

In almost all tests, I've been able to run the agent via "puppet agent --test --debug --trace". The first run will correctly create the scheduled task; but the second run will produce a segfault of the puppet agent, causing the catalog run to never finish. Sometimes a third run was needed to get a segfault.

The affected nodes so far are both Windows 2008 R2, but running different sets of updates (eg one is not SP1). Both were originally running Puppet 3.3.1, but I upgraded one of them to 3.4.1, which no effect.

I've created a new 'test' module to simplify reproducing the bug. To test this bug I am simply adding "include test" to the node definition for chosen nodes.

I've included the test module manifest for reproduction, as well as the stack strace I get when the segfault occurs.



 Comments   
Comment by Peter Huene [ 2014/01/27 ]

Waiting on upstream win32-api gem maintainer to see if we can just get a new version of the gem instead of having to patch our copy.

Comment by Josh Cooper [ 2014/01/29 ]

With application verifier and gflags setup, I'm seeing a very similar crash:

scheduled_task { 'Test':
  ensure => present,
  enabled => true,
  command => 'C:\Program Files\Microsoft SQL Server\100\Tools\Binn\SQLCMD.exe',
  arguments => "-Q \"EXECUTE [master].[dbo].[DatabaseBackup] @Databases = 'ALL_DATABASES', @Directory = N'C:\\backups\\sql'\" -b -o C:\\backups\\sql\\logs\\DatabaseBackup.txt",
  trigger => {
    schedule => daily,
    start_date => '2013-12-27',
    start_time => '20:00',
  }
}

And taking care not to use an older win32-api gem installed via bundler:

C:\work\puppet>ext\envpuppet.bat puppet apply ..\manifests\task.pp

I get:

0:000:x86> kb
ChildEBP RetAddr  Args to Child              
WARNING: Stack unwind information not available. Following frames may be wrong.
0028ebb0 775d3c4a 0afe5670 0b2ae428 02c60150 ntdll32!RtlSetUserValueHeap+0x278
0028ec9c 775d3cee 00000054 00000068 0afe560a ntdll32!RtlImageNtHeader+0xac6
0028ed20 77670d8b 02c60000 1100110b 00000054 ntdll32!RtlImageNtHeader+0xb6a
0028ed6c 7762ae7e 02c60000 1100110b 00000054 ntdll32!RtlpNtEnumerateSubKey+0x41ed
0028ee50 775d3cee 00000054 00000000 00000000 ntdll32!RtlUlonglongByteSwap+0xda2e
*** ERROR: Symbol file could not be found.  Defaulted to export symbols for C:\Windows\syswow64\verifier.dll - 
0028eed4 696da682 02c60000 0100100a 00000054 ntdll32!RtlImageNtHeader+0xb6a
0028eef8 696d8f6e 05781000 02c60000 0100100a verifier_696d0000!VerifierDisableFaultInjectionExclusionRange+0x4732
0028ef2c 77670d06 05780000 0100100a 0000002c verifier_696d0000!VerifierDisableFaultInjectionExclusionRange+0x301e
0028ef78 7762ae7e 05780000 0100100a 0000002c ntdll32!RtlpNtEnumerateSubKey+0x4168
0028f05c 775d3cee 0000002c 00000000 00000000 ntdll32!RtlUlonglongByteSwap+0xda2e
0028f0e0 696ecb62 05780000 0100100a 0000002c ntdll32!RtlImageNtHeader+0xb6a
*** ERROR: Symbol file could not be found.  Defaulted to export symbols for C:\Windows\syswow64\msvcrt.dll - 
0028f170 7522ade8 05780000 00000008 0000002c verifier_696d0000!VerifierGetProviderHelper+0xd302
0028f1b0 7522c470 0000002c 0000002c 0028f1c8 msvcrt!towlower+0x96
0028f1cc 696ed39f 00000001 0000002c 6cdb9304 msvcrt!calloc+0x1a
*** ERROR: Symbol file could not be found.  Defaulted to export symbols for c:\work\puppet-win32-ruby\ruby\bin\msvcrt-ruby191.dll - 
0028f208 62d55858 00000001 0000002c 0028f2a8 verifier_696d0000!VerifierGetProviderHelper+0xdb3f
0028f228 62df68d7 0000000b 00000004 00902805 msvcrt_ruby191!ruby_xcalloc+0x60
0028f248 62e31488 0799d738 09a26d38 0028f288 msvcrt_ruby191!st_init_numtable+0x43
*** ERROR: Symbol file could not be found.  Defaulted to export symbols for c:\work\puppet-win32-ruby\ruby\lib\ruby\gems\1.9.1\gems\win32-api-1.4.8-x86-mingw32\lib\win32\ruby19\win32\api.so - 
0028f2a8 669415d3 09a26d38 00006539 09a26de0 msvcrt_ruby191!rb_ivar_set+0x110
0028f2f8 62e4a355 00000003 05786384 09a26d38 api+0x15d3
0028f368 62e5515e 00000003 05786384 0028f3f8 msvcrt_ruby191!rb_vm_mark+0x54d
0028f3f8 62d42dbe 00000003 05786384 079f82f8 msvcrt_ruby191!rb_vm_invoke_proc+0x2fe
0028f418 62d8d9b6 09a26d38 00000003 05786384 msvcrt_ruby191!rb_obj_call_init+0x42
0028f448 62e4a355 00000003 05786384 07a69718 msvcrt_ruby191!rb_class_new_instance+0x2a
0028f4b8 62e5906a 00000003 05786384 00000000 msvcrt_ruby191!rb_vm_mark+0x54d
0028f538 62e4f980 00000000 00000000 000003f0 msvcrt_ruby191!rb_vm_call+0x2b2
0028f5e8 62e53e99 0028f674 0028f64c 0b27aae8 msvcrt_ruby191!rb_vm_localjump_error+0x1f84
0028f6d8 62e54a7a 696ecbb6 75229d45 05780000 msvcrt_ruby191!rb_vm_localjump_error+0x649d
0028f778 62e59b0f 00000001 0028f7a0 00000000 msvcrt_ruby191!rb_eval_string+0x246
0028f798 62d061ea 07ddfdf8 05786324 0028f7c8 msvcrt_ruby191!rb_yield+0x43
0028f7c8 62e5906a 099facf8 05786314 00000018 msvcrt_ruby191!rb_ary_each+0x46
0028f848 62e4f980 05805908 00000000 000001a8 msvcrt_ruby191!rb_vm_call+0x2b2
0028f8f8 62e53e99 0028f984 6b7c5583 0b09d630 msvcrt_ruby191!rb_vm_localjump_error+0x1f84
0028f9e8 62e55299 00000001 00000000 00000000 msvcrt_ruby191!rb_vm_localjump_error+0x649d
0028fa78 62d472e4 00000001 05786270 0b725058 msvcrt_ruby191!rb_vm_invoke_proc+0x439
0028fb08 62d47450 00000001 05786270 09be3200 msvcrt_ruby191!rb_method_call_with_block+0x11c
0028fb48 62e4a355 00000001 05786270 09be3200 msvcrt_ruby191!rb_method_call+0x34
0028fbb8 62e5906a 00000001 05786270 00000000 msvcrt_ruby191!rb_vm_mark+0x54d
0028fc38 62e4f980 00000000 00000000 00001780 msvcrt_ruby191!rb_vm_call+0x2b2
0028fce8 62e53e99 0028fd74 00000010 696ed364 msvcrt_ruby191!rb_vm_localjump_error+0x1f84
0028fdd8 62e5a02b 02c68e90 00000000 0799d230 msvcrt_ruby191!rb_vm_localjump_error+0x649d
0028fe48 62d405a5 078863d0 00000000 ffffffff msvcrt_ruby191!rb_iseq_eval_main+0xd7
0028fec8 62d42708 0028feec 02c63268 02c6337c msvcrt_ruby191!rb_check_frozen+0x2a85
*** ERROR: Module load completed but symbols could not be loaded for image00000000`00400000
0028fef8 0040136f 078863d0 02c63dc8 0028ff28 msvcrt_ruby191!ruby_run_node+0x48
0028ff28 004010b9 00000005 02c63dc8 02c618e8 image00000000_00400000+0x136f
0028ff68 00401284 00000001 00000000 00000000 image00000000_00400000+0x10b9
*** ERROR: Symbol file could not be found.  Defaulted to export symbols for C:\Windows\syswow64\kernel32.dll - 
0028ff88 754033aa 7efde000 0028ffd4 775d9ef2 image00000000_00400000+0x1284
0028ff94 775d9ef2 7efde000 72def870 00000000 kernel32!BaseThreadInitThunk+0x12
0028ffd4 775d9ec5 0040126c 7efde000 ffffffff ntdll32!RtlInitializeExceptionChain+0x63
0028ffec 00000000 0040126c 7efde000 00000000 ntdll32!RtlInitializeExceptionChain+0x36

Comment by Peter Huene [ 2014/01/29 ]

I've discovered the issue (again). There's a bunch of heap corruption bugs in our (old) version of win32-taskscheduler. This particular repro manifest has over 127 characters in the parameter property, causing a buffer overrun when the existing task's parameters are read on subsequent runs.

Here's the annotated code for getting the parameters of a task in the win32-taskscheduler gem:

...
lpVtbl = 0.chr * 4
table  = 0.chr * 144
 
memcpy(lpVtbl, @pITask, 4)
memcpy(table, lpVtbl.unpack('L').first, 144)
table = table.unpack('L*')
 
getParameters = Win32::API::Function.new(table[35], 'PP', 'L')
ptr = 0.chr * 4
hr = getParameters.call(@pITask, ptr)
 
if hr >= S_OK
    str = 0.chr * 256                     <---- Issue #1: 256 *bytes* not wide characters
    wcscpy(str, ptr.unpack('L').first)    <---- Issue #2: unbounded string copy; what are we, C programmers?
    param = wide_to_multi(str)
    CoTaskMemFree(ptr.unpack('L').first)
else
    raise Error, get_last_error
end
 
param

This will require a patch to 0.2.2 version of the gem, since 0.3.0 does not seem to be a drop-in replacement.

Comment by Peter Huene [ 2014/01/29 ]

Proposed fix to gem, wherever there is a wcscpy call:

...
length = wcslen(value) # always has null
str = 0.chr * 2 * (length + 1)
wcsncpy(str, value, length)
....

Comment by Josh Cooper [ 2014/01/30 ]

Patches to win32-api and win32-taskscheduler gems were merged to the 1.9.3 branch of puppet-win32-ruby in commit 4acb2b7.

To verify this patch, obtain a recent puppet MSI build. Confirm the sizes of the .so libraries:

C:\>dir "C:\Program Files (x86)\Puppet Labs\Puppet\sys\ruby\lib\ruby\gems\1.9.1\gems\win32-api-1.4.8-x86-mingw32\lib\win32\ruby19\win32\api.so"
...
01/06/2014  04:17 PM            27,648 api.so

And similarly for the task scheduler

C:\>dir "C:\Program Files (x86)\Puppet Labs\Puppet\sys\ruby\lib\ruby\gems\1.9.1\gems\win32-taskscheduler0.2.2\lib\win32\taskscheduler.rb"
...
01/06/2014  04:17 PM            48,837 taskscheduler.rb

Next apply the following manifest several times, making changes to various properties each time. Note the taskscheduler bug was triggered if the existing task's argument or other string based parameters were longer than 127 characters, such as when puppet tries to compare the existing tasks' properties with the desired values in the manifest.

scheduled_task { 'Test':
  ensure => present,
  enabled => true,
  command => 'C:\Program Files\Microsoft SQL Server\100\Tools\Binn\SQLCMD.exe',
  arguments => "-Q \"EXECUTE [master].[dbo].[DatabaseBackup] @Databases = 'ALL_DATABASES', @Directory = N'C:\\backups\\sql'\" -b -o C:\\backups\\sql\\logs\\DatabaseBackup.txt",
  trigger => {
    schedule => daily,
    start_date => '2013-12-27',
    start_time => '20:00',
  }
}

Comment by Peter Huene [ 2014/02/04 ]

Correction: the new size of api.so is 78.336 kb.

Comment by Kylo Ginsberg [ 2014/02/04 ]

Correction: the new size of taskscheduler.rb is 48,143.

Comment by Kylo Ginsberg [ 2014/02/04 ]

I followed the review steps Josh gave above, verifying the file sizes matched the file sizes given in the following two comments, and verifying that I could apply that manifest repeatedly, varying a number of the properties including some strings longer than 127 characters.

Comment by Josh Cooper [ 2014/03/19 ]

This was fixed in 3.4.3 (as part of the puppet-win32-ruby repo which is bundled into the MSI)

Generated at Wed Jan 22 04:26:53 PST 2020 using JIRA 7.7.1#77002-sha1:e75ca93d5574d9409c0630b81c894d9065296414.