Background
Recently, I ran into few incidents where .NET Application hosting web browser control was hanging during start up. When I investigated dump file, I found out that the hang has nothing to do with the .NET or JavaScript or Web Browser Control. In fact, it was related to a bug introduced in one of the MSXML security fixes MS15-084. In order to mitigate the issue, you may want to install KB3076895.
Analysis
Open the dump file using WinDbg (ships with Debugging Tools for Windows).
- Run !locks command and you'll get output similar to the following:
CritSec ntdll!LdrpLoaderLock+0 at 77af20c0
WaiterWoken No
LockCount 50
RecursionCount 1
OwningThread 2ea8
EntryCount 0
ContentionCount bd
*** LockedCritSec bcrypt!g_csLoaderLock+0 at 74844060
WaiterWoken No
LockCount 1
RecursionCount 1
OwningThread d5c
EntryCount 0
ContentionCount 1
*** LockedCritSec +18631a44 at 18631a44
WaiterWoken No
LockCount 0
RecursionCount 2
OwningThread 2ea8
EntryCount 0
ContentionCount 0
*** LockedScanned 4335 critical sections
- Run ~~[threadid]kb command for the call stacks. In our case, it would be ~~[d5c]kbL and ~~[2ea8]kbL.
~~[d5c]kbL Output or call stack for thread with id d5c
CritSec +18631a44 at 18631a44
# ChildEBP RetAddr Args to Child
00 0dffee7c 77a2b4b4 00000078 00000000 00000000 ntdll!NtWaitForSingleObject+0x15
01 0dffeee0 77a2b398 00000000 00000000 0dffef48 ntdll!RtlpWaitOnCriticalSection+0x13e
02 0dffef08 77a202a9 77af20c0 7a06f0f2 7483275c ntdll!RtlEnterCriticalSection+0x150
...
~~[2ea8]kbL Output or call stack for thread with id 2ea8
# ChildEBP RetAddr Args to Child
00 1502def4 77a2b4b4 00001790 00000000 00000000 ntdll!NtWaitForSingleObject+0x15
01 1502df58 77a2b398 00000000 00000000 00000000 ntdll!RtlpWaitOnCriticalSection+0x13e
02 1502df80 74832ff8 74844060 00000000 177ccde4 ntdll!RtlEnterCriticalSection+0x150
... - As you can see, thread d5c owns critical section 74844060 (step #1) and it is waiting for critical section 77af20c0 (step #2).
- Similarly thread 2ea8 owns critical section 77af20c0 (step #1) and it is waiting for critical section 74844060 (step #2).
- In other words, they are in mutual deadlock. They both need exclusive access to the critical section resource.
- The entire process is freezing because of a special critical section (loader lock). You can find more information here, here and so on.
- Very simple and short explanation is that a loader lock is acquired during dll initialization to make sure global & static variables are initialized properly. Plus few other reasons. You can read more about what exactly loader lock does in this msdn article (Causes of loader lock).
- If you want to go one step further, then you can take one more dump after few seconds / minutes and will see that the threads are not consuming any CPU.
- To do so execute !runway command on both the dumps and compare the time taken by each thread. It lists the threads consuming time the descending order with the one that consumed more CPU at the top.
User Mode Time
Thread Time
0:28dc 0 days 0:00:03.744
10:2b84 0 days 0:00:00.390
9:2408 0 days 0:00:00.156
48:2d28 0 days 0:00:00.124
37:2728 0 days 0:00:00.109
33:2538 0 days 0:00:00.109
15:1a34 0 days 0:00:00.109
31:1ca8 0 days 0:00:00.093
20:2298 0 days 0:00:00.078
17:287c 0 days 0:00:00.062
44:17b4 0 days 0:00:00.046
40:888 0 days 0:00:00.046
28:23e4 0 days 0:00:00.046
27:d5c 0 days 0:00:00.046
18:2c30 0 days 0:00:00.046
56:dd4 0 days 0:00:00.031
54:2940 0 days 0:00:00.031
38:2db4 0 days 0:00:00.031
22:2698 0 days 0:00:00.031
14:2a08 0 days 0:00:00.031
13:12dc 0 days 0:00:00.031
51:2d2c 0 days 0:00:00.015
49:22f4 0 days 0:00:00.015
46:c10 0 days 0:00:00.015
41:4e4 0 days 0:00:00.015
34:1de8 0 days 0:00:00.015
32:22ac 0 days 0:00:00.015
19:2878 0 days 0:00:00.015
16:2ee4 0 days 0:00:00.015
99:2d4c 0 days 0:00:00.000
98:1aec 0 days 0:00:00.000
97:2df4 0 days 0:00:00.000
96:2920 0 days 0:00:00.000
...
- Check process up time – displayed when you open the dump file
Debug session time: Fri Oct 23 11:56:12.000 2015 (UTC - 5:00)
System Uptime: 0 days 1:48:54.485
Process Uptime: 0 days 0:14:37.000
- The process is running for over 14 minutes and 37 seconds. The application is still initializing and it hasn't used more than 4-5 seconds of time. So application must be hung otherwise it will consume some amount of CPU until it initializes completely.
- Verify the version of MSXML files installed on the machine. To do so run lmvm msxml* command in WinDbg.
73630000 73788000 msxml6 (deferred)
Image path: C:\Windows\System32\msxml6.dll
Image name: msxml6.dll
Timestamp: Tue Jul 14 21:52:06 2015 (55A5CAD6)
CheckSum: 0015F444
ImageSize: 00158000
File version: 6.30.7601.18923
Product version: 6.30.7601.18923
File flags: 0 (Mask 3F)
File OS: 40004 NT Win32
File type: 2.0 Dll
File date: 00000000.00000000
Translations: 0000.04b0
CompanyName: Microsoft Corporation
ProductName: Microsoft(R) MSXML 6.0 SP3
InternalName: MSXML6.dll
OriginalFilename: MSXML6.dll
ProductVersion: 6.30.7601.18923
FileVersion: 6.30.7601.18923
FileDescription: MSXML 6.0 SP3
LegalCopyright: Copyright (C) Microsoft Corporation. 1981-2008
So what is the fix?
The issue was first introduced in MS15-084 and was fixed in KB3076895.