Local News Site Crashes, Part 2

As mentioned in my last post, a local news site was crashing on me and I wanted to learn more about what was causing it.  I had the HTTP request records from Fiddler, but I didn’t think its results were conclusive enough for me.  What could I find out in the debugger?  It’s the first tool I run for a kernel crash (bluescreen) but I had never tried to analyze an application crash with it.

First, I tried !analyze –v.  This command is the title of an internals blog I regularly read, but it is also the command that automatically analyzes a dump and determines the cause of a crash.  It is the first command often given in a kernel debugging session.  What is it here?

   1: ***    Your debugger is not using the correct symbols                 ***
   2: ***                                                                   ***
   3: ***    In order for this command to work properly, your symbol path   ***
   4: ***    must point to .pdb files that have full type information.    
   5: ***                                                                   ***
   6: ***    Certain .pdb files (such as the public OS symbols) do not      ***
   7: ***    contain the required information.  Contact the group that      ***
   8: ***    provided you with these symbols if you need this command to    ***
   9: ***    work.                                                          ***
  10: ***                                                                   ***
  11: ***    Type referenced: jscript!FncInfo                               ***
  12: ***                                                                   ***
  13: *************************************************************************
  14: *** ERROR: Symbol file could not be found.  Defaulted to export symbols for msidcrl40.DLL - 

OK.  I’m not in Microsoft so I won’t get those symbols.  I probably wouldn’t even posted this if I were in MS.  However, WinDbg helpfully tells me that “an exception of interest can be accessed via .ecxr”.  Let’s see this exception record:

0:005> .ecxr
eax=00000000 ebx=00000000 ecx=04fef280 edx=0301a424 esi=04f934a0 edi=00000000
eip=695981f6 esp=0301a4f0 ebp=0301a500 iopl=0 nv up ei pl zr na pe nc
cs=0023 ss=002b ds=002b es=002b fs=0053 gs=002b efl=00010246
mshtml!CMarkup::DetachElemCtxStream+0x64:
695981f6 8b07 mov eax,dword ptr [edi] ds:002b:00000000=????????

We’re getting someplace.  This is almost certainly where IE went boom.  The EDI register is supposed to point to somewhere in memory where the data is, but it is all zeroes so when it is dereferenced…it’s a null pointer.

 
What else was it doing?  My next step, a stack trace (kv)—this is just part of it:
0:005> kv
*** Stack trace for last set context - .thread/.cxr resets it
ChildEBP RetAddr Args to Child
0301a500 695981c2 00000000 00000000 00010100 mshtml!CMarkup::DetachElemCtxStream+0x64
0301a520 69575a5e 00000000 00000000 09e34b40 mshtml!CMarkup::DetachElemCtxStream+0x30
0301a554 694b7f43 04fd6c30 10e49194 04fc3830 mshtml!CAPProcessor::Evaluate+0x21d
0301a59c 69598299 00000000 00000000 09e34b40 mshtml!CDoc::SubmitForAntiPhishProcessing+0x1c4
0301a5b4 694c4e81 0301a628 125d82b8 00000000 mshtml!CMarkup::CheckCtxInfoThreshold+0x4c
0301a5c8 694250c2 09e34b40 00000002 00000001 mshtml!CElement::AddCtxInfoHelper+0xa5
0301a5e8 69478a42 00000002 69478a4c 125d82b8 mshtml!CAnchorElement::AddCtxInfoToStream+0x1e
0301a5f0 69478a4c 125d82b8 0301a778 00000000 mshtml!CImgElement::ExitTree+0xa (FPO: [0,0,0])
0301a614 693565e0 0301a628 09e34b40 00000000 mshtml!CAnchorElement::Notify+0x142
0301a768 693559f2 0301a874 002a7ea0 00000001 mshtml!CSpliceTreeEngine::RemoveSplice+0x2eb

The full trace was over 80 entries deep!  The usual strategy is to look at the topmost 5 or 10 entries in the stack since they’re “near” the problem area.  The crash happened in CMarkup::DetachElemCtxStream.  On the left the arguments to the function (args to child) are listed.  Some are zero, suggesting that that function got the bad pointer from one of its parent callers. 

I disassembled the code of DetachElemCtxStream and traced through it:

0:005> u @eip
mshtml!CMarkup::DetachElemCtxStream+0x64:
695981f6 8b07 mov eax,dword ptr [edi]
695981f8 57 push edi
695981f9 ff5004 call dword ptr [eax+4]
695981fc 8b8680000000 mov eax,dword ptr [esi+80h]
69598202 8b08 mov ecx,dword ptr [eax]
69598204 50 push eax
69598205 ff5108 call dword ptr [ecx+8]
69598208 899e80000000 mov dword ptr [esi+80h],ebx

While it seems to involve jumping to a previously-constructed dispatch table, I don’t know what else to make of it.  I did trace through its callers for a bit but didn’t know what I was looking for. (I am familiar with x86 assembly code but do not code in it or look at it regularly.)  Instead, I wanted to look at some registers and some stack arguments to see if they pointed to interesting data.  Now you know why I wanted a full user dump.

We’ll see if some of the registers or stack arguments point to interesting text.

For the most part, most of the registers and the arguments to the first five entries off the top of the stack weren’t interesting.  In earlier debugging sessions with different dumps of IE, I once found a long list of URL’s in Unicode.  A very long list.   I wasn’t able to find that in this dump without spending all week on it.  I found one interesting text pointed by the ECX register, about 572 bytes in:

0:005> db @ecx + 0n672
04fef520 1f 00 00 00 00 00 00 00-68 00 74 00 74 00 70 00 ........h.t.t.p.
04fef530 3a 00 2f 00 2f 00 77 00-77 00 77 00 2e 00 73 00 :././.w.w.w...s.
04fef540 61 00 6c 00 65 00 6d 00-6e 00 65 00 77 00 73 00 a.l.e.m.n.e.w.s.
04fef550 2e 00 63 00 6f 00 6d 00-2f 00 00 00 00 00 00 00 ..c.o.m./.......
04fef560 00 00 00 00 00 00 00 00-37 aa c0 36 00 00 00 8c ........7..6....
04fef570 2f 00 61 00 6a 00 61 00-78 00 2f 00 6c 00 69 00 /.a.j.a.x./.l.i.
04fef580 62 00 73 00 2f 00 73 00-77 00 66 00 6f 00 62 00 b.s./.s.w.f.o.b.
04fef590 6a 00 65 00 63 00 74 00-2f 00 32 00 2e 00 32 00 j.e.c.t./.2...2.
It appears to be a list of links in that page.  Somewhere in the dump is text of the current web page;  I’d seen it before, but not this time. 
 
At this point, it’s perfectly acceptable to just take the top stack entries, throw them in a search engine and see who else has seen this problem.  That’s what I’m doing next.

Advertisements


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s