Postmortem: Vista Hangs after Three Days due to Taskeng spawning

I’ve finally found the resolution for my Vista hangs after four months.  Did I do the right things diagnosing it?  What could I have done better, or really, quicker?

From the start, it’s obvious in hindsight that the original symptoms—hanging on wakeup and on PDA docking—were very misleading and pointed me towards hardware problems.  Other than the bad memory I replaced (with still no idea why it was bad), my hardware was fine.

I did turn away from hardware towards performance issues as the root cause.  I would have been better going to Performance Monitor first;  it’s much improved in Vista and very useful.  I should have only taken out the kernel tools if perfmon didn’t yield anything up, but that’s what happens when you start with hardware as an assumption.

Even when I used kernel tools like WinDbg, I missed a big clue.  Here’s the output of !vm, from my most recent crash dump several weeks ago, listing memory in use, most importantly process memory in use:

*** Virtual Memory Usage ***
    Physical Memory:      851636 (   3406544 Kb)
    Page File: \??\C:\pagefile.sys
      Current:   3713744 Kb  Free Space:   1774576 Kb
      Minimum:   3713744 Kb  Maximum:     10219632 Kb
    Available Pages:      176293 (    705172 Kb)
    ResAvail Pages:       618754 (   2475016 Kb)
    Locked IO Pages:           0 (         0 Kb)
    Free System PTEs:      58252 (    233008 Kb)
    Modified Pages:          103 (       412 Kb)
    Modified PF Pages:        56 (       224 Kb)
    NonPagedPool Usage:    22548 (     90192 Kb)
    NonPagedPool Max:     523072 (   2092288 Kb)

Nothing wrong with nonpaged pool usage.

    PagedPool 0 Usage:     18814 (     75256 Kb)
    PagedPool 1 Usage:      7492 (     29968 Kb)
    PagedPool 2 Usage:      4200 (     16800 Kb)
    PagedPool 3 Usage:      3870 (     15480 Kb)
    PagedPool 4 Usage:      3702 (     14808 Kb)
    PagedPool Usage:       38078 (    152312 Kb)
    PagedPool Maximum:    523264 (   2093056 Kb)

Nor with paged pool. Whatever was happening, it wasn’t from my drivers.  Now on to the processes:

    Total Private:       1033896 (   4135584 Kb)
         11310 iexplore.exe     25548 (    102192 Kb)
         0484 svchost.exe      25456 (    101824 Kb)
         111a0 iexplore.exe     25304 (    101216 Kb)
         0f18 explorer.exe     21786 (     87144 Kb)
         03cc svchost.exe      20625 (     82500 Kb)
         11f24 OUTLOOK.EXE      19878 (     79512 Kb)
    

Lots of memory in use, but the processes themselves seem normal so far…

         4d6c taskeng.exe        409 (      1636 Kb)
         43bc taskeng.exe        392 (      1568 Kb)
         9178 taskeng.exe        388 (      1552 Kb)
         3f24 taskeng.exe        388 (      1552 Kb)
         1123c taskeng.exe        386 (      1544 Kb)
         2dd0 taskeng.exe        384 (      1536 Kb)
         2d80 taskeng.exe        384 (      1536 Kb)
         0314 taskeng.exe        383 (      1532 Kb)
         b9cc taskeng.exe        382 (      1528 Kb)
         7184 taskeng.exe        382 (      1528 Kb)
         58a8 taskeng.exe        382 (      1528 Kb)
         1988 taskeng.exe        382 (      1528 Kb)
         [and nearly 1000 more instances!!!…]

That should have tipped me off right there, but it didn’t.

For months, I had run Process Explorer and had seen multiple instances of taskeng.  I had always assumed Task Scheduler, rewritten for Vista, had a pool of processes to run tasks, just as IIS does.

I never thought to count the tasks!   Had I only realized that Task Scheduler didn’t work that way!  Remember that Windows Internals covers much more the kernel and drivers themselves, but about the many services and administration mechanisms inside Windows. 

I hadn’t been as familiar with the newer Vista administration tools, since many of them depend on Windows Server 2008, which I’ve only really gotten used to over the summer on my own SBS box.  Familiar tools like the Event Log console are a little different.  At SATV we’re still on Windows XP/Windows Server 2003 (and even Windows 2000) so part of me is still accustomed to the older platforms.

Before I congratulate myself on sticking with the problem for five months, I should note that most people would have reformatted and reinstalled by now and it would be just another story in the “bad Vista” narrative that’s been in the IT press for three years.  I shouldn’t ever expect patience like that from a computer problem in this fast-paced world.  I’m perhaps too patient.

The other story is that Windows, for all that we bitterly condemn its faults, is remarkably resilient.  I have seen workstations that I was convinced were trashed when I couldn’t reach them from the network, only to find that their users were working away not realizing anything was wrong with their machines (refreshing Group Policy fixed that.)

There are many, many, many Windows installations that are seriously messed up, yet their users have no idea anything’s amiss.  Workstations and laptops shipped with crapware almost certainly qualify as “messed up”, sadly.

We have Macs at SATV that are quirky too.  It’s the reality of using a very complex machine;  do we reformat and rebuild our cars when they don’t start in the morning?

(Obligatory Linux comment:  I can’t see spending the same time with an Ubuntu distribution and having the same success;  most Ubuntu users have to reformat and reinstall regularly when new versions come out, and there is no concept of running your 5-year old program on Linux.  It just isn’t done.)

But, as you can see from the image at the top of this post, everything’s fine.  I have lots of programs open (and lots of tabs in IE) but only about 2G memory in use.  Looks and runs fine now.

What’s the next problem?

Advertisements


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s