Vista Hangs After Three Days Uptime: Too Many Processes!

At long last, I get to post a solution!  A few posts ago, I alluded to a bad problem I’ve been having with my Vista workstation involving a memory leak.  I have been working this problem for five months with much frustration along the way.  I want to describe how I found the problem—this will be a long post.  Next post will have the solution, and I’ll have a postmortem too.

Several months ago, my Vista workstation started hanging on me.  This would happen when I woke up the machine from sleep for the day, when I connected to a VPN, and when I put my Windows Mobile PDA in its cradle to sync.  These circumstances informed, or I should say, betrayed my troubleshooting efforts.

First I did the obvious:  pull hardware (my Firewire card that saw little use, my USB card reader) and software that I didn’t use.  Unsurprisingly, no change. 

I focused on USB thinking that problems with the USB bus were causing my problems.  I applied a patch from Microsoft and a registry fix (KB953367) that purported to improve USB reliability.  No changes.

Drivers:  Eventually, over the course of several months, I refreshed all the drivers in my machine.  Good for my machine, but not for my problem.

I tried every debugging tool I knew of—WinDbg, Kernrate, Xperf—and a few tools that I didnt.  When my system hung on wakeup, I had to do the “Crash on Ctrl-Scroll” trick to bluescreen the machine on purpose so I could get a dump.  The dumps I would get didn’t yield much information, or rather too much info.  Sometimes I wouldn’t get a dump at all.  Hardware? 

I had gotten several real bluescreens throughout.  Memory?  I got new memory for my SBS box and put the old memory in the workstation.  I had 3G in the machine (2 1G and 2 512M sticks) and swapped it out for 4 1G sticks making 4G (or 3.5G since I still run 32-bit Vista).  No crashes.  I put off that motherboard I was about to buy.  But still hangs.

I couldn’t get anything meaningful from the flood of data from the kernel tools that I tried.  I couldn’t use Process Explorer—it crashed too!

I tried a different tack:  Performance Monitor (perfmon.msc).  I monitored some counters, and found an interesting trend.  You can see it at the top of this post.  Over the course of 12 hours or so, my Process Total Virtual Bytes went from 17,990,000,000 bytes, an already very high value, up to 48,067,000,000 bytes.  (!!)

I had other graphs (unfortunately not saved) that showed a very high ski-slope of virtual memory usage over three days.  It’s probably best I don’t remember the exact values, but when I could get Process Explorer to work, it reported 4.0G of virtual space in use during a “normal” session where I had a few tabs of IE and Outlook open.

I had a memory leak. 

That was one of my first breaks in the case.  The second came when I was looking at user profiles.

I live alone.  Despite having a full SBS 2008 server in the house, it only has one user, and my workstation has only one user.  I do, as is good practice, run as a regular user and have another admin account. 

I logged in as admin and let the system idle overnight while monitoring virtual memory. 

The memory graph looked reasonably flat.

That ruled out, for the most part, the kernel, the drivers, and nearly every service from guilt.  Windows was mostly not corrupted.  Was it my profile?

I logged back on as my regular user (with an elevated command prompt, as I usually do) and got my last and biggest break.  I had been using Powershell to take snapshots of my process activity, since I couldn’t run Process Explorer.  I had been suspecting that Explorer or some other process was leaking and wanted to get a before and after snapshot to look at in Excel.  This is the command I used:

get-process | sort VirtualMemorySize | export-csv vmsnap.csv

After my system had been up for awhile, I took my second snapshot and noted something strange:

PS C:\temp> dir vmsnap*


    Directory: Microsoft.PowerShell.Core\FileSystem::C:\temp


Mode                LastWriteTime     Length Name
----                -------------     ------ ----
-a---        10/27/2008   9:28 PM      94920 vmsnap.csv
-a---        10/28/2008  10:11 AM     236901 vmsnap2.csv

WTF?!  One snapshot’s that much bigger than the other? 

Only way that could happen is if there were way more rows in the second snapshot, meaning more processes.  I logged off and back on to my “good” admin account and counted processes, like this:

(get-process).Count
71

This is the normal number of running processes in a “good” system.  I restarted my machine and logged back on to my regular user and repeated that command.  Same result.  Now I waited. 

Two days later, I repeat the command:

(get-process).Count
880
 

(!!!!!!)  Mommy!

Was it a virus?  I know of “fork bombs”, and was about to try Rootkit Revealer to see if my machine was infected (I have never gotten malware infected on any machine I owned) but looked at my process list and saw something else:  A zillion instances of taskeng, the Microsoft Task Scheduler, rewritten for Vista.  This command line shows it nicely:

(get-process | where {$_.Name –eq “taskeng”}).Count
817

Eeeepp!

Task Scheduler was spawning tasks almost as fast as it could.  I didn’t see this when I looked at memory per-process since I’d been thinking of a leak within a process and not a spawning process, but it was a perfect explanation!  No wonder Process Explorer crashed—it was starved for memory to begin with, never mind when it had to reserve memory to display all those processes!

It explains the hang on wakeup from sleep, since Vista has to notify all the processes upon wake.  It doesn’t explain the hangs when I sync to my PDA or connect to a VPN, but I’m assuming these were caused by a low resource condition. 

I did some digging to find out exactly what Task Scheduler was doing.  That’s my next post.

Advertisements


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s