When a blue screen is really hardware

The SBS Diva has a bluescreen problem.  Several commenters on her blog suggested that it might be hardware-related.  Let me join the chorus.
 
I’ve been fighting bluescreens on my 5-year old MSI Athlon XP-based motherboard for almost two months. The problem was hardware.  Here’s how I found the problem.
 
To reiterate the diagnostic criteria for hardware-related problems, they are:
 
1) Symptoms are intermittent
2) Symptoms happen after machine has been powered on for a while
3) Bluescreen diagnostics point to many different causes.
4) Memory tests may or may not indicate a problem.
 
In my case, the bluescreens were happening, seemingly at random.  I would do bluescreen analysis with the debugging tools, as I described in an earlier post, each and every time it crashed.  I found a wide variety of errors, many of these happening in win32k.sys.
 
I removed as many third-party drivers as I could (I own a Kensington trackball), and reverted to the "safe" settings in the BIOS.  I checked the ventilation on my system, cleaned and replaced one of the fans and the power supply and even suspected the keyboard.  After all this work, it still bluescreened.
 
I put in a new stick of memory from Crucial and performed a memory test with Microsoft’s tester, running through two passes.  No change.  Bluescreens still.
 
One cause of defective hardware is not always recognized outside of service centers or the hobbyist PC community.  Look at these pictures of my motherboard.  The round objects are capacitors.  They help to regulate voltages in electronic equipment, including my motherboard.  They’re often in a hot, dusty, environment, and they sometimes fail.  (This problem is too common.   Google "bad caps" for a sampler.)
 
Normal capacitors have flat tops (the triangular score lines on the top are normal and do not indicate any defect.)
 
Bad capacitors generate heat and pressure inside, which result in the cap bulging its top.  Eventually, the cap may "pop" (the reason for the score lines on the top) or simply short out.  In either case, the device it’s wired into–hello, motherboard!–may fail.
 
My 5-year old MSI board was all but certainly brought down by the two caps in the photo, which just happen to be near the memory sockets (obscured by wiring).  It’s little wonder I was having problems; 
 
If you have a flashlight and a magnifier, and a few minutes, you can check your motherboard for bad caps yourself.  Just turn off your computer and unplug it, remove the cover (following directions in your hardware manual) and have a look.
 
Caps can be anywhere on your motherboard, but they are most commonly located around the DIMM (memory) sockets and around the CPU, where you’ll find a large cluster of them.
 
Normal caps will have flat tops–again, the triangular score lines at the top are normal.  Bad caps will have bulging tops, or they’ll "leak";  there may be junk leaking out of the cap and running over the motherboard.  Badcaps.net has more photos.
 
If you find a bad cap, there’s not much you can do other than replacing the motherboard or calling for service.  But at least you won’t be running in circles with crashdumps.
 
Take care,
 
Dave
 
 
P.S.  Happy ending for me:  One new motherboard and processor later (Athlon 64), I was back in business one day later.
 
 
 
 
 
 
Advertisements


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s