Vista finally stable for me, (no) thanks to a cable!

For the past six months, I’d been scraping by with Vista.  Every few days, my machine would lock up.  I tried many combinations of hardware;  was it my cheap USB hub?  Remove it.  My Firewire?  Removed.  I wasn’t happy with the performance but it was never really bad enough for me to consider going back to XP.  (or going to Linux as some would think)  I added 2 gigs of Crucial memory to the box when I renovated my server, but the lockups didn’t go away.

Sometimes the machine would simply lock up, but other times, the symptoms were very interesting:  I’d wake the machine up from sleep for the morning, and over a minute or so, the computer would slowly grind to a halt.   Control-Alt-Delete would often yield an error message (a message about being unable to bring up the security dialog.)

I found an interesting event log entry:

Log Name:      System
Source:        EventLog
Date:          7/23/2007 1:56:46 PM
Event ID:      6008
Task Category: None
Level:         Error
Keywords:      Classic
User:          N/A
Computer:      *********
The previous system shutdown at 1:50:50 PM on 7/23/2007 was unexpected.

The interesting part:  Notice the time.  The eventlog entry is made at nearly 1:57 PM.  My machine has a reboot cycle of about 2 minutes.   Remember that Windows writes a timestamp to the registry every 5 seconds (see "The Heartbeat of Windows").   Normally, the timestamp in the description of the event is a good indication of when a system bluescreens (within 5 seconds.)   The last heartbeat is at nearly 1:51 PM, about 5 minutes before.  Subtract 2 minutes for reboot, and it appears that Windows wasn’t able to write a heartbeat for 3 minutes.

I’ve seen this behavior before:  We have a Dell CERC 6-channel SATA RAID controller on our Dell 1800 server (OEM’ed by Adaptec).  I once had it misconfigured so that it would scan a RAID 5 array for consistency with every disk access.  (Dell does not recommend this.)  

The array would work but it would slow down dramatically during heavy use and even hang.   I chased network problems for a long time since we use SMB to transfer big video files between that server and our Macs.  I found the same pattern in the event logs in that machine, but more extreme.  Once, the heartbeat timestamp occurred some six hours before I eventually had to come in to reboot the system.   In hindsight, my mistaken setting turned our SATA RAID array into a giant floppy disk!

A conclusion:  My Vista machine, for whatever reason, wasn’t able to access the hard drive and this was happening at a very low level, in other words, it wasn’t due to Vista itself.  I always thought Windows would bluescreen if you removed the hard disk, but apparently if there are enough resources available when this happens, it simply slows to a halt.  Just what our server did and what my workstation was doing.

I tried new nVidia chipset drivers and they worked but the real breakthrough is when I thought of swapping the SATA cable;  I had extra cables after my work on the server and put them in.

It’s been five days with no blips or hangs.  All my USB ports are hooked up.   I can only conclude that the SATA cable was bad.  The contacts were probably intermittent and failed through temperature cycles.  (I put my machine to sleep nightly.)  As a bonus, my home SBS box doesn’t drop its RAID 1 mirror anymore since I swapped cables in that machine, too.

I should have seen this when I built my machine last year since it was running XP.  I had had the system grind to a halt occasionally a few times that year, but I had assumed nVidia had simply shipped a bad driver.  Hardware problems are usually the most underestimated problems in Windows troubleshooting, and I proved it personally!

I found one other Vista patch that helped:   Cumulative update rollup for Windows Vista is a USB-related patch.   (You have to call for that one, or fill out their Hotfix Request form online.)

There are more to come:  New Vista fix packs provide updates promised for SP1.  They’re coming on Patch Tuesday, August 14th, or so goes the rumor.

Update:  The patches are out.

I don’t like getting upset over computer problems, or joining the online echo chambers where rage happens over the slightest thing with Vista/OS X/Linux since I’ve had too many more important things to worry about.  But I am very happy with Vista today and very happy to find the root cause of my troubles.

Update:  Ed Bott has some kind words.  As does Serdar.

7 Comments on “Vista finally stable for me, (no) thanks to a cable!”

  1. Bill says:

    I too have suffered with similar problems for the past few weeks.  In my system it would either totally freeze up or reboot itself.  Saw this post as well as the picture of the formentioned SATA cable.  Since the cable looked just like the same brand of cable as I had in my system, I decided to take 3 minutes and swapped it out for one that came with my new MOBO.  So far the system has been fault free for the past 30 hours.

  2. er says:

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s