Vista finally stable for me, (no) thanks to a cable!Posted: August 2, 2007
For the past six months, I’d been scraping by with Vista. Every few days, my machine would lock up. I tried many combinations of hardware; was it my cheap USB hub? Remove it. My Firewire? Removed. I wasn’t happy with the performance but it was never really bad enough for me to consider going back to XP. (or going to Linux as some would think) I added 2 gigs of Crucial memory to the box when I renovated my server, but the lockups didn’t go away.
Sometimes the machine would simply lock up, but other times, the symptoms were very interesting: I’d wake the machine up from sleep for the morning, and over a minute or so, the computer would slowly grind to a halt. Control-Alt-Delete would often yield an error message (a message about being unable to bring up the security dialog.)
I found an interesting event log entry:
Log Name: System Source: EventLog Date: 7/23/2007 1:56:46 PM Event ID: 6008 Task Category: None Level: Error Keywords: Classic User: N/A Computer: ********* Description: The previous system shutdown at 1:50:50 PM on 7/23/2007 was unexpected.
The interesting part: Notice the time. The eventlog entry is made at nearly 1:57 PM. My machine has a reboot cycle of about 2 minutes. Remember that Windows writes a timestamp to the registry every 5 seconds (see "The Heartbeat of Windows"). Normally, the timestamp in the description of the event is a good indication of when a system bluescreens (within 5 seconds.) The last heartbeat is at nearly 1:51 PM, about 5 minutes before. Subtract 2 minutes for reboot, and it appears that Windows wasn’t able to write a heartbeat for 3 minutes.
I’ve seen this behavior before: We have a Dell CERC 6-channel SATA RAID controller on our Dell 1800 server (OEM’ed by Adaptec). I once had it misconfigured so that it would scan a RAID 5 array for consistency with every disk access. (Dell does not recommend this.)
The array would work but it would slow down dramatically during heavy use and even hang. I chased network problems for a long time since we use SMB to transfer big video files between that server and our Macs. I found the same pattern in the event logs in that machine, but more extreme. Once, the heartbeat timestamp occurred some six hours before I eventually had to come in to reboot the system. In hindsight, my mistaken setting turned our SATA RAID array into a giant floppy disk!
A conclusion: My Vista machine, for whatever reason, wasn’t able to access the hard drive and this was happening at a very low level, in other words, it wasn’t due to Vista itself. I always thought Windows would bluescreen if you removed the hard disk, but apparently if there are enough resources available when this happens, it simply slows to a halt. Just what our server did and what my workstation was doing.
I tried new nVidia chipset drivers and they worked but the real breakthrough is when I thought of swapping the SATA cable; I had extra cables after my work on the server and put them in.
It’s been five days with no blips or hangs. All my USB ports are hooked up. I can only conclude that the SATA cable was bad. The contacts were probably intermittent and failed through temperature cycles. (I put my machine to sleep nightly.) As a bonus, my home SBS box doesn’t drop its RAID 1 mirror anymore since I swapped cables in that machine, too.
I should have seen this when I built my machine last year since it was running XP. I had had the system grind to a halt occasionally a few times that year, but I had assumed nVidia had simply shipped a bad driver. Hardware problems are usually the most underestimated problems in Windows troubleshooting, and I proved it personally!
There are more to come: New Vista fix packs provide updates promised for SP1. They’re coming on Patch Tuesday, August 14th, or so goes the rumor.
Update: The patches are out.
I don’t like getting upset over computer problems, or joining the online echo chambers where rage happens over the slightest thing with Vista/OS X/Linux since I’ve had too many more important things to worry about. But I am very happy with Vista today and very happy to find the root cause of my troubles.