The long-rumored reliability patches for Vista are out. KB938979 and KB938194 promise to make Vista a bit faster and a bit more stable. The chatter about these patches is generally good, although one person on the Yahoo SBS2K list had trouble with Outlook; I hope he’s able to have more information on it.
I’m running the patches now. So far, so good. At the same time, nVidia released a SATA driver update through Windows Update, which works fine, but also a NIC update–which does not!
(For some reason, on my MSI K9N Neo-F, I can only use the nVidia NIC driver that comes with Vista. None of the later drivers work. Symptoms are a network icon with an X (meaning no connection/disabled) and if I attempt to change the NIC parameters in Device Manager, say, speed and duplex settings, Windows hangs. But the driver works in safe mode with networking. Good luck finding this one but you’ll hear about it on this blog first if I do!)
For the past six months, I’d been scraping by with Vista. Every few days, my machine would lock up. I tried many combinations of hardware; was it my cheap USB hub? Remove it. My Firewire? Removed. I wasn’t happy with the performance but it was never really bad enough for me to consider going back to XP. (or going to Linux as some would think) I added 2 gigs of Crucial memory to the box when I renovated my server, but the lockups didn’t go away.
Sometimes the machine would simply lock up, but other times, the symptoms were very interesting: I’d wake the machine up from sleep for the morning, and over a minute or so, the computer would slowly grind to a halt. Control-Alt-Delete would often yield an error message (a message about being unable to bring up the security dialog.)
I found an interesting event log entry:
Log Name: System Source: EventLog Date: 7/23/2007 1:56:46 PM Event ID: 6008 Task Category: None Level: Error Keywords: Classic User: N/A Computer: ********* Description: The previous system shutdown at 1:50:50 PM on 7/23/2007 was unexpected.
The interesting part: Notice the time. The eventlog entry is made at nearly 1:57 PM. My machine has a reboot cycle of about 2 minutes. Remember that Windows writes a timestamp to the registry every 5 seconds (see "The Heartbeat of Windows"). Normally, the timestamp in the description of the event is a good indication of when a system bluescreens (within 5 seconds.) The last heartbeat is at nearly 1:51 PM, about 5 minutes before. Subtract 2 minutes for reboot, and it appears that Windows wasn’t able to write a heartbeat for 3 minutes.
I’ve seen this behavior before: We have a Dell CERC 6-channel SATA RAID controller on our Dell 1800 server (OEM’ed by Adaptec). I once had it misconfigured so that it would scan a RAID 5 array for consistency with every disk access. (Dell does not recommend this.)
The array would work but it would slow down dramatically during heavy use and even hang. I chased network problems for a long time since we use SMB to transfer big video files between that server and our Macs. I found the same pattern in the event logs in that machine, but more extreme. Once, the heartbeat timestamp occurred some six hours before I eventually had to come in to reboot the system. In hindsight, my mistaken setting turned our SATA RAID array into a giant floppy disk!
A conclusion: My Vista machine, for whatever reason, wasn’t able to access the hard drive and this was happening at a very low level, in other words, it wasn’t due to Vista itself. I always thought Windows would bluescreen if you removed the hard disk, but apparently if there are enough resources available when this happens, it simply slows to a halt. Just what our server did and what my workstation was doing.
I tried new nVidia chipset drivers and they worked but the real breakthrough is when I thought of swapping the SATA cable; I had extra cables after my work on the server and put them in.
It’s been five days with no blips or hangs. All my USB ports are hooked up. I can only conclude that the SATA cable was bad. The contacts were probably intermittent and failed through temperature cycles. (I put my machine to sleep nightly.) As a bonus, my home SBS box doesn’t drop its RAID 1 mirror anymore since I swapped cables in that machine, too.
I should have seen this when I built my machine last year since it was running XP. I had had the system grind to a halt occasionally a few times that year, but I had assumed nVidia had simply shipped a bad driver. Hardware problems are usually the most underestimated problems in Windows troubleshooting, and I proved it personally!
There are more to come: New Vista fix packs provide updates promised for SP1. They’re coming on Patch Tuesday, August 14th, or so goes the rumor.
Update: The patches are out.
I don’t like getting upset over computer problems, or joining the online echo chambers where rage happens over the slightest thing with Vista/OS X/Linux since I’ve had too many more important things to worry about. But I am very happy with Vista today and very happy to find the root cause of my troubles.