Exchange 2007 SP3: Easier service pack with wrinkles

Exchange 2007 SP3 is now available.  Unlike SP2, there is no wrapper needed to install it under SBS 2008.

However, you may see a few wrinkles in the update process.  You will need to stop the Windows SBS Manager (datacollectorsvc) and, if you have File Server Resource Manager installed, srmsvc.  In Powershell do:

stop-service datacollectorsvc

stop-service srmsvc

The service pack setup should then proceed normally.  It takes about a half-hour, and to be safe you should reboot the server after the update completes.

Advertisements

Diagnosing Hardware Bluescreens

Screenshot from NirSoft’s BlueScreenView

This morning I was waking my computer before breakfast to check on a FedEx shipment (much needed cooling fans for my apartment!) and when my machine woke up this is what I got.

I restarted it and the BIOS told me, “could not read disk, press Ctrl-Alt-Del to restart”.  I power-cycled the machine and got Windows to boot.

I checked my hard drive with Crystal Disk Info, but found nothing out of line in the SMART data—in fact, my terabyte HD, nearly a year old, has never had an error or a remapped sector or anything odd.  Had my partition table been truly corrupted, that would usually cause another bluescreen when I tried to boot.

OK, Windbg:

0: kd> !analyze -v
[banner omitted]
WHEA_UNCORRECTABLE_ERROR (124)
A fatal hardware error has occurred. Parameter 1 identifies the type of error
source that reported the error. Parameter 2 holds the address of the
WHEA_ERROR_RECORD structure that describes the error conditon.
Arguments:
Arg1: 0000000000000000, Machine Check Exception
Arg2: fffffa800435c038, Address of the WHEA_ERROR_RECORD structure.
Arg3: 00000000b2000010, High order 32-bits of the MCi_STATUS value.
Arg4: 0000000000010c0f, Low order 32-bits of the MCi_STATUS value.

I’d already guessed when the error happened, but to be sure, here’s the stack:

Child-SP          RetAddr           Call Site
fffff800`00ba8ac8 fffff800`02e2b917 nt!KeBugCheckEx
fffff800`00ba8ad0 fffff800`02fe84d3 hal!HalBugCheckSystem+0x1e3
fffff800`00ba8b10 fffff800`02e2b5dc nt!WheaReportHwError+0x263
fffff800`00ba8b70 fffff800`02e2af2e hal!HalpMcaReportError+0x4c
fffff800`00ba8cc0 fffff800`02e1ee8f hal!HalpMceHandler+0x9e
fffff800`00ba8d00 fffff800`02ed0eac hal!HalHandleMcheck+0x47
fffff800`00ba8d30 fffff800`02ed0d13 nt!KxMcheckAbort+0x6c
fffff800`00ba8e70 fffff880`03dd11f2 nt!KiMcheckAbort+0x153
fffff800`00b9cc98 fffff800`02ee013a amdk8!C1Halt+0x2
fffff800`00b9cca0 fffff800`02edadcc nt!PoIdle+0x53a
fffff800`00b9cd80 00000000`00000000 nt!KiIdleLoop+0x2c

The machine woke up to Windows, started running, and did its normal CPU idle procedure;  in all modern machines, the CPU halts when it is not otherwise running user or kernel code.  It’s possible the exception happened during the transition to sleep when I put the machine to bed the night before, in this event log entry:

The previous system shutdown at 11:27:54 PM on ‎6/‎28/‎2010 was unexpected.

OK, so it’s hardware.  What is the WHEA_ERROR_RECORD?

WHEA stands for Windows Hardware Error Architecture in Vista, 2008, Seven and 2008R2.  It replaces the Machine Check Architecture mechanism in earlier versions of Windows.

Parameter #2 of the bugcheck points to the hardware error record:

0: kd> dd fffffa800435c038fffffa80`0435c038  52455043 ffff0210 0003ffff 00000001fffffa80`0435c048  00000002 000003a0 000c1114 140a061dfffffa80`0435c058  00000000 00000000 00000000 00000000fffffa80`0435c068  00000000 00000000 00000000 00000000fffffa80`0435c078  cf07c4bd 4e18b789 731fc4b3 3171b52cfffffa80`0435c088  e8f56ffe 4cc5919c ab6588ba bb1349e1fffffa80`0435c098  0ced40e1 01cb1314 00000000 00000000fffffa80`0435c0a8  00000000 00000000 00000000 00000000

Right.  That’s clear.  Fortunately there are debugging extension commands for WHEA in the latest debugger.  I’ll try them.

0: kd> !wheaError Source Table @ fffff80003062b380 Error Sources

 

OK, not much info there, I’ll try one of the others.

0: kd> !errrec fffffa800435c038===============================================================================Common Platform Error Record @ fffffa800435c038-------------------------------------------------------------------------------Record Id     : 01cb13140ced40e1Severity      : Fatal (1)Length        : 928Creator       : MicrosoftNotify Type   : Machine Check ExceptionTimestamp     : 6/29/2010 12:17:20Flags         : 0x00000000

===============================================================================Section 0     : Processor Generic-------------------------------------------------------------------------------Descriptor    @ fffffa800435c0b8Section       @ fffffa800435c190Offset        : 344Length        : 192Flags         : 0x00000001 PrimarySeverity      : Fatal

Proc. Type    : x86/x64Instr. Set    : x64Error Type    : BUS errorOperation     : GenericFlags         : 0x00Level         : 3CPU Version   : 0x0000000000060fb1Processor ID  : 0x0000000000000000

===============================================================================Section 1     : x86/x64 Processor Specific-------------------------------------------------------------------------------Descriptor    @ fffffa800435c100Section       @ fffffa800435c250Offset        : 536Length        : 128Flags         : 0x00000000Severity      : Fatal

Local APIC Id : 0x0000000000000000CPU Id        : b1 0f 06 00 00 08 02 00 - 01 20 00 00 ff fb 8b 17                00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00                00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00

Proc. Info 0  @ fffffa800435c250

===============================================================================Section 2     : x86/x64 MCA-------------------------------------------------------------------------------Descriptor    @ fffffa800435c148Section       @ fffffa800435c2d0Offset        : 664Length        : 264Flags         : 0x00000000Severity      : Fatal

Error         : BUSLG_OBS_ERR_*_NOTIMEOUT_ERR (Proc 0 Bank 4)  Status      : 0xb200001000010c0f

 

We’re getting somewhere.  The Processor Generic section categorizes this as a bus error.  Section 2 gets a bit more detailed:

Error         : BUSLG_OBS_ERR_*_NOTIMEOUT_ERR (Proc 0 Bank 4)  Status      : 0xb200001000010c0f
I have no information on how to decode the status word.  Doing a search on BUSLG turns up a few hits related to memory errors in FreeBSD.  The “Bank 4” wording implies memory hardware—this machine has four sticks of 1G each for a 4G system.
MSDN has a description of WHEA error events, though none of them describe my error at all.
A likely scenario may be that the system was put to sleep and entered sleep normally, but there was a power glitch during sleep, or on wakeup, that affected the standby power that keeps the RAM alive.  If you put the system to sleep and turn off the power, Windows will complain about it in the event logs.

I’ve seen a lot of quirks with this particular system but this is a new one.   I’ve often had BIOS messages that tell me,

A HyperTransport sync flood occurred on last bootHit F1 to Resume
Sure, I see that message and think, OMG the sync flooded, get a mop!  It’s not a very actionable error.
Unless this happens again, I’m not going to do anything about this.  The motherboard is elderly, 4 years old, and I plan to get a new board when it hits its fifth birthday this time next year.   If it happens every day for the next month though….

Ten Years of IT at SATV: Onward still

This past spring, we had our annual rite, the Annual Meeting.  This is a board function required by law where we explain our finances, and our past year, and our future plans.  Kevin Walker has a few kind words for me in this video.

The mistake with the cable was forgotten very quickly.  Ten years ago, I wrote a strategic plan for SATV in which I explained that IT would be a very important part of our success in serving the community of Salem.  It was received politely, but it would only be taken seriously by our new, and current executive director, Sal Russo.  When he was hired in 2002, he and I would work to realize many elements of my plan.

Then, we were in the final years of our contract with Comcast and we had limited funds;  we also had to tally up our assets and figure out what needed to be improved.

It was a long list and SATV was a tired place in many ways.  A caretaker director and an unaware board had made life very discouraging;  in 2002 you could be forgiven for thinking our best years were behind us.  Our first executive director had unknowingly contributed to this decline when he was first hired;  in a very technological business like ours, he had no knowledge of technology, not even superficially.

While our first director, Bob Miot, was second to none in making the contacts SATV needed to thrive in its early life, the technology was neglected to our detriment.  Members like myself who had expertise were not really encouraged to be involved in that area.

Very soon after Sal came on board, I got the money to run new phone wiring for DSL.   Six months after that I dusted off my plans for our new network and in early 2004, we had our new Category 6 gigabit network, which has been nearly unchanged to this day.  (In 2006, we renovated our studio space so we performed another round of network improvements.)

More gradually, in recent years, our server technology has been updated;  we had SBS 2000 on our first Dell , followed by SBS 2003 and now, as of a year ago, SBS 2008.  We installed a video server in 2008 and our network is busy with Mac editing workstations and a public WiFi network.

We have modern Dell PC’s running Windows 7, and a new VoIP PBX that replaced a very elderly Panasonic key system.  We are as progressive as our budget will allow us to be.

Gone are the days when I didn’t know something was down until I came in or staff would call.  We have an environmental monitor, our SuperGoose, and numerous notification systems that go out to my cell phone.

We now have a modern IT-centric system that sits in the background and does its work for us, helping our government connect with its community, our community connect with their government and our citizens connecting with each other, over video and now online.

Now, as it was 10 years ago, we are nearing the end of another contract cycle with Comcast.  The monies we get from them will be lesser and our budget tighter.

Now, as then, I have been asked to write another strategic plan for SATV.  Using what we have learned over the years and what I have learned over 10 years, I am once again responsible for helping SATV navigate the next few years in IT.

We’ve seen many changes, too many to list. Ten years ago, I was on a tour of WHDH-TV, Channel 7.  We were shown two large robotic tape machines.

They were each used to cue and play commercials, one acting to backup the other.

It is safe to say those tape machines are gone now.

Five years ago, the first video servers came into use at high-end broadcast facilities.  We’ll not get these for a while, we thought.

Nearly two years ago we got our first video server and it has forced a change in how we deal with video, so much of a change that I have had to dedicate a section of our plan to recommend policies on how long a member’s video needs to be in our servers, for example.

It has changed backup procedures;  five years ago we were on DDS, then DLT tape.  Then external hard drives just as you would find at Best Buy.  How do we backup a terabyte of video?

Can we afford to?  Afford not to?

What more will we see over a few short years?

Whatever happens, I have been given a mandate and an endorsement to continue, perhaps for another ten years.

More errors, more mistakes, but also more triumphs, and successes.

And, one hopes, the continuation of SATV’s mission.

All one can ask for.

UPDATE: Phil Elder has some kind words.


Ten Years of IT at SATV: Ups and Downs

There have been ups and downs over the years at SATV, not a few have been down to my own mistakes.  This is one of them.

This is or was a wiring harness used in our video server.  Because PC cases don’t have a lot of room for connectors, particularly not for BNC video connectors, it is very common in the broadcast and professional audio fields to have cable snakes—a series of BNC or XLR pigtails that go into a DB connector where it can be DB-15 (like the old IBM Joystick connectors) or DB-25’s (serial).

This is one I broke.  I was sliding out the video server for cleaning—I had wanted to check the machines for dust ingestion, which was a problem when we renovated our studio area a few years ago.  The slide rail got stuck and I pulled a little harder.

It snagged the cable without my realizing it and broke off several cables in the snake which we only found out about later on.  Fortunately, our program director, Dave Gauthier, had been able to work around the problem.

I offered to pay for the cable, but was declined.  "You do more good than bad”, I was told.

Another amusing picture:

When we got our new Dell server, it opened, as most Dell tower servers do, with a key inserted into the lock on its bezel.  The power button and case release button and screws are all under the bezel so you must unlock it and take it off from time to time.  This key was in the bezel while it was propped up against a cabinet while I worked on the server.  As you may know from the history of that machine, it had spent a lot of time being open.

I knocked the bezel down and it landed flat on the floor on the “good” side.  The key snapped off in the lock.  Lucky I had another key and a pair of pliers to remove the stub.  Keeping honest people honest is all this does.

We use Spiceworks to manage our IT assets and I’m a regular in the community there.  There was a thread going a little while ago, “What is Your Most Recent IT Screwup?  Be Honest”, and I contributed two posts I will reproduce here:

DCM_SATV

David Moisan

Network/Systems Administrator at Salem Access Television

Salem, MA

My worst mistakes are in the home. You know, you’re more likely to trip on a cat toy and fall down the stairs and die at home, that sort of thing.

I have an SBS box at home with backups. Good thing too. I’ve done these:

1) Changed security inheritance in c:\Windows in such a way that the next step was to get out the restore disk and the backup drive–or break down crying. I no longer change security on Windows directories, ever. (and I did have that backup!)

2) Torn apart my new server motherboard five times on a dead system indication, not realizing the BIOS has a blank screen for the first 30 seconds and there was nothing wrong with the board.

3) In the days of hard sectored floppies, inadvertently inserted a CP/M distribution disk as the "destination" in an old NorthStar machine and realized it only after hearing clickclickclick. (Fortunately I had a copy of that disk itself so I copied it back over…)

3A) Fired up a very, very loud daisywheel printer on that NorthStar for testing–during a staff meeting in that same room! It would have been a good way to drown out boss ranting had I thought of it…

4) Flashed the firmware on a managed switch through the serial port and didn’t take into account that 9600 baud is not fast to transfer an 8 meg image. Switched over to backup switch very very quickly and quietly. Configured TFTP the next day.

5) Rebooted video server in the middle of a program during Patch Tuesday. Embarrassing. We run from the satellite in the mornings for exactly this reason.

6) Ignored a call on the cell during afternoon sleepytime thinking it was a wrong number. Nope. My boss was on the line and he never calls me just because, and it WAS something I needed to have acted on 15 minutes ago.

Jun 10, 2010 at 10:24 AM

DCM_SATV

Oh does that bite! I was on the beta for SBS 2008 and every time a refresh came out I would save a PST as I put all my personal mail on Exchange. I migrated the mailstore twice in the course of the beta.

The RTM comes out and I have a copy.

I say, bleep it, I won’t bother with a PST, I migrated the store twice how hard could it be?

Hahaha

Could not migrate the mailstore from my beta machine for love nor money.

No PST.

I go and buy some program to read OST’s. It doesn’t get all my mail but just enough to triage it and get the important stuff. I try yet another OST reader to get one piece of mail that has an important voucher code in it, the one email I most need.

Ain’t going to do that again…

There you have it!  Those are my mistakes and I don’t expect to stop making any soon, even though one never wakes up in the morning with the intent to screw up.  But, there have been good moments, next.


Ten Years of IT At SATV

The end of May marked a milestone quietly passed:  I have been managing the IT at Salem Access Television since the spring of 2000.

I had been at SATV as a member since 1994, and served on its board as one of the three member representatives from 1999 through 2002.   I had learned, on my own and with the help of SATV, video production, editing, and my personal favorite skill, broadcast graphics design.

But I was never involved with our Macs (we were a Mac shop when we opened), nor our Amigas (which we used for TV production and graphics).  I had to be involved with the Amigas out of necessity since our graphics machine was very very flaky.  I had learned AmigaOS, Broadcast Titler, SCALA (which was, and is still, a superb digital presentation/signage program) and the Video Toaster, the legendary video production machine, but this out of necessity.

The executive director at the time was not so much interested in my help, or even aware of it, being very nontechnical and more of a social networker than anything.  I never bothered to be involved.  The only reason I paid attention to computers at all was to deal with the cranky Amigas;  despite the mystique around these nice machines, they were still computers all the same and I applied my computer skills no differently to them than to IBM’s or DEC’s.  There wasn’t anyone else in the building who could help when the Amiga CG went down before a show so…

In 1998 SATV moved to a Windows network and Small Business Server 4.0 (Back Office Server 4.0, Small Business Edition, was the name for it.)  I didn’t know it at the time, but the consultant’s expertise or lack of would make me more involved with our network whether I knew it or not.

When I served on the board in 1999, we were dealing with the financial repercussions of the new network—our director wasn’t really happy with the process nor the consultant and there were some disputed bills.

The staff wasn’t happy either:  Machines would crash and the network would often stall.

We hired a new executive director after our first director, the nontechnical one, felt burned out and wanted to do something else.   After this happened, I spent a little more time on staff machines, essentially diagnosing slowdowns from crapware that people would install and finding out just how flaky the client machines were.  Many of them had fans that failed quickly (a staffer told me once, “the machine sounds like it’s in pain!”) and networking problems that were all too often solved by, removing the NIC and the drivers and reinstalling.

These clients were all Windows 95 machines.  The consultant didn’t want Windows 98, “it was unproven”.  (Given our good experience with Win98 machines later on, I was livid remembering this  but that is off the point…)

We had used a third party NAT program to access our (then dialup) internet connection and it was flaky.  The network continued being intermittent—it was a 10 M shared Ethernet hub.  (Long-time Ethernet experts may know where this is going but hold those thoughts for the moment.)

Sometime in May 2000, I don’t know the exact date, our executive director, Jen Casco, had had enough.  I was babysitting a graphics project, or fixing the CG machine, can’t recall which, when she asked me that afternoon if I would look at the server.  “If you’re comfortable doing that.  It’s driving us crazy.”

Me:  “OK, give me the password…”

It took me two years to get all the problems out and get us a good system.

The network problem, I much later found out, was due to the same junky NICs in the clients;  if you are that Ethernet expert, you probably guessed our problem was due to a duplex mismatch, which can and does affect everything on a hub.

The server had never been backed up.  It was built with a Travan backup drive, an abomination in itself.  It had never seen a service pack.  It had never seen any routine checkups.  So much of my time over that first year was spent learning about our particular network and configuration.

Over that year, these matters were slowly resolved.  NT 4.0 Service Pack 6a was applied and when the upgrade to SBS 4.5 was published, I applied that.  I borrowed tools and parts from home to fix things.  I lent out, for a time, a 10/100 Ethernet switch that I had used to connect computers in my home network.  I even managed to fix the noisy fans by finding surplus CPU cooling units that would fit the (then-slotted) Pentium II processors in the workstations.

Eventually, I had to replace that Frankenstein server our consultant had built, in one final insult.  Over the year that I managed it, it would bluescreen continually.  That’s probably where my interest in Windows internals came from.  It was not a magic fix that I could do, but rather a bad motherboard.

A cheap workstation motherboard. 

Not a server board.

I would be lying if I said I never used a client board as a server, because I have.  But for a paying customer?  This consultant had the brass to suggest he had built and sold us a “Mercedes Network”!

Facepalm.

It was nothing like that.  There’s really no such thing as “Mercedes networks”, I told the board, only larger and smaller, simpler and more complex depending on the needs of the facility.

The network that guy left us wasn’t even that good—just five runs into the Engineering room, where my network core sits now.

(Years later, I would rip out that old network literally with my bare hands, and watch coils of cheap cable fall out of the ceiling, not even secured…)

The end of my year’s work (really year and a half) came when Jen listened to my request to refurbish the server with a new motherboard, and instead authorized the purchase of a new Dell server, the first of three Dell machines we’ve owned.

That is when I knew I had been a success.  (I also knew I loved this sort of work more than I do board work—no disrespect intended!)

Next:  Ups and downs, mistakes and revelations at SATV.