Tuesday, September 8, 2009

Darkest Before the Dawn

Who'll Stop the Rain...

Friday September 4 I set out again to rebuild my system. I tried to be a little more careful this time and not miss any steps. Previously I had forgot to install the back-plate on the I/O panel of the motherboard. Still installation had it's bumps. I got one of the water blocks on the CPU and then noticed that one of the fan cables was stuck under the motherboard and I could not get it out. So I had to pull the water block off, clean off all the thermal compound, pull the motherboard and put it back in holding the fan connector out of the way.

Finally I had enough things rigged up that I could power on the system again - I pushed the power button and everything powered up fine. Pretty much the first thing I did was to booth the Intel Deployment CD and upgrade the BIOS (and friends) to the latest version. However, when this was finished my system powered down instead of restarting - this was very odd (but a sign of things to come).

After that a lot of thing were behaving oddly, but mostly the system just kept powering down. For the longest time I could not get into the BIOS setup, the system just kept trying to boot something, but there was nothing to boot. Eventually I go into the system BIOS again and started trying to configure the RAID but I was having the same problems last time, I could not configure a second Virtual Disk.

Eventually I got on the phone with Intel Technical Support. He as asking my about what disks I had connected and I said five 2 TB disks. He says, "oh, that's your problem; your controller can only support 1 TB disks." After that he was not very willing to help any more so I said I would see what I could figure out on my own.

After a while later I tried booting the Intel Deployment CD, but this would no longer boot. I was confused as I did not have this problem with the previous motherboard. Also, my system kept powering down, which was annoying. Eventually I was back on the phone with Intel Technical Support (a different person this time) and told him about the system shutting down. I asked if this was because some component was getting too hot and he said likely. He asked me what Intel server case I was using and I said it wasn't Intel. He started giving me the same song and dance that it was not compatible with the motherboard and there was little he could do. I asked him how I could identify which part was overheating and he told me how to run a utility to view the System Even Log.

Now this part gets interesting because my system is not a normal BIOS, it's an EFI system so you can boot to the EFI shell, which is a little like an old style DOS or Unix system. At any rate I had to download the selview utility to a USB drive, then run the EFI shell on my system, and run the selview utility - but it would not run. At that point the guy from Intel said he could not do any more.

After a bit of a break and a chance to think I started reading about the BMC (Base Management Controller). This is basically a separate 32-bit computer separate from the main CPUs that controls the motherboard and presumably where the EFI shell lives. I read up on the EFI shell commands and how it work. It's actually pretty powerful and all computers should have EFI instead of the old BIOS. Eventually I found that I could run selview and dump the logs to a file, so I tried that and it worked. The best I can tell is that selview could not work on my display for some reason because it was a full-screen application.

After looking through the System Event Log I could see all kinds of warnings about fans not working. Back when I upgraded the BIOS it asked me which fans were connected in my chassis and I said all were (without thinking). Anyway, this was one of the things the system was complaining about. Also, there is a status LED at the back of the computer. At first I didn't realize this was a status signal because the Intel decal at the back was labeled incorrectly, but once I realized this was the status LED things became clearer. The light was always blinking amber which means there is a serious problem, and then when my system would power down the LED would be solid amber, meaning a critical problem.

The other thing the System Event Log showed was that the IOH (I/O Hub) was overheating, but it only said 10.0 degrees C - which is not hot, so I was confused. Up until now I had been running my system with the sides off the case because I had not finishes all the wiring. On a hunch I went an got a bit room fan and pointed it at the side of my computer on full. This time the status LED on my system stayed solid green, meaning everything was operating correctly.

Eventually I gave up on getting the RAID to work and just tried to install Windows Vista on a single disk normally. To my amazement it worked and I was able to get Vista running. Unfortunately after Vista was running there was no network connection. I finally fixed this by using the Intel Deployment CD to install the network drivers. It is interesting to note that when installing Windows 7 it already has the network drivers. I ran Task Manager and noted with some satisfaction that there was no pausing problem like I had seen before. Next I installed Second Life and it ran beautifully - no pausing - very smooth. Unfortunately my microphone was not working. I eventually fixed that by installing the audio drivers from the Intel Deployment CD and then fiddling around with the Realtek audio utilities.

After a satisfying couple of nights of running Second Life with my friends I went back to fiddling with the RAID setup again. Eventually I learned that if I enabled the SW-RAID in the BIOS that the Intel Deployment CD would not boot. It would only boot if this setting was not enabled. But I needed that setting enabled to configure the RAID. I had not seen this problem with my previous motherboard. Finally I went back to the BIOS and instead of configuring two virtual disks (which seems to be buggy) I configured a single large 8 TB RAID 8 array. I was able to install Windows 7 and get it running. I solved the pausing problem in Windows 7 by running two network connections - a trick I had learned in Driver Heaven. Again, everything seems to perform well, except my RAID performance is not what I had hoped - but I have no direct experience with RAID 5. Also, my RAID got formatted with MBR (Master Boot Record) layout and I wanted it formatted with GUID Partition Table.

At any rate I am impressed that I was able to create a functioning RAID 5 system with 2 TB disks after Intel told me it was impossible.

Thursday, September 3, 2009

Labor Pains - Part 2

Bad Luck is sometimes like rain - when it rains it pours!

I've been taking Fridays off from work this summer, and by Friday August 21 I had thought I had everything figured out.
  • I realized the problem I had getting Windows 7 to install on my RAID was that I had not set a system disk - and the Windows installer is too stupid to let you sent one from the UI. But this was something I could set in the S5520SC BIOS.
  • I had heard from DriverHeaven that other people had the same CPU spiking problem with Windows 7 and that I should use Visa in the meantime.
  • I confirmed that one of my disk drives was defective.
I went to NCIX and returned the disk drive - they confirmed it was defective too and ordered a replacement for me. I also bought a copy of Windows Vista Ultimate. When I got home I finally hooked up the rest of the front panel connectors on the computer case, and even managed to get the sides back on so everything looked nice and tidy.

I really took my time and wanted everything to go well for once. I got everything ready and then went into the BIOS to set up the RAID. For some reason the BIOS setup was not working properly this time, it would not let me finished configuring my second virtual disk - it kept freezing and forcing me to reboot the computer - grumble grumble grumble.

Next I thought I would try using the Intel RAID Web Console 2 from the utility CD. I booted the CD, and then upgraded to the latest version of the utility from the network, but I could not get the Web Console 2 to work - nothing would happen. Next I rebooted the CD again, but this time I did not do the network upgrade. Finally I was able to get into the RAID Web Console 2 user interface. This application was pretty crummy too, confusing to use, and buggy in some places. Eventually I managed to define the two RAID 5 virtual disks I wanted and started to initialize them. After 30 minutes I was wondering what was taking so long and then a progress bar finally popped up to show that it was only 10% done. I wish I had selected the fast initialization instead of the full initialization. I was getting tired of waiting so I went off to do some reading for a while.

I came back 15 or 20 minutes later to see how things were and found the screen blank, and the graphics card fan was on full (something that never happened before). I tried power the system off and on, but nothing happened. In fact, when powering the system on the Power On Self Test (POST) LEDs would not even light up at all. That was a very bad sign.

Eventually I found a phone number for Intel technical support and someone talked me through some tests. Mostly it was removing stuff from the motherboard and powering the system back on. Nothing helped and nothing changed so the support person conceded that the board was dead and sent me instructions for returning the board for a replacement.

By this time I felt pretty crushed - the morning had started off so well, and by mid afternoon it looked like I was finally going to get everything working - when BAM - the worst happens. I suppose this it what someone feels like after a terrible child birth and they discover that their child is not only retarded, but blind and deaf too. Of course this was just a computer and could never be the same as a child, but I just felt really depressed and angry. Why me?

The next day I set out to return the motherboard. First of all Intel required that there be some sort of commercial invoice for customs purposes so it took me an hour or so to fabricate something that looked official. Next was the process of removing all the connectors from the motherboard. Taking the water blocks off of the CPUs was interesting - but it was good to see that the thermal compound I had used had spread out nice and evenly across the CPU heat spreader. Of course I had to clean everything off and put the CPUs away safely, then prepare the motherboard for shipping. I took me almost 45 minutes at the UPS store to get all the information right because I was shipping across the boarder. I selected the least expensive shipping method, and that took over a week and cost me $85.00.

Anyway, I've had two weeks waiting for a replacement and today I'm supposed to get my replacement motherboard...