Server Wrangling

Failed Hard Drive

I’m not a big hardware guy. I use computers as a tool, sometimes for work, sometimes for hobbies, but I generally only get under the hood when something is wrong, and only then when whatever went wrong is seriously getting in my way.  Take the recent gallery outage for example.

I run an old Linux box with an unimpressive CPU, unremarkable storage and memory, in a seriously-oversized 4U rackmount chassis that I’ve had parked at my employer’s data center for the past half-dozen years.  Being not-particularly-inclined to fiddle with my operating system just for the heck of it, I let things get pretty seriously outdated, to the point where upgrading software like my WordPress install here was troublesome and required odd work-arounds. So I poked at it a bit, patching here and there, and at some point I managed to deny myself SSH terminal access. I could still get in via SCP (which is odd because SCP uses SSH to connect, but hey), so for the most part this didn’t matter to me.

I let it sit for a few months, taking a poke at it now and again when the mood struck me, but eventually I managed to totally screw it up to the point where I couldn’t connect via my https control panel (Webmin), SSH, nor SCP. I don’t let anybody else use any of the above, so nobody else was affected. People who wanted to use my gallery could, the handful of web robots and actual people that visit this blog could still do so as normal, and my Riddle of Steel wiki was functioning, so no harm, no foul.

Then I bit the bullet and opted for the nuclear option. In the back of my mind I was already concerned about disk usage and my decrepit old IDE drive, so I pulled the old 4u from its rack to pop a fresh disk in, install a modern operating system, and migrate the old data over. No problem, right? I’d have it done in an evening.  Of course not.

My first challenge was my server’s neighbors.  It turned out that the five servers above mine in the datacenter cabinet were not properly mounted. They weren’t even improperly mounted. They were just resting atop my box. Four little bolts in the face of my server were the only thing between them and plummeting to the floor. Oh, and one slightly-short CAT-5 cable that just about gave up the ghost as everything sagged.  A metal rack shelf and a couple of 4x4s later (I kid you not, the’re still there) my lazy neighbor’s equipment was reasonably secure again and I had my server out and ready to service.

This is when I realize for the first time in half a dozen years that my server has no CD drive. It had been a network install and never needed one before. But I didn’t have anything set up ready to serve a network install, so I’d have to cannibalize one from one of my retired desktop machines at home. Glorious. While I was was at it, I figured I would use the drive from the same machine. This will come back to haunt me later.

I finally get the CD drive and new HDD installed and promptly discover that the tray won’t open to take my freshly-burned install CD. Great.  Cue another pause in getting the gallery back up.  I cannibalize another drive from another retired machine, and lo this one opens and is bootable.  I get my install disk up and running, and 23% of the way through the process it declares I’ve got a corrupted file. Outstanding. I burn another install disk and get 11% in before getting the same error.  Two disks later it finally completes, I’ve got a simply LAMP server to migrate my data to. Or do I?

I power down, put in the old server’s HDD, and turn it back on. ATAPI errors out the butt. I fiddle with the jumper configurations, but they persist. Finally I discover that one of my ATA cables wasn’t set up correctly, and perhaps never had been. I get the sorted out and find out that my cannibalized HDD won’t boot, for reasons unknown.  Off to the store, get a new HDD, install the operating system from scratch, get the old HDD back into the picture, figure out how to mount it, and get the files copies over.

Then comes wrestling with the network configuration.  Linux likes to play coy about things like network interfaces and whether it’s got link on which port.  I get it up and running by DHCP on my office LAN to make sure I’ve got Apache and MySQL up configured properly, import the gallery database from backup, square away a couple of php.ini and virtualhost issues, and it’s time to re-rack the sucker.

There is currently a metal shelf and two 4×4’s where my box used to be, but there’s a convenient 4U opening elsewhere in the cabinet, so no problem there. I slide it in, secure it properly out of deference to the folks whose gear is below mine, and go to plug back into the switch.  The CAT5 cable I had left when I pulled the box is now gone. I grab a spare and plug it into the port I used to be on and see my link indicator light up. I fire up a console and manually enter my static IP information. Which is, naturally, kept in a different place than my DNS resolver information. Thanks, Linux.

I discover that I can’t ping out. Not by name, not by IP. I check for link. I check to make sure I’m connected to the port I used to be on. No good.  I double-check the syntax on /etc/network/interfaces and restart the network services. No good.  I check the records of which port I’m supposed to be on. The record says I’m supposed to be four ports off what what I had noted when I had removed my gear. I try that port. I get link. I reset my network. I still cannot ping. I try each of the other open ports on the switch in turn, getting link each time and finding myself unable to ping each time.  So I check for other loose cables. I find three in all, one of which I think was my old cable. I try it. No dice.  I try the others, and lo! what used to be port 10 is now port 2 and I’m back up and running.

I configured my DNS and hostname, buttoned down the cabinet, and popped the gallery link back up atop my blog.

The moral of the story here is that getting it to work is half the fun. If you can’t enjoy a process like the one described above, hire a professional.