
I'm always tinkering with my PCs. One day, a stumbled across an inconvenient bug. Seems that in RedHat 6.2 (or so), if you used fdisk to create your logical partitions (i.e., the ones on the extended partition) in just the wrong order, it would lose track of some of them. Sadly, this was back in my reckless days before I became zealous about backups. (Just a few days before I became zealous, in fact.)
What does a guy do when he's lost some partitions with all his precious data? Everything he can think of. For me, this mostly consisted of web research, initially. Barnes and Noble doesn't carry a lot of books on this sort of stuff. I certainly didn't want to do anything hasty, lest I destroy my last chance at getting my data back. I searched for anything I could find about how partitions were laid out and how partitioning programs worked. It really came down to two questions: (1) How do I get the partition table back in order? and (2) Will I overwrite the filesystem doing it?
Background: I'll explain my terminology, since some people refer to things differently. (FreeBSD users, keep your slices to yourselves for the moment.) On an Intel/x86/standard-PC sort of box, a hard drive can have 4 "primary" partitions. Of these primary partitions, one can be an "extended" partition. The extended partition is dedicated to being divided up further into "logical" partitions.
After piecing together all the tidbits of information I could find, I came up with a workable plan. The key information I found (which I didn't fully trust), was:
The 3 tools I had in my arsenal were dd, fdisk, and a hex editor. (I know, it sounds like the start of a joke.) Dd just lets you copy raw bits around anywhere you want. Fdisk modifies the partition tables. And a hex editor lets you manually edit any bit of data.
The first thing I did after booting my rescue disk was back up my master boot record, which contains the primary partition table. It went something like this:
dd if=/dev/hda of=somefile bs=512 count=1
Run "man dd" to see what those mean. Basically, I copied the first 512 bytes of my hard drive to a file (on a floppy). I also backed up the first 512 bytes of my remaining logical partitions (using hda4 or hda5, etc. instead of hda), and somehow, the 512 bytes at the beginning of the empty space. (I figured there was still a valid partition table there, which just wasn't being linked to.) Then, I ran fdisk and changed my partition table. I think I started off with 4 logical partitions (pre-cataclysm), and now had 2. I don't recall, but I suspect I just deleted one of the remaining logical partitions, or maybe messed with a spare drive. (I was missing at least 2 adjacent partition, and didn't know where the divisions were.) Then I fired up the hex editor to see what had changed. I just used the same old dd command as before. Then, I copied the original partition tables back with dd (just reversing the if= and of= arguments). Lo and behold, fdisk reported that I had, in effect, not changed anything. (It worked! I hadn't broken anything new yet!)
After staring at hex diffs for a while, I had a pretty good feel for what was going on. The first cluster of bits I saw changing was the information about this partition. The second cluster of changing bits was the link to the next partition (if any). The problem was, I couldn't just create the missing partitions, because I didn't know how big they were supposed to be.
I figured the partition info was still on the lost partitions, just waiting to be linked to. So, I created a single logical partition in the empty space formerly housed by two partitions (thus not overwriting any filesystems in the middle). This created the link from the known partition to the unknown partition, without writing anything into unknown territory. (After all, I knew it started at the beginning of the empty space, just not where it ended.) But, it also overwrote the size information at the beginning of the first missing partition. Now, I copied the original 512 bytes back the beginning of the empty space. Remember, that's a valid element in a linked list, which just didn't have anything pointing to it. So now, I have it back to its old self, with something now pointing at it! All the other links should still be intact!
In theory, it was now fixed. I had inserted the missing link without changing the rest of the data. I fired up fdisk, and discovered I had created an infinite loop with my linked list. Fortunately, either fdisk or cfdisk was smart enough to offer to blow away the duplicates. After it did, I had my system back!!!
There was much thinking involved, but the final solution was pretty simple. To summarize from above: The problem is that the link in the table of the partition before the new empty space has been destroyed.