A Day in the Life of DeepThought


Newsgroups: armory.general
From: spcecdt@armory.com (John DuBois)
Subject: A day in the life of deeptht
Organization: The Armory
Date: Sat, 3 Sep 1994 04:38:24 PDT

     TODAY deeptht ran out of space on the u filesystem again.  I decided that
the only thing to do about it was to get rid of the online backups, and give
the 113M of space that was allocated to them to u.  The only "supported" way
of changing the size of a filesystem is to back it up onto tape, recreate the
filesystem with the larger size, and restore from tape.  But that takes a
while, so I generally take the shortcut of directly patching the field in the
superblock that sets the filesystem size.  This is a somewhat dangerous
operation (I've hosed filesystems in the past by doing it incorrectly) so I did
a backup of u first.  Because I always have to poke around a bit to figure out
where the size field is, this time I actually wrote a little utility to change
the size.
     So, after the backups were done, I modified the divvy table to get rid of
the backup fs and extend the u fs.  But, this brought up a problem.  Because
my SCSI drives map their internal geometry to a geometry in which there is one
cylinder per megabyte, the increase in the size of u from 1000M to 1113M caused
it to extend beyond the first 1024 cylinders of the drive.  This puts the
filesystem in danger of being unbootable, because all of the boot stages up
until the kernel is executing use the system BIOS to read data from the drive,
and the PC BIOS call that does the read has only a 10 bit field for the
cylinder, meaning that data beyond the 1024th cylinder cannot be read.
     Normally, this wouldn't be an issue for anything but the root filesystem,
but it happens that what is now the u filesystem on deeptht used to be the root
filesystem.  At some point, I needed to move the root filesystem off of the
boot drive to make room to expand u.  When doing that, the usual thing to do
would be to change the SCSI IDs of the first & second drives so that the system
could boot from the root filesystem.  But that would require renaming lots of
divisions, because changing the SCSI ID of a drive changes the minor numbers of
all of the filesystems on the drive.  So, instead I had just copied the minimum
files necessary to have on a boot filesystem (/boot and /etc/default/boot) to
u, and changed the boot parameters to tell the boot program to boot from a
nonstandard device (hd(105) instead of the usual hd(40)).  That worked fine
until the 1024 cylinder limit was exceeded by u.  As long as /boot and
/etc/default/boot were never touched, neither would be liable to be allocated a
block beyond the 1024th cylinder, but I do make modifications to
/etc/default/boot occasionaly, and I might even load a new /boot some day, so I
decided at this point that the safe thing to do would be to finally get around
to changing the drives' SCSI IDs.
     To avoid the hassle of renaming the divisions, it occurred to me that I
could change the kernel's SCSI-ID-to-drive-number map at the same time.  So I
swapped the order of the lines in the mapping file and relinked the kernel.
Having the boot drive be anything other than 0th drive is also an unusual thing
to do, but it *should* work, so I figured I'd give it a try.
     Changing the drive IDs turned out to be a bit difficult since I couldn't
find the manuals for either of the drives.  I took a guess at which jumpers on
the drives were the ID jumpers and what their orientation was (i.e., which
jumpers were for each of the 1, 2, and 4 bits of the SCSI ID), changed them,
disconnected the 2nd drive so that I could try them one at a time, and rebooted
to see if the host adapter would find the first drive.  It did.  Then I
re-connected the 2nd drive.  The host adapter didn't find it.  So I tried the
opposite orientation.  It worked.
      With the drive IDs set, I tried booting with parameters for booting off
of the second drive (because, remember, I had told the kernel that the drive at
ID 0 should be considered the 2nd drive).  The kernel booted but then hung.  I
tried various things, getting strange results, before realizing that although I
changed the kernel's ID mapping, the boot program has its own, fixed mapping,
which is supposed to be the same as the kernel's.  Of course, it isn't if you
change the kernel's mapping.  Some of the strange results were because I had an
old kernel on u (put there to allow me to boot if the root filesystem was
damaged) which I was accidently booting in some cases.
     So, I tried booting with parameters that told /boot to read the kernel
from hd(40) (the first division of the first drive, which is the normal boot
device), but to "root" off of hd(104) (the first division of the second drive),
since this parameter is interpreted by the kernel after it has booted.  This
gave me even more peculiar results.  
     I finally remembered that when I had moved the root filesystem over to the
2nd drive, I had put it on the second division instead of the first, because it
happened to be free and it didn't really matter when it wasn't on the boot
drive anyway.  In fact, I wouldn't have been able to boot at all except that I
happened to have the boot files on the filesystem on the 2nd division of that
drive, too.  Fortunately, it's pretty easy to change the minor numbers.  I
edited the divvy table for the drive to swap the start and end block numbers
for the divisions on hd(40) and hd(41), which were the local and root
filesystems respectively.  Then I changed their names to be root and local, the
net effect being to swap the minor numbers of the two filesystems.  
     I booted again.  It didn't work, because the default boot parameters I had
in /etc/default/boot were still telling it to root off of the second division.
Before I realized this, I tried a few other things, including booting using an
old kernel that did not have its ID mapping swapped.  That kernel paniced and
tried to write an image to hd(41), the standard "dump" device, but appeared to
fail.  Panicing when there is no root filesystem available is normal, and the
failure to write to hd(41) seemed normal because my swap/dump device is not
there; it's on what the kernel thinks is the second drive.
     I eventually successfully booted, but decided that all this was really too
much hassle, and I should just change the ID mapping back to the normal one and
rename the divisions.  So I did.  It only took a few minutes.  I made sure the
boot parameters were correct for the final configuration, and let the system
autoboot, which will bring it into multiuser mode.
    The system came up, tried to mount u, and found it dirty (probably due to
my having to power the system off at some point while it was mounted... things
had begun to blur together).  Cleaning a large filesystem takes a long time so
I went off and let it do its thing.  You eventually get to recognizing the
sound of a normal fsck, and I came back to the console when I heard things
going wrong.  There were errors streaming by... serious errors.  It seemed that
somewhere along the line I had munged u somehow.  That didn't worry me too much
because I had just backed it up.  The fsck had reached the pitiful point where
it looked like it was probably doing more harm than good, and it occurred to me
that it might even be cleaning the wrong device with all the mucking about I'd
done, so I powered the system off and brought it back up in single-user mode.
     My suspicion was reinforced when I ran fsck on u and it came up clean.  I
checked all the minor numbers, made sure I was booting the right kernel, etc.
Nothing seemed to be wrong so I went multiuser again, keeping an eye on the
console this time.  Then I saw what the real problem was: it was local, not u,
that was corrupt.  During the last boot it had gone past cleaning u and on to
local without my noticing.  I power-cycled the system again and went single
user.  I ran fsck on local manually, and it looked really grim.  Thousands of
bad inodes... maybe even all of them; it was hard to tell because it gave up
after a while.  "root inode unallocated" (a very bad sign).  I was beginning to
get a bit worried because I didn't make a backup of local before starting all
this (since I hadn't expected to even be touching the drive it was on).  The
last backup of local was three weeks ago, meaning three weeks of work would be
lost.  Normally, the daily online backups save any files touched each day, but
I had just gotten rid of the backup fs!  I usually do a complete system backup
(all filesystems) before junking the data on the backup fs, but I didn't today
because u was full and I wanted to do something about it quickly.
     After trying fsck a couple of times, it was obvious that there was no
hope.  It may have been the kernel panic, writing over part of local's inode
table.  I edited the divvy table again, moving local from division 1 to 2 to
make sure it couldn't happen again, and began resigning myself to restoring
from the three week old backup.  But, it occurred to me that although I had
mounted, cleaned, etc. the u fs, I hadn't added anything to it, so although I
had given the backup fs's space to it it might still be untouched.  I even
remembered what the old start and end block numbers were.  So I restored the
old divvy parameters.  That put u in an invalid state, because it now thought
it had 113M more than its divvy entry gave it, but I didn't intend to mount u
until it was back to the larger size.
     After rebooting (to make sure the kernel wasn't terminally confused by all
this), I tried mounting the old backup fs, and succeeded.  Happy Happy!  I
quickly recreated local and issued a command that copied all of the files from
local that had been backed up to the backup fs (all of the files that had been
touched in the last three weeks), keeping the most recent copies of each.
      Then, I unmounted the backup fs (in case there was corruption in it that
I hadn't run into), and started a restore from my tape backups of local, with
instructions to not overwrite anything that I had already written to local. 
There were two 150M tapes.  The first one restored without problems.  In the
middle of the second, though, I got a media error and the restore aborted. 
That was a bit odd since I had used these tapes to copy the local data to my
machine at work.  I tried again, this time starting with the second tape and
with an option that tells cpio (the backup/restore program) to skip over any
corruption.  It got to the same spot on the tape, gave a media error, and hung.
It refused to go past that spot.
      I had been getting all-too-frequent errors with the tape drive lately, so
I decided that it might be time to clean it, something I had only done once
before in the 3 years I'd been using it.  I dug up the tape drive instructions
to find how to clean it, got out some head cleaner & a long swab, and did the
job.  After letting it dry, I put the tape back in and tried to read it.  It
hung at the same spot.  I started thinking about going to work and making a
backup of the local data from the machine I had installed it on.  But, I had
removed all of the stuff that was only relevant on deeptht, and changed lots of
other stuff.  Still, I tried making a file list from the tape, so I could get
a list of those files that were not successfully read and restore only them
from work.  But (as I pretty much expected) it wouldn't let me do that either.
It still hung at the bad spot.
      Finally, I thought I'd try skipping over the bad spot on the tape by
using the "no-rewind" tape device, which tells the drive to not rewind the tape
when it's done reading.  Since the tape was stopped at the bad spot, I removed
it and aborted the cpio process.  Then I reset the tape drive, put the tape
back in, and tried reading from the no-rewind device using cpio.  But it
rewound the tape anyway.  It didn't surprise me since the no-rewind device
often doesn't seem to behave the way it should.  I tried a few other
combinations, putting other tapes in at times to let them be rewound instead of
the one I was trying to read, but it didn't work.  My last attempt was to read
from the no-rewind device, let it hang, abort the cpio process, and then just
start another one, also reading from the no-rewind device.  It worked.  cpio
skipped the bad file (part of the webster dictionary database) and read the
rest of the tape.  As soon as it started, I knew I was home free at last. 
     As far as I know, all I lost was the webster file (which I can get from
another backup tape when I get around to it), and the filesystem size changing
program.  Since I was doing other stuff all day, I didn't do any other work on
local in the last day that I can recall.  When the restore was done, I changed
the divvy table a final time to give the backup space back to u.
     And that's why deeptht was down for 7 hours this evening.
-- 
John DuBois    spcecdt@armory.com    KC6QKZ    http://www.armory.com/~spcecdt/