[GLLUG] Unreadable partition

Sat Jul 14 23:26:27 EDT 2012

> If the sdc9 drive is part of a RAID array and you stick it in another PC,
> what kind of "reading" were you able to do?  Did you move all 4 drives over
> and recreate the array?  If not, that might be worth trying, though perhaps
> a lot of trouble.  If another machine can read it, and the cables, etc. in
> the original computer are OK, it kinda sounds like it must be a software
> issue.  Did you run /proc/mdstat?

I have moved only one drive at a time, and only attempted accessing
the partition in raw mode (dd and cat).  Other machines can read it,
just not that one.  The other parititions on that drive containing
raid sets work fine.

I have looked at mdstat, and it acts as if the sdc9 partition isn't
present.  Attempting to add it back results in errors, and I see media
errors in the dmesg.  Those errors are only on the logical partition,
however.  It's definitely a software problem, though.

> I assume you've done an fsck.

I have not.  I can't read from the partition at all in that machine,
so it didn't seem relevant.  I would need to get that back into the
raid set before I would expect fsck to possibly be of use.

> I have had drives previously used in RAID arrays have partition table issues
> when getting redeployed as a non-RAID drive.  Just recently I had to dd
> if=/dev/zero a drive because the Linux install program saw no drive at all.
> So, maybe there is a problem with the sdc9 drive at that level.

The installers, etc. see the drive.  The partition table is identical
between all 4 drives.

Update: Additionally, mdadm sees the drive, but it's showing up as a
spare.  I can't seem to find how to get it in an active state.  The
only other thing I can see that looks different is mdadm shows a
status on that drive as "AAA." and the other three show "AAAA".  This
drive was actually up and running, went down (presumably from this)
when I was away, and I haven't been able to get that raid array up
since.

> You could try removing the drive from the array (or all three arrays),
> running badblocks on it, and then reinserting it into the array(s) and
> rebuilding.....

I'm currently trying to backup as much stuff as possible from the
other arrays on the drive, then see if I can find enough space for
gzipped images of the raw partitions in question.  Unfortunately, the
backup is taking forever; I really could have used that multi-core
capable version of ssh about now.

Once the backup is done, I was thinking of trying to access the raid
set via a newer live cd.  If that doesn't work, I'll try moving the
entire array to another machine (hopefully the secondary sata chip in
my new box will have gained support since kernel 3.1).

> You could try removing the drive from the array (or all three arrays),
> running badblocks on it, and then reinserting it into the array(s) and
> rebuilding.....

I tried the equivalent at the drive level with smartctl.
Additionally, other machines can read from every sector of the drive
without error.  The raid subsystem still sees the drive, so I think I
would have to force it into a failed state first.  To do that, I think
I would have to zero the partition.  I'll do that if I have to, but
only after a full backup of the partitions belonging to that raid set.
 I might have to give my optical drives a few days workout before I
can get that point.

I'll keep everyone updated.  Thanks for the advice.

On Sat, Jul 14, 2012 at 10:34 PM, Stanley Mortel <mortel at cyber-nos.com> wrote:
> Rick,
>
> If the sdc9 drive is part of a RAID array and you stick it in another PC,
> what kind of "reading" were you able to do?  Did you move all 4 drives over
> and recreate the array?  If not, that might be worth trying, though perhaps
> a lot of trouble.  If another machine can read it, and the cables, etc. in
> the original computer are OK, it kinda sounds like it must be a software
> issue.  Did you run /proc/mdstat?
>
> I assume you've done an fsck.
>
> I have had drives previously used in RAID arrays have partition table issues
> when getting redeployed as a non-RAID drive.  Just recently I had to dd
> if=/dev/zero a drive because the Linux install program saw no drive at all.
> So, maybe there is a problem with the sdc9 drive at that level.
>
> You could try removing the drive from the array (or all three arrays),
> running badblocks on it, and then reinserting it into the array(s) and
> rebuilding.....
>
> Good luck on this one.
>
> Stan
>
>
>
> On 07/14/2012 01:10 AM, Richard Houser wrote:
>>
>> A few days ago, I had a couple year old software raid set fail in a
>> most peculiar way....  I'm hoping someone here will have a suggestion
>> other than back everything up and wipe the drives.
>>
>>
>> I'm running three different raid sets on this array of 4 disks, with
>> each in it's own partition.  The md8 set uses /dev/sd[abcd]8, md9 uses
>> /dev/sd[abcd]9, etc.  Well, only the md9 set failed, and I tracked it
>> back to the kernel seeing IO errors on /dev/sdc9.  The strange part is
>> it sees errors only on that partition, and the problems are exactly at
>> the logical boundary (I can dd all of the previous partition, but
>> immediately get read errors on /dev/sdc9).  Smartctl tests all
>> indicate the drive is fine, and if connect the drive to another
>> machine, that partition is readable.  I've also tried moving that
>> drive to a different cable and port on the motherboard, but no change.
>>
>> It's running an old 2.6.33.7-desktop-2mnb kernel from Mandriva, and I
>> was trying to get at the data to start moving stuff over to another
>> box.
>>
>> The other raid sets are running a resync now as a result of my
>> diagnostics, so I should be able to poke around a bit more in a couple
>> days.  Any suggestions you may have would be appreciated.
>>
>> Thanks!
>>
>> -Rick
>> _______________________________________________
>> linux-user mailing list
>> linux-user at egr.msu.edu
>> http://mailman.egr.msu.edu/mailman/listinfo/linux-user
>
>
>