Linux RAID6 recovery

Yesterday I retired my old home server, a Fujitsu-Siemens Primergy Econel200S2 with four 1TB Western Digital hard drives configured as Linux mdadm Software RAID6.

The server had been running for 5 years and was showing its age: too loud and excessive power consumption. The replacement server was already in place about a year ago – I just didn’t get around to moving the last virtual server to the new server before now…

At the time the server was installed I decided to not use hardware RAID but instead use Linux Software RAID6. The reasons for using RAID are more reliability and ease of maintenance. It is nice to just replace a hard drive when it fails, instead of having to do the re-install/recover-from-backup process (hard drives tend to fail at the most inconvenient of times). Using RAID6 means any 2 drives may fail without downtime. As hard drives get bigger the probability of another drive failing while the replacement drive is being added grows.

My reasons for using Linux software RAID instead of hardware RAID are:

  1. A hardware RAID card with battery backup adds a considerable cost to the system.
  2. If the card fails and a compatible replacement cannot be found the data are gone. This means a second card should be bought and put on a shelf “just in case”.
  3. Using Linux RAID it is possible to take the hard drives from the server, connect them to e.g. a laptop, mount the drives and recover the data.

When I installed the server I did several tests hot-plugging hard drives, unplugging and re-adding them to the RAID6 array to make sure everything worked as it should – as well as practicing recovery/maintenance before any real data was put on the system.

During the 5 year life-span of the server only one hard drive failed. The server kept running and I ordered a replacement drive from Western Digital. BTW I really like the Western Digital way of handling replacement: you register the failed drive at their website (RMA) and use the credit card to “order a new drive”. The replacement drive arrives by mail a few days later. The failed drive can then be shipped to Western Digital using the packaging from the new drive. If Western Digital receives the failed drive before 30 days your credit card will not be charged.

Using RAID6 with only 4 drives means it should be possible to unplug 2 drives from the server then go and mount them somewhere else – leaving the server running and at the same time having a complete copy of the data elsewhere…

In theory… until now :-) Today I mounted 2 drives from the original 4 drive set on my laptop. It worked perfectly – I could mount and copy data from the drives. Here are the commands I used:

mdadm --examine /dev/sdc2
mdadm --examine /dev/sdd2
mdadm --assemble /dev/md0 /dev/sdc2 /dev/sdd2
cat /proc/mdstat
mdadm --run /dev/md0
cat /proc/mdstat
lvmdiskscan
lvm pvdisplay /dev/md0
lvm vgscan
lvm lvscan
lvm vgchange -a y vg0
mount /dev/vg0/bigdisk /mnt/bigdisk/
mount /dev/vg0/home /mnt/home/
mount /dev/vg0/vmware1 /mnt/vmware1/

And when finished:

umount /mnt/bigdisk
umount /mnt/home
umount /mnt/vmware1
lvm vgchange -an vg0
mdadm --stop /dev/md0

Of course the new server is running Linux RAID6 as well :-)

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>