Linux RAID6 recovery

Yesterday I retired my old home server, a Fujitsu-Siemens Primergy Econel200S2 with four 1TB Western Digital hard drives configured as Linux mdadm Software RAID6.

The server had been running for 5 years and was showing its age: too loud and excessive power consumption. The replacement server was already in place about a year ago – I just didn’t get around to moving the last virtual server to the new server before now…

At the time the server was installed I decided to not use hardware RAID but instead use Linux Software RAID6. The reasons for using RAID are more reliability and ease of maintenance. It is nice to just replace a hard drive when it fails, instead of having to do the re-install/recover-from-backup process (hard drives tend to fail at the most inconvenient of times). Using RAID6 means any 2 drives may fail without downtime. As hard drives get bigger the probability of another drive failing while the replacement drive is being added grows.

My reasons for using Linux software RAID instead of hardware RAID are:

  1. A hardware RAID card with battery backup adds a considerable cost to the system.
  2. If the card fails and a compatible replacement cannot be found the data are gone. This means a second card should be bought and put on a shelf “just in case”.
  3. Using Linux RAID it is possible to take the hard drives from the server, connect them to e.g. a laptop, mount the drives and recover the data.

When I installed the server I did several tests hot-plugging hard drives, unplugging and re-adding them to the RAID6 array to make sure everything worked as it should – as well as practicing recovery/maintenance before any real data was put on the system.

During the 5 year life-span of the server only one hard drive failed. The server kept running and I ordered a replacement drive from Western Digital. BTW I really like the Western Digital way of handling replacement: you register the failed drive at their website (RMA) and use the credit card to “order a new drive”. The replacement drive arrives by mail a few days later. The failed drive can then be shipped to Western Digital using the packaging from the new drive. If Western Digital receives the failed drive before 30 days your credit card will not be charged.

Using RAID6 with only 4 drives means it should be possible to unplug 2 drives from the server then go and mount them somewhere else – leaving the server running and at the same time having a complete copy of the data elsewhere…

In theory… until now :-) Today I mounted 2 drives from the original 4 drive set on my laptop. It worked perfectly – I could mount and copy data from the drives. Here are the commands I used:

mdadm --examine /dev/sdc2
mdadm --examine /dev/sdd2
mdadm --assemble /dev/md0 /dev/sdc2 /dev/sdd2
cat /proc/mdstat
mdadm --run /dev/md0
cat /proc/mdstat
lvmdiskscan
lvm pvdisplay /dev/md0
lvm vgscan
lvm lvscan
lvm vgchange -a y vg0
mount /dev/vg0/bigdisk /mnt/bigdisk/
mount /dev/vg0/home /mnt/home/
mount /dev/vg0/vmware1 /mnt/vmware1/

And when finished:

umount /mnt/bigdisk
umount /mnt/home
umount /mnt/vmware1
lvm vgchange -an vg0
mdadm --stop /dev/md0

Of course the new server is running Linux RAID6 as well :-)

Image backup of libvirt/kvm guests

Here is a bash script to backup all libvirt (kvm) guests on my server. It will iterate over all guests, shutting them down if necessary before copying the image file. After completion the guest is restarted if it was running:

# get a list of all instances
INSTANCES=`virsh -q list --all | awk '{print $2}'`
echo Backup of: ${INSTANCES}

for each in ${INSTANCES} ; do
    echo --- ${each}
    IS_RUNNING=0
# shutdown if running
    if [[ `virsh domstate ${each}` == 'running' ]]; then
	IS_RUNNING=1
	echo shutting down ${each}
	virsh shutdown ${each}
	while [[ `virsh domstate ${each}` == 'running' ]]; do
	    sleep 1
	    echo waiting for shutdown to complete ${each}
	done
    fi
    sleep 1
# make backup
    IMAGEFILE=`virsh dumpxml ${each} | awk '/source file=/{print $2}' | cut -d "'" -f 2`
    DESTFILE=/mnt/bigdisk/Backup_virtual_machines/`basename ${IMAGEFILE}`
    echo ${each} has image file ${IMAGEFILE} which will be copied to ${DESTFILE}
#	rsync -v --progress -a --sparse ${IMAGEFILE} ${DESTFILE}
    nice qemu-img convert -p -O qcow2 -o preallocation=metadata ${IMAGEFILE} ${DESTFILE}
# start again if it was running
    if (( ${IS_RUNNING} )); then
	echo starting ${each}
	virsh start ${each}
	sleep 20
    fi
    sleep 1
    echo done
done

Min nye wordpress blog

Min første hjemmeside kom online i 1997. Den har stort set ikke ændret sig siden (bortset fra at time-prisen er opdateret) og består udelukkende af statiske html-sider.

Men takket være Søren Holm fik jeg i går installeret wordpress i en debian virtual guest på min nye debian-kvm-server:

HP Proliant MicroServer N40LDet er en N40L mikroserver fra HP som jeg har udvidet med en SSD (intel 320series 120GB), 8GB RAM og et ekstra netkort. Det kører forrygende.

Så nu er jeg altså kommet med på noderne og har fået en personlig blog. Det er vel ca. 13 år efter alle andre. Det bliver spændende at se om der kommer flere indlæg end dette :-)