[ t h e f r a g g l e . c o m ]

technology, photography and anything else that springs to mind.

Replacing a failed disk in an SVM root mirror

So SVM isn’t something I seem to have to fiddle with much as it just works ™ once it’s all setup, but what if you have a disk fail, and this is an older system using SVM to mirror root/boot devices? Well it’s pretty simple and can all be done online…

First you need to identify which disk has actually failed. In our case it’s c0t0d0.

You’ll need to make sure that you remove the stale state database replicas first using metadb:

metadb

The output above shows the slices that replicas have been created on, to remove the ones from our disk that has failed use the following:

metadb -d /dev/dsk/c0t0d0s7

Now a lot of the time the case might be that the machine you’re working on needs to stay up whilst you’re repairing it. Thankfully Solaris will allow us to remove the disk whilst the system is still running and replace it. My advice would be to simply un-configure the disk that has failed, because even if we make a mistake it might be possible to recover some data from the disk we’re removing, depending on how long you’ve left it in error for.

cfgadm -al

Will show us where the disk lives, identify the correct disk in the list (look at the disks attached from c0, you should be able to work out from there…)

cfgadm -c unconfigure c0::dsk/c0t0d0

Now that the disk is removed, we need to get the list of the failed submirrors:

metastat -c

Will give us the list of mirrors that are failed because of c0t0d0, next remove them from the mirrors they are a member of:

metadetach -f d0 d1
metadetach -f d10 d11
metadetach -f d35 d31
metadetach -f d40 d41
metadetach -f d50 d51
metadetach -f d60 d61

Once they’ve been cleared we need to clear the concats:

metaclear d1
metaclear d11
metaclear d31
metaclear d41
metaclear d51
metaclear d61

Now that’s cleaned up we can insert the new disk and format it:

#insert the new hard disk...
cfgadm -c configure c0::dsk/c0t0d0
#get the vtoc configuration from the good disk
prtvtoc /dev/rdsk/c1t0d0s2 > /tmp/format.out
#and write it to the new disk
fmthard -s /tmp/format.out /dev/rdsk/c0t0d0s2

Now that we’ve done that it’d be a good idea to re-create the database state replicas on the new disk:

metadb -a -c 2 /dev/dsk/c0t0d0s7

Once this is done we can move on to recreating the concats:

metainit d1 1 1 c0t0d0s0
metainit d11 1 1 c0t0d0s1
metainit d31 1 1 c0t0d0s3
metainit d41 1 1 c0t0d0s4
metainit d51 1 1 c0t0d0s5
metainit d61 1 1 c0t0d0s6

And then attach these to the mirrors they’re meant to be in:

metattach d0 d1
metattach d10 d11
metattach d35 d31
metattach d40 d41
metattach d50 d51
metattach d60 d61

As you attach the mirrors you’ll be able to start monitoring the progress of the mirror re-syncs, you can do this by using metastat:

metastat -c

Which will give you a short output of all the configured disks, and show the “Resync %” status of each mirror that is being re-synced.

A fairly easy task once you know all the steps, you just need to be careful that once you’ve removed the bad concats that if you need to reboot at all, you boot from the correct disk. Usually I setup devaliases in the obp called bootdiska (c0t0d0) and bootdiskb (c1t0d0) and the devalias ‘disk’ is usually just an alias to the same place as bootdiska. So, given c0t0d0 is the disk that has failed on us, make sure you boot from the secondary disk:

ok boot disk

If you’re not sure what was created, just issue:

ok devalias

to show the list of aliases configured, if none useful seem to be there, use:

ok probe-scsi-all

to give you a list of the disks that are available on the system, from there you should be able to figure out where the disk you want to boot from is.

Tags:

November 27, 2010 at 5:13 pm Comments (0)