Posted by: kezhong | July 10, 2009

Replacing a failure disk in RAID 1

I had installed Fedora 11 system based on RAID-1 on my virtual machine. If you want to know how to do it, you can read my last article “Installing Linux on Software RAID 1”. Now the question is how to replace a failure disk if it has problems or for some reason.

In my test, I firstly simulated the second disk failed, and then replaced it with a new one. Then I simulated the first one failed and replaced it.

Replacing the second disk
I removed the second disk from my virtual machine before turning on the system. Start the system and check it out. It provided the message that only one disk working as below.
[joker@localhost ~]$ cat /proc/mdstat
Personalities: [raid1]
md0 :   active raid1 sda1[0]
            204736 blocks [2/1] [U_]
md1 :   active raid1 sda2[0]
            1048512 blocks [2/1] [U_]
md2 :   active raid1 sda3[0]
            7132416 blocks [2/1] [U_]
unused devices: <none> 

Partition the new disk
Turning off the system, and then I added a new disk which its sizes was a little bit bigger than the old one. Partition the new disk. After finished, it gave the following message when I issued “fdisk –l /dev/sdb”.
Disk /dev/sdb: 9126 MB, 9126805504 bytes
255 heads, 63 sectors/track, 1109 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk identifier: 0x29dbaad1 
   Device Boot              Start     End      Blocks              Id         System
/dev/sdb1                     1          26        208813+         83        Linux
/dev/sdb2                     27        158      1060290          83        Linux
/dev/sdb3                     159      1109    7638907+       83        Linux 

Add the partitions of the new disk to existing RAID
[root@localhost ~]# mdadm /dev/md0 –add /dev/sdb1         
mdadm: added /dev/sdb1
[root@localhost ~]# mdadm /dev/md1 –add /dev/sdb2
mdadm: added /dev/sdb2
[root@localhost ~]# mdadm /dev/md2 –add /dev/sdb3
mdadm: added /dev/sdb3

Check it out. I found it was doing synchronization.
[root@localhost ~]# cat /proc/mdstat
Personalities: [raid1]
md0 :   active raid1 sdb1[1] sda1[0]
            204736 blocks [2/2] [UU]
md1 :   active raid1 sdb2[1] sda2[0]
            1048512 blocks [2/2] [UU]
md2 :   active raid1 sdb3[2] sdb3[0]
            7132416 blocks [2/1] [U_]
            [>…….] recovery = 0.8% (61312/7132416) finish=5.7min speed=20437K/sec
unused devices: <none> 

Make the both disks bootable
Check out again. When the synchronization had finished, enable the both disks to bootable.
[joker@localhost ~]# grub
grub> root (hd1,0)
root (hd1,0)
 Filesystem type is ext2fs, partition type 0x83
grub> setup (hd1)
setup (hd1)
 Checking if “/boot/grub/stage1” exists… no
 Checking if “/grub/stage1” exists… yes
 Checking if “/grub/stage2” exists… yes
 Checking if “/grub/e2fs_stage1_5 “ exists… yes
 Running “embed /grub/e2fs_stage1_5 (hd1)”… 28 sectors are embedded.
succeeded
 Running “install /grub/stage1 (hd1) (hd1)1+28 p (hd1,0)/grub/stage2 /grub/grub.
conf”… succeeded
Done.
grub> root (hd0,0)
 … …
grub> setup (hd0)
 … … 

Replace the first disk
Shut down the system. Remove the first disk, and then turn on the system. Check it out.
[joker@localhost ~]$ cat /proc/mdstat
Personalities: [raid1]
md0 :   active raid1 sda1[1]
            204736 blocks [2/1] [_U]
md1 :   active raid1 sda2[1]
            1048512 blocks [2/1] [_U]
md2 :   active raid1 sda3[1]
            7132416 blocks [2/1] [_U]
unused devices: <none> 

Shut down the system, add a new disk, partition the new disk, mirror the new disk with the old one, and make the both disks bootable as above.

When I did this test, I found a curious problem that the new disk should be a little bit bigger than the old one. If not, it will present the message “mdadm: /dev/sdb3 not large enough to join array”. I have not found the solution. Another thing I met is that it went to installation directly when I started the system after I replaced the first disk. I swapped the SCSI sequence within the two disks, the problem was solved.


Leave a comment

Categories