How to: Replacing a failed drive in a Linux Software RAID1 configuration (mdraid)

Scenario: 
A drive has failed in your linux RAID1 configuration and you need to replace it.

Solution: 
Use mdadm to fail the drive partition(s) and remove it from the RAID array.
Physically replace the drive in the system.
Create the same partition table on the new drive that existed on the old drive.
Add the drive partition(s) back into the RAID array.

In this example I have two drives named /dev/sdi and /dev/sdj. Each drive has 3 partitions and each partition is configured into a RAID1 array denoted by md#. We will assume that /dev/sdi has failed and needs to be replaced.
Note that in Linux Software RAID you can create RAID arrays by mirroring partitions and not entire disks.

4 Steps total

Step 1: Identify the faulty drive and array

Identify which RAID arrays have failed:

To identify if a RAID array has failed, look at the string containing [UU]. Each "U" represents a healthy partition in the RAID array. If you see [UU] the array is healthy. If you see a missing "U" like [U_] then the RAID array is degraded or faulty.

# cat /proc/mdstat 
Personalities : [raid1] 
md0 : active raid1 sdj1[0] 
102336 blocks super 1.0 [2/1] [U_] 

md2 : active raid1 sdj3[0] 
233147200 blocks super 1.1 [2/1] [U_] 
bitmap: 2/2 pages [8KB], 65536KB chunk

md1 : active (auto-read-only) raid1 sdj2[0] 
1048000 blocks super 1.1 [2/1] [U_]

From the above output we can see that RAID arrays md0, md1, and md2 are missing a "U" and are degraded or faulty.

Step 2: Remove the failed partition(s) and drive

Before we can physically remove the hard drive from the system we must first "fail" the disk partition(s) from all RAID arrays to which the failed drive belongs. In our example, /dev/sdi is a member of all three RAID arrays, but even if only one RAID array had failed we must still fail the drive for all three arrays before we remove it.

To fail the partitions we issue the following command:

# mdadm --manage /dev/md0 --fail /dev/sdi1 
# mdadm --manage /dev/md1 --fail /dev/sdi2 
# mdadm --manage /dev/md2 --fail /dev/sdi3

To remove the partitions from the RAID array: 
# mdadm --manage /dev/md0 --remove /dev/sdi1 
# mdadm --manage /dev/md1 --remove /dev/sdi2 
# mdadm --manage /dev/md2 --remove /dev/sdi3

Now you can power off the system and physically replace the defective drive: 
# shutdown -h now

Step 3: Adding the new disk to the RAID arrays

Now that the new hard drive is installed we can add it to the RAID array. In order to use the new drive we must create the exact same partition table structure that was on the old drive. We can use the existing drive and mirror its partition table structure to the new drive. There is an easy command to do this:

# sfdisk -d /dev/sdj | sfdisk /dev/sdi

Note that sometimes when removing drives and replacing them the drive device name may change. For our example here we will make sure the drive we replaced is listed as /dev/sdi by issuing the command "fdisk -l /dev/sdi" and verifying that no partitions exist.

Now that the partitions are configured on the newly installed hard drive, we can add the partitions to the RAID arrays.

# mdadm --manage /dev/md0 --add /dev/sdi1 
mdadm: added /dev/sdi1

Repeat this command for each partition changing /dev/md# and /dev/sdi#:

# mdadm --manage /dev/md1 --add /dev/sdi2 
mdadm: added /dev/sdi2

# mdadm --manage /dev/md2 --add /dev/sdi3 
mdadm: added /dev/sdi3

Now we can check that the partitions are being synchronized by issuing:

# cat /proc/mdstat 
Personalities : [raid1] 
md0 : active raid1 sdi1[2] sdj1[0] 
102336 blocks super 1.0 [2/2] [UU] 

md2 : active raid1 sdi3[2] sdj3[0] 
233147200 blocks super 1.1 [2/1] [U_] 
[>....................] recovery = 0.4% (968576/233147200) finish=15.9min speed=242144K/sec 
bitmap: 2/2 pages [8KB], 65536KB chunk

md1 : active raid1 sdj2[2] sdi2[0] 
1048000 blocks super 1.1 [2/2] [UU]

Once all drives have synchronized your RAID array will be back to normal.

Step 4: Install GRUB to the new hard drive MBR

We need to install GRUB (short for GNU GRand Unified Bootloader) on the new drive's MBR (short for Master Boot Record). This is so that if the other drive fails the new drive will be able to boot into the operating system.

Enter the GRUB command line:

# grub

Locate grub setup files:

grub> find /grub/stage1 
find /grub/stage1 
(hd8,0) 
(hd9,0)

Install grub on the MBR:

grub> device (hd8) /dev/sdi 
grub> root (hd8,0) 
grub> setup (hd8) 
grub> quit

Making sure the GRUB bootloader is installed on both drives ensures your system will boot regardless of which drive fails.

  • 143 Users Found This Useful
Was this answer helpful?

Related Articles

S.M.A.R.T (smartctl) commands and test

Installation of Smartcl in Ubuntu $ sudo apt-get install smartmontools Installation of Smartcl...

Check disk details

lsblk -o name,label,size,fstype,mountpoint,modefdisk -l

How to Increase the size of a Linux LVM by expanding the Disk

This post will cover how to increase the disk space for a VMware virtual machine running Linux...

Setup Flexible Disk Storage with Logical Volume Management (LVM) in Linux – PART 1

Logical Volume Management (LVM) makes it easier to manage disk space. If a file system needs more...

How to Extend/Reduce LVM’s (Logical Volume Management) in Linux – Part II

Previously we have seen how to create a flexible disk storage using LVM. Here, we are going to...

Powered by WHMCompleteSolution