Recovering a RAID5 mdadm array with two failed devices

Update
Before reading this article you should know that it is now quite old and there is a better method – ‘mdadm –assemble –force’ (it may have been there all along). This will try to assemble the array by marking previously failed drives as good. From the man page:

If mdadm cannot find enough working devices to start the array, but can find some devices that are recorded as having failed, then it will mark those devices as working so that the array can be started.

I would however strongly suggest that you first disconnect the drive that failed first. If you need to discover which device failed first, or assemble doesn’t work and you need to manually recreate the array, then read on.

I found myself in an interesting situation with my parents home server today (Ubuntu 10.04). Hardware wise it’s not the best setup – two of the drives are in an external enclose connected with eSATA cables. I did encourage Dad to buy a proper enclosure, but was unsuccessful. This is a demonstration of why eSATA is a very bad idea for RAID devices.

What happened was that one of the cables had been bumped, disconnecting one of the drives. Thus the array was running in a degraded state for over a month – not good. Anyway I noticed this when logging in one day to fix something else. The device wasn’t visible so I told Dad to check the cable, but unfortunately when he went to secure the cable, he must have somehow disconnected the another one. This caused a second drive to fail so the array immediately stopped.

Despite having no hardware failure, the situation is similar to someone replacing the wrong drive in a raid array. Recovering it was an interesting experience, so here I’ve documented the process.

YOU CAN PERMANENTLY DAMAGE YOUR DATA BY FOLLOWING THIS GUIDE, DO NOT PERFORM THIS OPERATION ON THE ORIGINAL DISKS UNLESS THE DATA IS BACKED UP ELSEWHERE.

Gathering information

The information you’ll need should be contained in the superblocks of the raid devices. First you need to find out which drive failed first, with the mdadm –examine command. My example was a raid5 array of 4 devices, sdb1, sdc1, sdd1 and sde1:

root@server:~# mdadm --examine /dev/sdb1
mdadm: metadata format 01.02 unknown, ignored.
/dev/sdb1:
          Magic : a92b4efc
        Version : 00.90.00
           UUID : 87fa9a4d:d26c14f1:01f9e43d:ac30fbff (local to host server)
  Creation Time : Mon Oct 11 00:13:02 2010
     Raid Level : raid5
  Used Dev Size : 625128960 (596.17 GiB 640.13 GB)
     Array Size : 1875386880 (1788.51 GiB 1920.40 GB)
   Raid Devices : 4
  Total Devices : 3
Preferred Minor : 0

    Update Time : Mon Mar 21 00:03:26 2011
          State : clean
 Active Devices : 2
Working Devices : 2
 Failed Devices : 2
  Spare Devices : 0
       Checksum : 713f331d - correct
         Events : 3910

         Layout : left-symmetric
     Chunk Size : 512K

      Number   Major   Minor   RaidDevice State
this     0       8       17        0      active sync   /dev/sdb1

   0     0       8       17        0      active sync   /dev/sdb1
   1     1       0        0        1      faulty removed
   2     2       8       49        2      active sync   /dev/sdd1
   3     3       0        0        3      faulty removed

Look at the last part. Here we can see that this drive is in sync with /dev/sdd1 but out of sync with the other two (sdc1 and sde1) – the data indicates that sdc1 and sde1 have failed. These drives are the two in the external enclosure… but I digress.

Performing an examine on sdc1 shows “active sync” for all the other drives, clearly this disk has no idea what’s going on. Also note the update time of February 5 (it is now March!!):

root@server:~# mdadm --examine /dev/sdc1
[...]
    Update Time : Sat Feb  5 11:22:29 2011
          State : clean
 Active Devices : 4
Working Devices : 4
 Failed Devices : 0
  Spare Devices : 0
       Checksum : 7105b39b - correct
         Events : 218

         Layout : left-symmetric
     Chunk Size : 512K

      Number   Major   Minor   RaidDevice State
this     1       8       33        1      active sync   /dev/sdc1

   0     0       8       17        0      active sync   /dev/sdb1
   1     1       8       33        1      active sync   /dev/sdc1
   2     2       8       49        2      active sync   /dev/sdd1
   3     3       8       65        3      active sync   /dev/sde1

This indicates that it was the first drive to be disconnected, as the drives were all in sync the last time this drive was part of the array. That leaves sde1:

root@server:~# mdadm --examine /dev/sde1
[...]
    Update Time : Sun Mar 20 23:53:07 2011
          State : clean
 Active Devices : 3
Working Devices : 3
 Failed Devices : 1
  Spare Devices : 0
       Checksum : 713f30d1 - correct
         Events : 3904

         Layout : left-symmetric
     Chunk Size : 512K

      Number   Major   Minor   RaidDevice State
this     3       8       65        3      active sync   /dev/sde1

   0     0       8       17        0      active sync   /dev/sdb1
   1     1       0        0        1      faulty removed
   2     2       8       49        2      active sync   /dev/sdd1
   3     3       8       65        3      active sync   /dev/sde1

When this drive was last part of the array, sdc1 was faulty but the other two were fine. This indicates that it was the second drive to be disconnected.

Scary stuff

Despite being marked as faulty, we have to assume that the data on /dev/sde1 is crash-consistent with sdb1 and sdd1 as the array immediately stopped upon failure. The original array won’t start because it only has two active devices. But we can create a new array with 3/4 of the drives as members and one missing.

This sounds scary and it should. If you have critical data that you’re trying to recover from this situation I would honestly be buying a whole new set of drives, cloning the data across to them and working from those. Having said that, the likelihood of permanently erasing the data is low if you’re careful and don’t trigger a rebuild with an incorrectly configured array (like I almost did).

Important information to note is the configuration of the array, in particular device order, layout and chunk size. If you’re using defaults (in hindsight probably a good idea to lessen the chance of something going wrong in situations ilke this), you don’t need to specify them. However you’ll note that in my example the chunk size is 512K, which differs from the default of 64K.

Update 2012/01/04

When reading the following notes you should note that the default chunk size in more recent versions of mdadm is 512K. In addition, ensure that you are using the same layout version as the original array by specifying with -e 0.90 or -e 1.2. If you are using the same distribution of mdadm as the array was created with, and didn’t manually specify a different version, you should be safe. However when dealing with raid arrays it always pays to double check. The metadata version information should be in the output of mdadm –examine or in mdadm.conf. Thanks to Neil Walfield for the info!

Creating a new array with old data

Here is the command I used to recreate the array:

root@server:~# mdadm --verbose --create /dev/md1 --chunk=512 --level=5 --raid-devices=4 /dev/sdb1 /dev/sdd1 /dev/sde1 missing
mdadm: metadata format 01.02 unknown, ignored.
mdadm: layout defaults to left-symmetric
mdadm: /dev/sdb1 appears to be part of a raid array:
    level=raid5 devices=4 ctime=Mon Oct 11 00:13:02 2010
mdadm: /dev/sdd1 appears to be part of a raid array:
    level=raid5 devices=4 ctime=Mon Oct 11 00:13:02 2010
mdadm: /dev/sde1 appears to be part of a raid array:
    level=raid5 devices=4 ctime=Mon Oct 11 00:13:02 2010
mdadm: size set to 625128960K
Continue creating array? y
mdadm: array /dev/md1 started.

Oops.

Can you see what I did there…. I created the array with the missing drive at the [3], when in actual fact the missing drive is [1] (the device numbering starts at 0). Thus when I tried to mount:

root@server:/# mount -r /dev/md1p1 /mnt -t ext4
mount: wrong fs type, bad option, bad superblock on /dev/md1p1,
       missing codepage or helper program, or other error
       In some cases useful info is found in syslog - try
       dmesg | tail  or so

!!

Upon realising this I looked at mdstat then stopped the array:

root@server:/# cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] 
md1 : active raid5 sde1[2] sdd1[1] sdb1[0]
      1875386880 blocks level 5, 512k chunk, algorithm 2 [4/3] [UUU_]

unused devices: 
root@server:/# mdadm -D /dev/md1
mdadm: metadata format 01.02 unknown, ignored.
/dev/md1:
        Version : 00.90
  Creation Time : Mon Mar 21 02:00:54 2011
     Raid Level : raid5
     Array Size : 1875386880 (1788.51 GiB 1920.40 GB)
  Used Dev Size : 625128960 (596.17 GiB 640.13 GB)
   Raid Devices : 4
  Total Devices : 3
Preferred Minor : 1
    Persistence : Superblock is persistent

    Update Time : Mon Mar 21 02:00:54 2011
          State : clean, degraded
 Active Devices : 3
Working Devices : 3
 Failed Devices : 0
  Spare Devices : 0

         Layout : left-symmetric
     Chunk Size : 512K

           UUID : e469103f:2ddf45e9:01f9e43d:ac30fbff (local to host server)
         Events : 0.1

    Number   Major   Minor   RaidDevice State
       0       8       17        0      active sync   /dev/sdb1
       1       8       49        1      active sync   /dev/sdd1
       2       8       65        2      active sync   /dev/sde1
       3       0        0        3      removed
root@server:/# mdadm --stop /dev/md1

I then recreated the array with the missing drive in the correct position:

root@server:/#  mdadm --verbose --create /dev/md1 --chunk=512 --level=5 --raid-devices=4 /dev/sdb1 missing /dev/sdd1 /dev/sde1
mdadm: metadata format 01.02 unknown, ignored.
mdadm: layout defaults to left-symmetric
mdadm: /dev/sdb1 appears to be part of a raid array:
    level=raid5 devices=4 ctime=Mon Mar 21 02:00:54 2011
mdadm: /dev/sdd1 appears to be part of a raid array:
    level=raid5 devices=4 ctime=Mon Mar 21 02:00:54 2011
mdadm: /dev/sde1 appears to be part of a raid array:
    level=raid5 devices=4 ctime=Mon Mar 21 02:00:54 2011
mdadm: size set to 625128960K
Continue creating array? y
mdadm: array /dev/md1 started.

And examined the situation:

root@server:/# cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] 
md1 : active raid5 sde1[3] sdd1[2] sdb1[0]
      1875386880 blocks level 5, 512k chunk, algorithm 2 [4/3] [U_UU]

unused devices: 
root@server:/# fdisk /dev/md1
GNU Fdisk 1.2.4
Copyright (C) 1998 - 2006 Free Software Foundation, Inc.
This program is free software, covered by the GNU General Public License.

This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
GNU General Public License for more details.

Using /dev/md1
Command (m for help): p                                                   

Disk /dev/md1: 1920 GB, 1920389022720 bytes
255 heads, 63 sectors/track, 233474 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System 
/dev/md1p1               1      233475  1875387906   83  Linux 
Warning: Partition 1 does not end on cylinder boundary.                   
Command (m for help): q                                                   
root@server:/# mount -r /dev/md1p1 /mnt
root@server:/# ls /mnt
Alex  Garth  Hamish  Jenny  lost+found  Public  Simon
root@server:/# umount /mnt

Phew!

So despite creating a bad array I was still able to stop it and create a new array with the correct configuration. I don’t believe there is any corruption as no writes occurred, and the array didn’t rebuild.

Adding the first-disconnected drive back in

The array is of course still in a degraded state at this point and no more secure than RAID0. We still need to add the disk that was disconnected first back in to the array. Compared to the rest of the saga this is straightforward:

root@server:/# mdadm -a /dev/md1 /dev/sdc1
mdadm: metadata format 01.02 unknown, ignored.
mdadm: added /dev/sdc1
root@server:/# cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] 
md1 : active raid5 sdc1[4] sde1[3] sdd1[2] sdb1[0]
      1875386880 blocks level 5, 512k chunk, algorithm 2 [4/3] [U_UU]
      [>....................]  recovery =  0.0% (442368/625128960) finish=164.7min speed=63196K/sec

unused devices:

Here we can see a happily rebuilding RAID5 array. Note that you will need to update /etc/mdadm/mdadm.conf file with the new uuid, the line can be simply generated with:

root@server:/# mdadm --detail --scan
mdadm: metadata format 01.02 unknown, ignored.
ARRAY /dev/md1 level=raid5 num-devices=4 metadata=00.90 spares=1 UUID=7271bab9:23a4b554:01f9e43d:ac30fbff

You can keep an eye on the rebuild with ‘watch cat /proc/mdstat’.

30 thoughts on “Recovering a RAID5 mdadm array with two failed devices

  1. Steven F

    I’d broaden a bit and say eSATA is a risky choice for any permanent use – RAID or not.

    As someone who used much of your prior ubuntu server post as reference, I decided to go with RAID6 instead. Even though I’m only running 4 drives at the moment and RAID6 causes me to sacrifice 2 of 4, the redundancy of RAID5 is not sufficient for me. Given most RAIDs are built with drives of the same model, similar age, often the same Lot #, and experience nearly identical usage, multiple simultaneous failures are not that farfetched. I believe the odds of two drives dying at the exact same moment are low, but a full rebuild will stress-test the remaining drives at a time that I can least afford to have a second drive go.

    I do like eSATA for performing back-ups. I’d be interested in a solution that can back up the entire RAID. Is it reasonable to run a tape drive at home?

    Reply
    1. Alex

      Totally agree re eSATA.

      For me however the security of the daily backup offsets the risk of multiple drives failing, so while one failing might indicate that an additional failure from the same batch is more likely, at most you lose a day’s worth of data.

      IMHO, the only reasons to go with tape are portability and durability of the media. You can get more storage on a hard drive these days for much lower cost, and the speed and flexibility of the backups is incomparable (can’t rsync to a tape…). If you need to keep your backups for a long time and your data set isn’t too large, tapes can make sense, but for someone who just wants to ensure their data is safe a couple of 2TB (or 3TB) hard drives on rotation with the aforementioned RAID array is hard to beat.

      Reply
  2. Sam

    Exactly the reassurance and mdadm –create command I needed to get my RAID5 array back together. Thank you a ton for posting this.

    Reply
  3. Pingback: Recovering Linux software RAID, RAID5 Array - MySQL Performance Blog

  4. DonaldVR

    This guide literally saved my life. (Ok, not literally.) I had a SATA controller die with two drives attached die on me. When I replaced the controller the RAID would not start because both drives were marked as faulty. Using this guide, I thought of recreating the array with both drivers since the –examine details were the same for both, but somehow I feared there would be inconsistency between the two, so chickened out and created the array with just one of them. My data (family photos) was preserved. I added the other “faulty” drive later without issue. THANKS!!!!

    Reply
  5. Neal Walfield

    This was a very helpful post. Thanks!

    I have two commends:

    mdadm’s default chunk size recently changed from 64k to 512k. This makes your warning even more relevant.

    Second, the default layout has changed! There is a new version 1.20. Many people are likely using 0.90. They need to specify -e 0.90 or risk losing data!

    Reply
    1. Alex Post author

      Thanks for the information!

      The warning should be a general one to ensure the config of the new array matches the old… any one change could result in scrambled data.

      I’ll update the post now.

      Reply
  6. Neal Walfield

    The following happened to me: I have a 4 disk RAID-5. A disk failed, which I promptly replaced. I added the new disk to the array, but it was not integrated because the recovery exposed a bad sector in a second disk. Double disk failure! Ouch. To recover, I had to recover one of the failed disks. To do this, I used gddrescue, a dd-like tool with error recovery capabilities. After copying one of the failed disks to the new disk, I was able to recover the array using this guide.

    Reply
  7. TooMeeK

    This is very helpful, thank You for sharing.
    ps. I’ve lost whole 1TB volume few years ago due:
    - ext4 still unstable
    - raid10 volume corruption
    - kernel panics
    - 3 x power failure while rebuild
    fortunatelly, it was just test server..
    No drive failed, all were fine. Just filesystem corruption.
    Then I’ve learned how to recover data from mdadm RAID arrays using Testdisk.

    Reply
    1. TooMeeK

      Mdadm has mailer daemon built-in.
      You just have to set up Exim or Sendmail as smarthost (if You have some mail server already and can use it) or sending-only system.
      Mdadm sends mail on every event.. even testing!

      Reply
  8. Jorus

    This post gave me the courage to pull the trigger on my failed array. Thank you, 16TB family RAID5 array rebuilding now!!! Now if I can only remember to keep the bloody server chassis locked with a two year old around…

    Reply
  9. Josh A

    A 2nd follow up. You can use mdadm –assemble –force rather than –create to put the raid back together. With this method you don’t have to specify the missing drive. I also don’t think you need to specify the drives in order, though I did anyway. It will also mark failed drives as clean because of –force.

    Reply
      1. PKIX

        Alex, could you add om top of your post that “mdadm –assemble –force” should be tried first?
        I used re-creation suggested in this post and damaged my data, because sector offset is different because mdadm versions. This is very dangerous.

        Reply
        1. Alex Post author

          Of course this is dangerous, I probably could have stressed this more but it worked for me, and the difference between versions IS mentioned (although that was a later update thanks to feedback). The assemble force method is clearly better however, so I’ve added a note to the top.

          Reply
  10. Boudewijn Charite

    Hi I need some help.

    I got:

    root@server:~# mdadm –examine /dev/md0
    /dev/md1:
    Version : 00.90
    Creation Time : Sun Oct 12 18:16:02 2008
    Raid Level : raid5
    Used Dev Size : 722209984 (688.75 GiB 739.54 GB)
    Raid Devices : 4
    Total Devices : 2
    Preferred Minor : 1
    Persistance : Superblock is persistent

    Update Time : Fri Jan 17 12:11:13 2014
    State : active, FAILED, Not Started
    Active Devices : 2
    Working Devices : 2
    Failed Devices : 0
    Spare Devices : 0
    Events : 3910

    Layout : left-symmetric
    Chunk Size : 512K
    Events : 0.251432

    Number Major Minor RaidDevice State

    0 0 0 0 removed
    1 8 22 1 active sync /dev/sdb6
    2 8 38 2 active sync /dev/sdc6
    3 0 0 3 removed

    If I look at /dev/sda6 I got:

    root@server:~# mdadm –examine /dev/sda6
    /dev/sda6:
    Magic : a92b4efc
    Version : 00.90.00
    UUID :
    Creation Time : Sun Oct 12 18:16:02 2008
    Raid Level : raid5
    Used Dev Size : 722209984 (688.75 GiB 739.54 GB)
    Array Size : 2166629952 (2066.26 GiB 2218.63 GB)
    Raid Devices : 4
    Total Devices : 4
    Preferred Minor : 1

    Update Time : Sun Dec 30 11:35:30 2012
    State : active
    Active Devices : 4
    Working Devices : 4
    Failed Devices : 0
    Spare Devices : 0
    Checksum : f38f0a97 – correct
    Events : 127013

    Layout : left-symmetric
    Chunk Size : 64K

    Number Major Minor RaidDevice State
    this 0 8 6 0 active sync /dev/sda6

    0 0 8 6 0 active sync /dev/sda6
    1 1 8 22 1 active sync /dev/sdb6
    2 2 8 38 2 active sync /dev/sdc6
    3 3 8 54 3 active sync /dev/sdd6

    If I look at /dev/sdd6 I got:

    root@server:~# mdadm –examine /dev/sdd6
    /dev/sdd6:
    Magic : a92b4efc
    Version : 00.90.00
    UUID :
    Creation Time : Sun Oct 12 18:16:02 2008
    Raid Level : raid5
    Used Dev Size : 722209984 (688.75 GiB 739.54 GB)
    Array Size : 2166629952 (2066.26 GiB 2218.63 GB)
    Raid Devices : 4
    Total Devices : 3
    Preferred Minor : 1

    Update Time : Fri Jan 10 17:43:59 2014
    State : clean
    Active Devices : 3
    Working Devices : 3
    Failed Devices : 0
    Spare Devices : 0
    Checksum : f584d13d – correct
    Events : 251427

    Layout : left-symmetric
    Chunk Size : 64K

    Number Major Minor RaidDevice State
    this 3 8 54 3 active sync /dev/sda6

    0 0 8 6 0 removed
    1 1 8 22 1 active sync /dev/sdb6
    2 2 8 38 2 active sync /dev/sdc6
    3 3 8 54 3 active sync /dev/sdd6

    But what I do I wil not rebuild, the raid0 I have on the same disk is working because the computer was booting.

    Please help I’am lost and want my data back

    Reply
    1. Boudewijn Charite

      I did:
      mdadm –stop /dev/md1
      and:
      mdadm –verbose –create /dev/md1 –chunk=64 –level=5 –raid-devices=4 /dev/sd[a,b,c,d]6

      which gave me:

      root@server:/# cat /proc/mdstat
      Personalities : [raid0] [raid1] [raid10] [raid6] [raid5] [raid4]
      md1 : active (auto-read-only) raid5 sdd6[4](S) sdc6[2] sdb6[1] sda6[0]
      2166626688 blocks super 1.2 level 5, 64k chunk, algorithm 2 [4/3] [UUU_]

      md0 : active raid0 sdd5[3] sdc5[2] sdb5[1] sda5[0]
      39069696 blocks 64k chunks

      but fdisk /dev/md1 gives me “unknown partition table”

      Now what?

      Reply
      1. Boudewijn Charite

        I stopped the array again and did

        mdadm –verbose –create /dev/md1 –chunk=64 –metadate=0.90 –level=5 –raid-devices=4 /dev/sd[a,b,c,d]6

        but still gives “unknown partition table”

        please help!!

        Reply
      2. Alex Post author

        It’s been a while since I’ve worked with mdadm, so I’m not the best person to get advice from, but it looks like you’re trying to create a new array with all the devices of the old one present. The article above describes how to create a new array with only the drive that failed first missing, which should result in a readable array if your data hasn’t been corrupted somehow.

        Reply
        1. Boudewijn Charite

          Thanks for you replay Alex

          It seem that I lost a drive in december 2012 but did not know that and lost a second one last weekend. This one I noticed because the data was partly accessible. I did shut down the server and discontected alle the hdd and connect them again and all 4 were seen by the bios and the md0 worked again because it booted in safe mode. So I have good hope that most of my data is still fine if I make the right rebuild.

          Now I have to know what the right rebuild could / should be.

          Reply
          1. Alex Post author

            In the article above I used:
            # mdadm –verbose –create /dev/md1 –chunk=512 –level=5 –raid-devices=4 /dev/sdb1 missing /dev/sdd1 /dev/sde1

            You’ll need to adapt this for your own scenario, it looks like /dev/sda6 failed first so I’d suggest creating it with that one missing.

          2. Alex Post author

            One thing I just noticed; your /dev/md1 says chunk size is 512K but your drives say 64K. I can only guess as to how this happened, but make sure you use whatever the array was originally created as. Figuring out the default for your version of mdadm might help with this.

  11. Boudewijn Charite

    I tried the following

    mdadm –verbose –create /dev/md1 –chunk=64 –metadate=0.90 –level=5 –raid-devices=4 missing /dev/sdb6 /dev/sdc6 /dev/sdc6

    which seemed to work

    Then I repaired the file system with fsck.ext4 -cDfty -C 0 /dev/md1

    Checked if the data was recovered and i was, than I added the missing drive with

    mdadm -a /dev/md1 /dev/sda6

    which lead to a rebuild of the array and a booting and working server and all my data is back.

    Is it possible that mdadm sent a message when something happens to the array or do I have to move to a hardware aray. A RocketRAID 2720SGL cost around E.180,– so that is not so expansive as it used to be.

    Reply
    1. Alex Post author

      Glad you got your data back.

      Mdadm can absolutely send email alerts, but you have to configure the address, and have a working MTA.

      Reply

Leave a Reply