Unverified Commit 6b06a0f0 authored by anarcat's avatar anarcat
Browse files

show how to readd a hardware RAID device

parent 95de6b31
Loading
Loading
Loading
Loading
+57 −1
Original line number Diff line number Diff line
@@ -46,7 +46,7 @@ upstream RPM by Alien) installs it in a strange location:

This will confirm you are using hardware raid:

    root@moly:/home/anarcat# lspci | grep -i raid
    root@moly:/home/anarcat# lspci | grep -i megaraid
    05:00.0 RAID bus controller: LSI Logic / Symbios Logic MegaRAID SAS 2108 [Liberator] (rev 05)

This will show the RAID levels of each enclosure, for example this is
@@ -55,6 +55,11 @@ RAID-10:
    root@moly:/home/anarcat# megacli -LdPdInfo -aALL | grep "RAID Level"
    RAID Level          : Primary-1, Secondary-0, RAID Level Qualifier-0

This is an example of a simple RAID-1 setup:

    root@chi-node-04:~# megacli -LdPdInfo -aALL | grep "RAID Level"
    RAID Level          : Primary-1, Secondary-0, RAID Level Qualifier-0

This lists a summary of all the disks, for example the first disk has
failed here:

@@ -92,6 +97,57 @@ This will make the drive blink (slot number 0 in enclosure 252):

    megacli -PdLocate -start -physdrv[252:0] -aALL

Take the disk offline:

    megacli -PDOffline -PhysDrv '[252:0]' -a0

Mark the disk as missing:

    megacli -PDMarkMissing -PhysDrv '[252:0]' -a0

Prepare the disk for removal:

    megacli -PDPrpRmv -PhysDrv '[252:0]' -a0

Reboot the machine, replace the disk, then inspect status again, you
may see "Unconfigured(good)" as a status:

    root@moly:~# megacli -PDList -aALL | grep -e '^Enclosure Device' -e '^Slot' -e '^Firmware' 
    Enclosure Device ID: 252
    Slot Number: 0
    Firmware state: Unconfigured(good), Spun Up
    [...]

Then you need to re-add the disk to the array:

    megacli -PdReplaceMissing -PhysDrv[252:0] -Array0 -row0 -a0
    megacli -PDRbld -Start -PhysDrv[252:0] -a0

Example output:

    root@moly:~# megacli -PdReplaceMissing -PhysDrv[252:0] -Array0 -row0 -a0
                                         
    Adapter: 0: Missing PD at Array 0, Row 0 is replaced.

    Exit Code: 0x00
    root@moly:~# megacli -PDRbld -Start -PhysDrv[252:0] -a0
                                         
    Started rebuild progress on device(Encl-252 Slot-0)

    Exit Code: 0x00

Then the rebuild should have started:

    root@moly:~# megacli -PDList -aALL | grep -e '^Enclosure Device' -e '^Slot' -e '^Firmware' 
    Enclosure Device ID: 252
    Slot Number: 0
    Firmware state: Rebuild
    [...]

To follow progress:

    watch /opt/MegaRAID/MegaCli/MegaCli64  -PDRbld -ShowProg -PhysDrv[252:0] -a0

## SMART monitoring

Some servers will fail to properly detect disk drives in their SMART