Skip to content
Snippets Groups Projects

Software RAID

Replacing a drive

If a drive fails in a server, the procedure is essentially to open a ticket, wait for the drive change, partition and re-add it to the RAID array. The following procdure assumes that sda failed and sdb is good in a RAID-1 array, but can vary with other RAID configurations or drive models.

  1. file a ticket upstream

    Hetzner Support, for example, has an excellent service which asks you the disk serial number (available in the SMART email notification) and the SMART log (output of smartctl -x /dev/sda). Then they will turn off the machine, replace the disk, and start it up again.

  2. wait for the server to return with the new disk

    Hetzner will send an email to the tpa alias when that is done.

  3. partition the new drive (sda) to match the old (sdb):

    sfdisk -d /dev/sdb | sfdisk --no-reread /dev/sda --force
  4. re-add the new disk to the RAID array:

    mdadm /dev/md0 -a /dev/sda

Note that Hetzner also has pretty good documentation on how to deal with SMART output.

Hardware RAID

MegaCLI operation

Some TPO machines --particularly at cymru -- have hardware RAID with megaraid controllers. Those are controlled with the MegaCLI command that is ... rather hard to use.

First, alias the megacli command because the package (derived from the upstream RPM by Alien) installs it in a strange location:

alias megacli=/opt/MegaRAID/MegaCli/MegaCli

This will confirm you are using hardware raid:

root@moly:/home/anarcat# lspci | grep -i megaraid
05:00.0 RAID bus controller: LSI Logic / Symbios Logic MegaRAID SAS 2108 [Liberator] (rev 05)

This will show the RAID levels of each enclosure, for example this is RAID-10:

root@moly:/home/anarcat# megacli -LdPdInfo -aALL | grep "RAID Level"
RAID Level          : Primary-1, Secondary-0, RAID Level Qualifier-0

This is an example of a simple RAID-1 setup:

root@chi-node-04:~# megacli -LdPdInfo -aALL | grep "RAID Level"
RAID Level          : Primary-1, Secondary-0, RAID Level Qualifier-0

This lists a summary of all the disks, for example the first disk has failed here:

root@moly:/home/anarcat# megacli -PDList -aALL | grep -e '^Enclosure' -e '^Slot' -e '^PD' -e '^Firmware' -e '^Raw' -e '^Inquiry'
Enclosure Device ID: 252
Slot Number: 0
Enclosure position: 0
PD Type: SAS
Raw Size: 558.911 GB [0x45dd2fb0 Sectors]
Firmware state: Failed
Inquiry Data: SEAGATE ST3600057SS     [REDACTED]
Enclosure Device ID: 252
Slot Number: 1
Enclosure position: 0
PD Type: SAS
Raw Size: 558.911 GB [0x45dd2fb0 Sectors]
Firmware state: Online, Spun Up
Inquiry Data: SEAGATE ST3600057SS     [REDACTED]
Enclosure Device ID: 252
Slot Number: 2
Enclosure position: 0
PD Type: SAS
Raw Size: 558.911 GB [0x45dd2fb0 Sectors]
Firmware state: Online, Spun Up
Inquiry Data: SEAGATE ST3600057SS     [REDACTED]
Enclosure Device ID: 252
Slot Number: 3
Enclosure position: 0
PD Type: SAS
Raw Size: 558.911 GB [0x45dd2fb0 Sectors]
Firmware state: Online, Spun Up
Inquiry Data: SEAGATE ST3600057SS     [REDACTED]

This will make the drive blink (slot number 0 in enclosure 252):

megacli -PdLocate -start -physdrv[252:0] -aALL

Take the disk offline:

megacli -PDOffline -PhysDrv '[252:0]' -a0

Mark the disk as missing:

megacli -PDMarkMissing -PhysDrv '[252:0]' -a0

Prepare the disk for removal:

megacli -PDPrpRmv -PhysDrv '[252:0]' -a0

Reboot the machine, replace the disk, then inspect status again, you may see "Unconfigured(good)" as a status:

root@moly:~# megacli -PDList -aALL | grep -e '^Enclosure Device' -e '^Slot' -e '^Firmware' 
Enclosure Device ID: 252
Slot Number: 0
Firmware state: Unconfigured(good), Spun Up
[...]

Then you need to re-add the disk to the array:

megacli -PdReplaceMissing -PhysDrv[252:0] -Array0 -row0 -a0
megacli -PDRbld -Start -PhysDrv[252:0] -a0

Example output:

root@moly:~# megacli -PdReplaceMissing -PhysDrv[252:0] -Array0 -row0 -a0
                                     
Adapter: 0: Missing PD at Array 0, Row 0 is replaced.

Exit Code: 0x00
root@moly:~# megacli -PDRbld -Start -PhysDrv[252:0] -a0
                                     
Started rebuild progress on device(Encl-252 Slot-0)

Exit Code: 0x00

Then the rebuild should have started:

root@moly:~# megacli -PDList -aALL | grep -e '^Enclosure Device' -e '^Slot' -e '^Firmware' 
Enclosure Device ID: 252
Slot Number: 0
Firmware state: Rebuild
[...]

To follow progress:

watch /opt/MegaRAID/MegaCli/MegaCli64  -PDRbld -ShowProg -PhysDrv[252:0] -a0

Rebuilding the Debian package

The Debian package is based on a binary RPM provided by upstream (LSI corporation). Unfortunately, upstream was acquired by Broadcom in 2014, after which their MegaCLI software development seem to have stopped. Since then the lsi.com domain redirects to broadcom.com and those packages -- that were already hard to find -- are getting even harder to find.

It seems the broadcom search page is the best place to find the megaraid stuff. In that link you should get "search results" and under "Management Software and Tools" there should be a link to some "MegaCLI". The latest is currently (as of 2021) 5.5 P2 (dated 2014-01-19!). Note that this version number differs from the actual version number of the megacli binary (8.07.14). A direct link to the package is currently:

https://docs.broadcom.com/docs-and-downloads/raid-controllers/raid-controllers-common-files/8-07-14_MegaCLI.zip

Obviously, it seems like upstream does not mind breaking those links at any time, so you might have to redo the search to find it. In any case, the package is based on a RPM buried in the ZIP file. So this should get you a package:

unzip 8-07-14_MegaCLI.zip
fakeroot alien Linux/MegaCli-8.07.14-1.noarch.rpm

This gives you a megacli_8.07.14-2_all.deb package which normally gets upload to the proprietary archive on alberti.

An alternative is to use existing packages like the ones from le-vert.net. In particular, megactl is a free software alternative that works on chi-node-13, yet not packaged in Debian so currently not in use:

root@chi-node-13:~# megasasctl
a0       PERC 6/i Integrated      encl:1 ldrv:1  batt:good
a0d0       465GiB RAID 1   1x2  optimal
a0e32s0     465GiB  a0d0  online   errs: media:0  other:819
a0e32s1     465GiB  a0d0  online   errs: media:0  other:819

root@chi-node-13:~# megasasctl
a0       PERC 6/i Integrated      encl:1 ldrv:1  batt:good
a0d0       465GiB RAID 1   1x2  optimal
a0e32s0     465GiB  a0d0  online   errs: media:0  other:819
a0e32s1     465GiB  a0d0  online   errs: media:0  other:819

Pager playbook

Nagios should be monitoring hardware RAID on servers that support it. This is normally auto-detected by Puppet (in the raid module/class) but grep around for megaraid otherwise. The raid module should have a good README file describing how it works.

Failed disk

A normal RAID-1 Nagios check output looks like this:

OK: 0:0:RAID-1:2 drives:465.25GB:Optimal Drives:2

A failed RAID-10 check output looks like this:

CRITICAL: 0:0:RAID-10:4 drives:1.089TB:Degraded Drives:3

It actually has the numbers backwards: in the above situation, there was only one degraded drive, and 3 healthy ones. See above for how to restore a drive in a MegaRAID array.

Disks with "other" errors

The following warning may seem innocuous but actually reports that drives have "errors:

WARNING: 0:0:RAID-1:2 drives:465.25GB:Optimal Drives:2 (1530 Errors: 0 media, 0 predictive, 1530 other) 

The 1530 Errors part is the key here. They are "other" errors. This can be reproduced with the megacli command:

# megacli -PDList -aALL | grep -e '^Enclosure Device' -e '^Slot' -e '^Firmware' -e "Error Count"
Enclosure Device ID: 32
Slot Number: 0
Media Error Count: 0
Other Error Count: 765
Firmware state: Online, Spun Up
Enclosure Device ID: 32
Slot Number: 1
Media Error Count: 0
Other Error Count: 765
Firmware state: Online, Spun Up

The actual error should also be visible in the logs:

megacli -AdpEventLog -GetLatest 100 -f events.log -aALL

... then in events.log, the key part is:

Event Description: Unexpected sense: PD 00(e0x20/s0) Path 1221000000000000, CDB: 4d 00 4d 00 00 00 00 00 20 00, Sense: 5/24/00

The Sense field is Key Code Qualifier ("an error-code returned by a SCSI device") which, for 5/24/00 means "Illegal Request - invalid field in CDB (Command Descriptor Block) ". According to this discussion it seems that newer versions of the megacli binary trigger those errors when older drives are in use. Those errors can be safely ignored.

SMART monitoring

Some servers will fail to properly detect disk drives in their SMART configuration. In particular, smartd does not support:

  • virtual disks (e.g. /dev/nbd0)
  • MMC block devices (e.g. /dev/mmcblk0, commonly found on ARM devices)
  • out of the box, CCISS raid devices (e.g. /dev/cciss/c0d0)

The latter can be configured with the following snippet in /etc/smartd.conf:

#DEVICESCAN -d removable -n standby -m root -M exec /usr/share/smartmontools/smartd-runner
DEFAULT -n standby -m root -M exec /usr/share/smartmontools/smartd-runner
/dev/cciss/c0d0 -d cciss,0
/dev/cciss/c0d0 -d cciss,1
/dev/cciss/c0d0 -d cciss,2
/dev/cciss/c0d0 -d cciss,3
/dev/cciss/c0d0 -d cciss,4
/dev/cciss/c0d0 -d cciss,5

Notice how the DEVICESCAN is commented out to be replaced by the CCISS configuration. One line for each drive should be added (and no, it does not autodetect all drives unfortunately). This hack was deployed on listera which uses that hardware RAID.

Other hardware RAID controllers are better supported. For example, the megaraid controller on moly was correctly detected by smartd which accurately found a broken hard drive.

References

Here are some external documentation links: