raid.md

[[_TOC_]]

# Software RAID

## Replacing a drive

If a drive fails in a server, the procedure is essentially to open a
ticket, wait for the drive change, partition and re-add it to the RAID
array. The following procdure assumes that `sda` failed and `sdb` is
good in a RAID-1 array, but can vary with other RAID configurations or
drive models.

 1. file a ticket upstream

    [Hetzner Support](https://robot.your-server.de/support/), for example, has an excellent service which
    asks you the disk serial number (available in the SMART email
    notification) and the SMART log (output of `smartctl -x
    /dev/sda`). Then they will turn off the machine, replace the disk,
    and start it up again.

 2. wait for the server to return with the new disk

    Hetzner will send an email to the tpa alias when that is done.

 3. partition the new drive (`sda`) to match the old (`sdb`):

        sfdisk -d /dev/sdb | sfdisk --no-reread /dev/sda --force

 4. re-add the new disk to the RAID array:

        mdadm /dev/md0 -a /dev/sda

Note that Hetzner also has [pretty good documentation on how to deal
with SMART output](https://wiki.hetzner.de/index.php/Seriennummern_von_Festplatten_und_Hinweise_zu_defekten_Festplatten/en).

# Hardware RAID

## MegaCLI operation

Some TPO machines --particularly [at cymru](howto/new-machine-cymru) -- have hardware RAID with `megaraid`
controllers. Those are controlled with the `MegaCLI` command that is
... rather hard to use.

First, alias the megacli command because the package (derived from the
upstream RPM by Alien) installs it in a strange location:

    alias megacli=/opt/MegaRAID/MegaCli/MegaCli

This will confirm you are using hardware raid:

    root@moly:/home/anarcat# lspci | grep -i megaraid
    05:00.0 RAID bus controller: LSI Logic / Symbios Logic MegaRAID SAS 2108 [Liberator] (rev 05)

This will show the RAID levels of each enclosure, for example this is
RAID-10:

    root@moly:/home/anarcat# megacli -LdPdInfo -aALL | grep "RAID Level"
    RAID Level          : Primary-1, Secondary-0, RAID Level Qualifier-0

This is an example of a simple RAID-1 setup:

    root@chi-node-04:~# megacli -LdPdInfo -aALL | grep "RAID Level"
    RAID Level          : Primary-1, Secondary-0, RAID Level Qualifier-0

This lists a summary of all the disks, for example the first disk has
failed here:

    root@moly:/home/anarcat# megacli -PDList -aALL | grep -e '^Enclosure' -e '^Slot' -e '^PD' -e '^Firmware' -e '^Raw' -e '^Inquiry'
    Enclosure Device ID: 252
    Slot Number: 0
    Enclosure position: 0
    PD Type: SAS
    Raw Size: 558.911 GB [0x45dd2fb0 Sectors]
    Firmware state: Failed
    Inquiry Data: SEAGATE ST3600057SS     [REDACTED]
    Enclosure Device ID: 252
    Slot Number: 1
    Enclosure position: 0
    PD Type: SAS
    Raw Size: 558.911 GB [0x45dd2fb0 Sectors]
    Firmware state: Online, Spun Up
    Inquiry Data: SEAGATE ST3600057SS     [REDACTED]
    Enclosure Device ID: 252
    Slot Number: 2
    Enclosure position: 0
    PD Type: SAS
    Raw Size: 558.911 GB [0x45dd2fb0 Sectors]
    Firmware state: Online, Spun Up
    Inquiry Data: SEAGATE ST3600057SS     [REDACTED]
    Enclosure Device ID: 252
    Slot Number: 3
    Enclosure position: 0
    PD Type: SAS
    Raw Size: 558.911 GB [0x45dd2fb0 Sectors]
    Firmware state: Online, Spun Up
    Inquiry Data: SEAGATE ST3600057SS     [REDACTED]

This will make the drive blink (slot number 0 in enclosure 252):

    megacli -PdLocate -start -physdrv[252:0] -aALL

Take the disk offline:

    megacli -PDOffline -PhysDrv '[252:0]' -a0

Mark the disk as missing:

    megacli -PDMarkMissing -PhysDrv '[252:0]' -a0

Prepare the disk for removal:

    megacli -PDPrpRmv -PhysDrv '[252:0]' -a0

Reboot the machine, replace the disk, then inspect status again, you
may see "Unconfigured(good)" as a status:

    root@moly:~# megacli -PDList -aALL | grep -e '^Enclosure Device' -e '^Slot' -e '^Firmware' 
    Enclosure Device ID: 252
    Slot Number: 0
    Firmware state: Unconfigured(good), Spun Up
    [...]

Then you need to re-add the disk to the array:

    megacli -PdReplaceMissing -PhysDrv[252:0] -Array0 -row0 -a0
    megacli -PDRbld -Start -PhysDrv[252:0] -a0

Example output:

    root@moly:~# megacli -PdReplaceMissing -PhysDrv[252:0] -Array0 -row0 -a0
                                         
    Adapter: 0: Missing PD at Array 0, Row 0 is replaced.

    Exit Code: 0x00
    root@moly:~# megacli -PDRbld -Start -PhysDrv[252:0] -a0
                                         
    Started rebuild progress on device(Encl-252 Slot-0)

    Exit Code: 0x00

Then the rebuild should have started:

    root@moly:~# megacli -PDList -aALL | grep -e '^Enclosure Device' -e '^Slot' -e '^Firmware' 
    Enclosure Device ID: 252
    Slot Number: 0
    Firmware state: Rebuild
    [...]

To follow progress:

    watch /opt/MegaRAID/MegaCli/MegaCli64  -PDRbld -ShowProg -PhysDrv[252:0] -a0

### Rebuilding the Debian package

The Debian package is based on a binary RPM provided by upstream ([LSI
corporation](https://en.wikipedia.org/wiki/LSI_Corporation)). Unfortunately, upstream was acquired by
[Broadcom](https://en.wikipedia.org/wiki/Broadcom_Inc.) in 2014, after which their MegaCLI software development
seem to have stopped. Since then the `lsi.com` domain redirects to
`broadcom.com` and those packages -- that were already hard to find --
are getting even harder to find.

It seems the [broadcom search page](https://www.broadcom.com/support/download-search?pg=&pf=&pn=&pa=&po=&dk=megacli&pl=) is the best place to find the
megaraid stuff. In that link you should get "search results" and under
"Management Software and Tools" there should be a link to some
"MegaCLI". The latest is currently (as of 2021) 5.5 P2 (dated
2014-01-19!). Note that this version number differs from the actual
version number of the megacli binary (8.07.14).  A direct link to the
package is currently:

    https://docs.broadcom.com/docs-and-downloads/raid-controllers/raid-controllers-common-files/8-07-14_MegaCLI.zip

Obviously, it seems like upstream does not mind breaking those links at
any time, so you might have to redo the search to find it. In any
case, the package is based on a RPM buried in the ZIP file. So this
should get you a package:

    unzip 8-07-14_MegaCLI.zip
    fakeroot alien Linux/MegaCli-8.07.14-1.noarch.rpm

This gives you a `megacli_8.07.14-2_all.deb` package which normally
gets upload to the proprietary archive on `alberti`.

An alternative is to use existing packages like the ones from
[le-vert.net](https://hwraid.le-vert.net/wiki/DebianPackages). In particular, `megactl` is a free software
alternative that works on `chi-node-13`, yet not packaged in Debian so
currently not in use:

    root@chi-node-13:~# megasasctl
    a0       PERC 6/i Integrated      encl:1 ldrv:1  batt:good
    a0d0       465GiB RAID 1   1x2  optimal
    a0e32s0     465GiB  a0d0  online   errs: media:0  other:819
    a0e32s1     465GiB  a0d0  online   errs: media:0  other:819

    root@chi-node-13:~# megasasctl
    a0       PERC 6/i Integrated      encl:1 ldrv:1  batt:good
    a0d0       465GiB RAID 1   1x2  optimal
    a0e32s0     465GiB  a0d0  online   errs: media:0  other:819
    a0e32s1     465GiB  a0d0  online   errs: media:0  other:819

## Pager playbook

Nagios should be monitoring hardware RAID on servers that support
it. This is normally auto-detected by Puppet (in the `raid`
module/class) but grep around for `megaraid` otherwise. The `raid`
module should have a good README file describing how it works.

### Failed disk

A normal RAID-1 Nagios check output looks like this:

    OK: 0:0:RAID-1:2 drives:465.25GB:Optimal Drives:2

A failed RAID-10 check output looks like this:

    CRITICAL: 0:0:RAID-10:4 drives:1.089TB:Degraded Drives:3

It actually has the numbers backwards: in the above situation, there
was only *one* degraded drive, and 3 healthy ones. See above for how
to restore a drive in a MegaRAID array.

### Disks with "other" errors

The following warning may seem innocuous but actually reports that
drives have "errors:

    WARNING: 0:0:RAID-1:2 drives:465.25GB:Optimal Drives:2 (1530 Errors: 0 media, 0 predictive, 1530 other) 

The `1530 Errors` part is the key here. They are "other" errors. This
can be reproduced with the `megacli` command:

    # megacli -PDList -aALL | grep -e '^Enclosure Device' -e '^Slot' -e '^Firmware' -e "Error Count"
    Enclosure Device ID: 32
    Slot Number: 0
    Media Error Count: 0
    Other Error Count: 765
    Firmware state: Online, Spun Up
    Enclosure Device ID: 32
    Slot Number: 1
    Media Error Count: 0
    Other Error Count: 765
    Firmware state: Online, Spun Up

The actual error should also be visible in the logs:

    megacli -AdpEventLog -GetLatest 100 -f events.log -aALL

... then in `events.log`, the key part is:

    Event Description: Unexpected sense: PD 00(e0x20/s0) Path 1221000000000000, CDB: 4d 00 4d 00 00 00 00 00 20 00, Sense: 5/24/00

The `Sense` field is [Key Code Qualifier][] ("an error-code returned
by a SCSI device") which, for 5/24/00 means "Illegal Request - invalid
field in CDB (Command Descriptor Block) ". According to [this
discussion][] it seems that *newer* versions of the `megacli` binary
trigger those errors when older drives are in use. Those errors can be
safely ignored.

[this discussion]: https://serverfault.com/questions/482705/megacli-causes-drive-other-error
[Key Code Qualifier]: https://en.wikipedia.org/wiki/Key_Code_Qualifier

# SMART monitoring

Some servers will fail to properly detect disk drives in their SMART
configuration. In particular, `smartd` does not support:

 * virtual disks (e.g. `/dev/nbd0`)
 * MMC block devices (e.g. `/dev/mmcblk0`, commonly found on ARM
   devices)
 * out of the box, CCISS raid devices (e.g. `/dev/cciss/c0d0`)

The latter can be configured with the following snippet in
`/etc/smartd.conf`:

    #DEVICESCAN -d removable -n standby -m root -M exec /usr/share/smartmontools/smartd-runner
    DEFAULT -n standby -m root -M exec /usr/share/smartmontools/smartd-runner
    /dev/cciss/c0d0 -d cciss,0
    /dev/cciss/c0d0 -d cciss,1
    /dev/cciss/c0d0 -d cciss,2
    /dev/cciss/c0d0 -d cciss,3
    /dev/cciss/c0d0 -d cciss,4
    /dev/cciss/c0d0 -d cciss,5

Notice how the `DEVICESCAN` is commented out to be replaced by the
CCISS configuration. One line for each drive should be added (and no,
it does not autodetect all drives unfortunately). This hack was
deployed on `listera` which uses that hardware RAID.

Other hardware RAID controllers are better supported. For example, the
`megaraid` controller on `moly` was correctly detected by `smartd`
which accurately found a broken hard drive.

## References

Here are some external documentation links:

 * <https://cs.uwaterloo.ca/twiki/view/CF/MegaRaid>
 * <https://raid.wiki.kernel.org/index.php/Hardware_Raid_Setup_using_MegaCli>
 * <https://sysadmin.compxtreme.ro/how-to-replace-an-lsi-raid-disk-with-megacli/>
 * <https://wikitech.wikimedia.org/wiki/MegaCli>