Software RAID
Replacing a drive
If a drive fails in a server, the procedure is essentially to open a
ticket, wait for the drive change, partition and re-add it to the RAID
array. The following procdure assumes that sda
failed and sdb
is
good in a RAID-1 array, but can vary with other RAID configurations or
drive models.
-
file a ticket upstream
Hetzner Support, for example, has an excellent service which asks you the disk serial number (available in the SMART email notification) and the SMART log (output of
smartctl -x /dev/sda
). Then they will turn off the machine, replace the disk, and start it up again. -
wait for the server to return with the new disk
Hetzner will send an email to the tpa alias when that is done.
-
partition the new drive (
sda
) to match the old (sdb
):sfdisk -d /dev/sdb | sfdisk --no-reread /dev/sda --force
-
re-add the new disk to the RAID array:
mdadm /dev/md0 -a /dev/sda
Note that Hetzner also has pretty good documentation on how to deal with SMART output.
Hardware RAID
MegaCLI operation
Some TPO machines --particularly at cymru -- have hardware RAID with megaraid
controllers. Those are controlled with the MegaCLI
command that is
... rather hard to use.
First, alias the megacli command because the package (derived from the upstream RPM by Alien) installs it in a strange location:
alias megacli=/opt/MegaRAID/MegaCli/MegaCli
This will confirm you are using hardware raid:
root@moly:/home/anarcat# lspci | grep -i megaraid
05:00.0 RAID bus controller: LSI Logic / Symbios Logic MegaRAID SAS 2108 [Liberator] (rev 05)
This will show the RAID levels of each enclosure, for example this is RAID-10:
root@moly:/home/anarcat# megacli -LdPdInfo -aALL | grep "RAID Level"
RAID Level : Primary-1, Secondary-0, RAID Level Qualifier-0
This is an example of a simple RAID-1 setup:
root@chi-node-04:~# megacli -LdPdInfo -aALL | grep "RAID Level"
RAID Level : Primary-1, Secondary-0, RAID Level Qualifier-0
This lists a summary of all the disks, for example the first disk has failed here:
root@moly:/home/anarcat# megacli -PDList -aALL | grep -e '^Enclosure' -e '^Slot' -e '^PD' -e '^Firmware' -e '^Raw' -e '^Inquiry'
Enclosure Device ID: 252
Slot Number: 0
Enclosure position: 0
PD Type: SAS
Raw Size: 558.911 GB [0x45dd2fb0 Sectors]
Firmware state: Failed
Inquiry Data: SEAGATE ST3600057SS [REDACTED]
Enclosure Device ID: 252
Slot Number: 1
Enclosure position: 0
PD Type: SAS
Raw Size: 558.911 GB [0x45dd2fb0 Sectors]
Firmware state: Online, Spun Up
Inquiry Data: SEAGATE ST3600057SS [REDACTED]
Enclosure Device ID: 252
Slot Number: 2
Enclosure position: 0
PD Type: SAS
Raw Size: 558.911 GB [0x45dd2fb0 Sectors]
Firmware state: Online, Spun Up
Inquiry Data: SEAGATE ST3600057SS [REDACTED]
Enclosure Device ID: 252
Slot Number: 3
Enclosure position: 0
PD Type: SAS
Raw Size: 558.911 GB [0x45dd2fb0 Sectors]
Firmware state: Online, Spun Up
Inquiry Data: SEAGATE ST3600057SS [REDACTED]
This will make the drive blink (slot number 0 in enclosure 252):
megacli -PdLocate -start -physdrv[252:0] -aALL
Take the disk offline:
megacli -PDOffline -PhysDrv '[252:0]' -a0
Mark the disk as missing:
megacli -PDMarkMissing -PhysDrv '[252:0]' -a0
Prepare the disk for removal:
megacli -PDPrpRmv -PhysDrv '[252:0]' -a0
Reboot the machine, replace the disk, then inspect status again, you may see "Unconfigured(good)" as a status:
root@moly:~# megacli -PDList -aALL | grep -e '^Enclosure Device' -e '^Slot' -e '^Firmware'
Enclosure Device ID: 252
Slot Number: 0
Firmware state: Unconfigured(good), Spun Up
[...]
Then you need to re-add the disk to the array:
megacli -PdReplaceMissing -PhysDrv[252:0] -Array0 -row0 -a0
megacli -PDRbld -Start -PhysDrv[252:0] -a0
Example output:
root@moly:~# megacli -PdReplaceMissing -PhysDrv[252:0] -Array0 -row0 -a0
Adapter: 0: Missing PD at Array 0, Row 0 is replaced.
Exit Code: 0x00
root@moly:~# megacli -PDRbld -Start -PhysDrv[252:0] -a0
Started rebuild progress on device(Encl-252 Slot-0)
Exit Code: 0x00
Then the rebuild should have started:
root@moly:~# megacli -PDList -aALL | grep -e '^Enclosure Device' -e '^Slot' -e '^Firmware'
Enclosure Device ID: 252
Slot Number: 0
Firmware state: Rebuild
[...]
To follow progress:
watch /opt/MegaRAID/MegaCli/MegaCli64 -PDRbld -ShowProg -PhysDrv[252:0] -a0
Rebuilding the Debian package
The Debian package is based on a binary RPM provided by upstream (LSI
corporation). Unfortunately, upstream was acquired by
Broadcom in 2014, after which their MegaCLI software development
seem to have stopped. Since then the lsi.com
domain redirects to
broadcom.com
and those packages -- that were already hard to find --
are getting even harder to find.
It seems the broadcom search page is the best place to find the megaraid stuff. In that link you should get "search results" and under "Management Software and Tools" there should be a link to some "MegaCLI". The latest is currently (as of 2021) 5.5 P2 (dated 2014-01-19!). Note that this version number differs from the actual version number of the megacli binary (8.07.14). A direct link to the package is currently:
https://docs.broadcom.com/docs-and-downloads/raid-controllers/raid-controllers-common-files/8-07-14_MegaCLI.zip
Obviously, it seems like upstream does not mind breaking those links at any time, so you might have to redo the search to find it. In any case, the package is based on a RPM buried in the ZIP file. So this should get you a package:
unzip 8-07-14_MegaCLI.zip
fakeroot alien Linux/MegaCli-8.07.14-1.noarch.rpm
This gives you a megacli_8.07.14-2_all.deb
package which normally
gets upload to the proprietary archive on alberti
.
An alternative is to use existing packages like the ones from
le-vert.net. In particular, megactl
is a free software
alternative that works on chi-node-13
, yet not packaged in Debian so
currently not in use:
root@chi-node-13:~# megasasctl
a0 PERC 6/i Integrated encl:1 ldrv:1 batt:good
a0d0 465GiB RAID 1 1x2 optimal
a0e32s0 465GiB a0d0 online errs: media:0 other:819
a0e32s1 465GiB a0d0 online errs: media:0 other:819
root@chi-node-13:~# megasasctl
a0 PERC 6/i Integrated encl:1 ldrv:1 batt:good
a0d0 465GiB RAID 1 1x2 optimal
a0e32s0 465GiB a0d0 online errs: media:0 other:819
a0e32s1 465GiB a0d0 online errs: media:0 other:819
Pager playbook
Nagios should be monitoring hardware RAID on servers that support
it. This is normally auto-detected by Puppet (in the raid
module/class) but grep around for megaraid
otherwise. The raid
module should have a good README file describing how it works.
Failed disk
A normal RAID-1 Nagios check output looks like this:
OK: 0:0:RAID-1:2 drives:465.25GB:Optimal Drives:2
A failed RAID-10 check output looks like this:
CRITICAL: 0:0:RAID-10:4 drives:1.089TB:Degraded Drives:3
It actually has the numbers backwards: in the above situation, there was only one degraded drive, and 3 healthy ones. See above for how to restore a drive in a MegaRAID array.
Disks with "other" errors
The following warning may seem innocuous but actually reports that drives have "errors:
WARNING: 0:0:RAID-1:2 drives:465.25GB:Optimal Drives:2 (1530 Errors: 0 media, 0 predictive, 1530 other)
The 1530 Errors
part is the key here. They are "other" errors. This
can be reproduced with the megacli
command:
# megacli -PDList -aALL | grep -e '^Enclosure Device' -e '^Slot' -e '^Firmware' -e "Error Count"
Enclosure Device ID: 32
Slot Number: 0
Media Error Count: 0
Other Error Count: 765
Firmware state: Online, Spun Up
Enclosure Device ID: 32
Slot Number: 1
Media Error Count: 0
Other Error Count: 765
Firmware state: Online, Spun Up
The actual error should also be visible in the logs:
megacli -AdpEventLog -GetLatest 100 -f events.log -aALL
... then in events.log
, the key part is:
Event Description: Unexpected sense: PD 00(e0x20/s0) Path 1221000000000000, CDB: 4d 00 4d 00 00 00 00 00 20 00, Sense: 5/24/00
The Sense
field is Key Code Qualifier ("an error-code returned
by a SCSI device") which, for 5/24/00 means "Illegal Request - invalid
field in CDB (Command Descriptor Block) ". According to this
discussion it seems that newer versions of the megacli
binary
trigger those errors when older drives are in use. Those errors can be
safely ignored.
SMART monitoring
Some servers will fail to properly detect disk drives in their SMART
configuration. In particular, smartd
does not support:
- virtual disks (e.g.
/dev/nbd0
) - MMC block devices (e.g.
/dev/mmcblk0
, commonly found on ARM devices) - out of the box, CCISS raid devices (e.g.
/dev/cciss/c0d0
)
The latter can be configured with the following snippet in
/etc/smartd.conf
:
#DEVICESCAN -d removable -n standby -m root -M exec /usr/share/smartmontools/smartd-runner
DEFAULT -n standby -m root -M exec /usr/share/smartmontools/smartd-runner
/dev/cciss/c0d0 -d cciss,0
/dev/cciss/c0d0 -d cciss,1
/dev/cciss/c0d0 -d cciss,2
/dev/cciss/c0d0 -d cciss,3
/dev/cciss/c0d0 -d cciss,4
/dev/cciss/c0d0 -d cciss,5
Notice how the DEVICESCAN
is commented out to be replaced by the
CCISS configuration. One line for each drive should be added (and no,
it does not autodetect all drives unfortunately). This hack was
deployed on listera
which uses that hardware RAID.
Other hardware RAID controllers are better supported. For example, the
megaraid
controller on moly
was correctly detected by smartd
which accurately found a broken hard drive.
References
Here are some external documentation links: