... | ... | @@ -148,6 +148,25 @@ To follow progress: |
|
|
|
|
|
watch /opt/MegaRAID/MegaCli/MegaCli64 -PDRbld -ShowProg -PhysDrv[252:0] -a0
|
|
|
|
|
|
## Pager playbook
|
|
|
|
|
|
Nagios should be monitoring hardware RAID on servers that support
|
|
|
it. This is normally auto-detected by Puppet (in the `raid`
|
|
|
module/class) but grep around for `megaraid` otherwise. The `raid`
|
|
|
module should have a good README file describing how it works.
|
|
|
|
|
|
A normal RAID-1 Nagios check output looks like this:
|
|
|
|
|
|
OK: 0:0:RAID-1:2 drives:465.25GB:Optimal Drives:2
|
|
|
|
|
|
A failed RAID-10 check output looks like this:
|
|
|
|
|
|
CRITICAL: 0:0:RAID-10:4 drives:1.089TB:Degraded Drives:3
|
|
|
|
|
|
It actually has the numbers backwards: in the above situation, there
|
|
|
was only *one* degraded drive, and 3 healthy ones. See above for how
|
|
|
to restore a drive in a MegaRAID array.
|
|
|
|
|
|
# SMART monitoring
|
|
|
|
|
|
Some servers will fail to properly detect disk drives in their SMART
|
... | ... | |