Changes

This should be slightly more readable
anarcat · 1daba1fc
--- a/howto/raid.md
+++ b/howto/raid.md
@@ -6,7 +6,7 @@

 If a drive fails in a server, the procedure is essentially to open a
 ticket, wait for the drive change, partition and re-add it to the RAID
-array. The following procdure assumes that `sda` failed and `sdb` is
+array. The following procedure assumes that `sda` failed and `sdb` is
 good in a RAID-1 array, but can vary with other RAID configurations or
 drive models.

@@ -35,6 +35,12 @@ with SMART output](https://wiki.hetzner.de/index.php/Seriennummern_von_Festplatt

 # Hardware RAID

+Note: we do not have hardware RAID servers, nor do we want any in the
+future.
+
+This documentation is kept only for historical reference, in case we
+end up with hardware RAID arrays again.
+
 ## MegaCLI operation

 Some TPO machines --particularly [at cymru](howto/new-machine-cymru) -- have hardware RAID with `megaraid`
@@ -197,7 +203,47 @@ currently not in use:
    a0e32s0     465GiB  a0d0  online   errs: media:0  other:819
    a0e32s1     465GiB  a0d0  online   errs: media:0  other:819

-## Pager playbook
+## References
+
+Here are some external documentation links regarding hardware RAID setups:
+
+ * <https://cs.uwaterloo.ca/twiki/view/CF/MegaRaid>
+ * <https://raid.wiki.kernel.org/index.php/Hardware_Raid_Setup_using_MegaCli>
+ * <https://sysadmin.compxtreme.ro/how-to-replace-an-lsi-raid-disk-with-megacli/>
+ * <https://wikitech.wikimedia.org/wiki/MegaCli>
+
+# SMART monitoring
+
+Some servers will fail to properly detect disk drives in their SMART
+configuration. In particular, `smartd` does not support:
+
+ * virtual disks (e.g. `/dev/nbd0`)
+ * MMC block devices (e.g. `/dev/mmcblk0`, commonly found on ARM
+   devices)
+ * out of the box, CCISS raid devices (e.g. `/dev/cciss/c0d0`)
+
+The latter can be configured with the following snippet in
+`/etc/smartd.conf`:
+
+    #DEVICESCAN -d removable -n standby -m root -M exec /usr/share/smartmontools/smartd-runner
+    DEFAULT -n standby -m root -M exec /usr/share/smartmontools/smartd-runner
+    /dev/cciss/c0d0 -d cciss,0
+    /dev/cciss/c0d0 -d cciss,1
+    /dev/cciss/c0d0 -d cciss,2
+    /dev/cciss/c0d0 -d cciss,3
+    /dev/cciss/c0d0 -d cciss,4
+    /dev/cciss/c0d0 -d cciss,5
+
+Notice how the `DEVICESCAN` is commented out to be replaced by the
+CCISS configuration. One line for each drive should be added (and no,
+it does not autodetect all drives unfortunately). This hack was
+deployed on `listera` which uses that hardware RAID.
+
+Other hardware RAID controllers are better supported. For example, the
+`megaraid` controller on `moly` was correctly detected by `smartd`
+which accurately found a broken hard drive.
+
+# Pager playbook

 Prometheus should be monitoring hardware RAID on servers that support
 it. This is normally auto-detected by the Prometheus node exporter.
@@ -205,7 +251,7 @@ it. This is normally auto-detected by the Prometheus node exporter.
 NOTE: those instructions are out of date and need to be rewritten for
 Prometheus, see [tpo/tpa/prometheus-alerts#16](https://gitlab.torproject.org/tpo/tpa/prometheus-alerts/-/issues/16).

-### Failed disk
+## Failed disk

 A normal RAID-1 Nagios check output looks like this:

@@ -219,7 +265,7 @@ It actually has the numbers backwards: in the above situation, there
 was only *one* degraded drive, and 3 healthy ones. See above for how
 to restore a drive in a MegaRAID array.

-### Disks with "other" errors
+## Disks with "other" errors

 The following warning may seem innocuous but actually reports that
 drives have "errors:
@@ -259,42 +305,12 @@ safely ignored.
 [this discussion]: https://serverfault.com/questions/482705/megacli-causes-drive-other-error
 [Key Code Qualifier]: https://en.wikipedia.org/wiki/Key_Code_Qualifier

-# SMART monitoring
-
-Some servers will fail to properly detect disk drives in their SMART
-configuration. In particular, `smartd` does not support:
-
- * virtual disks (e.g. `/dev/nbd0`)
- * MMC block devices (e.g. `/dev/mmcblk0`, commonly found on ARM
-   devices)
- * out of the box, CCISS raid devices (e.g. `/dev/cciss/c0d0`)
-
-The latter can be configured with the following snippet in
-`/etc/smartd.conf`:
-
-    #DEVICESCAN -d removable -n standby -m root -M exec /usr/share/smartmontools/smartd-runner
-    DEFAULT -n standby -m root -M exec /usr/share/smartmontools/smartd-runner
-    /dev/cciss/c0d0 -d cciss,0
-    /dev/cciss/c0d0 -d cciss,1
-    /dev/cciss/c0d0 -d cciss,2
-    /dev/cciss/c0d0 -d cciss,3
-    /dev/cciss/c0d0 -d cciss,4
-    /dev/cciss/c0d0 -d cciss,5
-
-Notice how the `DEVICESCAN` is commented out to be replaced by the
-CCISS configuration. One line for each drive should be added (and no,
-it does not autodetect all drives unfortunately). This hack was
-deployed on `listera` which uses that hardware RAID.
+# Other documentation

-Other hardware RAID controllers are better supported. For example, the
-`megaraid` controller on `moly` was correctly detected by `smartd`
-which accurately found a broken hard drive.
-
-## References
-
-Here are some external documentation links:
+See also:

- * <https://cs.uwaterloo.ca/twiki/view/CF/MegaRaid>
- * <https://raid.wiki.kernel.org/index.php/Hardware_Raid_Setup_using_MegaCli>
- * <https://sysadmin.compxtreme.ro/how-to-replace-an-lsi-raid-disk-with-megacli/>
- * <https://wikitech.wikimedia.org/wiki/MegaCli>
+- [LVM](howto/lvm)
+- [RAID wiki](https://archive.kernel.org/oldwiki/raid.wiki.kernel.org/) (archived)
+- [md(4) manual page](https://manpages.debian.org/bookworm/mdadm/md.4.en.html)
+- [mdadm(8) manual page](https://manpages.debian.org/bookworm/mdadm/mdadm.8.en.html)
+- [md driver kernel documentation](https://docs.kernel.org/admin-guide/md.html)