diff --git a/tsa/howto/drbd.mdwn b/tsa/howto/drbd.mdwn index 4007ddf2a234b7d1ea3df89be413b17d8d9d9ced..e7ea5575f6bfa72af74ad8cdbd5cffdb1cda5950 100644 --- a/tsa/howto/drbd.mdwn +++ b/tsa/howto/drbd.mdwn @@ -16,8 +16,8 @@ created, so it's expected that new nodes are flagged until they host some content. The check is shipped as part of `tor-nagios-checks`, as `dsa-check-drbd`, see [dsa-check-drbd](https://gitweb.torproject.org/admin/tor-nagios.git/plain/tor-nagios-checks/checks/dsa-check-drbd). -Common tasks -============ +How-to +====== Checking status --------------- @@ -76,6 +76,44 @@ Finding which host is associated with this device is easy: just call It's the host `gettor-01`. +## Pager playbook + +### Resyncing disks + +In Nagios, if you see this warning: + + DRBD CRITICAL: Device 10 WFConnection UpToDate, Device 9 WFConnection UpToDate + +It means that, on that host (in my case it was +`fsn-node-04.torproject.org`), disks are desynchronized for some +reason. In this case, those are disks 9 and 10. You can confirm that +on the host: + + # ssh fsn-node-04.torproject.org cat /proc/drbd + [...] + 9: cs:WFConnection ro:Primary/Unknown ds:UpToDate/DUnknown C r----- + ns:13799284 nr:0 dw:272704248 dr:15512933 al:1331 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:8343096 + 10: cs:WFConnection ro:Primary/Unknown ds:UpToDate/DUnknown C r----- + ns:2097152 nr:0 dw:2097192 dr:2102652 al:9 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:40 + [...] + +You need to find which instance this disk is associated with (see also +above): + + $ ssh fsn-node-01.torproject.org gnt-node list-drbd fsn-node-04 + [...] + Node Minor Instance Disk Role PeerNode + [...] + fsn-node-04.torproject.org 9 onionoo-frontend-01.torproject.org disk/0 primary fsn-node-03.torproject.org + fsn-node-04.torproject.org 10 onionoo-frontend-01.torproject.org disk/1 primary fsn-node-03.torproject.org + [...] + +Then you can "reactivate" the disks simply by telling ganeti: + + $ ssh fsn-node-01.torproject.org gnt-instance activate-disks onionoo-frontend-01.torproject.org + +And then the disk will resync. + References ==========