cymru iSCSI storage setup
the iSCSI servers at cymru are now ready for use and configuration. i'm not sure from which end to take this one, but it seems we need to at least:
-
reset passwords - @munsond will do this -
hardware health checks all around -
make sure spare disks are setup on all storage units (check show allPhysicalDisks
, one or more disks in the summary should be tagged as "hot spare") -
replace failing drives - we won't be replacing drives for now, we have plenty of spares -
make sure batteries work, apparently "when the internal battery dies those things revert to much slower write through mode" (@lavamind)
-
-
change iscsi-inames on debian boxes to reflect hostname (done through puppet) -
make sure endpoints work: i can talk to the raid boxes over management and storage -
try to create an drive from a chi-node-X
box -
rename SANs to chi-san-0X.torproject.org -
setup proper host groups and names for all ganeti nodes in the SAN -
set CHAP passwords? (https://dsa.debian.org/howto/iscsi/ has the debian magic side) in Puppet! -
puppetize at least basic tools (iscsiadm, lsscsi) but ideally also configs (some progress done in profile::iscsi
, missing docs in the wiki page, and maybe multipath config, if necessary) -
try to hook it into a new ganeti instance (https://dsa.debian.org/howto/install-ganeti/ has ideas) -
make sure devices and configuration survives boot -
document how to do this in the ganeti/cymru docs? or an iscsi page? this should include: -
how to install SMcli -
how to create a disk -
how to delete a disk -
pager playbook? lsscsi
is pretty useful! -
how to hook devices into ganeti by hand (in howto/ganeti?)
-
Postponed/canceled tasks:
-
setup disk groups for different disk speedsSAN 1 and 2 have 7200RPM disks, 3 and 4 have 10k RPM, so no need for groups -
upload smcli debian packages to restricted repowill upload when we automate, right now we do not need it on all nodes. -
monitoring? existing nagios check? wrapper around smcli? snmp traps? syslog? oh my... mostly want to check for RAID#40197 (closed) -
backups?covered by intra-VM backups -
integrate in ganeti so this is automated?when we find the current procedure too painful, we'll start automating. priority is to unblock #40100 (closed) see also https://gitlab.torproject.org/tpo/tpa/team/-/wikis/howto/new-machine-cymru#ganeti-iscsi-integration
Edited by anarcat