... | ... | @@ -1690,6 +1690,29 @@ device: |
|
|
|
|
|
For this, see [DRBD: deleting a stray device](howto/drbd#deleting-a-stray-device).
|
|
|
|
|
|
### SSH key verification failures
|
|
|
|
|
|
Ganeti uses SSH to launch arbitrary commands (as root!) on other
|
|
|
nodes. It does this using a funky command, from `node-daemon.log`:
|
|
|
|
|
|
ssh -oEscapeChar=none -oHashKnownHosts=no \
|
|
|
-oGlobalKnownHostsFile=/var/lib/ganeti/known_hosts \
|
|
|
-oUserKnownHostsFile=/dev/null -oCheckHostIp=no \
|
|
|
-oConnectTimeout=10 -oHostKeyAlias=chignt.torproject.org
|
|
|
-oPort=22 -oBatchMode=yes -oStrictHostKeyChecking=yes -4 \
|
|
|
root@chi-node-03.torproject.org
|
|
|
|
|
|
This has caused us some problems in the Ganeti buster to bullseye
|
|
|
upgrade, possibly because of changes in host verification routines in
|
|
|
OpenSSH. The problem was documented in [issue 1608 upstream](https://github.com/ganeti/ganeti/issues/1608) and
|
|
|
[tpo/tpa/team#40383](https://gitlab.torproject.org/tpo/tpa/team/-/issues/40383).
|
|
|
|
|
|
A workaround is to synchronize Ganeti's `known_hosts` file:
|
|
|
|
|
|
grep 'chi-node-0[0-9]' /etc/ssh/ssh_known_hosts | grep -v 'initramfs' | grep ssh-rsa | sed 's/[^ ]* /chignt.torproject.org /' >> /var/lib/ganeti/known_hosts
|
|
|
|
|
|
Note that the above assumes only a < 10 nodes cluster.
|
|
|
|
|
|
### Other troubleshooting
|
|
|
|
|
|
The [walkthrough](http://docs.ganeti.org/ganeti/2.15/html/walkthrough.html) also has a few recipes to resolve common
|
... | ... | @@ -1697,6 +1720,10 @@ problems. |
|
|
|
|
|
See also the [common issues page](https://github.com/ganeti/ganeti/wiki/Common-Issues) in the Ganeti wiki.
|
|
|
|
|
|
Look into logs on the relevant nodes (particularly
|
|
|
`/var/log/ganeti/node-daemon.log`, which shows all commands ran by
|
|
|
ganeti) when you have problems.
|
|
|
|
|
|
## Disaster recovery
|
|
|
|
|
|
If things get completely out of hand and the cluster becomes too
|
... | ... | @@ -1932,6 +1959,9 @@ address blocks reserved in the cluster. |
|
|
|
|
|
gnt-cluster verify
|
|
|
|
|
|
If the last step fails with SSH errors, you may need to re-synchronise
|
|
|
the SSH `known_hosts` file, see [SSH key verification failures](#ssh-key-verification-failures).
|
|
|
|
|
|
### gnt-chi cluster initialization
|
|
|
|
|
|
This procedure replaces the `gnt-node add` step in the initial setup
|
... | ... | |