... | ... | @@ -252,6 +252,101 @@ From here on, follow the [next steps](#next-steps) above. |
|
|
TODO: This would ideally be automated by an external storage provider,
|
|
|
see the [storage reference for more information](#storage).
|
|
|
|
|
|
### Troubleshooting
|
|
|
|
|
|
If a Ganeti instance install fails, it will show the end of the
|
|
|
install log, for example:
|
|
|
|
|
|
```
|
|
|
Thu Aug 26 14:11:09 2021 - INFO: Selected nodes for instance tb-pkgstage-01.torproject.org via iallocator hail: chi-node-02.torproject.org, chi-node-01.torproject.org
|
|
|
Thu Aug 26 14:11:09 2021 - INFO: NIC/0 inherits netparams ['br0', 'bridged', '']
|
|
|
Thu Aug 26 14:11:09 2021 - INFO: Chose IP 38.229.82.29 from network gnt-chi-01
|
|
|
Thu Aug 26 14:11:10 2021 * creating instance disks...
|
|
|
Thu Aug 26 14:12:58 2021 adding instance tb-pkgstage-01.torproject.org to cluster config
|
|
|
Thu Aug 26 14:12:58 2021 adding disks to cluster config
|
|
|
Thu Aug 26 14:13:00 2021 * checking mirrors status
|
|
|
Thu Aug 26 14:13:01 2021 - INFO: - device disk/0: 30.90% done, 3m 32s remaining (estimated)
|
|
|
Thu Aug 26 14:13:01 2021 - INFO: - device disk/2: 0.60% done, 55m 26s remaining (estimated)
|
|
|
Thu Aug 26 14:13:01 2021 * checking mirrors status
|
|
|
Thu Aug 26 14:13:02 2021 - INFO: - device disk/0: 31.20% done, 3m 40s remaining (estimated)
|
|
|
Thu Aug 26 14:13:02 2021 - INFO: - device disk/2: 0.60% done, 52m 13s remaining (estimated)
|
|
|
Thu Aug 26 14:13:02 2021 * pausing disk sync to install instance OS
|
|
|
Thu Aug 26 14:13:03 2021 * running the instance OS create scripts...
|
|
|
Thu Aug 26 14:16:31 2021 * resuming disk sync
|
|
|
Failure: command execution error:
|
|
|
Could not add os for instance tb-pkgstage-01.torproject.org on node chi-node-02.torproject.org: OS create script failed (exited with exit code 1), last lines in the log file:
|
|
|
Setting up openssh-sftp-server (1:7.9p1-10+deb10u2) ...
|
|
|
Setting up openssh-server (1:7.9p1-10+deb10u2) ...
|
|
|
Creating SSH2 RSA key; this may take some time ...
|
|
|
2048 SHA256:ZTeMxYSUDTkhUUeOpDWpbuOzEAzOaehIHW/lJarOIQo root@chi-node-02 (RSA)
|
|
|
Creating SSH2 ED25519 key; this may take some time ...
|
|
|
256 SHA256:MWKeA8vJKkEG4TW+FbG2AkupiuyFFyoVWNVwO2WG0wg root@chi-node-02 (ED25519)
|
|
|
Created symlink /etc/systemd/system/sshd.service \xe2\x86\x92 /lib/systemd/system/ssh.service.
|
|
|
Created symlink /etc/systemd/system/multi-user.target.wants/ssh.service \xe2\x86\x92 /lib/systemd/system/ssh.service.
|
|
|
invoke-rc.d: could not determine current runlevel
|
|
|
Setting up ssh (1:7.9p1-10+deb10u2) ...
|
|
|
Processing triggers for systemd (241-7~deb10u8) ...
|
|
|
Processing triggers for libc-bin (2.28-10) ...
|
|
|
Errors were encountered while processing:
|
|
|
linux-image-4.19.0-17-amd64
|
|
|
E: Sub-process /usr/bin/dpkg returned an error code (1)
|
|
|
run-parts: /etc/ganeti/instance-debootstrap/hooks/ssh exited with return code 100
|
|
|
Using disk /dev/drbd4 as swap...
|
|
|
Setting up swapspace version 1, size = 2 GiB (2147479552 bytes)
|
|
|
no label, UUID=96111754-c57d-43f2-83d0-8e1c8b4688b4
|
|
|
Not using disk 2 (/dev/drbd5) because it is not named 'swap' (name: )
|
|
|
root@chi-node-01:~#
|
|
|
```
|
|
|
|
|
|
Here the failure which tripped the install is:
|
|
|
|
|
|
```
|
|
|
Errors were encountered while processing:
|
|
|
linux-image-4.19.0-17-amd64
|
|
|
E: Sub-process /usr/bin/dpkg returned an error code (1)
|
|
|
```
|
|
|
|
|
|
But the actual error is higher up, and we need to go look at the logs
|
|
|
on the server for this, in this case in
|
|
|
`chi-node-02:/var/log/ganeti/os/add-debootstrap+buster-tb-pkgstage-01.torproject.org-2021-08-26_14_13_04.log`,
|
|
|
we can find the real problem:
|
|
|
|
|
|
```
|
|
|
Setting up linux-image-4.19.0-17-amd64 (4.19.194-3) ...
|
|
|
/etc/kernel/postinst.d/initramfs-tools:
|
|
|
update-initramfs: Generating /boot/initrd.img-4.19.0-17-amd64
|
|
|
W: Couldn't identify type of root file system for fsck hook
|
|
|
/etc/kernel/postinst.d/zz-update-grub:
|
|
|
/usr/sbin/grub-probe: error: cannot find a device for / (is /dev mounted?).
|
|
|
run-parts: /etc/kernel/postinst.d/zz-update-grub exited with return code 1
|
|
|
dpkg: error processing package linux-image-4.19.0-17-amd64 (--configure):
|
|
|
installed linux-image-4.19.0-17-amd64 package post-installation script subprocess returned error exit status 1
|
|
|
```
|
|
|
|
|
|
In this case, oddly enough, even though Ganeti thought the install had
|
|
|
failed, the machine can actually start:
|
|
|
|
|
|
```
|
|
|
gnt-instance start tb-pkgstage-01.torproject.org
|
|
|
```
|
|
|
|
|
|
... and after a while, we can even get a console:
|
|
|
|
|
|
```
|
|
|
gnt-instance start tb-pkgstage-01.torproject.org
|
|
|
```
|
|
|
|
|
|
And in *that* case, the procedure can just continue from here on:
|
|
|
reset the root password, and just make sure you finish the install:
|
|
|
|
|
|
```
|
|
|
apt install linux-image-amd64
|
|
|
```
|
|
|
|
|
|
In the above case, the `sources-list` post-install hook was buggy: it
|
|
|
wasn't mounting `/dev` and friends before launching the upgrades,
|
|
|
which was causing issues when a kernel upgrade was queued.
|
|
|
|
|
|
## Modifying an instance
|
|
|
|
|
|
### CPU, memory changes
|
... | ... | |