Disk 0 failed to receive data: Exited with status 1 (recent output: socat: W ioctl(9, IOCTL_VM_SOCKETS_GET_LOCAL_CID, ...): Inappropriate ioctl for device\n0+0 records in\n0+0 records out\n0 bytes copied, 12.2305 s, 0.0 kB/s)
Is *probably* a due to a certification verification bug in Ganeti's
import-export daemon. It should be confirmed in the logs in
`/var/log/ganeti/os` on the relevant node. The actual confirmation log
is:
Disk 0 failed to send data: Exited with status 1 (recent output: socat: E certificate is valid but its commonName does not match hostname "ganeti.example.com")
That is upstream bug [1681](https://github.com/ganeti/ganeti/issues/1681) that should have been fixed in [PR
WARNING: Could not snapshot disk/2 on node chi-node-10.torproject.org: Error while executing backend function: Not enough free space: required 20480, available 15364.0
That is because the volume group doesn't have enough room to make a
snapshot. In this case, there was a 300GB swap partition on the node
(!) that could easily be removed, but an alternative would be to
evacuate other instances off of the node (even as secondaries) to free
up some space.
#### Snapshot failure
If the procedure fails with:
ganeti.errors.OpExecError: Not all disks could be snapshotted, and you did not allow the instance to remain offline for a longer time through the --long-sleep option;
aborting
... try again with the VM stopped.
#### Connectivity issues
If the procedure fails during the data transfer with:
pycurl.error: (7, 'Failed to connect to chi-node-01.torproject.org port 5080: Connection refused')
or:
Disk 0 failed to send data: Exited with status 1 (recent output: dd: 0 bytes copied, 0.996381 s, 0.0 kB/s\ndd: 0 bytes copied, 5.99901 s, 0.0 kB/s\nsocat: E SSL_connect(): Connection refused)
... make sure you have the firewalls opened. Note that Puppet or other
things might clear out the temporary firewall rules established in the
preparation step.
#### DNS issues
This error:
ganeti.errors.OpPrereqError: ('The given name (metrics-psqlts-01.torproject.org.2.8.0.0.0.0.0.5.0.0.8.8.4.0.6.2.ip6.arpa) does not resolve: Name or service not known', 'resolver_error')
... means the reverse DNS on the instance has not been properly
configured. In this case, the fix was to add a trailing dot to the
`PTR` record:
```diff
--- a/2.8.0.0.0.0.0.5.0.0.8.8.4.0.6.2.ip6.arpa
+++ b/2.8.0.0.0.0.0.5.0.0.8.8.4.0.6.2.ip6.arpa
@@ -55,7 +55,7 @@ b.c.b.7.0.c.e.f.f.f.8.3.6.6.4.0 IN PTR ci-runner-x8
6-01.torproject.org.
; 2604:8800:5000:82:466:38ff:fe3c:f0a7
7.a.0.f.c.3.e.f.f.f.8.3.6.6.4.0 IN PTR dangerzone-01.torproject.org.
; 2604:8800:5000:82:466:38ff:fe97:24ac
-c.a.4.2.7.9.e.f.f.f.8.3.6.6.4.0 IN PTR metrics-psqlts-01.torproject.
org
+c.a.4.2.7.9.e.f.f.f.8.3.6.6.4.0 IN PTR metrics-psqlts-01.torproject.org.
; 2604:8800:5000:82:466:38ff:fed4:51a1
1.a.1.5.4.d.e.f.f.f.8.3.6.6.4.0 IN PTR onion-test-01.torproject.org.
; 2604:8800:5000:82:466:38ff:fea3:7c78
```
#### Capacity issues
If the procedure fails with:
ganeti.errors.OpPrereqError: ('Instance allocation to group 64c116fc-1ab2-4f6d-ba91-89c65875f888 (default) violates policy: memory-size value 307200 is not in range [128, 65536]', 'wrong_input')
It's because the VM is smaller or bigger than the cluster
configuration allow. You need to change the `--ipolicy-bounds-specs`
in the cluster, see, for example, the [gnt-dal cluster
WARNING: Failed to run rename script for tpa-bootstrap-01.torproject.org on node dal-node-02.torproject.org: OS rename script failed (exited with exit code 1), last lines in the log file:\nCannot rename from tpa-bootstrap-01.torproject.org to tpa-bootstrap-01.torproject.org:\nInstance has a different hostname (tpa-bootstrap-01)
It's probably a flaw in the `ganeti-instance-debootstrap` backend that
doesn't properly renumber the instance. We have our own renumbering
procedure in Fabric instead, but that could be merged inside
`ganeti-instance-debootstrap` eventually.
#### Tracing executed commands
Finally, to trace which commands are executed (which can be
challenging in Ganeti), the `execsnoop.bt` command (from the [bpftrace
...
...
@@ -1823,6 +1987,11 @@ The `execsnoop` command (from the [libbpf-tools package](https://tracker.debian.
work but it truncates the command after 128 characters ([Debian