failure to create SAN-backed VM in gnt-chi

me and @lavamind tried to create a VM in the gnt-chi cluster, backed by the SAN, and we couldn't figure it out. this was for #40683 (closed).

at first, the problem was this:

09:36:12 <lavamind> aaargh
09:36:21 <lavamind> it created a 150G root partition
09:36:31 <lavamind> yeah thats not good

then anarcat tried to create GPT partitions for the device and make ganeti adopt this:

gnt-instance add       -n chi-node-01.torproject.org       -o debootstrap+bullseye       -t blockdev --no-wait-for-sync       --net 0:ip=pool,network=gnt-chi-01       --no-ip-check       --no-name-check       --disk 0:adopt=/dev/disk/by-id/dm-name-telegram-bot-01       --backend-parameters memory=8g,vcpus=2       telegram-bot-01.torproject.org
gnt-instance shutdown --timeout=0 telegram-bot-01.torproject.org
gnt-instance reinstall telegram-bot-01.torproject.org

and that didn't work at all: it failed with

device-mapper: reload ioctl on telegram-bot-01-1  failed: No such device or address
create/reload failed on telegram-bot-01-1
mke2fs: No such file or directory while trying to determine filesystem size

that seems to be a problem in the patch lavamind submitted to work with the SAN. after hot-fixing this, the script would still fail with:

Re-reading the partition table failed.: Invalid argument

it was found that the partition was being recreated by the install script, specifically in the create hook, because of the default PARTITION_STYLE=msdos.

then we tried with PARTITION_STYLE=none in /etc/default/ganeti-instance-debootstrap. @anarcat also ran partprobe because that was recommended by sgdisk, but that turned out to be a bad idea because it added a bunch of irrelevant mappings everywhere.

with PARTITION_STYLE=none, the VM does go through the install, but somehow fails silently. after the failed install, it's in the ADMIN_down state. it's unclear why it fails, because all hooks complete succesfully. last lines of the install log:

I: swap configuration hook in /etc/ganeti/instance-debootstrap/hooks/swap
Only one disk found, creating a 512M /swapfile instead
512+0 records in                                  
512+0 records out                                 
536870912 bytes (537 MB, 512 MiB) copied, 0.747554 s, 718 MB/s
mkswap: /tmp/tmp.seBabqJnRb/swapfile: insecure permissions 0644, 0600 suggested.
Setting up swapspace version 1, size = 512 MiB (536866816 bytes)
no label, UUID=ab94d001-901b-49e1-badf-8fc06966e554
I: make /tmp a tmpfs                              

notice how "only one disk" is "found" above... that's not a great sign. after the install, also, the partition table on the device is completely empty, which is reasonable because ... well, that's what we asked for.

a possible fix is to do multiple --disk adopt... stanzas. This is how web-chi-03 (#40193 (closed)) was setup (see #40131 (comment 2728663)) and could serve as a simple workaround for now, which probably doesn't even require hacking at the PARTITION_STYLE. the downside, of course, is significant confusion because you have a partition setup at the SAN layer and then, on the first partition, another MSDOS partition setup. a little odd.

there's an issue upstream about GPT support but in my opinion, it's kind of a distraction... regardless of the partition format, ganeti-instance-debootstrap should be able to handle partitioned drives correctly...

right now it looks like it just wipes whatever you give it, with nothing (if PARTITION_STYLE=none) or with a MSDOS partition (if PARTITION_STYLE=msdos). one has to wonder why it bothers with partitionning in the first place, in a sense...

update: we deliberated quite a bit on design here, and here's the checklist we came up with in the end.

  • rewrite tpo-create-san-disks in Python
  • add support for handling multipath configuration across the cluster
  • update the ganeti node creation to mention copying the config from a previous node in the gnt-chi cluster
  • update the instance gnt-chi creation docs to use a single disk by default (and warn that swapfile should be resized after install if we worry about memory usage)
  • summarize this ticket and design decisions in a wiki page ... somewhere? possibly howto/new-machine-cymru?
Edited by anarcat