@anarcat this is temporary. I download the files and then delete them and download again. Maybe a little more space would make the alert go away. If we have space that would be great.
@hiro how much do you need? you only have a stock 10G for the system on this thing, i could easily attach another 20G SSD on there, would that be enough?
hiro confirmed a 20G would be okay. turns out this is on gnt-chi so it's not SSD at all, it's those SAS drives.
we also didn't have room on the primary, so i flipped to secondary and reallocated disks elsewhere:
root@chi-node-01:~# gnt-instance migrate metrics-psqlts-01.torproject.orgInstance metrics-psqlts-01.torproject.org will be migrated. Note thatmigration might impact the instance if anything goes wrong (e.g. dueto bugs in the hypervisor). Continue?y/[n]/?: yMon Oct 3 15:02:49 2022 Migrating instance metrics-psqlts-01.torproject.orgMon Oct 3 15:02:50 2022 * checking disk consistency between source and targetMon Oct 3 15:02:51 2022 * closing instance disks on node chi-node-10.torproject.orgMon Oct 3 15:02:52 2022 * changing into standalone modeMon Oct 3 15:02:53 2022 * changing disks into dual-master modeMon Oct 3 15:02:54 2022 * wait until resync is doneMon Oct 3 15:02:55 2022 * opening instance disks on node chi-node-11.torproject.org in shared modeMon Oct 3 15:02:55 2022 * opening instance disks on node chi-node-10.torproject.org in shared modeMon Oct 3 15:02:56 2022 * preparing chi-node-10.torproject.org to accept the instanceMon Oct 3 15:02:56 2022 * migrating instance to chi-node-10.torproject.orgMon Oct 3 15:02:57 2022 * starting memory transferMon Oct 3 15:03:08 2022 * memory transfer progress: 14.56 %Mon Oct 3 15:03:19 2022 * memory transfer progress: 29.38 %Mon Oct 3 15:03:29 2022 * memory transfer progress: 44.02 %Mon Oct 3 15:03:40 2022 * memory transfer progress: 58.63 %Mon Oct 3 15:03:51 2022 * memory transfer progress: 73.26 %Mon Oct 3 15:04:02 2022 * memory transfer progress: 87.76 %Mon Oct 3 15:04:10 2022 * memory transfer completeMon Oct 3 15:04:10 2022 * closing instance disks on node chi-node-11.torproject.orgMon Oct 3 15:04:11 2022 * wait until resync is doneMon Oct 3 15:04:12 2022 * changing into standalone modeMon Oct 3 15:04:12 2022 * changing disks into single-master modeMon Oct 3 15:04:14 2022 * wait until resync is doneMon Oct 3 15:04:14 2022 * doneroot@chi-node-01:~# gnt-instance modify --disk add:size=20g metrics-psqlts-01.torproject.orgFailure: command execution error:Can't create block device <LogicalVolume(/dev/vg_ganeti/8053521e-336a-42a2-967a-76f5b0b91032.disk2_data, not visible, size=20480m)> on node chi-node-10.torproject.org for instance metrics-psqlts-01.torproject.org: Can't create block device: Not enough free space: required 20480, available 15108.0-node-01:~# gnt-instance replace-disks -I . metrics-psqlts-01.torproject.orgMon Oct 3 15:10:40 2022 - INFO: Selected new secondary for instance '280b14af-c029-4255-873b-48d2e6ba716a': chi-node-03.torproject.orgMon Oct 3 15:10:40 2022 Replacing disk(s) 0, 1 for instance 'metrics-psqlts-01.torproject.org'Mon Oct 3 15:10:40 2022 Current primary node: chi-node-10.torproject.orgMon Oct 3 15:10:40 2022 Current secondary node: chi-node-11.torproject.orgMon Oct 3 15:10:40 2022 STEP 1/6 Check device existenceMon Oct 3 15:10:40 2022 - INFO: Checking disk/0 on chi-node-10.torproject.orgMon Oct 3 15:10:40 2022 - INFO: Checking disk/1 on chi-node-10.torproject.orgMon Oct 3 15:10:41 2022 - INFO: Checking volume groupsMon Oct 3 15:10:41 2022 STEP 2/6 Check peer consistencyMon Oct 3 15:10:41 2022 - INFO: Checking disk/0 consistency on node chi-node-10.torproject.orgMon Oct 3 15:10:42 2022 - INFO: Checking disk/1 consistency on node chi-node-10.torproject.orgMon Oct 3 15:10:42 2022 STEP 3/6 Allocate new storageMon Oct 3 15:10:42 2022 - INFO: Adding new local storage on chi-node-03.torproject.org for disk/0Mon Oct 3 15:10:44 2022 - INFO: Adding new local storage on chi-node-03.torproject.org for disk/1Mon Oct 3 15:10:46 2022 STEP 4/6 Changing drbd configurationMon Oct 3 15:10:46 2022 - INFO: activating a new drbd on chi-node-03.torproject.org for disk/0Mon Oct 3 15:10:49 2022 - INFO: activating a new drbd on chi-node-03.torproject.org for disk/1Mon Oct 3 15:10:52 2022 - INFO: Shutting down drbd for disk/0 on old nodeMon Oct 3 15:10:53 2022 - INFO: Shutting down drbd for disk/1 on old nodeMon Oct 3 15:10:53 2022 - INFO: Detaching primary drbds from the network (=> standalone)Mon Oct 3 15:10:54 2022 - INFO: Updating instance configurationMon Oct 3 15:10:54 2022 - INFO: Attaching primary drbds to new secondary (standalone => connected)Mon Oct 3 15:10:56 2022 STEP 5/6 Sync devicesMon Oct 3 15:10:56 2022 - INFO: Waiting for instance metrics-psqlts-01.torproject.org to sync disksMon Oct 3 15:10:57 2022 - INFO: - device disk/0: 0.20% done, 33m 35s remaining (estimated)Mon Oct 3 15:10:57 2022 - INFO: - device disk/1: 0.60% done, 4m 8s remaining (estimated)Mon Oct 3 15:11:57 2022 - INFO: - device disk/0: 17.40% done, 4m 22s remaining (estimated)Mon Oct 3 15:11:57 2022 - INFO: - device disk/1: 82.50% done, 11s remaining (estimated)Mon Oct 3 15:12:09 2022 - INFO: - device disk/0: 21.60% done, 3m 34s remaining (estimated)Mon Oct 3 15:13:10 2022 - INFO: - device disk/0: 44.10% done, 2m 23s remaining (estimated)Mon Oct 3 15:14:10 2022 - INFO: - device disk/0: 66.50% done, 1m 31s remaining (estimated)Mon Oct 3 15:15:11 2022 - INFO: - device disk/0: 89.20% done, 28s remaining (estimated)Mon Oct 3 15:15:39 2022 - INFO: - device disk/0: 99.80% done, 0s remaining (estimated)Mon Oct 3 15:15:40 2022 - INFO: - device disk/0: 100.00% done, 0s remaining (estimated)Mon Oct 3 15:15:41 2022 - INFO: Instance metrics-psqlts-01.torproject.org's disks are in syncMon Oct 3 15:15:41 2022 STEP 6/6 Removing old storageMon Oct 3 15:15:41 2022 - INFO: Remove logical volumes for 0Mon Oct 3 15:15:42 2022 - INFO: Remove logical volumes for 1
migrating to a new secondary:
root@chi-node-01:~# gnt-instance migrate metrics-psqlts-01.torproject.orgInstance metrics-psqlts-01.torproject.org will be migrated. Note thatmigration might impact the instance if anything goes wrong (e.g. dueto bugs in the hypervisor). Continue?y/[n]/?: yMon Oct 3 15:17:59 2022 Migrating instance metrics-psqlts-01.torproject.orgMon Oct 3 15:17:59 2022 * checking disk consistency between source and targetMon Oct 3 15:18:01 2022 * closing instance disks on node chi-node-03.torproject.orgMon Oct 3 15:18:02 2022 * changing into standalone modeMon Oct 3 15:18:02 2022 * changing disks into dual-master modeMon Oct 3 15:18:04 2022 * wait until resync is doneMon Oct 3 15:18:05 2022 * opening instance disks on node chi-node-10.torproject.org in shared modeMon Oct 3 15:18:05 2022 * opening instance disks on node chi-node-03.torproject.org in shared modeMon Oct 3 15:18:06 2022 * preparing chi-node-03.torproject.org to accept the instanceMon Oct 3 15:18:06 2022 * migrating instance to chi-node-03.torproject.orgMon Oct 3 15:18:06 2022 * starting memory transferMon Oct 3 15:18:17 2022 * memory transfer progress: 14.61 %Mon Oct 3 15:18:28 2022 * memory transfer progress: 29.51 %Mon Oct 3 15:18:39 2022 * memory transfer progress: 44.07 %Mon Oct 3 15:18:50 2022 * memory transfer progress: 58.71 %Mon Oct 3 15:19:01 2022 * memory transfer progress: 73.45 %Mon Oct 3 15:19:12 2022 * memory transfer progress: 88.33 %Mon Oct 3 15:19:19 2022 * memory transfer has switched to postcopyMon Oct 3 15:19:20 2022 * memory transfer completeMon Oct 3 15:19:20 2022 * closing instance disks on node chi-node-10.torproject.orgMon Oct 3 15:19:21 2022 * wait until resync is doneMon Oct 3 15:19:22 2022 * changing into standalone modeMon Oct 3 15:19:22 2022 * changing disks into single-master modeMon Oct 3 15:19:24 2022 * wait until resync is doneMon Oct 3 15:19:25 2022 * done
reallocating secondary:
ot@chi-node-01:~# gnt-instance replace-disks -I . metrics-psqlts-01.torproject.orgMon Oct 3 15:25:27 2022 - INFO: Selected new secondary for instance '280b14af-c029-4255-873b-48d2e6ba716a': chi-node-11.torproject.orgMon Oct 3 15:25:27 2022 Replacing disk(s) 0, 1 for instance 'metrics-psqlts-01.torproject.org'Mon Oct 3 15:25:27 2022 Current primary node: chi-node-03.torproject.orgMon Oct 3 15:25:27 2022 Current secondary node: chi-node-10.torproject.orgMon Oct 3 15:25:27 2022 STEP 1/6 Check device existenceMon Oct 3 15:25:27 2022 - INFO: Checking disk/0 on chi-node-03.torproject.orgMon Oct 3 15:25:27 2022 - INFO: Checking disk/1 on chi-node-03.torproject.orgMon Oct 3 15:25:27 2022 - INFO: Checking volume groupsMon Oct 3 15:25:28 2022 STEP 2/6 Check peer consistencyMon Oct 3 15:25:28 2022 - INFO: Checking disk/0 consistency on node chi-node-03.torproject.orgMon Oct 3 15:25:28 2022 - INFO: Checking disk/1 consistency on node chi-node-03.torproject.orgMon Oct 3 15:25:29 2022 STEP 3/6 Allocate new storageMon Oct 3 15:25:29 2022 - INFO: Adding new local storage on chi-node-11.torproject.org for disk/0Mon Oct 3 15:25:31 2022 - INFO: Adding new local storage on chi-node-11.torproject.org for disk/1Mon Oct 3 15:25:32 2022 STEP 4/6 Changing drbd configurationMon Oct 3 15:25:32 2022 - INFO: activating a new drbd on chi-node-11.torproject.org for disk/0Mon Oct 3 15:25:35 2022 - INFO: activating a new drbd on chi-node-11.torproject.org for disk/1Mon Oct 3 15:25:37 2022 - INFO: Shutting down drbd for disk/0 on old nodeMon Oct 3 15:25:38 2022 - INFO: Shutting down drbd for disk/1 on old nodeMon Oct 3 15:25:39 2022 - INFO: Detaching primary drbds from the network (=> standalone)Mon Oct 3 15:25:40 2022 - INFO: Updating instance configurationMon Oct 3 15:25:40 2022 - INFO: Attaching primary drbds to new secondary (standalone => connected)Mon Oct 3 15:25:42 2022 STEP 5/6 Sync devicesMon Oct 3 15:25:42 2022 - INFO: Waiting for instance metrics-psqlts-01.torproject.org to sync disksMon Oct 3 15:25:42 2022 - INFO: - device disk/0: 0.40% done, 5m 22s remaining (estimated)Mon Oct 3 15:25:42 2022 - INFO: - device disk/1: 2.00% done, 55s remaining (estimated)Mon Oct 3 15:26:38 2022 - INFO: - device disk/0: 20.80% done, 3m 32s remaining (estimated)Mon Oct 3 15:27:41 2022 - INFO: - device disk/0: 37.70% done, 4m 25s remaining (estimated)Mon Oct 3 15:28:45 2022 - INFO: - device disk/0: 50.90% done, 4m 48s remaining (estimated)Mon Oct 3 15:29:46 2022 - INFO: - device disk/0: 65.50% done, 3m 11s remaining (estimated)Mon Oct 3 15:30:49 2022 - INFO: - device disk/0: 74.50% done, 3m 38s remaining (estimated)Mon Oct 3 15:32:02 2022 - INFO: - device disk/0: 82.00% done, 6m 49s remaining (estimated)Mon Oct 3 15:33:03 2022 - INFO: Instance metrics-psqlts-01.torproject.org's disks are in syncMon Oct 3 15:33:04 2022 STEP 6/6 Removing old storageMon Oct 3 15:33:04 2022 - INFO: Remove logical volumes for 0Mon Oct 3 15:33:04 2022 - INFO: Remove logical volumes for 1
i rebooted the server for the change to take effect, formatted /dev/sdc in ext4+journaling, mounted it on /srv (after moving the parser thing out of there), and bind-mounted /var/lib/postgresql in there.
restarted postgresql, this seems resolved. let me know if you find any problem, @hiro
it looks like postgresql has a 4.5GB /var/log/postgresql/postgresql-13-main.log file... this might be related to the fact that DNS is hosed because unbound is crashed, possibly because we ran out of disk. fun times.
unbound was crashed because it created empty anchor files in /var/lib/unbound. i removed them, put a dummy external nameserver in resolv.conf, reran puppet (so it restores the anchor files) and it restarted everything properly (including bacula as well).
root@metrics-psqlts-01:/var/log/postgresql# logrotate -f /etc//logrotate.d/postgresql-commonerror: error writing to /var/log/postgresql/postgresql-13-main.log.1: No space left on deviceerror: error copying /var/log/postgresql/postgresql-13-main.log to /var/log/postgresql/postgresql-13-main.log.1: No space left on device
i just cleared that log file for now.
fwiw, that log file had a lot of warnings about being unable to connect to the backup server because DNS was done, it could be why it filled up the disk after the disk filled up on its own...
now, it looks like everything is back in order. i did see a lot of queries logged in there, so this might happen again, but that would be @hiro's problem then. :)