15:39:32 <anarcat> our global disk usage went up by 5TiB in the last *day*15:40:21 <anarcat> if i read this right: https://grafana.torproject.org/d/zbCoGRjnz/disk-usage?orgId=1&from=now-7d&to=now15:41:24 <anarcat> like wtf happened here https://grafana.torproject.org/d/zbCoGRjnz/disk-usage?orgId=1&from=now-7d&to=now&viewPanel=113715:43:21 <anarcat> well fuck15:47:13 <anarcat> i ran ncdu on bungei15:47:18 <anarcat> archive-01 is using 27TiB15:48:05 <anarcat> it could be the *rotation* on there, bacula would still have the older versions of the files and things got overwritten on archive-01, so we don't see it rising, but it's taking up more space in bacula15:48:56 <anarcat> yeah, it's archive-0115:49:13 <anarcat> it's been pounding one new 500GB volume after the other for 24h now15:49:21 <anarcat> -rw-r----- 1 bacula bacula 500G Oct 17 23:33 torproject-archive-01.torproject.org-full.2023-10-17_20:1815:49:24 <anarcat> -rw-r----- 1 bacula bacula 500G Oct 18 01:33 torproject-archive-01.torproject.org-full.2023-10-17_23:3415:49:29 <anarcat> ...15:49:29 <anarcat> -rw-r----- 1 bacula bacula 500G Oct 18 12:01 torproject-archive-01.torproject.org-full.2023-10-18_10:1515:49:33 <anarcat> -rw-r----- 1 bacula bacula 349G Oct 18 13:13 torproject-archive-01.torproject.org-full.2023-10-18_12:0115:51:02 <anarcat> i'm going to tune down the reserved block size15:51:33 <anarcat> it seems we have 2.7TB reserved15:51:52 <anarcat> which is half a percent. i think15:52:07 <anarcat> Setting reserved blocks percentage to 0.01% (1530082 blocks)15:53:56 <anarcat> Filesystem Size Used Avail Use% Mounted on15:53:57 <anarcat> /dev/mapper/vg_bulk-backups--bacula 57T 55T 2.5T 96% /srv/backups/bacula15:55:22 <anarcat> yep, so archive-01's /srv/archive.torproject.org/htdocs/tor-package-archive/torbrowser is 5TB15:58:32 <anarcat> i ran the "mount" command to unblock the backup jobs for relay-01 and archive-0115:58:36 <anarcat> we might run out of disk space again16:00:56 <anarcat> i suspect this might work better with borg16:01:04 <anarcat> but i've never scaled borg up to ~100TiB16:03:10 <anarcat> oh dear16:03:10 <anarcat> JobId Level Files Bytes Status Finished Name 16:03:10 <anarcat> ====================================================================16:03:15 <anarcat> 246156 Full 684,887 4.665 T Other 18-Oct-23 13:15 archive-01.torproject.org16:03:21 <anarcat> so it looks like the full failed16:03:27 <anarcat> and it's going to RETRY IT16:03:32 <anarcat> so we're going to full up *another* 5TB16:07:22 <anarcat> i cleared the files from the previous, failed full backup16:07:42 <anarcat> it's possible this damage the files from the current full backup, but that beats disabling backups for the entire infrastructure16:07:52 <anarcat> we now have 6.7T left
so tldr: it seems the TB archive is 5TB and was fully rotated with the 13 release, which meant the full added a 5TiB of storage on the backup server. the archive-01 backup job failed, then when i allocated more diskspace (by tuning the free space), the job could resume (with the mount command) but then in a new set volume, hoping that the new ~2TB of disk space would be sufficient.
... but maybe bacula is unhappy because there isn't enough free space to go around (@lavamind's idea). so let's just give that poor server more space, we still have some wiggle room.
JobId Type Level Files Bytes Name Status======================================================================246219 Back Full 723,866 5.763 T archive-01.torproject.org is running246221 Back Incr 0 0 onionoo-backend-01.torproject.org is running246222 Back Incr 0 0 dangerzone-01.torproject.org is waiting for a mount request246223 Back Incr 0 0 ns5.torproject.org is waiting for a mount request246224 Back Incr 0 0 tb-build-05.torproject.org is waiting for a mount request246225 Back Incr 0 0 crm-ext-01.torproject.org is waiting for a mount request246226 Back Incr 0 0 media-01.torproject.org is waiting for a mount request246227 Back Incr 0 0 weather-01.torproject.org is waiting for a mount request246228 Back Incr 0 0 neriniflorum.torproject.org is waiting for a mount request246229 Back Incr 0 0 tb-build-02.torproject.org is waiting for a mount request246230 Back Incr 0 0 survey-01.torproject.org is waiting for a mount request====Terminated Jobs: JobId Level Files Bytes Status Finished Name ====================================================================246241 Incr 1,748 204.2 M OK 18-Oct-23 22:37 fsn-node-02.torproject.org246242 Incr 208 65.54 M OK 18-Oct-23 22:43 tb-tester-01.torproject.org246244 Incr 646 450.3 M OK 18-Oct-23 22:48 rude.torproject.org246245 Incr 302 249.2 M OK 18-Oct-23 23:04 perdulce.torproject.org246246 Incr 1,712 176.3 M OK 18-Oct-23 23:09 fsn-node-05.torproject.org246247 Incr 345 6.928 G OK 18-Oct-23 23:23 probetelemetry-01.torproject.org246248 Incr 407 6.120 G OK 18-Oct-23 23:33 minio-01.torproject.org246249 Incr 303 56.57 M OK 18-Oct-23 23:34 dal-rescue-01.torproject.org246250 Incr 29,385 62.60 G OK 19-Oct-23 01:14 tb-build-06.torproject.org246220 Incr 180 282.8 M OK 19-Oct-23 15:16 ci-runner-x86-02.torproject.org====*mount jobid=2462223001 OK mount requested. Device="FileStorage-dangerzone-01.torproject.org" (/srv/backups/bacula/dangerzone-01.torproject.org)*mount jobid=2462233001 OK mount requested. Device="FileStorage-ns5.torproject.org" (/srv/backups/bacula/ns5.torproject.org)*mount jobid=2462243001 OK mount requested. Device="FileStorage-tb-build-05.torproject.org" (/srv/backups/bacula/tb-build-05.torproject.org)*mount jobid=2462253001 OK mount requested. Device="FileStorage-crm-ext-01.torproject.org" (/srv/backups/bacula/crm-ext-01.torproject.org)^[[A*mount jobid=2462263001 OK mount requested. Device="FileStorage-media-01.torproject.org" (/srv/backups/bacula/media-01.torproject.org)*mount jobid=2462273001 OK mount requested. Device="FileStorage-weather-01.torproject.org" (/srv/backups/bacula/weather-01.torproject.org)*mount jobid=2462283001 OK mount requested. Device="FileStorage-neriniflorum.torproject.org" (/srv/backups/bacula/neriniflorum.torproject.org)*mount jobid=2462293001 OK mount requested. Device="FileStorage-tb-build-02.torproject.org" (/srv/backups/bacula/tb-build-02.torproject.org)*mount jobid=2462303001 OK mount requested. Device="FileStorage-survey-01.torproject.org" (/srv/backups/bacula/survey-01.torproject.org)*
and it looks like that worked, suspiciously quickly:
JobId Level Files Bytes Status Finished Name ====================================================================246250 Incr 29,385 62.60 G OK 19-Oct-23 01:14 tb-build-06.torproject.org246220 Incr 180 282.8 M OK 19-Oct-23 15:16 ci-runner-x86-02.torproject.org246226 Incr 237 83.55 M OK 19-Oct-23 15:17 media-01.torproject.org246222 Incr 259 208.7 M OK 19-Oct-23 15:17 dangerzone-01.torproject.org246227 Incr 284 150.1 M OK 19-Oct-23 15:17 weather-01.torproject.org246225 Incr 1,296 249.0 M OK 19-Oct-23 15:17 crm-ext-01.torproject.org246223 Incr 220 1.181 G OK 19-Oct-23 15:17 ns5.torproject.org246228 Incr 230 1.596 G OK 19-Oct-23 15:17 neriniflorum.torproject.org246230 Incr 304 80.13 M OK 19-Oct-23 15:17 survey-01.torproject.org246229 Incr 264 191.4 M OK 19-Oct-23 15:17 tb-build-02.torproject.org