backup failure: disk full on bungei

The scheduler failed last night, starting at 08:09UTC:

From: root@bacula-director-01.torproject.org
Subject: Cron <root@bacula-director-01> sleep $(( $RANDOM % 60 )); flock -w 0 -e /usr/local/sbin/dsa-bacula-scheduler /usr/local/sbin/dsa-bacula-scheduler
To: root@bacula-director-01.torproject.org
Date: Fri, 22 Oct 2021 08:09:57 +0000

Traceback (most recent call last):
  File "/usr/local/sbin/dsa-bacula-scheduler", line 199, in <module>
    conn = psycopg2.connect(args.db)
  File "/usr/lib/python3/dist-packages/psycopg2/__init__.py", line 130, in connect
    conn = _connect(dsn, connection_factory=connection_factory, **kwasync)
psycopg2.OperationalError: FATAL:  remaining connection slots are reserved for non-replication superuser connections

we have a mail like this every 3 minutes. cause unclear.

we are also getting errors from individual jobs:

bacula-service@torproject.org (0 mins. ago) (backup rapports tor unread)
Subject: Bacula: Backup Fatal Error of static-master-fsn.torproject.org-fd Incremental
To: bacula-service@torproject.org
Date: Fri, 22 Oct 2021 13:34:48 +0000

22-Oct 05:33 bacula-director-01.torproject.org-dir JobId 175779: Start Backup JobId 175779, Job=static-master-fsn.torproject.org.2021-10-22_05.33.44_59
22-Oct 05:33 bacula-director-01.torproject.org-dir JobId 175779: There are no more Jobs associated with Volume "torproject-static-master-fsn.torproject.org-inc.2021-09-21_10:33". Marking it purged.
22-Oct 05:33 bacula-director-01.torproject.org-dir JobId 175779: New Pool is: poolgraveyard-torproject-static-master-fsn.torproject.org
22-Oct 05:33 bacula-director-01.torproject.org-dir JobId 175779: All records pruned from Volume "torproject-static-master-fsn.torproject.org-inc.2021-09-21_10:33"; marking it "Purged"
22-Oct 05:33 bacula-director-01.torproject.org-dir JobId 175779: Created new Volume="torproject-static-master-fsn.torproject.org-inc.2021-10-22_05:33", Pool="poolinc-torproject-static-master-fsn.torproject.org", MediaType="File-static-master-fsn.torproject.org" in catalog.
22-Oct 05:33 bacula-director-01.torproject.org-dir JobId 175779: Using Device "FileStorage-static-master-fsn.torproject.org" to write.
22-Oct 05:33 bacula-director-01.torproject.org-dir JobId 175779: Sending Accurate information to the FD.
22-Oct 05:33 bungei.torproject.org-sd JobId 175779: Fatal error: [SF0209] Out of freespace caused End of Volume "torproject-static-master-fsn.torproject.org-inc.2021-10-22_05:33" at 0 on device "FileStorage-static-master-fsn.torproject.org" (/srv/backups/bacula/static-master-fsn.torproject.org). Write of 366 bytes got -1.
22-Oct 05:33 static-master-fsn.torproject.org-fd JobId 175779: Fatal error: job.c:3013 Bad response from SD to Append Data command. Wanted 3000 OK data
, got len=320 msg="3903 Error append data: Read label block failed: requested Volume "torproject-static-master-fsn.torproject.org-inc.2021-10-22_05:33" on File device "FileStorage-static-master-fsn.torproject.org" (/srv/backups/bacula/static-master-fsn.torproject.org) is no"
22-Oct 05:33 bacula-director-01.torproject.org-dir JobId 175779: Error: Bacula bacula-director-01.torproject.org-dir 9.4.2 (04Feb19):
  Build OS:               x86_64-pc-linux-gnu debian 10.5
  JobId:                  175779
  Job:                    static-master-fsn.torproject.org.2021-10-22_05.33.44_59
  Backup Level:           Incremental, since=2021-10-21 05:12:44
  Client:                 "static-master-fsn.torproject.org-fd" 9.4.2 (04Feb19) x86_64-pc-linux-gnu,debian,10.5
  FileSet:                "Standard Set" 2014-09-06 20:30:19
  Pool:                   "poolinc-torproject-static-master-fsn.torproject.org" (From Job IncPool override)
  Catalog:                "MyCatalog" (From Client resource)
  Storage:                "File-static-master-fsn.torproject.org" (From Pool resource)
  Scheduled time:         22-Oct-2021 05:33:44
  Start time:             22-Oct-2021 05:33:50
  End time:               22-Oct-2021 05:33:59
  Elapsed time:           9 secs
  Priority:               10
  FD Files Written:       0
  SD Files Written:       0
  FD Bytes Written:       0 (0 B)
  SD Bytes Written:       0 (0 B)
  Rate:                   0.0 KB/s
  Software Compression:   None
  Comm Line Compression:  None
  Snapshot/VSS:           no
  Encryption:             no
  Accurate:               yes
  Volume name(s):         
  Volume Session Id:      906
  Volume Session Time:    1634072246
  Last Volume Bytes:      0 (0 B)
  Non-fatal FD errors:    1
  SD Errors:              1
  FD termination status:  Error
  SD termination status:  Error
  Termination:            *** Backup Error ***

22-Oct 05:33 bacula-director-01.torproject.org-dir JobId 175779: Rescheduled Job static-master-fsn.torproject.org.2021-10-22_05.33.44_59 at 22-Oct-2021 05:33 to re-run in 14400 seconds (22-Oct-2021 09:33).
22-Oct 05:33 bacula-director-01.torproject.org-dir JobId 175779: Error: openssl.c:68 TLS shutdown failure.: ERR=error:14094123:SSL routines:ssl3_read_bytes:application data after close notify
22-Oct 05:33 bacula-director-01.torproject.org-dir JobId 175779: Error: openssl.c:68 TLS shutdown failure.: ERR=error:14094123:SSL routines:ssl3_read_bytes:application data after close notify
22-Oct 05:34 bacula-director-01.torproject.org-dir JobId 175779: Job static-master-fsn.torproject.org.2021-10-22_05.33.44_59 waiting 14400 seconds for scheduled start time.
22-Oct 09:34 bacula-director-01.torproject.org-dir JobId 175779: Start Backup JobId 175779, Job=static-master-fsn.torproject.org.2021-10-22_05.33.44_59
22-Oct 09:34 bacula-director-01.torproject.org-dir JobId 175779: There are no more Jobs associated with Volume "torproject-static-master-fsn.torproject.org-inc.2021-09-22_09:00". Marking it purged.
22-Oct 09:34 bacula-director-01.torproject.org-dir JobId 175779: New Pool is: poolgraveyard-torproject-static-master-fsn.torproject.org
22-Oct 09:34 bacula-director-01.torproject.org-dir JobId 175779: All records pruned from Volume "torproject-static-master-fsn.torproject.org-inc.2021-09-22_09:00"; marking it "Purged"
22-Oct 09:34 bacula-director-01.torproject.org-dir JobId 175779: Created new Volume="torproject-static-master-fsn.torproject.org-inc.2021-10-22_09:34", Pool="poolinc-torproject-static-master-fsn.torproject.org", MediaType="File-static-master-fsn.torproject.org" in catalog.
22-Oct 09:34 bacula-director-01.torproject.org-dir JobId 175779: Using Device "FileStorage-static-master-fsn.torproject.org" to write.
22-Oct 09:34 bacula-director-01.torproject.org-dir JobId 175779: Sending Accurate information to the FD.
22-Oct 09:34 bungei.torproject.org-sd JobId 175779: Fatal error: [SF0209] Out of freespace caused End of Volume "torproject-static-master-fsn.torproject.org-inc.2021-10-22_09:34" at 0 on device "FileStorage-static-master-fsn.torproject.org" (/srv/backups/bacula/static-master-fsn.torproject.org). Write of 366 bytes got -1.
22-Oct 09:34 static-master-fsn.torproject.org-fd JobId 175779: Fatal error: job.c:3013 Bad response from SD to Append Data command. Wanted 3000 OK data
, got len=561 msg="3903 Error append data: Read label block failed: requested Volume "torproject-static-master-fsn.torproject.org-inc.2021-10-22_09:34" on File device "FileStorage-static-master-fsn.torproject.org" (/srv/backups/bacula/static-master-fsn.torproject.org) is no"
22-Oct 09:34 bungei.torproject.org-sd JobId 175779: Marking Volume "torproject-static-master-fsn.torproject.org-inc.2021-10-22_09:34" in Error in Catalog.
22-Oct 09:34 bungei.torproject.org-sd JobId 175779: Job static-master-fsn.torproject.org.2021-10-22_05.33.44_59 canceled while waiting for mount on Storage Device ""FileStorage-static-master-fsn.torproject.org" (/srv/backups/bacula/static-master-fsn.torproject.org)".
22-Oct 09:34 bungei.torproject.org-sd JobId 175779: Fatal error: Too many errors trying to mount File device "FileStorage-static-master-fsn.torproject.org" (/srv/backups/bacula/static-master-fsn.torproject.org).
22-Oct 09:34 bacula-director-01.torproject.org-dir JobId 175779: Error: bsock.c:388 Wrote 4 bytes to Storage daemon:bungei.torproject.org:9103, but only 0 accepted.
22-Oct 09:34 bacula-director-01.torproject.org-dir JobId 175779: Error: Bacula bacula-director-01.torproject.org-dir 9.4.2 (04Feb19):
  Build OS:               x86_64-pc-linux-gnu debian 10.5
  JobId:                  175779
  Job:                    static-master-fsn.torproject.org.2021-10-22_05.33.44_59
  Backup Level:           Incremental, since=2021-10-21 05:12:44
  Client:                 "static-master-fsn.torproject.org-fd" 9.4.2 (04Feb19) x86_64-pc-linux-gnu,debian,10.5
  FileSet:                "Standard Set" 2014-09-06 20:30:19
  Pool:                   "poolinc-torproject-static-master-fsn.torproject.org" (From Job IncPool override)
  Catalog:                "MyCatalog" (From Client resource)
  Storage:                "File-static-master-fsn.torproject.org" (From Pool resource)
  Scheduled time:         22-Oct-2021 05:33:44
  Start time:             22-Oct-2021 09:34:08
  End time:               22-Oct-2021 09:34:22
  Elapsed time:           14 secs
  Priority:               10
  FD Files Written:       0
  SD Files Written:       0
  FD Bytes Written:       0 (0 B)
  SD Bytes Written:       0 (0 B)
  Rate:                   0.0 KB/s
  Software Compression:   None
  Comm Line Compression:  None
  Snapshot/VSS:           no
  Encryption:             no
  Accurate:               yes
  Volume name(s):         
  Volume Session Id:      941
  Volume Session Time:    1634072246
  Last Volume Bytes:      0 (0 B)
  Non-fatal FD errors:    3
  SD Errors:              1
  FD termination status:  Error
  SD termination status:  Error
  Termination:            *** Backup Error ***

22-Oct 09:34 bacula-director-01.torproject.org-dir JobId 175779: Rescheduled Job static-master-fsn.torproject.org.2021-10-22_05.33.44_59 at 22-Oct-2021 09:34 to re-run in 14400 seconds (22-Oct-2021 13:34).
22-Oct 09:34 bacula-director-01.torproject.org-dir JobId 175779: Error: openssl.c:68 TLS shutdown failure.: ERR=error:14094123:SSL routines:ssl3_read_bytes:application data after close notify
22-Oct 09:34 bacula-director-01.torproject.org-dir JobId 175779: Job static-master-fsn.torproject.org.2021-10-22_05.33.44_59 waiting 14400 seconds for scheduled start time.
22-Oct 13:34 bacula-director-01.torproject.org-dir JobId 175779: Start Backup JobId 175779, Job=static-master-fsn.torproject.org.2021-10-22_05.33.44_59
22-Oct 13:34 bacula-director-01.torproject.org-dir JobId 175779: Created new Volume="torproject-static-master-fsn.torproject.org-inc.2021-10-22_13:34", Pool="poolinc-torproject-static-master-fsn.torproject.org", MediaType="File-static-master-fsn.torproject.org" in catalog.
22-Oct 13:34 bacula-director-01.torproject.org-dir JobId 175779: Using Device "FileStorage-static-master-fsn.torproject.org" to write.
22-Oct 13:34 bacula-director-01.torproject.org-dir JobId 175779: Sending Accurate information to the FD.
22-Oct 13:34 bungei.torproject.org-sd JobId 175779: Fatal error: [SF0209] Out of freespace caused End of Volume "torproject-static-master-fsn.torproject.org-inc.2021-10-22_13:34" at 0 on device "FileStorage-static-master-fsn.torproject.org" (/srv/backups/bacula/static-master-fsn.torproject.org). Write of 366 bytes got -1.
22-Oct 13:34 static-master-fsn.torproject.org-fd JobId 175779: Fatal error: job.c:3013 Bad response from SD to Append Data command. Wanted 3000 OK data
, got len=320 msg="3903 Error append data: Read label block failed: requested Volume "torproject-static-master-fsn.torproject.org-inc.2021-10-22_13:34" on File device "FileStorage-static-master-fsn.torproject.org" (/srv/backups/bacula/static-master-fsn.torproject.org) is no"
22-Oct 13:34 bacula-director-01.torproject.org-dir JobId 175779: Error: Bacula bacula-director-01.torproject.org-dir 9.4.2 (04Feb19):
  Build OS:               x86_64-pc-linux-gnu debian 10.5
  JobId:                  175779
  Job:                    static-master-fsn.torproject.org.2021-10-22_05.33.44_59
  Backup Level:           Incremental, since=2021-10-21 05:12:44
  Client:                 "static-master-fsn.torproject.org-fd" 9.4.2 (04Feb19) x86_64-pc-linux-gnu,debian,10.5
  FileSet:                "Standard Set" 2014-09-06 20:30:19
  Pool:                   "poolinc-torproject-static-master-fsn.torproject.org" (From Job IncPool override)
  Catalog:                "MyCatalog" (From Client resource)
  Storage:                "File-static-master-fsn.torproject.org" (From Pool resource)
  Scheduled time:         22-Oct-2021 05:33:44
  Start time:             22-Oct-2021 13:34:30
  End time:               22-Oct-2021 13:34:48
  Elapsed time:           18 secs
  Priority:               10
  FD Files Written:       0
  SD Files Written:       0
  FD Bytes Written:       0 (0 B)
  SD Bytes Written:       0 (0 B)
  Rate:                   0.0 KB/s
  Software Compression:   None
  Comm Line Compression:  None
  Snapshot/VSS:           no
  Encryption:             no
  Accurate:               yes
  Volume name(s):         
  Volume Session Id:      52
  Volume Session Time:    1634896469
  Last Volume Bytes:      0 (0 B)
  Non-fatal FD errors:    2
  SD Errors:              1
  FD termination status:  Error
  SD termination status:  Error
  Termination:            *** Backup Error ***

at least there the cause is clearer: bungei is full...

22-Oct 05:33 bungei.torproject.org-sd JobId 175779: Fatal error: [SF0209] Out of freespace caused End of Volume "torproject-static-master-fsn.torproject.org-inc.2021-10-22_05:33" at 0 on device "FileStorage-static-master-fsn.torproject.org" (/srv/backups/bacula/static-master-fsn.torproject.org). Write of 366 bytes got -1.
Edited Oct 22, 2021 by anarcat
Assignee Loading
Time tracking Loading