improve large bacula backup duration
Large, full backups with many files and reasonably large dataset take enormous amounts of time to complete. Most of the time is spent in the SD despooling attributes
state where the bacula director is inserting lots of data into PostgreSQL. During that phase, many many temporary files are created and large amounts of I/O is happening with those files. Because the director and storage daemon (bungei) are locked together for so long, it adds sadness to the processes of adding new machines (which require a restart of bacula-sd
) and reboots (which can interrupt those long running backups).
Right now the entire database is on HDD, including the temp files, so it's likely the bottleneck is due to slow storage. It could also be that memory allocation for PostgreSQL is not ideal, and we could tweak some setting to avoid or at least minimize the use of on-disk temporary files.
We could try one or more of these things:
- Tune PostgreSQL (eg. with the help of postgresqltuner)
- Add a new 10-20G SSD volume and bind mount
/srv/postgresql/13/main/base/pgsql_tmp
to it - Move
/srv
entirely to a new SSD volume
27-Sep 14:30 bacula-director-01.torproject.org-dir JobId 208960: Bacula bacula-director-01.torproject.org-dir 9.6.7 (10Dec20):
Build OS: x86_64-pc-linux-gnu debian bullseye/sid JobId: 208960
Job: corsicum.torproject.org.2022-09-26_12.03.12_19
Backup Level: Full
Client: "corsicum.torproject.org-fd" 9.4.2 (04Feb19) x86_64-pc-linux-gnu,debian,10.5
FileSet: "Standard Set" 2014-09-06 20:30:19
Pool: "poolfull-torproject-corsicum.torproject.org" (From Job resource)
Catalog: "MyCatalog" (From Client resource)
Storage: "File-corsicum.torproject.org" (From Pool resource)
Scheduled time: 26-Sep-2022 12:03:12
Start time: 26-Sep-2022 12:03:18
End time: 27-Sep-2022 14:30:46
Elapsed time: 1 day 2 hours 27 mins 28 secs
Priority: 10
FD Files Written: 3,531,021
SD Files Written: 3,531,021
FD Bytes Written: 169,857,700,037 (169.8 GB)
SD Bytes Written: 170,586,082,199 (170.5 GB)
Rate: 1783.3 KB/s
Software Compression: None
Comm Line Compression: 27.0% 1.4:1
Snapshot/VSS: no
Encryption: no
Accurate: yes
Volume name(s): torproject-corsicum.torproject.org-full.2022-09-26_12:03
Volume Session Id: 810
Volume Session Time: 1663534105
Last Volume Bytes: 170,794,999,476 (170.7 GB)
Non-fatal FD errors: 0
SD Errors: 0
FD termination status: OK
SD termination status: OK
Termination: Backup OK