title: Backup and restore procedures
- How to...
Most work on Bacula happens on the director, which is where backups are coordinated. Actual data is stored on the storage daemon, but the director is where we can issue commands and everything.
All commands below are ran from the
bconsole shell, which can be ran
on the director with:
root@bacula-director-01:~# bconsole Connecting to Director bacula-director-01.torproject.org:9101 1000 OK: 103 torproject-dir Version: 9.4.2 (04 February 2019) Enter a period to cancel a command. *
Then you end up with a shell with
* as a prompt where you can issue
Checking last jobs
To see the last jobs ran, you can check the status of the director:
*status director torproject-dir Version: 9.4.2 (04 February 2019) x86_64-pc-linux-gnu debian 9.7 Daemon started 22-Jul-19 10:30, conf reloaded 23-Jul-2019 12:43:41 Jobs: run=868, running=1 mode=0,0 Heap: heap=7,536,640 smbytes=701,360 max_bytes=21,391,382 bufs=4,518 max_bufs=8,576 Res: njobs=74 nclients=72 nstores=73 npools=291 ncats=1 nfsets=2 nscheds=2 Scheduled Jobs: Level Type Pri Scheduled Job Name Volume =================================================================================== Full Backup 15 03-Aug-19 02:10 BackupCatalog *unknown* ==== Running Jobs: Console connected using TLS at 02-Aug-19 15:41 JobId Type Level Files Bytes Name Status ====================================================================== 107689 Back Full 0 0 chiwui.torproject.org is waiting for its start time (02-Aug 19:32) ==== Terminated Jobs: JobId Level Files Bytes Status Finished Name ==================================================================== 107680 Incr 51,879 2.408 G OK 02-Aug-19 13:16 rouyi.torproject.org 107682 Incr 355 361.2 M OK 02-Aug-19 13:33 henryi.torproject.org 107681 Diff 12,864 715.9 M OK 02-Aug-19 13:34 pauli.torproject.org 107683 Incr 274 30.78 M OK 02-Aug-19 13:50 forrestii.torproject.org 107684 Incr 3,423 2.398 G OK 02-Aug-19 13:55 meronense.torproject.org 107685 Incr 288 32.24 M OK 02-Aug-19 14:12 nevii.torproject.org 107686 Incr 341 69.64 M OK 02-Aug-19 14:51 getulum.torproject.org 107687 Incr 289 26.24 M OK 02-Aug-19 15:11 dictyotum.torproject.org 107688 Incr 376 57.62 M OK 02-Aug-19 15:22 kvm5.torproject.org 107690 Incr 238 20.88 M OK 02-Aug-19 15:32 opacum.torproject.org ====
Here we see that no backups are running, and the last ones succeeded correctly.
You can also check the status of individual clients with
messages command shows the latest messages on the
bconsole. It's useful to run this command when you start your
session as it will flush the (usually quite long) buffer of
messages. That way the next time you call the command, you will only
see the result of your latest jobs.
Running a backup
Backups are ran regularly by a cron job, but if you need to run a
backup immediately, this can be done in the
The short version is to just run the
run command and pick the server
enter the console on the bacula director:
ssh -tt bacula-director-01.torproject.org bconsole
*run A job name must be specified. The defined Job resources are: 1: RestoreFiles 2: alberti.torproject.org 3: archive-01.torproject.org [...] 59: peninsulare.torproject.org
pick a server, for example
59, so enter
59and confirm by entering
Select Job resource (1-77): 59 Run Backup job JobName: peninsulare.torproject.org Level: Incremental Client: peninsulare.torproject.org-fd FileSet: Standard Set Pool: poolfull-torproject-peninsulare.torproject.org (From Job resource) Storage: File-peninsulare.torproject.org (From Pool resource) When: 2019-10-11 20:57:09 Priority: 10 OK to run? (yes/mod/no): yes Job queued. JobId=113225
bacula confirms the job is
queued. you can see the status of the job with
status director, which should show set of lines like this in the middle:
JobId Type Level Files Bytes Name Status ====================================================================== 113226 Back Incr 0 0 peninsulare.torproject.org is running
this will take more or less time depending on the size of the server. you can call
status directorrepeatedly to follow progress (for example, with
watch "echo status director | bconsole"in another shell or run the
messcommand to see new messages as they progress. when the backup completes, you should see something like this in the
Terminated Jobs: JobId Level Files Bytes Status Finished Name ==================================================================== 113225 Incr 33 11.67 M OK 11-Oct-19 20:59 peninsulare.torproject.org
That's it, new files were sucked in and you're good to do whatever nasty things you were about to do.
This section is more in-depths and will explain more concepts as we go. Relax, take a deep breath, it should go fine.
Configure backups on new machines
Backups for new machines should be automatically configured by Puppet
bacula::client class, included everywhere (through
There are special configurations required for MySQL and PostgreSQL databases, see the design section for more information on those.
$ ssh -tt bacula-director-01.torproject.org bconsole *restore
... and follow instructions. Reminder: by default, backups are
restored on the originating server.
llist jobid=N and
bconsole program has a pretty good interactive restore mode
which you can just call with
restore. It needs to know which "jobs"
you want to restore from. As a given backup job is typically an
incremental job, you normally mean multiple jobs to restore to a given
point in time.
The first thing to know is that restores are done from the server to
the client, ie. they are restored directly on the machine that is
backed up. Note that by default files will be owned by the
user because the file daemon runs as
bacula in our configuration. If
that's a problem for large backups, the override (in
/etc/systemd/system/bacula-fd.service.d/override.conf) can be
temporarily disabled by simply removing the file and restarting the
rm /etc/systemd/system/bacula-fd.service.d/override.conf systemctl restart bacula-fd
And then restarting the restore procedure. Note that this file well be
re-created by Puppet the next time it runs, so maybe you also want to
puppet --disable 'to respect the bacula-fd override'. In this
configuration, however, the file daemon can overwrite any file, so be
careful in this case.
A simple way of restoring a given client to a given point in time is to use the option. So:
bconsolein a shell on the director
*restore Automatically selected Catalog: MyCatalog Using Catalog "MyCatalog" First you select one or more JobIds that contain files to be restored. You will be presented several methods of specifying the JobIds. Then you will be allowed to select which files from those JobIds are to be restored.
you now have a list of possible ways of restoring, choose:
5: Select the most recent backup for a client:
To select the JobIds, you have the following choices: 1: List last 20 Jobs run 2: List Jobs where a given File is saved 3: Enter list of comma separated JobIds to select 4: Enter SQL list command 5: Select the most recent backup for a client 6: Select backup for a client before a specified time 7: Enter a list of files to restore 8: Enter a list of files to restore before a specified time 9: Find the JobIds of the most recent backup for a client 10: Find the JobIds for a backup for a client before a specified time 11: Enter a list of directories to restore for found JobIds 12: Select full restore to a specified Job date 13: Cancel Select item: (1-13): 5
you will see a list of machines, pick the machine you want to restore from by entering its number:
Defined Clients: 1: alberti.torproject.org-fd [...] 117: yatei.torproject.org-fd Select the Client (1-117): 87
you now get dropped in a file browser where you use the
unmarkcommands to mark and unmark files for restore. the commands support wildcards like
mark *to mark all files in the current directory, see also the full list of commands:
Automatically selected FileSet: Standard Set +---------+-------+----------+-----------------+---------------------+----------------------------------------------------------+ | jobid | level | jobfiles | jobbytes | starttime | volumename | +---------+-------+----------+-----------------+---------------------+----------------------------------------------------------+ | 106,348 | F | 363,125 | 157,545,039,843 | 2019-07-16 09:42:43 | torproject-full-perdulce.torproject.org.2019-07-16_09:42 | | 107,033 | D | 9,136 | 691,803,964 | 2019-07-25 06:30:15 | torproject-diff-perdulce.torproject.org.2019-07-25_06:30 | | 107,107 | I | 4,244 | 214,271,791 | 2019-07-26 06:11:30 | torproject-inc-perdulce.torproject.org.2019-07-26_06:11 | | 107,181 | I | 4,285 | 197,548,921 | 2019-07-27 05:30:51 | torproject-inc-perdulce.torproject.org.2019-07-27_05:30 | | 107,257 | I | 4,273 | 197,739,452 | 2019-07-28 04:52:15 | torproject-inc-perdulce.torproject.org.2019-07-28_04:52 | | 107,334 | I | 4,302 | 218,259,369 | 2019-07-29 04:58:23 | torproject-inc-perdulce.torproject.org.2019-07-29_04:58 | | 107,423 | I | 4,400 | 287,819,534 | 2019-07-30 05:42:09 | torproject-inc-perdulce.torproject.org.2019-07-30_05:42 | | 107,504 | I | 4,278 | 413,289,422 | 2019-07-31 06:11:49 | torproject-inc-perdulce.torproject.org.2019-07-31_06:11 | | 107,587 | I | 4,401 | 700,613,429 | 2019-08-01 07:51:52 | torproject-inc-perdulce.torproject.org.2019-08-01_07:51 | | 107,653 | I | 471 | 63,370,161 | 2019-08-02 06:01:35 | torproject-inc-perdulce.torproject.org.2019-08-02_06:01 | +---------+-------+----------+-----------------+---------------------+----------------------------------------------------------+ You have selected the following JobIds: 106348,107033,107107,107181,107257,107334,107423,107504,107587,107653 Building directory tree for JobId(s) 106348,107033,107107,107181,107257,107334,107423,107504,107587,107653 ... mark etc ++++++++++++++++++++++++++++++++++++++++++++++ 335,060 files inserted into the tree. You are now entering file selection mode where you add (mark) and remove (unmark) files to be restored. No files are initially added, unless you used the "all" keyword on the command line. Enter "done" to leave this mode. cwd is: / $ mark etc 1,921 files marked.
Do not use the
estimatecommand as it can take a long time to run and will freeze the shell.
when done selecting files, call the
this will drop you in a confirmation dialog showing what will happen. note the
Whereparameter which shows where the files will be restored, on the
RestoreClient. Make sure that location has enough space for the restore to complete.
Bootstrap records written to /var/lib/bacula/torproject-dir.restore.6.bsr The Job will require the following (*=>InChanger): Volume(s) Storage(s) SD Device(s) =========================================================================== torproject-full-perdulce.torproject.org.2019-07-16_09:42 File-perdulce.torproject.org FileStorage-perdulce.torproject.org torproject-diff-perdulce.torproject.org.2019-07-25_06:30 File-perdulce.torproject.org FileStorage-perdulce.torproject.org torproject-inc-perdulce.torproject.org.2019-07-26_06:11 File-perdulce.torproject.org FileStorage-perdulce.torproject.org torproject-inc-perdulce.torproject.org.2019-07-27_05:30 File-perdulce.torproject.org FileStorage-perdulce.torproject.org torproject-inc-perdulce.torproject.org.2019-07-29_04:58 File-perdulce.torproject.org FileStorage-perdulce.torproject.org torproject-inc-perdulce.torproject.org.2019-07-31_06:11 File-perdulce.torproject.org FileStorage-perdulce.torproject.org torproject-inc-perdulce.torproject.org.2019-08-01_07:51 File-perdulce.torproject.org FileStorage-perdulce.torproject.org torproject-inc-perdulce.torproject.org.2019-08-02_06:01 File-perdulce.torproject.org FileStorage-perdulce.torproject.org Volumes marked with "*" are in the Autochanger. 1,921 files selected to be restored. Using Catalog "MyCatalog" Run Restore job JobName: RestoreFiles Bootstrap: /var/lib/bacula/torproject-dir.restore.6.bsr Where: /var/tmp/bacula-restores Replace: Always FileSet: Standard Set Backup Client: perdulce.torproject.org-fd Restore Client: perdulce.torproject.org-fd Storage: File-perdulce.torproject.org When: 2019-08-02 16:43:08 Catalog: MyCatalog Priority: 10 Plugin Options: *None*
this doesn't restore the backup immediately, but schedules a job that does so, like such:
OK to run? (yes/mod/no): yes Job queued. JobId=107693
You can see the status of the jobs on the director with the
status director, but also see specifically the status of that job with
*llist JobId=107697 jobid: 107,697 job: RestoreFiles.2019-08-02_16.43.40_17 name: RestoreFiles purgedfiles: 0 type: R level: F clientid: 9 clientname: dictyotum.torproject.org-fd jobstatus: R schedtime: 2019-08-02 16:43:08 starttime: 2019-08-02 16:43:42 endtime: realendtime: jobtdate: 1,564,764,222 volsessionid: 0 volsessiontime: 0 jobfiles: 0 jobbytes: 0 readbytes: 0 joberrors: 0 jobmissingfiles: 0 poolid: 0 poolname: priorjobid: 0 filesetid: 0 fileset: hasbase: 0 hascache: 0 comment:
JobStatus column is an internal database field that will show
T ("terminated normally") when completed or
C when still
running or not started, and anything else if, well, anything else is
happening. The full list of possible statuses is hidden deep in the
developer documentation, obviously.
messages command also provides for a good way of showing the
latest status, although it will flood your terminal if it wasn't ran
for a long time. You can hit "enter" to see if there are new messages.
*messages [...] 02-Aug 16:43 torproject-sd JobId 107697: Ready to read from volume "torproject-inc-perdulce.torproject.org.2019-08-02_06:01" on File device "FileStorage-perdulce.torproject.org" (/srv/backups/bacula/perdulce.torproject.org). 02-Aug 16:43 torproject-sd JobId 107697: Forward spacing Volume "torproject-inc-perdulce.torproject.org.2019-08-02_06:01" to addr=328 02-Aug 16:43 torproject-sd JobId 107697: Elapsed time=00:00:03, Transfer rate=914.8 K Bytes/second 02-Aug 16:43 torproject-dir JobId 107697: Bacula torproject-dir 9.4.2 (04Feb19): Build OS: x86_64-pc-linux-gnu debian 9.7 JobId: 107697 Job: RestoreFiles.2019-08-02_16.43.40_17 Restore Client: bacula-director-01.torproject.org-fd Where: /var/tmp/bacula-restores Replace: Always Start time: 02-Aug-2019 16:43:42 End time: 02-Aug-2019 16:43:50 Elapsed time: 8 secs Files Expected: 1,921 Files Restored: 1,921 Bytes Restored: 2,528,685 (2.528 MB) Rate: 316.1 KB/s FD Errors: 0 FD termination status: OK SD termination status: OK Termination: Restore OK
Once the job is done, the files will be present in the chosen location
Where) on the given server (
Restore the directory server
If the storage daemon disappears catastrophically, there's nothing we can do: the data is lost. But if the director disappears, we can still restore from backups. Those instructions should cover the case where we need to rebuild the director from backups. The director is, essentially, a PostgreSQL database. Therefore, the restore procedure is to restore that database, along with some configuration.
This procedure can also be used to rotate a replace a still running director.
if the old director is still running, star a fresh backup of the old database cluster from the storage server:
sudo -tt bungei sudo -u torbackup postgres-make-base-backups dictyotum.torproject.org:5433 &
disable puppet on the old director:
ssh dictyotum.torproject.org puppet agent --disable 'disabling scheduler -- anarcat 2019-10-10'
disable scheduler, by commenting out the cron job, and wait for jobs to complete, then shutdown the old director:
sed -i '/dsa-bacula-scheduler/s/^/#/' /etc/cron.d/puppet-crontab watch -c "echo 'status director' | bconsole " service bacula-director stop
TODO: this could be improved:
<weasel> it's idle when there are no non-idle 'postgres: bacula bacula' processes and it doesn't have any open tcp connections?
classes: - roles::backup::director bacula::client::director_server: 'bacula-director-01.torproject.org'
This should restore a basic Bacula configuration with the director acting, weirdly, as its own director.
When you add the machine to Nagios, make sure to add it to the
postgres96-hostsgroup so that the PostgreSQL cluster is correctly monitored.
Run Puppet by hand on the new director and the storage server a few times, so their manifest converge:
ssh bungei.torproject.org puppet agent -t ssh bacula-director-01.torproject.org puppet agent -t ssh bungei.torproject.org puppet agent -t ssh bacula-director-01.torproject.org puppet agent -t ssh bungei.torproject.org puppet agent -t ssh bacula-director-01.torproject.org puppet agent -t ssh bungei.torproject.org puppet agent -t ssh bacula-director-01.torproject.org puppet agent -t
The Puppet manifests will fail because PostgreSQL is not installed. And even if it would be, it will fail because it doesn't have the right passwords. For now, PostgreSQL is configured by hand.
TODO: Do consider deploying it with Puppet, as discussed in howto/postgresql.
Install the right version of PostgreSQL.
It might be the case that backups of the director are from an earlier version of PostgreSQL than the version available in the new machine. In that case, an older
sources.listneeds to be added:
cat > /etc/apt/sources.list.d/stretch.list <<EOF deb https://deb.debian.org/debian/ stretch main deb http://security.debian.org/ stretch/updates main EOF apt update
Actually install the server:
apt install -y postgresql-9.6
Once the base backup from step one is completed (or if there is no old director left), restore the cluster on the new host, see the "Indirect restore procedure" in howto/postgresql
You will also need to restore the file
/etc/dsa/bacula-reader-databasefrom backups (see "Getting files without a director", below), as that file is not (currently) managed through howto/puppet (TODO). Alternatively, that file can be recreated by hand, using a syntax like this:
user=bacula-dictyotum-reader password=X dbname=bacula host=localhost
The matching user will need to have its password modified to match
sudo -u postgres psql -c '\password bacula-dictyotum-reader'
reset the password of the bacula director, as it changed in puppet:
grep dbpassword /etc/bacula/bacula-dir.conf | cut -f2 -d\" sudo -u postgres psql -c '\password bacula'
same for the
ssh bungei.torproject.org grep director /home/torbackup/.pgpass ssh bacula-director-01 -tt sudo -u postgres psql -c '\password bacula'
copy over the
conf.d/tor.conf) from the previous director cluster configuration (e.g.
/var/lib/postgresql/9.6/main) to the new one (TODO: put in howto/puppet). Make sure that:
- the cluster name (e.g.
bacula) is correct in the
ssl_key_filepoint to valid SSL certs
- the cluster name (e.g.
Once you have the postgres database cluster restored, start the director:
systemctl start bacula-director
Then everything should be fairies and magic and happiness all over again. Check that everything works with:
Run a few of the "Basic commands" above, to make sure we have everything. For example,
list jobsshould show the latest jobs ran on the director. It's normal that
status directordoes not show those, however.
Enable puppet on the director again.
puppet agent -t
This involves (optionally) keeping a lock on the scheduler so it doesn't immediately start at once. If you're confident (not tested!), this step might be skipped:
flock -w 0 -e /usr/local/sbin/dsa-bacula-scheduler sleep infinity
to switch a single node, configure its director in
$FQDNis the fully qualified domain name of the machine (e.g.
Then run puppet on that node, the storage, and the director server:
ssh perdulce.torproject.org puppet agent -t ssh bungei.torproject.org puppet agent -t ssh bacula-director-01.torproject.org puppet agent -t
Then test a backup job for that host, in
runand pick that server which should now show up.
switch all nodes to the new director, in
run howto/puppet everywhere (or wait for it to run):
cumin -b 5 -p 0 -o txt '*' 'puppet agent -t'
Then make sure the storage and director servers are also up to date:
ssh bungei.torproject.org puppet agent -t ssh bacula-director-01.torproject.org puppet agent -t
if you held a lock on the scheduler, it can be removed:
switch the nagios checks over the new director: grep for the old director name in the nagios configuration and fix up some of the checks
git -C tor-nagios grep dictyotum
you will also need to restore the password file for the nagios check in
switch the director in
/etc/postgresql-common/pg_service.confto point to the new host
The new scheduler and director should now have completely taken over the new one, and backups should resume. The old server can now be decommissioned, if it's still around, when you feel comfortable the new setup is working.
15:19:55 <weasel> and once that's up and running, it'd probably be smart to upgrade it to 11. pg_upgradecluster -m upgrade --link
TODO: some psql users still refer to host-specific usernames like
bacula-dictyotum-reader, maybe they should refer to role-specif
If you get this error:
psycopg2.OperationalError: definition of service "bacula" not found
It's probably the scheduler failing to connect to the database server,
/etc/dsa/bacula-reader-database refers to a non-existent
"service", as defined in
/etc/postgresql-common/pg_service.conf. Either add something like:
[bacula] dbname=bacula port=5433
to that file, or specify the
port manually in the
If the scheduler is sending you an email every three minutes with this error:
FileNotFoundError: [Errno 2] No such file or directory: '/etc/dsa/bacula-reader-database'
It's because you forgot to create that file, in step 8. Similar errors may occur if you forgot to change that password.
If the director takes a long time to start and ultimately fails with:
oct 10 18:19:41 bacula-director-01 bacula-dir: bacula-dir JobId 0: Fatal error: Could not open Catalog "MyCatalog", database "bacula". oct 10 18:19:41 bacula-director-01 bacula-dir: bacula-dir JobId 0: Fatal error: postgresql.c:332 Unable to connect to PostgreSQL server. Database=bacula User=bac oct 10 18:19:41 bacula-director-01 bacula-dir: Possible causes: SQL server not running; password incorrect; max_connections exceeded.
It's because you forgot to reset the director password, in step 9.
Get files without a director
If you want to get to files stored on the bacula storgage host without
involving the director, they can be accessed directly as well. Remember
that to bacula everything is a tape, and
/srv/backups/bacula is full
of directories of tapes. You can see the contents of a tape using
bls <file>, with a fully qualified filename, i.e. involving all the
bls $(readlink -f <filename>) is a handy way to get that.
root@bungei:/srv/backups/bacula/dictyotum.torproject.org# bls `readlink -f torproject-inc-dictyotum.torproject.org.2019-09-25_11:53` | head bls: butil.c:292-0 Using device: "/srv/backups/bacula/dictyotum.torproject.org" for reading. 25-Sep 13:48 bls JobId 0: Ready to read from volume "torproject-inc-dictyotum.torproject.org.2019-09-25_11:53" on File device "FileStorage-dictyotum.torproject.org" (/srv/backups/bacula/dictyotum.torproject.org). bls JobId 0: drwxr-xr-x 4 root root 1024 2019-09-07 17:01:03 /boot/ bls JobId 0: drwxr-xr-x 24 root root 800 2019-09-25 11:33:53 /run/ bls JobId 0: -rw-r--r-- 1 root root 12288 2019-09-25 11:51:17 /etc/postfix/debian.db bls JobId 0: -rw-r--r-- 1 root root 4732 2019-09-25 11:51:17 /etc/postfix/debian bls JobId 0: -r--r--r-- 1 root root 28161 2019-09-25 00:55:50 /etc/ssl/torproject-auto/crls/ca.crl ...
You can then extract files from there bextract:
bextract /srv/backups/bacula/dictyotum.torproject.org/torproject-inc-dictyotum.torproject.org.2019-09-25_11:53 /var/tmp/restore
This will extract the entire tape to
/var/tmp/restore. If you want only a few files,
put their names into a file such as
include and call bextract with
bextract -i ~/include /srv/backups/bacula/dictyotum.torproject.org/torproject-inc-dictyotum.torproject.org.2019-09-25_11:53 /var/tmp/restore
Restore PostgreSQL databases
See howto/postgresql for restore instructions on PostgreSQL databases.
Restore MySQL databases
MySQL restoration should be fairly straightforward. Install MySQL:
apt install mysql-server
Load each database dump:
for dump in 20190812-220301-mysql.xz 20190812-220301-torcrm_prod.xz; do mysql < /var/backups/local/mysql/$dump done
Restore LDAP databases
See howto/ldap for LDAP-specific procedures.
Hint: see also the howto/postgresql documentation for the backup procedures specific to that database.
If a job is behaving strangely, you can inspect its job log to see what's going on. For example, today Nagios warned about the backups being too old on colchicifolium:
10:02:58 <nsa> tor-nagios: [colchicifolium] backup - bacula - last full backup is WARNING: WARN: Last backup of colchicifolium.torproject.org/F is 45.16 days old.
Looking at the bacula director status, it says this:
Console connected using TLS at 10-Jan-20 18:19 JobId Type Level Files Bytes Name Status ====================================================================== 120225 Back Full 833,079 123.5 G colchicifolium.torproject.org is running 120230 Back Full 4,864,515 218.5 G colchicifolium.torproject.org is waiting on max Client jobs 120468 Back Diff 30,694 3.353 G gitlab-01.torproject.org is running ====
Which is strange because those JobId numbers are very low compared to
(say) the gitlab backup job. To inspect the job log, you use the
*list joblog jobid=120225 +----------------------------------------------------------------------------------------------------+ | logtext | +----------------------------------------------------------------------------------------------------+ | bacula-director-01.torproject.org-dir JobId 120225: Start Backup JobId 120225, Job=colchicifolium.torproject.org.2020-01-07_17.00.36_03 | | bacula-director-01.torproject.org-dir JobId 120225: Created new Volume="torproject-colchicifolium.torproject.org-full.2020-01-07_17:00", Pool="poolfull-torproject-colchicifolium.torproject.org", MediaType="File-colchicifolium.torproject.org" in catalog. | [...] | bacula-director-01.torproject.org-dir JobId 120225: Fatal error: Network error with FD during Backup: ERR=No data available | | bungei.torproject.org-sd JobId 120225: Fatal error: append.c:170 Error reading data header from FD. n=-2 msglen=0 ERR=No data available | | bungei.torproject.org-sd JobId 120225: Elapsed time=00:03:47, Transfer rate=7.902 M Bytes/second | | bungei.torproject.org-sd JobId 120225: Sending spooled attrs to the Director. Despooling 14,523,001 bytes ... | | bungei.torproject.org-sd JobId 120225: Fatal error: fd_cmds.c:225 Command error with FD msg="", SD hanging up. ERR=Error getting Volume info: 1998 Volume "torproject-colchicifolium.torproject.org-full.2020-01-07_17:00" catalog status is Used, but should be Append, Purged or Recycle. | | bacula-director-01.torproject.org-dir JobId 120225: Fatal error: No Job status returned from FD. | [...] | bacula-director-01.torproject.org-dir JobId 120225: Rescheduled Job colchicifolium.torproject.org.2020-01-07_17.00.36_03 at 07-Jan-2020 17:09 to re-run in 14400 seconds (07-Jan-2020 21:09). | | bacula-director-01.torproject.org-dir JobId 120225: Error: openssl.c:68 TLS shutdown failure.: ERR=error:14094123:SSL routines:ssl3_read_bytes:application data after close notify | | bacula-director-01.torproject.org-dir JobId 120225: Job colchicifolium.torproject.org.2020-01-07_17.00.36_03 waiting 14400 seconds for scheduled start time. | | bacula-director-01.torproject.org-dir JobId 120225: Restart Incomplete Backup JobId 120225, Job=colchicifolium.torproject.org.2020-01-07_17.00.36_03 | | bacula-director-01.torproject.org-dir JobId 120225: Found 78113 files from prior incomplete Job. | | bacula-director-01.torproject.org-dir JobId 120225: Created new Volume="torproject-colchicifolium.torproject.org-full.2020-01-10_12:11", Pool="poolfull-torproject-colchicifolium.torproject.org", MediaType="File-colchicifolium.torproject.org" in catalog. | | bacula-director-01.torproject.org-dir JobId 120225: Using Device "FileStorage-colchicifolium.torproject.org" to write. | | bacula-director-01.torproject.org-dir JobId 120225: Sending Accurate information to the FD. | | bungei.torproject.org-sd JobId 120225: Labeled new Volume "torproject-colchicifolium.torproject.org-full.2020-01-10_12:11" on File device "FileStorage-colchicifolium.torproject.org" (/srv/backups/bacula/colchicifolium.torproject.org). | | bungei.torproject.org-sd JobId 120225: Wrote label to prelabeled Volume "torproject-colchicifolium.torproject.org-full.2020-01-10_12:11" on File device "FileStorage-colchicifolium.torproject.org" (/srv/backups/bacula/colchicifolium.torproject.org) | | bacula-director-01.torproject.org-dir JobId 120225: Max Volume jobs=1 exceeded. Marking Volume "torproject-colchicifolium.torproject.org-full.2020-01-10_12:11" as Used. | | colchicifolium.torproject.org-fd JobId 120225: /run is a different filesystem. Will not descend from / into it. | | colchicifolium.torproject.org-fd JobId 120225: /home is a different filesystem. Will not descend from / into it. | +----------------------------------------------------------------------------------------------------+ +---------+-------------------------------+---------------------+------+-------+----------+---------------+-----------+ | jobid | name | starttime | type | level | jobfiles | jobbytes | jobstatus | +---------+-------------------------------+---------------------+------+-------+----------+---------------+-----------+ | 120,225 | colchicifolium.torproject.org | 2020-01-10 12:11:51 | B | F | 77,851 | 1,759,625,288 | R | +---------+-------------------------------+---------------------+------+-------+----------+---------------+-----------+
So that job failed three days ago, but now it's actually running. In this case, it might be safe to just ignore the Nagios warning and hope that the rescheduled backup will eventually go through. The duplicate job is also fine: worst case there is it will just run after the first one does, resulting in a bit more I/O than we'd like.
This section documents how backups are setup at Tor. It should be useful if you wish to recreate or understand the architecture.
Backups are configured automatically by Puppet on all nodes, and use Bacula with TLS encryption over the wire.
Backups are pulled from machines to the backup server, which means a compromise on a machine shouldn't allow an attacker to delete backups from the backup server.
Bacula splits the different responsabilities of the backup system among multiple components, namely:
- storage daemon (
bacula::storagein Puppet, currently
- director (
bacula::directorin Puppet, currently
bacula-director-01, PostgreSQL configured by hand)
- file daemon (
bacula::client, on all nodes)
In our configuration, the Admin workstation, Database serverand
Backup server are all on the same machine, the
Volumes are stored in the storage daemon, in
/srv/backups/bacula/. Each client stores its volumes in a separate
directory, which makes it easier to purge offline clients and evaluate
We do not have a bootstrap file as advised by the upstream
documentation because we do not use tapes or tape libraries, which
make it harder to find volumes. Instead, our catalog is backed up in
/srv/backups/bacula/Catalog and each backup contains a single file,
the compressed database dump, which is sufficient to re-bootstrap the
See the introductio to Bacula for more information on those distinctions.
PostgreSQL backup system
Database backups are handled specially. We use PostgreSQL (postgres) everywhere apart from a few rare exceptions (currently only CiviCRM) and therefore use postgres-specific configurations to do backups of all our servers.
See howto/postgresql for that server's specific backup/restore instructions.
MySQL backup system
MySQL also requires special handling, and it's done in the
mariadb::server Puppet class. It deploys a script (
which runs every hour and calls
mysqldump to store plaintext copies
of all databases in
It also stores the SHA256 checksum of the backup file as a hardlink to the file, for example:
1184448 -rw-r----- 2 root 154820 aug 12 21:03 SHA256-665fac68c0537eda149b22445fb8bca1985ee96eb5f145019987bdf398be33e7 1184448 -rw-r----- 2 root 154820 aug 12 21:03 20190812-210301-mysql.xz
Those both point to the same file, inode 1184448.
Those backups then get included in the normal Bacula backups.