... | ... | @@ -767,6 +767,24 @@ one does, resulting in a bit more I/O than we'd like. |
|
|
|
|
|
### "waiting to reserve a device"
|
|
|
|
|
|
This can happen in two cases: if a job is hung and blocking the
|
|
|
storage daemon, or if the storage daemon is not aware of the host to
|
|
|
backup.
|
|
|
|
|
|
If the job is repeatedly outputting:
|
|
|
|
|
|
waiting to reserve a device
|
|
|
|
|
|
It's the first, "hung job" scenario.
|
|
|
|
|
|
If you have the error:
|
|
|
|
|
|
Storage daemon didn't accept Device "FileStorage-rdsys-test-01.torproject.org" command.
|
|
|
|
|
|
It's the second, "unavailable storage" scenario.
|
|
|
|
|
|
#### hung job scenario
|
|
|
|
|
|
If a job is continuously reporting an error like:
|
|
|
|
|
|
07-Dec 16:38 bungei.torproject.org-sd JobId 146833: JobId=146833, Job colchicifolium.torproject.org.2020-12-07_15.18.44_05 waiting to reserve a device.
|
... | ... | @@ -945,6 +963,23 @@ resolved the problem and the warning went away. It is assumed the |
|
|
problem will not return on the next job run. See [issue 40110](https://gitlab.torproject.org/tpo/tpa/team/-/issues/40110) for
|
|
|
one example of this problem.
|
|
|
|
|
|
#### unavailable storage scenario
|
|
|
|
|
|
If you see an error like:
|
|
|
|
|
|
Storage daemon didn't accept Device "FileStorage-rdsys-test-01.torproject.org" command.
|
|
|
|
|
|
It's because the storage server (currently `bungei`) doesn't know
|
|
|
about the host to backup. Restart the storage daemon on the storage
|
|
|
server to fix this:
|
|
|
|
|
|
service bacula-sd restart
|
|
|
|
|
|
Normally, Puppet is supposed to take care of those restarts, but it
|
|
|
can happen the restarts don't work (presumably because the storage
|
|
|
server doesn't do a clean restart when there's a backup already
|
|
|
running.
|
|
|
|
|
|
### Bacula GDB traceback / Connection refused / Cannot assign requested address: Retrying
|
|
|
|
|
|
If you get an email from the directory stating that it can't connect
|
... | ... | |