Unverified Commit 9c14b629 authored by anarcat's avatar anarcat
Browse files

document today's bacula SNAFU

parent 112c4939
Loading
Loading
Loading
Loading
+38 −0
Original line number Diff line number Diff line
@@ -791,6 +791,44 @@ resolved the problem and the warning went away. It is assumed the
problem will not return on the next job run. See [issue 40110](https://gitlab.torproject.org/tpo/tpa/team/-/issues/40110) for
one example of this problem.

### Bacula GDB traceback / Connection refused / Cannot assign requested address: Retrying

If you get an email from the directory stating that it can't connect
to the file server on a machine:

    09-Mar 04:45 bacula-director-01.torproject.org-dir JobId 154835: Fatal error: bsockcore.c:209 Unable to connect to Client: scw-arm-par-01.torproject.org-fd on scw-arm-par-01.torproject.org:9102. ERR=Connection refused

You can even receive an error like this:

> root@forrestii.torproject.org (1 mins. ago) (rapports root tor)
> Subject: Bacula GDB traceback of bacula-fd on forrestii
> To: root@forrestii.torproject.org
> Date: Thu, 26 Mar 2020 00:31:44 +0000
>
> /usr/sbin/btraceback: 60: /usr/sbin/btraceback: gdb: not found

In any case, go on the affected server (in the first case,
`scw-arm-par-01.torproject.org`) and look at the `bacula-fd.service`:

    service bacula-fd status

If you see an error like:

    Warning: Cannot bind port 9102: ERR=Cannot assign requested address: Retrying ...

It's Bacula that's being a bit silly and failing to bind on the
external interface. It might be an incorrect `/etc/hosts`. This
particularly happens "in the cloud", where IP addresses are in the
RFC1918 space and change unpredictably.

In the above case, it was simply a matter of adding the IPv4 and IPv6
addresses to `/etc/hosts`, and restarting bacula-fd:

    vi /etc/hosts
    service bacula-fd restart

The GDB errors were documented in [issue 33732](https://gitlab.torproject.org/tpo/tpa/team/-/issues/33732).

## Disaster recovery

<a name="Restoring-the-directory-server"></a>