so i tried to debug this problem on forrestii. i found the following:
bacula is unhappy because it can't bind on the IPv4 port. i fix this by fixing the /etc/hosts so it points to the correct IP. good.
bacula, then, starts, but is still unhappy because it can't bind to the IPv6 port. i fixed this by adding ipv6 to /etc/hosts, but it's not clear to me why this is a problem on this specific host and not others
regarding bug 2, the "fun" thing with that is that it doesn't immediately crash. at first, it gives us this rather cryptic warning:
Mar 26 01:07:06 forrestii bacula-fd[2813]: forrestii.torproject.org-fd: Warning: Cannot bind port 9102: ERR=Cannot assign requested address: Retrying ...
and then eventually it will crash (with a SIGSEGV, no less), like so:
Mar 26 01:08:06 forrestii bacula-fd[2813]: forrestii.torproject.org-fd: ABORTING due to ERROR in bnet_server.c:132Mar 26 01:08:06 forrestii bacula-fd[2813]: Cannot bind port 9102: ERR=Cannot assign requested address.Mar 26 01:08:06 forrestii bacula-fd[2813]: 26-Mar 01:08 forrestii.torproject.org-fd: ABORTING due to ERROR in bnet_server.c:132Mar 26 01:08:06 forrestii bacula-fd[2813]: Cannot bind port 9102: ERR=Cannot assign requested address.Mar 26 01:08:06 forrestii bacula-fd[2813]: forrestii.torproject.org-fd: ABORTING due to ERROR in bnet_server.c:132 Cannot bind port 9102: ERR=Cannot assign requested address.
if (--tmax <= 0) { Emsg2(M_ABORT, 0, _("Cannot bind port %d: ERR=%s.\n"), ntohs(fd_ptr->port), be.bstrerror()); Pmsg2(000, "Aborting cannot bind port %d: ERR=%s.\n", ntohs(fd_ptr->port), be.bstrerror()); }
from the error message, the seemingly intuitive thing to do is to install gdb. but that doesn't give us anything. instead of the reported error, by email, we get:
root@forrestii.torproject.org (4 mins. ago) (rapports root tor)Subject: Bacula GDB traceback of bacula-fd on forrestiiTo: root@forrestii.torproject.orgDate: Thu, 26 Mar 2020 01:14:42 +0000ptrace: Operation not permitted./var/lib/bacula/2813: No such file or directory./etc/bacula/scripts/btraceback.gdb:1: Error in sourced command file:No symbol table is loaded. Use the "file" command.
... because of course, bacula can't gdb itself because it dropped privileges. (this, by the way, is a regression of bug 556207 in debian.)
and besides, it segfaulted, so the process is possibly gone already. but even if it could have ran, this is what it gets:
Thread 1 "bacula-fd" received signal SIGSEGV, Segmentation fault.0x00007f7857199185 in e_msg(char const*, int, int, int, char const*, ...) () from /usr/lib/bacula/libbac-9.4.2.so(gdb) bt #0 0x00007f7857199185 in e_msg(char const*, int, int, int, char const*, ...) () from /usr/lib/bacula/libbac-9.4.2.so#1 0x00007f785717d5c6 in bnet_thread_server(dlist*, int, workq_tag*, void* (*)(void*)) () from /usr/lib/bacula/libbac-9.4.2.so#2 0x0000557966d9f62c in ?? ()#3 0x00007f7856bfc09b in __libc_start_main () from /lib/x86_64-linux-gnu/libc.so.6#4 0x0000557966d9fb6a in ?? ()(gdb) up #1 0x00007f785717d5c6 in bnet_thread_server(dlist*, int, workq_tag*, void* (*)(void*)) () from /usr/lib/bacula/libbac-9.4.2.so(gdb) list No symbol table is loaded. Use the "file" command.
because of course there's no symbols table, because of course we're in production. why we would fire up a debugger, by default, in production code is just beyond me. i looked at the source code:
I'm not sure what else to do with this. I mostly opened this bug report to document this odyssey, since I remember dealing with this before, but not what the outcome was.
TL;DR: look into systemctl status bacula-fd, the error message should be there, and is more informative than the backtrace could ever be. Consider trashing /usr/sbin/btraceback completely to stop it from sending us such warnings ever again.
Trac: Status: assigned to closed Resolution: N/Ato wontfix