hs-v3: OnionBalance backend instances all fail after more introductions by hitting descriptor upload time limits
We fixed legacy/trac#33762 (moved) and thought that was the problem (which it also was) but after fixing that I have discovered that all onionbalance backend instances continue to die after testing and testing (establishing successful RP circuits with the frontend) and never heal unless Tor processes of the backends are restarted.
I was instructed by asn on IRC to look between the time of SIGHUP (logrotate log file change) and time of:
[warn] tor_bug_occurred_(): Bug: ../src/feature/hs/hs_circuit.c:987: get_subcredential_for_handling_intro2_cell: Non-fatal assertion !(!service->ob_subcreds) failed. (on Tor 0.4.4.0-alpha-dev )
for the message
log_info(LD_REND, "Hidden service %s next descriptor successfully "
"built. Now scheduled for upload.",
Which meant that it eventually recovered by itself. I can see two of those messages in ~24 hours, but the backend instances continue to be unavailable. Frontend does not work, and even if you try to connect to the backend address directly instead of the frontend one it does not work.
It looks like it is never able to upload further descriptors at some point, flooding the log file with:
[info] log_cant_upload_desc(): Service [scrubbed] can't upload its current descriptor: Next upload time is 1585751537, it is now 1585748071. [598 similar message(s) suppressed in last 600 seconds]
[info] log_cant_upload_desc(): Service [scrubbed] can't upload its next descriptor: Next upload time is 1585751987, it is now 1585748071. [598 similar message(s) suppressed in last 600 seconds]
[info] log_cant_upload_desc(): Service [scrubbed] can't upload its current descriptor: Next upload time is 1585751537, it is now 1585748671. [598 similar message(s) suppressed in last 600 seconds]
[info] log_cant_upload_desc(): Service [scrubbed] can't upload its next descriptor: Next upload time is 1585751987, it is now 1585748671. [598 similar message(s) suppressed in last 600 seconds]
[info] log_cant_upload_desc(): Service [scrubbed] can't upload its current descriptor: Next upload time is 1585751537, it is now 1585749271. [598 similar message(s) suppressed in last 600 seconds]
[info] log_cant_upload_desc(): Service [scrubbed] can't upload its next descriptor: Next upload time is 1585751987, it is now 1585749271. [598 similar message(s) suppressed in last 600 seconds]
[info] log_cant_upload_desc(): Service [scrubbed] can't upload its current descriptor: Next upload time is 1585751537, it is now 1585749871. [598 similar message(s) suppressed in last 600 seconds]
[info] log_cant_upload_desc(): Service [scrubbed] can't upload its next descriptor: Next upload time is 1585751987, it is now 1585749871. [598 similar message(s) suppressed in last 600 seconds]
I have info log level files from 2 backend instances of like ~47 MB each.