Some LD_BUG logs don't come with any warn/err logs, leading to silent metrics port bursts of 2000+ bug counts

I've been working with @toralf who reported surprising counts on the METRICS_NAME(bug_reached_count) metrics port counter:

<toralf> and I observed 2000 incremnents within 3 min in the past [...]

They were a mystery because the warn-level logs remained empty.

I looked into it some more, and it looks like this counter increments every time there is a log entry of domain LD_BUG.

For the most part we only log with LD_BUG at severity warn or err, but there are some exceptions, e.g.

conflux_log_set() uses LD_BUG but here gets told to use LOG_INFO by default:

     log_fn(LOG_PROTOCOL_WARN, LD_CIRC,
            "Conflux set has too many legs to link. "
            "Rejecting this circuit.");
     conflux_log_set(LOG_PROTOCOL_WARN, unlinked->cfx, unlinked->is_client);

In or/circuitstats.c we have some lines of the form log_debug(LD_BUG,
On the client side,

       if (endreason != END_STREAM_REASON_RESOLVEFAILED) {
         log_info(LD_BUG,
                  "No origin circuit for successful SOCKS stream %"PRIu64

So... there appear to be quite a few edge cases like this.

I guess the hope was that we have an invariant where you only use LD_BUG when you are also (loudly) logging details about a real bug?

The robust simple fix would be to no longer increment the counter for LD_BUG.

The more pervasive and more brittle fix would be to audit all uses of LD_BUG and make sure we keep to our invariant.

Calling @dgoulet's name since maybe he can help us with what the original intent was for this metrics port counter.

Edited Jul 10, 2025 by Roger Dingledine

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information