Nonzero bandwidthcapacity requirement for active relays hinders bootstrapping a Tor network

Context: I'm working on configuring shadow test simulations that do not use TestingTorNetwork. This in turn is motivated by TestingTorNetwork causing undocumented and non-overridable changes in behavior, which can make it difficult to replicate production behavior inside simulations and vice versa.

A problem I'm running into is that when TestingTorNetwork is not set, router_is_active requires that the relay's self-reported bandwidthcapacity is non-zero. This in turn prevents such relays from getting assigned the HsDir flag, which in turn prevents hidden services from working. (It could also be causing other issues).

bandwidthcapacity roughly corresponds to the max observed bandwidth usage of the relay in some recent window.

Normally in shadow simulations, we use AssumeReachable 1. This causes relays to skip self-testing, and since they haven't transferred any data over Tor circuits yet, upload a descriptor with bandwidthcapacity=0. Without TestingTorNetwork this causes router_is_active to return false for such relays (i.e. all of them). If we have some synthetic test traffic in the network (that doesn't depend on hidden services), eventually relays that saw some test traffic upload descriptors with non-zero bandwidthcapacity, but this can take a while and isn't very dependable.

Conversely, if we don't set AssumeReachable 1, then relays never upload a descriptor; presumably because there are no relays in the consensus besides the authority, preventing them from performing a successful self test.

Some potential solutions, roughly in descending order of preference:

Remove the ri->bandwidthcapacity check from router_is_active. It's not clear why this is needed on top of e.g. the check for ri->is_hibernating. This condition was added in 962765a3. That commit references #13000 (closed), but it's not clear to me that it's actually needed to address that problem (i.e. whether relays with bandwidthcapacity=0 are considered active by the authority is orthogonal from whether relays choose to publish their descriptor before doing their bw self test).
Add some option for relays to report a non-zero bandwidthcapacity initially, for network bootstrap purposes. This seems like potentially a can of worms, and would directly reverse what #13000 (closed) was trying to accomplish.
There might be a way to carefully orchestrate network bootstrap; e.g. start some "bootstrap relays" with AssumeReachable 1, then start a new set of relays without AssumeReachable 1 that can use the first set to perform their self test. I haven't verified that this would work, and it'd be a nontrivial bit of extra complexity and cost for doing this in shadow simulations.
Change the bandwidth self-test, or create some new alternate bandwidth self-test, that just uses a 1-hop circuit. I've confirmed that even if no relays use AssumeReachable 1, the authority is in the initial consensus, so presumably other relays could then use the authority to do their self-tests. This would make the authority a bottleneck for these tests, though that could be mitigated by staggering relay starts.

Edited Feb 29, 2024 by Jim Newsome

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information