A running Tor won't update the microdesc consensus
I am observing that my relay and bridge will update the microdesc consensus when they are restarted or catch SIGHUP, but not while they are running. In the case of the bridge, the consensus it serves eventually falls out of date, and clients that try to connect through it will hang on "I learned some more directory information, but not enough to build a circuit: We have no recent usable consensus" and never connect to the network.
The bridge and relay I seen this happening on are running 0.2.9.4-alpha on OpenBSD. I also spun up a new bridge on Debian (also running 0.2.9.4-alpha) and it appears to have the same problem. This does not appear to happen with 0.2.8.9.
What it looks like is happening:
At startup (or reload) the relay fetches the microdesc consensus
1 minute later it tries to fetch it again (update_consensus_networkstatus_downloads() is called) and receives a 304 response as it hasn't been modified
download_status_increment_failure() gets called with a status_code of 304
update_consensus_networkstatus_downloads() gets called again, this time it stops at the call to connection_dir_count_by_purpose_and_resource() which returns 1 (equal to max_in_progress_conns)
download_status_increment_failure() gets called again, this time with a status_code of 0 (as a result each 304 response results in the fail count being increased by 2)
The previous steps repeat every minute for a few minutes until the failure count reaches 10 (exceeding the max fail count of 8)
At this point it still keeps retrying every minute but download_status_is_ready() doesn't return true as the failure count exceeds the max, so it skips it without trying to fetch it
Eventually the consensus falls out of date, but download_is_ready() still won't return true so it won't try and fetch a new one
On 0.2.8.9 it makes a couple of attempts that fail with a 304 response but download_is_ready() will eventually start returning false as the value of dts->next_attempt_at is greater than the current time. It seems on 0.2.8.9 next_attempt_at is increased a lot more aggressively, first by 1 minute, then 10 minutes and then an hour, so it accumulates a failure count of 6 but then waits long enough that the next attempt succeeds.
On 0.2.9.4-alpha, it looks like the value of next_attempt_at is increased more slowly, by only seconds at a time, so it reattempts every minute and quickly reaches the failure limit.