Occasionally, the CPU load on my test machine will increase (or some other condition affecting the scheduler will occur), and a bootstrap race condition will cause the test to fail 50-100% of the time for a few hours. Then it will start working again. The commands run are exactly the same each time. I'll be excluding these results from the tests, because they happen with or without the changes.
Perhaps lengthening some of the default intervals chutney uses would solve this.
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Child items 0
Show closed items
No child items are currently assigned. Use child items to break down this issue into smaller parts.
Linked items 0
Link issues together to show that they're related.
Learn more.
This appears to be due to the fact that MIN_VOTE_INTERVAL is set to 300, and all the chutney scripts I have are set to wait for 18, 60, and 300 seconds. So the authorities only have one chance to build a sufficiently comprehensive and consistent consensus at around the 4-6 second mark, and that's it.
If they miss it, the network won't function for the first 5 minutes.
This is now directly affecting #13718 (moved), because we need to run two consensus to test it - one with no exits, and one after the exits have determined their own reachability using internal paths build on the first consensus.
I'd like to do this in under 5 minutes, so I've defined MIN_VOTE_INTERVAL_TESTING 10 (which is greater than (MIN_VOTE_SECONDS + MIN_DIST_SECONDS) * 2 as required) and patched tor to use it based on TestingTorNetwork 1, or during direct comparisons to Testing* options.
I'll post a patch as part of the #13718 (moved) process.
Trying to run two consensuses in as short as period as possible is fraught with restrictions.
When trying to use the smallest allowable voting interval for testing, I have found the following restrictions for the new macro MIN_VOTE_INTERVAL_TESTING:
a minimum of (MIN_VOTE_SECONDS + MIN_DIST_SECONDS) * 2 + 1 = 9 based on V3AuthVoteDelay + V3AuthDistDelay [<] V3AuthVotingInterval/2 "V3AuthVoteDelay plus V3AuthDistDelay must be less than half V3AuthVotingInterval" in options_validate()
a minimum of 16 based on min_sec_before_caching = interval/16 [> 0 = 16] "slop factor in case clocks get desynchronized a little" in update_consensus_networkstatus_fetch_time_impl()
a minimum of 18 based on (30*60) % TestingV3AuthInitialVotingInterval != 0 "[must] divide evenly into 30 minutes" in options_validate()
we may be able to get away with 9, 10, 12, 15 (which all divide 30 minutes) if we allow min_sec_before_caching's "slop factor" to equal 0, which should be fine if we're running on the same host/clock
So I have set:
#define MIN_VOTE_INTERVAL_TESTING 9
But set the vote interval to 18 in the chutney templates to play it a little less unsafe.
(Other options are 20, 24, 25, 30, 36, 40, 45, 50, 60, ...)
This change successfully has the consensus run every 18 seconds on my machine in a chutney network.
I have not tested an interval of 9 seconds, but it should work as long as the clocks are strictly synchronised. See #13718 (moved) for further details and an (eventual) branch.
Trac: Component: Chutney to Tor Keywords: lorax deleted, tor-auth chutney added
I have consensus intervals down to a minimum of 10 seconds, as the calculation is actually:
V3AuthVoteDelay + V3AuthDistDelay [<] V3AuthVotingInterval/2(MIN_VOTE_SECONDS + MIN_DIST_SECONDS + 1) * 2 = 10
We won't be able to get it any lower without changing MIN_VOTE_SECONDS, MIN_DIST_SECONDS, or the V3AuthVoteDelay + V3AuthDistDelay [<] V3AuthVotingInterval/2 calculation.
The src/test/test-network.sh script allows 18 seconds for chutney to launch and do its tests, which is two consensuses.
Is 10 seconds sufficient for your purposes, rl1987?
(We spoke about this being annoying on irc almost a week ago.)
Also, a relay doesn't re-publish its descriptor until up to 60 seconds elapses.
I've changed it so it uploads immediately when ORPort or DirPort change, but only in a testing tor network.
A relay with AssumeReachable 0 now makes it into the consensus after around 30-40 seconds, even without using TestingDirAuthVoteExit (from #13161 (moved)). This means that it correctly:
determines that no exits are available in the consensus
continues to bootstrap with internal paths only
successfully self-tests reachability with an internal path
src/test/test-network.sh can still complete basic tests in 30 seconds, even while the machine is under heavy load. These fixes should resolve the original issue that triggered this report.
Trac: Owner: nickm to teor Status: new to assigned
The chutney branch looks reasonable, but one thing I'm not sure about: will merging these changes in chutney plus the changes for Tor 0.2.6 make it so that Tor 0.2.5 and earlier no longer bootstrap? Or will they just bootstrap as slowly as before?