chutney intervals are too short for successful bootstrap, particularly under high CPU load on OS X

added chutney component::core tor/tor owner::teor parent::13718 priority::medium resolution::fixed status::closed tor-auth type::defect labels

Trac:
Keywords: N/A deleted, lorax added

Perhaps shortening the consensus intervals from 5-30 minutes to 20 seconds would help here, too.

This appears to be due to the fact that MIN_VOTE_INTERVAL is set to 300, and all the chutney scripts I have are set to wait for 18, 60, and 300 seconds. So the authorities only have one chance to build a sufficiently comprehensive and consistent consensus at around the 4-6 second mark, and that's it.

If they miss it, the network won't function for the first 5 minutes.

This is now directly affecting #13718 (moved), because we need to run two consensus to test it - one with no exits, and one after the exits have determined their own reachability using internal paths build on the first consensus.

I'd like to do this in under 5 minutes, so I've defined MIN_VOTE_INTERVAL_TESTING 10 (which is greater than (MIN_VOTE_SECONDS + MIN_DIST_SECONDS) * 2 as required) and patched tor to use it based on TestingTorNetwork 1, or during direct comparisons to Testing* options.

I'll post a patch as part of the #13718 (moved) process.

Trying to run two consensuses in as short as period as possible is fraught with restrictions.

When trying to use the smallest allowable voting interval for testing, I have found the following restrictions for the new macro MIN_VOTE_INTERVAL_TESTING:

a minimum of (MIN_VOTE_SECONDS + MIN_DIST_SECONDS) * 2 + 1 = 9 based on V3AuthVoteDelay + V3AuthDistDelay [<] V3AuthVotingInterval/2 "V3AuthVoteDelay plus V3AuthDistDelay must be less than half V3AuthVotingInterval" in options_validate()
a minimum of 16 based on min_sec_before_caching = interval/16 [> 0 = 16] "slop factor in case clocks get desynchronized a little" in update_consensus_networkstatus_fetch_time_impl()
a minimum of 18 based on (30*60) % TestingV3AuthInitialVotingInterval != 0 "[must] divide evenly into 30 minutes" in options_validate()
we may be able to get away with 9, 10, 12, 15 (which all divide 30 minutes) if we allow min_sec_before_caching's "slop factor" to equal 0, which should be fine if we're running on the same host/clock

So I have set:

#define MIN_VOTE_INTERVAL_TESTING 9

But set the vote interval to 18 in the chutney templates to play it a little less unsafe. (Other options are 20, 24, 25, 30, 36, 40, 45, 50, 60, ...)

This change successfully has the consensus run every 18 seconds on my machine in a chutney network.

I have not tested an interval of 9 seconds, but it should work as long as the clocks are strictly synchronised. See #13718 (moved) for further details and an (eventual) branch.

Trac:
Component: Chutney to Tor
Keywords: lorax deleted, tor-auth chutney added

I have consensus intervals down to a minimum of 10 seconds, as the calculation is actually: V3AuthVoteDelay + V3AuthDistDelay [<] V3AuthVotingInterval/2 (MIN_VOTE_SECONDS + MIN_DIST_SECONDS + 1) * 2 = 10

We won't be able to get it any lower without changing MIN_VOTE_SECONDS, MIN_DIST_SECONDS, or the V3AuthVoteDelay + V3AuthDistDelay [<] V3AuthVotingInterval/2 calculation.

The src/test/test-network.sh script allows 18 seconds for chutney to launch and do its tests, which is two consensuses.

Is 10 seconds sufficient for your purposes, rl1987? (We spoke about this being annoying on irc almost a week ago.)

Trac:
Cc: nickm to nickm, rl1987

Also, a relay doesn't re-publish its descriptor until up to 60 seconds elapses. I've changed it so it uploads immediately when ORPort or DirPort change, but only in a testing tor network.

Fixed in #13718 (moved):

A relay with AssumeReachable 0 now makes it into the consensus after around 30-40 seconds, even without using TestingDirAuthVoteExit (from #13161 (moved)). This means that it correctly:

determines that no exits are available in the consensus
continues to bootstrap with internal paths only
successfully self-tests reachability with an internal path

Composing commits over the next week.

See also #13976 (moved), which would vastly simplify the configuration required to get rapid tor/chutney bootstraps to work.

src/test/test-network.sh can still complete basic tests in 30 seconds, even while the machine is under heavy load. These fixes should resolve the original issue that triggered this report.

Trac:
Owner: nickm to teor
Status: new to assigned

The changes to tor and chutney in #13718 (moved) have fixed this:

Bugs: #13718 (moved), #13814 (moved), maybe #13787 (moved), #13839 (moved), #13924 (moved), #13823 (moved), #13929 (moved), #13963 (moved) Branch: bug13718-fast-bootstrap Note: There are 5 branches that start with bug13718, please choose the right one. Repository: https://github.com/teor2345/tor.git

Bugs: #13823 (moved) Branch: bug13823-fast-bootstrap Repository: https://github.com/teor2345/chutney.git

Trac:
Status: assigned to needs_review

The chutney branch looks reasonable, but one thing I'm not sure about: will merging these changes in chutney plus the changes for Tor 0.2.6 make it so that Tor 0.2.5 and earlier no longer bootstrap? Or will they just bootstrap as slowly as before?

tor changes committed as part of of bug13718-consensus-interval merge. chutney changes have not yet been merged.

nickm: Tor 0.2.5 and earlier will bootstrap just as slowly as before. (Some of the torrc changes may speed earlier versions up a little.)

dgoulet has tested the chutney changes along with the draft tor changes in #13718 (moved).

Merged the the torspec changes too.

Trac:
Status: needs_review to closed
Resolution: N/A to fixed

closed

mentioned in issue #13839 (moved)

mentioned in issue #13924 (moved)

mentioned in issue #13928 (moved)

mentioned in issue #13929 (moved)

mentioned in issue #13934 (moved)

mentioned in issue #13935 (moved)

mentioned in issue #13963 (moved)

mentioned in issue #13976 (moved)

moved to tpo/core/tor#13823 (closed)

mentioned in issue tpo/core/tor#13963 (closed)

chutney intervals are too short for successful bootstrap, particularly under high CPU load on OS X

Child items 0

Activity