Distribute only functional and running bridges
Check if the ratio of functional or running bridges is good enough to only distribute those.
Use bridgestrap results to distribute only tested and functional resources. Unless the ratio of functional resources is too low, so rdsys keeps distributing resources after restarts (having most resources untested) or in case of bridgestrap failures.
When bridge authority restarts it might produce bridge descriptors without the running flag, as it hasn't tested the bridges yet. Let's ignore those updates.
Closes: #102 (closed) #96 (closed)
Merge request reports
Activity
added Sponsor 96 label
requested review from @cohosh
added 1 commit
- 3f66e0cc - Ignore bridge descriptors updates if not running bridges
Once this is deployed I'll propose new alerts for prometheus: meskio/prometheus-alerts@68704274
I'm going to reassign this to @shelikhoo, I think I'll limit myself to reviewing snowflake tickets for now.
requested review from @shelikhoo and removed review request for @cohosh
84 92 for state, num := range nums { 85 93 frac := float64(num) / float64(hashring.Len()) 86 94 metrics.TestedResources.With(prometheus.Labels{"type": rName, "status": toStr[state]}).Set(frac) 95 if state == core.StateFunctional { 96 functionalFractionAcc += frac 97 } 87 98 } 88 99 } 100 101 // Distribute only functional resources if the fraction is high enough 102 // The fraction might be low after a restart as many resources will be 103 // untested or if there is an issue with bridgestrap. 104 functionalFraction := functionalFractionAcc / float64(len(rcol.Collection)) 105 rcol.OnlyFunctional = functionalFraction >= MinFunctionalFraction 106 if rcol.OnlyFunctional { Since we are already rejecting resource files that have an insufficient ratio of running bridges. This
OnlyFunctional
will only be false in the edge case, in which, there are a lot of identical running bridges and distinct offline bridges in each resource file. This case is oddly specific so I would like to know if this is intended?I expect two situations where the ratio of running bridges is high but rdsys/bridgestrap functional ratio is low and OnlyFunctional is false:
- just after starting rdsys, as there is no storage of the state of the bridge all bridges are non-functional and need to be tested. But this testing should be fast as most of them are cached in bridgestrap.
- if bridgestrap fails and reports bridges as dysfunctional when they are working fine. This has happen in the past (bridgestrap#7 (closed)), but hopefully is solved now.
So, I'm mostly worried about restarts as I'm not sure how long it takes to get results from bridgestrap. At some point we'll need to implement some resource storage (#109 (closed)) and this mechanism can be removed by then.
bridge authority test if the OR port of the bridge is reachable and marks them as running in the bridge descriptor. rdsys does test if the bridge transport work to connect to tor and internally marks them as functional. rdsys uses both the running and functional to decide which bridges should be distributed, but there is no feedback to the bridge authority about what bridges are functional.
I know is complicated, and duplicated work. I will argue there are historical reasons for it. But bridge authority test for the OR port is producing more problems and what it solves, needing bridges to have open a port that is not actually used and censors could use to locate them (I think there is an issue about this at the tor repo, but I can't find it now).
221 251 //check to see if the bridge has the running flag 222 252 if status.Flags.Running { 223 253 bridges[b.Fingerprint] = b 254 runningBridges++ 224 255 } else { 225 256 log.Printf("Found bridge %s in networkstatus but is not running", b.Fingerprint) 226 257 } 258 numBridges++ 259 } 260 261 runningBridgesFraction := float64(runningBridges) / float64(numBridges) 262 if runningBridgesFraction < MinFunctionalFraction { 263 // Fail if most bridges are marked as non-functional. This happens if bridge authority restarts (#102) 264 // XXX: If bridge authority restarts at the same time than rdsys the first update will not get any 265 // bridges, hopefully this will not happend. Yeah, if I were you I would create either
- a ratchet system, in which, the
NotEnoughRunningError
is only returned if there is a previous read with enough ratio of running bridges, or ... - an introductory forgiveness system, in which, in the first read, the insufficient running bridge issue will be ignored. (This would be easy to implement, as in
func InitKraken
there is a separate initial invocation.)
for an optional peace of mind. This is just a suggestion, I think it is fine to merge as-is.
- a ratchet system, in which, the
Yes, the problem is the existing code will not distribute bridges that are not marked as 'Running'. So even if we don't return the error no bridges (or very few) if both bridge authority and rdsys are restarted at once.
I think I prefer to don't add any more complexity here for now and solve this problem by having a resource storage (#56 (closed)) so we know previous bridges after a restart.
added 6 commits
-
e8e24883...fe57758c - 3 commits from branch
tpo/anti-censorship:main
- baf16ca1 - Move Get resources function into BackendResources
- f47e58d3 - Distribute only functional resources
- 30a7b98d - Ignore bridge descriptors updates if not running bridges
Toggle commit list-
e8e24883...fe57758c - 3 commits from branch