Implement a "lightweight" form of pathbias detection.
We now track, for every guard: the total number of successful circuits we've built through it, along with the total number of "indeterminate" circuits.
Recall that a circuit's status is "indeterminate" if it has failed for a reason that might be the guard's fault, or might not be the guard's fault. For example, if extending to the second hop of the circuit fails, we have no way to know whether the guard deliberately refused to connect there, or whether the second hop is just offline.
But we don't want to forgive all indeterminate circuit failures: if we did, then a malicious guard could simply reject any second hops that it didn't like, thereby filtering the client into a chosen set of circuits.
As a stopgap solution, this patch now makes guards become permanently disabled if the fraction of their circuit failures becomes too high.
See also general-purpose path bias selection (#65 (closed)), and Mike's idea for changing the guard reachability definition (torspec#67). This patch doesn't do either of those.
Closes #185 (closed).
Merge request reports
Activity
requested review from @eta
assigned to @nickm
added 1 commit
- 01612d30 - fixup! Do not blame a guard for failures on non-random circuits.
marked this merge request as draft from nickm/arti@01612d30
- Resolved by eta
- Resolved by eta
- Resolved by eta
- Resolved by eta
- Resolved by eta
496 498 const WARN_THRESHOLD: f64 = 0.5; 497 499 498 500 if ratio > DISABLE_THRESHOLD { 499 let reason = format!("{:.1}% of circuits died under mysterious circumstances, exceeding threshold of {:.1}%", ratio*100.0, (DISABLE_THRESHOLD*100.0)); 500 warn!(guard=?self.id, "Disabling guard: {}", reason); 501 let reason = GuardDisabled::TooManyIndeterminateFailures { 502 history: self.circ_history.clone(), 503 failure_ratio: ratio, 504 threshold_ratio: DISABLE_THRESHOLD, 505 }; 506 warn!(guard=?self.id, "Disabling guard: {:.1}% of circuits died under mysterioius circumstances, exceeding threshold of {:.1}%", ratio*100.0, (DISABLE_THRESHOLD*100.0)); 535 541 } 536 542 } 537 543 544 /// A reason for permanently disabling a guard. 545 #[derive(Clone, Debug, Serialize, Deserialize)] 546 #[serde(tag = "type")] 547 enum GuardDisabled { 952 975 assert!(g.disabled.is_some()); 953 assert_eq!( 954 &g.disabled.unwrap(), 955 "93.3% of circuits died under mysterious circumstances, exceeding threshold of 70.0%" 956 ); 976 977 if let GuardDisabled::TooManyIndeterminateFailures { 978 history: _, 979 failure_ratio, 980 threshold_ratio, 981 } = g.disabled.unwrap() 982 { 983 assert!((failure_ratio - 0.933).abs() < 0.01); 984 assert!((threshold_ratio - 0.7).abs() < 0.01); 985 } else { 986 panic!("Wrong variant.") lgtm modulo minor nits that don't necessitate a further round of review in my opinion :p
Do you generally prefer that I squash
fixup!
comments like this before requesting review again, or only before merging?How I looked at what you'd changed since requesting review was going to the "changes" tab and doing this:
This mechanism seems to work irrespective of whether you squashed or not, because I force-pushed in !102 (closed) and it still seems to work there. So basically, I don't have any preference — do whatever you think is easiest, as long as you actually remember to squash before merging that is :)