Guardfraction code is still around the codebase: guard_get_guardfraction_bandwidth(), should_apply_guardfraction(), routerstatus_parse_guardfraction(), guardfraction_line_apply(), guardfraction_file_parse_guard_line(), dirserv_read_guardfraction_file_from_str(), compute_weighted_bandwidths(). Also AFAIK some dirauths are still running the guardfraction python script.
Unfortunately, guardfraction code is still broken because of legacy/trac#16255 (moved) and possibly other unknown bugs. Hence, guardfraction support is disabled in the dirauths and all that code is currently useless.
My current plan here is to remove the guardfraction code from the Tor codebase, since fixing it and maintaining it as a PITA that I don't want to sign up for and we probably don't need it as much as we thought we did (see comment:36:ticket:16255).
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Child items ...
Show closed items
Linked items 0
Link issues together to show that they're related.
Learn more.
Trac: Description: Guardraction code is still around the codebase: guard_get_guardfraction_bandwidth(), should_apply_guardfraction(), routerstatus_parse_guardfraction(), guardfraction_line_apply(), guardfraction_file_parse_guard_line(), dirserv_read_guardfraction_file_from_str(), compute_weighted_bandwidths(). Also AFAIK some dirauths are still running the guardfraction python script.
Unfortunately, guardfraction code is still broken because of legacy/trac#16255 (moved) and possibly other unknown bugs. Hence, guardfraction support is disabled in the dirauths and all that code is currently useless.
My current plan here is to remove the guardfraction code from the Tor codebase, since fixing it and maintaining it as a PITA that I don't want to sign up for and we probably don't need it as much as we thought we did (see ticket:16255:comment:36).
to
Guardraction code is still around the codebase: guard_get_guardfraction_bandwidth(), should_apply_guardfraction(), routerstatus_parse_guardfraction(), guardfraction_line_apply(), guardfraction_file_parse_guard_line(), dirserv_read_guardfraction_file_from_str(), compute_weighted_bandwidths(). Also AFAIK some dirauths are still running the guardfraction python script.
Unfortunately, guardfraction code is still broken because of legacy/trac#16255 (moved) and possibly other unknown bugs. Hence, guardfraction support is disabled in the dirauths and all that code is currently useless.
My current plan here is to remove the guardfraction code from the Tor codebase, since fixing it and maintaining it as a PITA that I don't want to sign up for and we probably don't need it as much as we thought we did (see comment:36:ticket:16255).
OK I did it! I ripped off the code from the codebase. Here are the rewards:
{{{
19 files changed, 11 insertions(+), 1063 deletions(-)
}}}
Please see branch ticket24456 in my repo for this.
Quick structural review:
We don't delete options, we OBSOLETE() them.
We can't just delete code from networkstatus_compute_consensus() and any the functions it calls. Instead, we add a new consensus method, and only run the old code when the consensus method is old enough. For guardfraction, there will be a range of consensus methods that run the guardfraction code.
Votes are different: we can make authorities stop voting guardfraction without needing a new consensus method.
How buggy is guardfraction?
Does it work?
Does it cause consensus failures when it's turned on?
Should we backport the "don't vote guardfraction" part of this patch to 0.3.1 and 0.3.2?
(The minimal patch obsoletes the options, but leaves them in or_options_t. This works because zero means off.)
Tests pass and make test-network-all pass. The diff is pretty simple.
This needs to pass the "mixed" chutney test, which is only run if you have a binary called "tor-stable" in your path.
I usually make this happen by doing "ln -s /usr/bin/tor /usr/local/bin/tor-stable".
Also, the mixed network needs to pass with guardfraction enabled on the stable authorities.
Rationale for the removal is cited in the commit msg.
OK I did it! I ripped off the code from the codebase. Here are the rewards:
{{{
19 files changed, 11 insertions(+), 1063 deletions(-)
}}}
Please see branch ticket24456 in my repo for this.
Quick structural review:
We don't delete options, we OBSOLETE() them.
We can't just delete code from networkstatus_compute_consensus() and any the functions it calls. Instead, we add a new consensus method, and only run the old code when the consensus method is old enough. For guardfraction, there will be a range of consensus methods that run the guardfraction code.
Oh man you are right. I guess we need to do this even tho no dirauths run the guardfraction code right now and they don't plan to start... So I'd need to define a new consensus method MIN_METHOD_FOR_REMOVING_GUARDFRACTION and only run the guardfraction consensus code if it's between MIN_METHOD_FOR_GUARDFRACTION and MIN_METHOD_FOR_REMOVING_GUARDFRACTION.
And then eventually when all dirauths are upgraded to versions >= MIN_METHOD_FOR_REMOVING_GUARDFRACTION we can remove that consensus guardfraction code too. Does this plan make sense?
Votes are different: we can make authorities stop voting guardfraction without needing a new consensus method.
How buggy is guardfraction?
It suffers from legacy/trac#16255 (moved) and perhaps other unknown bugs. It's extremely hard to test this feature on chutney (because chutney's bandwidth weights are not realistic), so we only learned of legacy/trac#16255 (moved) when we deployed it on the real net.
Does it work?
It "works" but it's buggy. Authorities compute bad consensus weights for the relays and
then the bandwidth equations complain.
Does it cause consensus failures when it's turned on?
Yes.
Should we backport the "don't vote guardfraction" part of this patch to 0.3.1 and 0.3.2?
Not sure. Perhaps not. No dirauths have the guardfraction torrc option enabled anyway.
OK I did it! I ripped off the code from the codebase. Here are the rewards:
{{{
19 files changed, 11 insertions(+), 1063 deletions(-)
}}}
Please see branch ticket24456 in my repo for this.
Quick structural review:
We don't delete options, we OBSOLETE() them.
We can't just delete code from networkstatus_compute_consensus() and any the functions it calls. Instead, we add a new consensus method, and only run the old code when the consensus method is old enough. For guardfraction, there will be a range of consensus methods that run the guardfraction code.
Oh man you are right. I guess we need to do this even tho no dirauths run the guardfraction code right now and they don't plan to start... So I'd need to define a new consensus method MIN_METHOD_FOR_REMOVING_GUARDFRACTION and only run the guardfraction consensus code if it's between MIN_METHOD_FOR_GUARDFRACTION and MIN_METHOD_FOR_REMOVING_GUARDFRACTION.
It's >= MIN_METHOD_FOR_GUARDFRACTION and < MIN_METHOD_FOR_REMOVING_GUARDFRACTION.
And then eventually when all dirauths are upgraded to versions >= MIN_METHOD_FOR_REMOVING_GUARDFRACTION we can remove that consensus guardfraction code too. Does this plan make sense?
Technically, we can't remove the guardfraction code until we remove all the consensus methods that could have used it.
At the moment, we support all methods from 13-28, excluding 21.
We can't remove all the buggy guardfraction methods like we removed 21, because the buggy range is 20-28.
(Maybe we could remove 20, because we'll never use it.)
Votes are different: we can make authorities stop voting guardfraction without needing a new consensus method.
How buggy is guardfraction?
It suffers from legacy/trac#16255 (moved) and perhaps other unknown bugs. It's extremely hard to test this feature on chutney (because chutney's bandwidth weights are not realistic), so we only learned of legacy/trac#16255 (moved) when we deployed it on the real net.
Does it work?
It "works" but it's buggy. Authorities compute bad consensus weights for the relays and
then the bandwidth equations complain.
Does it cause consensus failures when it's turned on?
Yes.
Ok, so let's rip out the vote generation code in master as soon as we obsolete the option.
Should we backport the "don't vote guardfraction" part of this patch to 0.3.1 and 0.3.2?
Not sure. Perhaps not. No dirauths have the guardfraction torrc option enabled anyway.
Let's backport making the option obsolete to 0.2.9 and later.
That's a one-line patch.
OK let's recap here because these backward compatibility stuff confuse me!
For 033 master we need a patch that obsoletes the config options (also removes all the config option-relevant code), and also kills the guardfraction voting feature. See my branch bug24456_033 for this.
For 029 and later, we just need a patch that obsoletes the config options. See my branch bug24456_029 for this.
Then later in the future, when most dirauths use tor versions with obsolete config options, we can completely kill the rest of the code (vote/consensus guardfraction parsing, consensus voting code, client code, etc.).
Does that make sense? If it does, I'll add changes file etc to the branches above, and also make further backport branches.
OK let's recap here because these backward compatibility stuff confuse me!
For 033 master we need a patch that obsoletes the config options (also removes all the config option-relevant code), and also kills the guardfraction voting feature. See my branch bug24456_033 for this.
This is ok. Refusing to vote guardfraction is compatible with any consensus method.
For 029 and later, we just need a patch that obsoletes the config options. See my branch bug24456_029 for this.
This is ok. Refusing to vote guardfraction is compatible with any consensus method.
Then later in the future, when most dirauths use tor versions with obsolete config options, we can completely kill the rest of the code (vote/consensus guardfraction parsing, consensus voting code, client code, etc.).
One minor tweak:
Clients can ignore guardfraction in 033 or 034 if we want. Clients do not have to use consensus features.
And here is how we remove code that depends on consensus method 20 (the guardfraction method):
Create a new consensus method that ignores guardfraction votes. This new method can go in 033 or 034.
Change all the code that says ">= MIN METHOD FOR GUARDFRACTION" to say ">= MIN METHOD FOR GUARDFRACTION and < MIN METHOD TO IGNORE GUARDFRACTION" (the new method). This disables guardfraction when the new method or any later method is used. See consensus method 28 for an example of the code and spec that we need to remove a consensus feature.
When we will never revert to a lower consensus method (two releases later, 035 or 036), stop authorities voting for the buggy methods 20-28, and remove a whole bunch of old code, including all the guardfraction code.
At that time, we might want to remove all the methods lower than 20, too. See legacy/trac#24378 (moved).
Does that make sense? If it does, I'll add changes file etc to the branches above, and also make further backport branches.
OK let's recap here because these backward compatibility stuff confuse me!
For 033 master we need a patch that obsoletes the config options (also removes all the config option-relevant code), and also kills the guardfraction voting feature. See my branch bug24456_033 for this.
This is ok. Refusing to vote guardfraction is compatible with any consensus method.
For 029 and later, we just need a patch that obsoletes the config options. See my branch bug24456_029 for this.
This is ok. Refusing to vote guardfraction is compatible with any consensus method.
Then later in the future, when most dirauths use tor versions with obsolete config options, we can completely kill the rest of the code (vote/consensus guardfraction parsing, consensus voting code, client code, etc.).
One minor tweak:
Clients can ignore guardfraction in 033 or 034 if we want. Clients do not have to use consensus features.
And here is how we remove code that depends on consensus method 20 (the guardfraction method):
Create a new consensus method that ignores guardfraction votes. This new method can go in 033 or 034.
Change all the code that says ">= MIN METHOD FOR GUARDFRACTION" to say ">= MIN METHOD FOR GUARDFRACTION and < MIN METHOD TO IGNORE GUARDFRACTION" (the new method). This disables guardfraction when the new method or any later method is used. See consensus method 28 for an example of the code and spec that we need to remove a consensus feature.
When we will never revert to a lower consensus method (two releases later, 035 or 036), stop authorities voting for the buggy methods 20-28, and remove a whole bunch of old code, including all the guardfraction code.
At that time, we might want to remove all the methods lower than 20, too. See legacy/trac#24378 (moved).
Hmm, let's do all this stuff in 034 I think. Also separating client guardfraction functionality from dirauth-guardfraction functionality is not so easy, since they both use the same funcs, but the latter also uses it for consensus making. So perhaps we can roll with the branches I posted above for now, and build on top of them for 034.
Pushed branches bug24456_029, bug24456_030, bug24456_031 and bug24456_032 with the plan from comment:6 .
Also separating client guardfraction functionality from dirauth-guardfraction functionality is not so easy, since they both use the same funcs, but the latter also uses it for consensus making. So perhaps we can roll with the branches I posted above for now, and build on top of them for 034.
Sure!
I suggest we stop clients (and relays) calling those functions in 0.3.4. We can leave the functions as authority-only.
After catching up on the related bugs (legacy/trac#13297 (moved), legacy/trac#16255 (moved)), it seems like the reason this is being killed is because of the lack of unit test support to take in all inputs for a consensus (votes, bw auth files, guardfraction files) and produce a consensus, and then examine that consensus for expected results for both values and client usage. If we had that, it would be much easier to verify that the fix in legacy/trac#16255 (moved) is sufficient, right?
If that is true, then I would rather obsolete the torrc options for now without removing any code, and file a ticket for better consensus testing support. There have been several other path selection tickets where I wished I just had a full consensus to check behavior against, rather than cobbling together a mock or tiny chutney network...
Once we have such a testing mechanism, it will be much easier to bring this code back with torrc options like GuardFractionV2File (since in legacy/trac#16255 (moved) we already discussed changing the format slightly), rather than ripping it out and starting over completely from scratch. We need the guardfraction feature (or something like it) less since we have backed off from infinite guard lifetimes, but I don't think the need is now zero.
We need the guardfraction feature (or something like it) less since we have backed off from infinite guard lifetimes, but I don't think the need is now zero.
I think this is right.
But when considering the guardfraction feature, we should think not just about the code on the Tor side, but also about what feels to the dir auths like yet another pile of abandonware crap that they're going to be asked to run and maintain forever.
So, if we're thinking about the feature's future, let's not underweight the importance of the external portion of the tool too.
We need the guardfraction feature (or something like it) less since we have backed off from infinite guard lifetimes, but I don't think the need is now zero.
I think this is right.
We will never have perfectly accurate bandwidth allocations. So I think we need to decide how much inaccuracy we are willing to tolerate in the bandwidth system. (Currently, we tolerate 30-50% variance between bandwidth authorities. But we don't have any good way of working out the difference between the median measured bandwidth, and the actual bandwidth of a relay.)
Once we know our tolerance, it will help us prioritise fixing guardfraction.
It will also help us evaluate bandwidth authority configurations and implementations.
But when considering the guardfraction feature, we should think not just about the code on the Tor side, but also about what feels to the dir auths like yet another pile of abandonware crap that they're going to be asked to run and maintain forever.
So, if we're thinking about the feature's future, let's not underweight the importance of the external portion of the tool too.
Also, regardless of whether we fixe guardfraction in the future, we need to remove all the old consensus methods affected by legacy/trac#16255 (moved).