circuit_predict_and_launch_new() builds CIRCUIT_PURPOSE_C_GENERAL circuits, even for hidden service circuits. When we implement Proposa247/#9001 (moved), we will need to give all hidden service circuits their specific purposes right from build time, in order to ensure that they are build using Vanguards. We will also need to disable cannibalization for them, since cannibalized GENERAL circuits will not use our vanguards, either.
This means we need to change the prediction recording and circuit building to record, predict, and build each HS circuit type independently.
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Child items ...
Show closed items
Linked items 0
Link issues together to show that they're related.
Learn more.
One approach here is to indeed define new circuit purposes for HS circuits. My main fear here is that these circuit purposes are used for tons of things in the codebase (from circuit timeouts, to cannibalization, to guard stuff, to stats stuff, to basic circuit functionality state machine), and hence adding new purposes means that we need to find all the right places to consider our new purposes which I imagine is gonna be a PITA and perhaps error-prone...
Another approach would be to use the hs_ident and rend_data elements of the circuit_t to distinguish HS circuits from normal circuits. That's what most of the code is doing to detect HS rend circuits.
One approach here is to indeed define new circuit purposes for HS circuits. My main fear here is that these circuit purposes are used for tons of things in the codebase (from circuit timeouts, to cannibalization, to guard stuff, to stats stuff, to basic circuit functionality state machine), and hence adding new purposes means that we need to find all the right places to consider our new purposes which I imagine is gonna be a PITA and perhaps error-prone...
Yes, it will be error-prone. I wonder how we can make this state machine easier to modify in future?
Another approach would be to use the hs_ident and rend_data elements of the circuit_t to distinguish HS circuits from normal circuits. That's what most of the code is doing to detect HS rend circuits.
No, these fields are not populated until the circuit is used for rendezvous.
So they can't be used to identify preemptive circuits that are built with vanguards.
Asn - you are right to worry about trying to alter how we deal with existing purposes, especially in a way that changes the HS state machine. However, adding a whole new circuit purpose is quite easy. I have done it several times before (including in my draft patch for parent ticket legacy/trac#13837 (moved)). I know how to do it without issue. Aside from a couple asserts and checks here and there, new purposes get ignored by existing code. That's what circuit purposes are for.
How about this: instead of messing with the existing HS purpose state machine(s), we create a new purpose (say CIRCUIT_PURPOSE_HS_GENERAL), and then in circuit_predict_and_launch_new() we launch new circuits of that type when we predict we need any hidden service circuits. Then we should be able to make just a few changes to circuit_get_open_circ_or_launch(), circuit_launch_by_extend_info(), and circuit_find_to_cannibalize() to use this new circuit type for hidden service circuits (instead of using CIRCUIT_PURPOSE_C_GENERAL for them).
That way circuits with this new purpose will otherwise be ignored unless we are looking to build a new one. And that mechanism for reusing them will be very similar as it is today, just drawing from CIRCUIT_PURPOSE_HS_GENERAL instead of CIRCUIT_PURPOSE_C_GENERAL, and we also don't have to change our current rephist prediction code.
Asn - you are right to worry about trying to alter how we deal with existing purposes, especially in a way that changes the HS state machine. However, adding a whole new circuit purpose is quite easy. I have done it several times before (including in my draft patch for parent ticket legacy/trac#13837 (moved)). I know how to do it without issue. Aside from a couple asserts and checks here and there, new purposes get ignored by existing code. That's what circuit purposes are for.
How about this: instead of messing with the existing HS purpose state machine(s), we create a new purpose (say CIRCUIT_PURPOSE_HS_GENERAL), and then in circuit_predict_and_launch_new() we launch new circuits of that type when we predict we need any hidden service circuits. Then we should be able to make just a few changes to circuit_get_open_circ_or_launch(), circuit_launch_by_extend_info(), and circuit_find_to_cannibalize() to use this new circuit type for hidden service circuits (instead of using CIRCUIT_PURPOSE_C_GENERAL for them).
That way circuits with this new purpose will otherwise be ignored unless we are looking to build a new one. And that mechanism for reusing them will be very similar as it is today, just drawing from CIRCUIT_PURPOSE_HS_GENERAL instead of CIRCUIT_PURPOSE_C_GENERAL, and we also don't have to change our current rephist prediction code.
Sound like a plan?
Plausible plan. I'm still a bit concerned but I don't have a better plan tbh.
What would be the interaction between CIRCUIT_PURPOSE_C_GENERAL and CIRCUIT_PURPOSE_HS_GENERAL? e.g. would you be able to cannibalize a C_GENERAL circuit to an HS_GENERAL circuit if we find that we have idle general circuits and we need HS ones (or the opposite)?
Asn - you are right to worry about trying to alter how we deal with existing purposes, especially in a way that changes the HS state machine. However, adding a whole new circuit purpose is quite easy. I have done it several times before (including in my draft patch for parent ticket legacy/trac#13837 (moved)). I know how to do it without issue. Aside from a couple asserts and checks here and there, new purposes get ignored by existing code. That's what circuit purposes are for.
How about this: instead of messing with the existing HS purpose state machine(s), we create a new purpose (say CIRCUIT_PURPOSE_HS_GENERAL), and then in circuit_predict_and_launch_new() we launch new circuits of that type when we predict we need any hidden service circuits. Then we should be able to make just a few changes to circuit_get_open_circ_or_launch(), circuit_launch_by_extend_info(), and circuit_find_to_cannibalize() to use this new circuit type for hidden service circuits (instead of using CIRCUIT_PURPOSE_C_GENERAL for them).
That way circuits with this new purpose will otherwise be ignored unless we are looking to build a new one. And that mechanism for reusing them will be very similar as it is today, just drawing from CIRCUIT_PURPOSE_HS_GENERAL instead of CIRCUIT_PURPOSE_C_GENERAL, and we also don't have to change our current rephist prediction code.
Sound like a plan?
Plausible plan. I'm still a bit concerned but I don't have a better plan tbh.
What would be the interaction between CIRCUIT_PURPOSE_C_GENERAL and CIRCUIT_PURPOSE_HS_GENERAL? e.g. would you be able to cannibalize a C_GENERAL circuit to an HS_GENERAL circuit if we find that we have idle general circuits and we need HS ones (or the opposite)?
The desired behavior is that this should not be allowed. The idea is that these two types of general circuits will have different path construction mechanisms for the first three hops (vanguards/pinned middles vs normal), and should not be mixed.
In terms of how this will be done - I am going to alter circuit_find_to_cannibalize() so that you must also specify a circuit type to cannibalize from in addition to the one you want to create, and you can't switch between these classes of circuits (in the case where there is nothing to cannibalize in this way, that function can find nothing and return NULL, and the calling code in circuit_launch_by_extend_info() will build a fresh circuit instead). Since circuit_find_to_cannibalize() already behaves as if you specified canibalize_from == CIRCUIT_PURPOSE_C_GENERAL, I don't anticipate that the change to draw from CIRCUIT_PURPOSE_HS_GENERAL will be very complicated.
Asn - you are right to worry about trying to alter how we deal with existing purposes, especially in a way that changes the HS state machine. However, adding a whole new circuit purpose is quite easy. I have done it several times before (including in my draft patch for parent ticket legacy/trac#13837 (moved)). I know how to do it without issue. Aside from a couple asserts and checks here and there, new purposes get ignored by existing code. That's what circuit purposes are for.
How about this: instead of messing with the existing HS purpose state machine(s), we create a new purpose (say CIRCUIT_PURPOSE_HS_GENERAL), and then in circuit_predict_and_launch_new() we launch new circuits of that type when we predict we need any hidden service circuits. Then we should be able to make just a few changes to circuit_get_open_circ_or_launch(), circuit_launch_by_extend_info(), and circuit_find_to_cannibalize() to use this new circuit type for hidden service circuits (instead of using CIRCUIT_PURPOSE_C_GENERAL for them).
That way circuits with this new purpose will otherwise be ignored unless we are looking to build a new one. And that mechanism for reusing them will be very similar as it is today, just drawing from CIRCUIT_PURPOSE_HS_GENERAL instead of CIRCUIT_PURPOSE_C_GENERAL, and we also don't have to change our current rephist prediction code.
Sound like a plan?
Plausible plan. I'm still a bit concerned but I don't have a better plan tbh.
What would be the interaction between CIRCUIT_PURPOSE_C_GENERAL and CIRCUIT_PURPOSE_HS_GENERAL? e.g. would you be able to cannibalize a C_GENERAL circuit to an HS_GENERAL circuit if we find that we have idle general circuits and we need HS ones (or the opposite)?
The desired behavior is that this should not be allowed. The idea is that these two types of general circuits will have different path construction mechanisms for the first three hops (vanguards/pinned middles vs normal), and should not be mixed.
Interesting and indeed makes sense for the cases where vanguard-ish designs are enabled (e.g. hidden services in the beginning, and maybe even HS clients in the future).
I think we are reaching the point where the cannibalization logic requires a small spec of its own to document this stuff (+ various other intricacies).
Another behavior worth documenting: I wonder how the predictive circuit launching should work after this change. Should clients create predictive general circuits and also create predictive HS circuits? What about HSes?
So consider this early preview review stage, I guess, while we slowly work up the merge chain on all of this.
Also asn, to your point about specs: all of our path selection has an incredible amount of cruft, technical dept, specification deficiency, and complete lack of test coverage. I don't think it's a good idea to make paying back all of that debt a blocker to fixing the guard discovery problem.
In particular, I think it is especially important to get these options merged by the 0.3.3 deadline in January, so that hidden service operators at least have the option of using them while we work on a more complete solution for 0.3.4 in May, and for later tor releases.
...
Also asn, to your point about specs: all of our path selection has an incredible amount of cruft, technical dept, specification deficiency, and complete lack of test coverage. I don't think it's a good idea to make paying back all of that debt a blocker to fixing the guard discovery problem.
In particular, I think it is especially important to get these options merged by the 0.3.3 deadline in January, so that hidden service operators at least have the option of using them while we work on a more complete solution for 0.3.4 in May, and for later tor releases.
I agree that we can't fix all the code structure, missing documentation and missing tests in the short term.
But we can work out a plan for paying back this debt, so it doesn't cause the same issues the next time we want to modify this code. And we can mark this ticket with the "technical-debt" keyword, so we find it when we review our processes.
Initial review performed here on oniongit. This one is a hard and complicated feature to do well and clean. Let's do it!
I'm a bit anxious about the new purpose added here, and how well it's handled by the whole codebase. I'd really like a way to test this new purpose, either unittest or integration test (currently all the integration tests fail make test-network-all because of legacy/trac#13837 (moved) issues). After a few legacy/trac#13837 (moved)/legacy/trac#23101 (moved) fixes get in I'll also give the vanguard project a try.
I agree that we can't fix all the code structure, missing documentation and missing tests in the short term.
But we can work out a plan for paying back this debt, so it doesn't cause the same issues the next time we want to modify this code. And we can mark this ticket with the "technical-debt" keyword, so we find it when we review our processes.
Wrt a plan: An important thing to remember for this code is that it is also the first round of changes for vanguards. We will have an additional chance to clean old crufty stuff up more when we implement the full vanguard algorithm in Tor (which will involve much more refactoring of the node selection code anyhow).
Initial review performed here on oniongit. This one is a hard and complicated feature to do well and clean. Let's do it!
I'm a bit anxious about the new purpose added here, and how well it's handled by the whole codebase. I'd really like a way to test this new purpose, either unittest or integration test (currently all the integration tests fail make test-network-all because of legacy/trac#13837 (moved) issues). After a few legacy/trac#13837 (moved)/legacy/trac#23101 (moved) fixes get in I'll also give the vanguard project a try.
Ok, I rebased this on top of the fixups from mikeperry/bug13837, and then wrote some more fixups for your code review. They are in mikeperry/bug23101-rebased, and each fixup is linked in a comment on your oniongit review.
The main outstanding issue from both bugs is what to do about choose_good_middle_server(). Since we don't like various things about how it is changed in each bug, we should probably get together on IRC and talk about how we want to refactor it to satisfy both sets of concerns, otherwise it will be a git conflict nightmare.
Wrt checks, the vanguard code is basically meant to self-verify all of this. It verifies all of the generated paths are as expected using control port events. If there is no scaffolding in the current network tests to do things like that verify paths are as expected, then I'm leaning towards making a separate ticket for that work for all of our circuit building stuff, so that it does not block this.
OK, thanks for fixups! I reviewed the fixups and did another round of review. I still haven't had time to actually test the branch tho.
In general, this branch was hard to review IMO mainly due to the surrounding code being extremely hairy, the lack of concise high-level comments of what we are trying to achieve, and everything being done in a single commit (adding circ purpose && handling cannibalization && launching preemptive circs). Also the lack of tests is troubling, and the vanguards repo that is supposed to help testing had no README and was very hard to understand, but thankfuly mikeperry helped me over IRC. It took me 3 days of review to get a basic understanding of the subsystem and grasp the code changes that were done, and I still haven't actually tested this branch. I feel that no strong attempt was done to make the feature pleasurable for review and that's why it took me so long.
I think the next steps here are to squash the mikeperry/bug23101-rebased branch, then apply my fixes (bug23101-dev) on top of it, and then start handling the latest review comments. Then we would need a fresh merge request on gitlab because the current one is quite dirty.
In the next days, I'll try to test the branch on the real network to understand more what's going on and provide further review points, or a branch.
You're right, I could have separated the pre-building of HS_GENERAL from the cannibalization of them (similar to how I did in legacy/trac#13837 (moved)). I forgot to do that in the rush to get this done. I'll keep it in mind for next time.
As for the next steps, I took your dev patches and squashed them into mikeperry/bug23101-rebased, and the result is mikeperry/bug23101-squashed, which is rebased onto the freshly squashed+cleaned mikeperry/bug13837-squashed. https://oniongit.eu/mikeperry/tor/commits/bug23101-squashed if you want to comment there some more.
I'm now out until ~Jan 2. Leaving this in needs_revision because I am wary of the issue you found in legacy/trac#13837 (moved).
You're right, I could have separated the pre-building of HS_GENERAL from the cannibalization of them (similar to how I did in legacy/trac#13837 (moved)). I forgot to do that in the rush to get this done. I'll keep it in mind for next time.
As for the next steps, I took your dev patches and squashed them into mikeperry/bug23101-rebased, and the result is mikeperry/bug23101-squashed, which is rebased onto the freshly squashed+cleaned mikeperry/bug13837-squashed. https://oniongit.eu/mikeperry/tor/commits/bug23101-squashed if you want to comment there some more.
I'm now out until ~Jan 2. Leaving this in needs_revision because I am wary of the issue you found in legacy/trac#13837 (moved).
Yo yo, I'm back and took some time to look into this again.
I suggested a potential fix for the legacy/trac#13837 (moved) issue in gitlab, and noted another issue found using the vanguards script.
I also noticed that there is no legacy/trac#23101 (moved) in your bug23101-squashed branch, so I couldn't really test that one. I'll wait till the commit pops up to continue testing!
Yo yo, I'm back and took some time to look into this again.
I suggested a potential fix for the legacy/trac#13837 (moved) issue in gitlab, and noted another issue found using the vanguards script.
I also noticed that there is no legacy/trac#23101 (moved) in your bug23101-squashed branch, so I couldn't really test that one. I'll wait till the commit pops up to continue testing!
Ok, I replied to your remaining outstanding comments on the 23101 commit with fixup commits. Everything is sitting in https://oniongit.eu/mikeperry/tor/commits/bug23101-squashed now. If you like the fixups, I will squash it all down and open a new merge request while we both test with vanguards.py.
Ok, I fixed those remaining issues, rebased onto mikeperry/bug13837-mergeready, squashed, and pushed to https://oniongit.eu/mikeperry/tor/commits/bug23101-mergeready. The full chain of fixups is at mikeperry/bug23101-squashed-docs (note this fixup branch still has the e9c4aa7823d1396fe2f34173d25f6aa602e9c0ef fixup, which we decided to drop due to redundancy+possibe user confusion).
It's never the case that we have preemptive old-style onion service circuits and also preemptive vanguard-style onion service circuits going on at the same time, right?
That is, with the current design (as of this patch), we have two things supported in the code at the same time: a C_GENERAL is_internal circuit, and an HS_VANGUARDS is_internal circuit. And which one we make, and expect to use, is controlled by whether we have one (or both) of the HSLayerXGuards options set.
So a much simpler design could be: if it's C_GENERAL but is_internal is set, then either build it like a 3-hop vanguard circ if you're using vanguards, else build it like a normal 3-hop onion service circ.
HSDir uploads and fetches use a 3-hop circuit (do we want to use vanguards for those?)
I'm under the impression that the patches here try to use vanguard circuits for onion descriptor uploads and fetches.
But maybe it is better to always do vanguards for internal circuits. Do we know if the onion service attacks apply to other circuits?
A good question. Is there some revised vanguard design proposal at this point? :) It would seem that at least service-side publishing doesn't happen that often, and more importantly doesn't happen in response to adversary action, so it wouldn't benefit as much from the vanguard approach.
...
But maybe it is better to always do vanguards for internal circuits. Do we know if the onion service attacks apply to other circuits?
A good question. Is there some revised vanguard design proposal at this point? :) It would seem that at least service-side publishing doesn't happen that often, and more importantly doesn't happen in response to adversary action, so it wouldn't benefit as much from the vanguard approach.
Let's not assume this.
If an adversary can cause an onion service to repeatedly crash or run out of RAM, then service uploads happen in response to that adversary action.
Also, in a non-traditional application like OnionFlare, an adversary could deliberately register sites that use adversarial HSDirs.
Or if Ricochet ever used client authentication, a new friend request could result in a new client key and a descriptor upload.
But maybe it is better to always do vanguards for internal circuits. Do we know if the onion service attacks apply to other circuits?
Another fun example is testing circuits. If you've set some Vanguards, that means you're primarily concerned with onion service (or onion client) performance. Does that mean your internal testing circuits should be using your Vanguards too? Seems like there's a good argument for yes.
Agreed. Using the vanguard circuits for onion descriptor posts / fetches is the clear right move, first because there's little downside and an unknown upside (that is, if we turn out to regret our choice, it's going to be because we didn't use the vanguard circs for them), and second because guard discovery attacks on the onion descriptor posts are already too fast for our comfort.
But maybe it is better to always do vanguards for internal circuits. Do we know if the onion service attacks apply to other circuits?
Another fun example is testing circuits. If you've set some Vanguards, that means you're primarily concerned with onion service (or onion client) performance. Does that mean your internal testing circuits should be using your Vanguards too? Seems like there's a good argument for yes.
Which testing are you talking about here, and how does that impact performance? Do you mean CIRCUIT_PURPOSE_TESTING, or the CBT stuff? The CBT aspects of this were handled in legacy/trac#23100 (moved). The CIRCUIT_PURPOSE_TESTING is just router reachability, right?
It's never the case that we have preemptive old-style onion service circuits and also preemptive vanguard-style onion service circuits going on at the same time, right?
That is, with the current design (as of this patch), we have two things supported in the code at the same time: a C_GENERAL is_internal circuit, and an HS_VANGUARDS is_internal circuit. And which one we make, and expect to use, is controlled by whether we have one (or both) of the HSLayerXGuards options set.
So a much simpler design could be: if it's C_GENERAL but is_internal is set, then either build it like a 3-hop vanguard circ if you're using vanguards, else build it like a normal 3-hop onion service circ.
This did not strike me as simpler, especially since "is_internal" doesn't always seem to mean "internal for HS activity only", like for router descriptor downloads and who knows what else in the future. I wanted to keep it completely and obviously separate.
Oh, also using is_internal means that there is some possibility of use confusion if a controller is SETCONFing these options. Then there could be some old C_GENERAL is_internal circuits that are not pinned laying around after the SETCONF. We do try to mark all of these dirty, but it is not as easy for the controller to track them and verify correctness if they do not have a unique purpose.
Okay, I'm fine with merging this patch series once Roger is. I still think that having separate client and service vanguards might be a better idea, but if you've done the analysis, I'll believe you.