Predict and build specific HS purpose circuits (rather than GENERAL)

changed milestone to %Tor: 0.3.3.x-final in legacy/trac

One approach here is to indeed define new circuit purposes for HS circuits. My main fear here is that these circuit purposes are used for tons of things in the codebase (from circuit timeouts, to cannibalization, to guard stuff, to stats stuff, to basic circuit functionality state machine), and hence adding new purposes means that we need to find all the right places to consider our new purposes which I imagine is gonna be a PITA and perhaps error-prone...

Another approach would be to use the hs_ident and rend_data elements of the circuit_t to distinguish HS circuits from normal circuits. That's what most of the code is doing to detect HS rend circuits.

Trac:
Keywords: guard-discovery deleted, guard-discovery-prop247-controller added
Milestone: Tor: unspecified to Tor: 0.3.3.x-final

Replying to asn:

One approach here is to indeed define new circuit purposes for HS circuits. My main fear here is that these circuit purposes are used for tons of things in the codebase (from circuit timeouts, to cannibalization, to guard stuff, to stats stuff, to basic circuit functionality state machine), and hence adding new purposes means that we need to find all the right places to consider our new purposes which I imagine is gonna be a PITA and perhaps error-prone...

Yes, it will be error-prone. I wonder how we can make this state machine easier to modify in future?

Another approach would be to use the hs_ident and rend_data elements of the circuit_t to distinguish HS circuits from normal circuits. That's what most of the code is doing to detect HS rend circuits.

No, these fields are not populated until the circuit is used for rendezvous. So they can't be used to identify preemptive circuits that are built with vanguards.

Re-parenting, because I think this prediction should be fixed for legacy/trac#13837 (moved) (and for our experiments using legacy/trac#13837 (moved) w/ https://github.com/mikeperry-tor/vanguards).

Trac:
Parent: legacy/trac#9001 (moved) to legacy/trac#13837 (moved)

Asn - you are right to worry about trying to alter how we deal with existing purposes, especially in a way that changes the HS state machine. However, adding a whole new circuit purpose is quite easy. I have done it several times before (including in my draft patch for parent ticket legacy/trac#13837 (moved)). I know how to do it without issue. Aside from a couple asserts and checks here and there, new purposes get ignored by existing code. That's what circuit purposes are for.

How about this: instead of messing with the existing HS purpose state machine(s), we create a new purpose (say CIRCUIT_PURPOSE_HS_GENERAL), and then in circuit_predict_and_launch_new() we launch new circuits of that type when we predict we need any hidden service circuits. Then we should be able to make just a few changes to circuit_get_open_circ_or_launch(), circuit_launch_by_extend_info(), and circuit_find_to_cannibalize() to use this new circuit type for hidden service circuits (instead of using CIRCUIT_PURPOSE_C_GENERAL for them).

That way circuits with this new purpose will otherwise be ignored unless we are looking to build a new one. And that mechanism for reusing them will be very similar as it is today, just drawing from CIRCUIT_PURPOSE_HS_GENERAL instead of CIRCUIT_PURPOSE_C_GENERAL, and we also don't have to change our current rephist prediction code.

Sound like a plan?

Replying to mikeperry:

Asn - you are right to worry about trying to alter how we deal with existing purposes, especially in a way that changes the HS state machine. However, adding a whole new circuit purpose is quite easy. I have done it several times before (including in my draft patch for parent ticket legacy/trac#13837 (moved)). I know how to do it without issue. Aside from a couple asserts and checks here and there, new purposes get ignored by existing code. That's what circuit purposes are for.

How about this: instead of messing with the existing HS purpose state machine(s), we create a new purpose (say CIRCUIT_PURPOSE_HS_GENERAL), and then in circuit_predict_and_launch_new() we launch new circuits of that type when we predict we need any hidden service circuits. Then we should be able to make just a few changes to circuit_get_open_circ_or_launch(), circuit_launch_by_extend_info(), and circuit_find_to_cannibalize() to use this new circuit type for hidden service circuits (instead of using CIRCUIT_PURPOSE_C_GENERAL for them).

That way circuits with this new purpose will otherwise be ignored unless we are looking to build a new one. And that mechanism for reusing them will be very similar as it is today, just drawing from CIRCUIT_PURPOSE_HS_GENERAL instead of CIRCUIT_PURPOSE_C_GENERAL, and we also don't have to change our current rephist prediction code.

Sound like a plan?

Plausible plan. I'm still a bit concerned but I don't have a better plan tbh.

What would be the interaction between CIRCUIT_PURPOSE_C_GENERAL and CIRCUIT_PURPOSE_HS_GENERAL? e.g. would you be able to cannibalize a C_GENERAL circuit to an HS_GENERAL circuit if we find that we have idle general circuits and we need HS ones (or the opposite)?

Replying to asn:

Replying to mikeperry:

Asn - you are right to worry about trying to alter how we deal with existing purposes, especially in a way that changes the HS state machine. However, adding a whole new circuit purpose is quite easy. I have done it several times before (including in my draft patch for parent ticket legacy/trac#13837 (moved)). I know how to do it without issue. Aside from a couple asserts and checks here and there, new purposes get ignored by existing code. That's what circuit purposes are for.

How about this: instead of messing with the existing HS purpose state machine(s), we create a new purpose (say CIRCUIT_PURPOSE_HS_GENERAL), and then in circuit_predict_and_launch_new() we launch new circuits of that type when we predict we need any hidden service circuits. Then we should be able to make just a few changes to circuit_get_open_circ_or_launch(), circuit_launch_by_extend_info(), and circuit_find_to_cannibalize() to use this new circuit type for hidden service circuits (instead of using CIRCUIT_PURPOSE_C_GENERAL for them).

That way circuits with this new purpose will otherwise be ignored unless we are looking to build a new one. And that mechanism for reusing them will be very similar as it is today, just drawing from CIRCUIT_PURPOSE_HS_GENERAL instead of CIRCUIT_PURPOSE_C_GENERAL, and we also don't have to change our current rephist prediction code.

Sound like a plan?

Plausible plan. I'm still a bit concerned but I don't have a better plan tbh.

What would be the interaction between CIRCUIT_PURPOSE_C_GENERAL and CIRCUIT_PURPOSE_HS_GENERAL? e.g. would you be able to cannibalize a C_GENERAL circuit to an HS_GENERAL circuit if we find that we have idle general circuits and we need HS ones (or the opposite)?

The desired behavior is that this should not be allowed. The idea is that these two types of general circuits will have different path construction mechanisms for the first three hops (vanguards/pinned middles vs normal), and should not be mixed.

In terms of how this will be done - I am going to alter circuit_find_to_cannibalize() so that you must also specify a circuit type to cannibalize from in addition to the one you want to create, and you can't switch between these classes of circuits (in the case where there is nothing to cannibalize in this way, that function can find nothing and return NULL, and the calling code in circuit_launch_by_extend_info() will build a fresh circuit instead). Since circuit_find_to_cannibalize() already behaves as if you specified canibalize_from == CIRCUIT_PURPOSE_C_GENERAL, I don't anticipate that the change to draw from CIRCUIT_PURPOSE_HS_GENERAL will be very complicated.

Trac:
Status: new to assigned
Owner: N/A to mikeperry

Replying to mikeperry:

Replying to asn:

Replying to mikeperry:

Asn - you are right to worry about trying to alter how we deal with existing purposes, especially in a way that changes the HS state machine. However, adding a whole new circuit purpose is quite easy. I have done it several times before (including in my draft patch for parent ticket legacy/trac#13837 (moved)). I know how to do it without issue. Aside from a couple asserts and checks here and there, new purposes get ignored by existing code. That's what circuit purposes are for.

How about this: instead of messing with the existing HS purpose state machine(s), we create a new purpose (say CIRCUIT_PURPOSE_HS_GENERAL), and then in circuit_predict_and_launch_new() we launch new circuits of that type when we predict we need any hidden service circuits. Then we should be able to make just a few changes to circuit_get_open_circ_or_launch(), circuit_launch_by_extend_info(), and circuit_find_to_cannibalize() to use this new circuit type for hidden service circuits (instead of using CIRCUIT_PURPOSE_C_GENERAL for them).

That way circuits with this new purpose will otherwise be ignored unless we are looking to build a new one. And that mechanism for reusing them will be very similar as it is today, just drawing from CIRCUIT_PURPOSE_HS_GENERAL instead of CIRCUIT_PURPOSE_C_GENERAL, and we also don't have to change our current rephist prediction code.

Sound like a plan?

Plausible plan. I'm still a bit concerned but I don't have a better plan tbh.

What would be the interaction between CIRCUIT_PURPOSE_C_GENERAL and CIRCUIT_PURPOSE_HS_GENERAL? e.g. would you be able to cannibalize a C_GENERAL circuit to an HS_GENERAL circuit if we find that we have idle general circuits and we need HS ones (or the opposite)?

The desired behavior is that this should not be allowed. The idea is that these two types of general circuits will have different path construction mechanisms for the first three hops (vanguards/pinned middles vs normal), and should not be mixed.

Interesting and indeed makes sense for the cases where vanguard-ish designs are enabled (e.g. hidden services in the beginning, and maybe even HS clients in the future).

I think we are reaching the point where the cannibalization logic requires a small spec of its own to document this stuff (+ various other intricacies).

Another behavior worth documenting: I wonder how the predictive circuit launching should work after this change. Should clients create predictive general circuits and also create predictive HS circuits? What about HSes?

Ok, I implemented what I described above in https://oniongit.eu/mikeperry/tor/commits/bug23101. It only does anything if legacy/trac#13837 (moved) is enabled, but is written such that when we implement vanguards, it will work the same for them, too.

Because https://github.com/mikeperry-tor/vanguards needs all of this code to work properly, this branch is based on legacy/trac#13837 (moved), which is based on legacy/trac#23114 (moved).

So consider this early preview review stage, I guess, while we slowly work up the merge chain on all of this.

Also asn, to your point about specs: all of our path selection has an incredible amount of cruft, technical dept, specification deficiency, and complete lack of test coverage. I don't think it's a good idea to make paying back all of that debt a blocker to fixing the guard discovery problem.

In particular, I think it is especially important to get these options merged by the 0.3.3 deadline in January, so that hidden service operators at least have the option of using them while we work on a more complete solution for 0.3.4 in May, and for later tor releases.

Trac:
Status: assigned to needs_review

Replying to mikeperry:

... Also asn, to your point about specs: all of our path selection has an incredible amount of cruft, technical dept, specification deficiency, and complete lack of test coverage. I don't think it's a good idea to make paying back all of that debt a blocker to fixing the guard discovery problem.

In particular, I think it is especially important to get these options merged by the 0.3.3 deadline in January, so that hidden service operators at least have the option of using them while we work on a more complete solution for 0.3.4 in May, and for later tor releases.

I agree that we can't fix all the code structure, missing documentation and missing tests in the short term.

But we can work out a plan for paying back this debt, so it doesn't cause the same issues the next time we want to modify this code. And we can mark this ticket with the "technical-debt" keyword, so we find it when we review our processes.

Trac:
Keywords: N/A deleted, review-group-27 added

Trac:
Reviewer: N/A to asn

Initial review performed here on oniongit. This one is a hard and complicated feature to do well and clean. Let's do it!

I'm a bit anxious about the new purpose added here, and how well it's handled by the whole codebase. I'd really like a way to test this new purpose, either unittest or integration test (currently all the integration tests fail make test-network-all because of legacy/trac#13837 (moved) issues). After a few legacy/trac#13837 (moved)/legacy/trac#23101 (moved) fixes get in I'll also give the vanguard project a try.

Trac:
Status: needs_review to needs_revision

Replying to teor:

I agree that we can't fix all the code structure, missing documentation and missing tests in the short term.

But we can work out a plan for paying back this debt, so it doesn't cause the same issues the next time we want to modify this code. And we can mark this ticket with the "technical-debt" keyword, so we find it when we review our processes.

Wrt a plan: An important thing to remember for this code is that it is also the first round of changes for vanguards. We will have an additional chance to clean old crufty stuff up more when we implement the full vanguard algorithm in Tor (which will involve much more refactoring of the node selection code anyhow).

Replying to asn:

Initial review performed here on oniongit. This one is a hard and complicated feature to do well and clean. Let's do it!

I'm a bit anxious about the new purpose added here, and how well it's handled by the whole codebase. I'd really like a way to test this new purpose, either unittest or integration test (currently all the integration tests fail make test-network-all because of legacy/trac#13837 (moved) issues). After a few legacy/trac#13837 (moved)/legacy/trac#23101 (moved) fixes get in I'll also give the vanguard project a try.

Ok, I rebased this on top of the fixups from mikeperry/bug13837, and then wrote some more fixups for your code review. They are in mikeperry/bug23101-rebased, and each fixup is linked in a comment on your oniongit review.

The main outstanding issue from both bugs is what to do about choose_good_middle_server(). Since we don't like various things about how it is changed in each bug, we should probably get together on IRC and talk about how we want to refactor it to satisfy both sets of concerns, otherwise it will be a git conflict nightmare.

Wrt checks, the vanguard code is basically meant to self-verify all of this. It verifies all of the generated paths are as expected using control port events. If there is no scaffolding in the current network tests to do things like that verify paths are as expected, then I'm leaning towards making a separate ticket for that work for all of our circuit building stuff, so that it does not block this.

OK, thanks for fixups! I reviewed the fixups and did another round of review. I still haven't had time to actually test the branch tho.

In general, this branch was hard to review IMO mainly due to the surrounding code being extremely hairy, the lack of concise high-level comments of what we are trying to achieve, and everything being done in a single commit (adding circ purpose && handling cannibalization && launching preemptive circs). Also the lack of tests is troubling, and the vanguards repo that is supposed to help testing had no README and was very hard to understand, but thankfuly mikeperry helped me over IRC. It took me 3 days of review to get a basic understanding of the subsystem and grasp the code changes that were done, and I still haven't actually tested this branch. I feel that no strong attempt was done to make the feature pleasurable for review and that's why it took me so long.

I think the next steps here are to squash the mikeperry/bug23101-rebased branch, then apply my fixes (bug23101-dev) on top of it, and then start handling the latest review comments. Then we would need a fresh merge request on gitlab because the current one is quite dirty.

In the next days, I'll try to test the branch on the real network to understand more what's going on and provide further review points, or a branch.

You're right, I could have separated the pre-building of HS_GENERAL from the cannibalization of them (similar to how I did in legacy/trac#13837 (moved)). I forgot to do that in the rush to get this done. I'll keep it in mind for next time.

As for the next steps, I took your dev patches and squashed them into mikeperry/bug23101-rebased, and the result is mikeperry/bug23101-squashed, which is rebased onto the freshly squashed+cleaned mikeperry/bug13837-squashed. https://oniongit.eu/mikeperry/tor/commits/bug23101-squashed if you want to comment there some more.

I'm now out until ~Jan 2. Leaving this in needs_revision because I am wary of the issue you found in legacy/trac#13837 (moved).

Replying to mikeperry:

You're right, I could have separated the pre-building of HS_GENERAL from the cannibalization of them (similar to how I did in legacy/trac#13837 (moved)). I forgot to do that in the rush to get this done. I'll keep it in mind for next time.

As for the next steps, I took your dev patches and squashed them into mikeperry/bug23101-rebased, and the result is mikeperry/bug23101-squashed, which is rebased onto the freshly squashed+cleaned mikeperry/bug13837-squashed. https://oniongit.eu/mikeperry/tor/commits/bug23101-squashed if you want to comment there some more.

I'm now out until ~Jan 2. Leaving this in needs_revision because I am wary of the issue you found in legacy/trac#13837 (moved).

Yo yo, I'm back and took some time to look into this again.

I suggested a potential fix for the legacy/trac#13837 (moved) issue in gitlab, and noted another issue found using the vanguards script.

I also noticed that there is no legacy/trac#23101 (moved) in your bug23101-squashed branch, so I couldn't really test that one. I'll wait till the commit pops up to continue testing!

Replying to asn:

Yo yo, I'm back and took some time to look into this again.

I suggested a potential fix for the legacy/trac#13837 (moved) issue in gitlab, and noted another issue found using the vanguards script.

I also noticed that there is no legacy/trac#23101 (moved) in your bug23101-squashed branch, so I couldn't really test that one. I'll wait till the commit pops up to continue testing!

Ok, I replied to your remaining outstanding comments on the 23101 commit with fixup commits. Everything is sitting in https://oniongit.eu/mikeperry/tor/commits/bug23101-squashed now. If you like the fixups, I will squash it all down and open a new merge request while we both test with vanguards.py.

OK! Answered the latest comments as well. Seems pretty good, after removing the commit with the superfluous check as discussed on gitlab.

I also pushed a small branch bug23101-squashed-docs with some small docs improvements.

Perhaps we can proceed with merge request and merge_ready after squashing this.

Ok, I fixed those remaining issues, rebased onto mikeperry/bug13837-mergeready, squashed, and pushed to https://oniongit.eu/mikeperry/tor/commits/bug23101-mergeready. The full chain of fixups is at mikeperry/bug23101-squashed-docs (note this fixup branch still has the e9c4aa7823d1396fe2f34173d25f6aa602e9c0ef fixup, which we decided to drop due to redundancy+possibe user confusion).

Trac:
Status: needs_revision to merge_ready

Trac:
Keywords: N/A deleted, review-group-30 added

I've posted some comments on the 3 commits on oniongit. Roger says he'll do so too.

My major thought on this one:

It's never the case that we have preemptive old-style onion service circuits and also preemptive vanguard-style onion service circuits going on at the same time, right?

That is, with the current design (as of this patch), we have two things supported in the code at the same time: a C_GENERAL is_internal circuit, and an HS_VANGUARDS is_internal circuit. And which one we make, and expect to use, is controlled by whether we have one (or both) of the HSLayerXGuards options set.

So a much simpler design could be: if it's C_GENERAL but is_internal is set, then either build it like a 3-hop vanguard circ if you're using vanguards, else build it like a normal 3-hop onion service circ.

circuit_purpose_requires_anonymity() (spelling?) lists multi-hop internal circuit types. Some of them don't need to use vanguards. For example:

bridge uploads and/or downloads use an internal 3-hop circuit
HSDir uploads and fetches use a 3-hop circuit (do we want to use vanguards for those?)

But maybe it is better to always do vanguards for internal circuits. Do we know if the onion service attacks apply to other circuits?

Replying to teor:

HSDir uploads and fetches use a 3-hop circuit (do we want to use vanguards for those?)

I'm under the impression that the patches here try to use vanguard circuits for onion descriptor uploads and fetches.

But maybe it is better to always do vanguards for internal circuits. Do we know if the onion service attacks apply to other circuits?

A good question. Is there some revised vanguard design proposal at this point? :) It would seem that at least service-side publishing doesn't happen that often, and more importantly doesn't happen in response to adversary action, so it wouldn't benefit as much from the vanguard approach.

Replying to arma:

Replying to teor:

... But maybe it is better to always do vanguards for internal circuits. Do we know if the onion service attacks apply to other circuits?

A good question. Is there some revised vanguard design proposal at this point? :) It would seem that at least service-side publishing doesn't happen that often, and more importantly doesn't happen in response to adversary action, so it wouldn't benefit as much from the vanguard approach.

Let's not assume this.

If an adversary can cause an onion service to repeatedly crash or run out of RAM, then service uploads happen in response to that adversary action.

Also, in a non-traditional application like OnionFlare, an adversary could deliberately register sites that use adversarial HSDirs.

Or if Ricochet ever used client authentication, a new friend request could result in a new client key and a descriptor upload.

Replying to teor:

But maybe it is better to always do vanguards for internal circuits. Do we know if the onion service attacks apply to other circuits?

Another fun example is testing circuits. If you've set some Vanguards, that means you're primarily concerned with onion service (or onion client) performance. Does that mean your internal testing circuits should be using your Vanguards too? Seems like there's a good argument for yes.

Replying to teor:

Let's not assume this.

Agreed. Using the vanguard circuits for onion descriptor posts / fetches is the clear right move, first because there's little downside and an unknown upside (that is, if we turn out to regret our choice, it's going to be because we didn't use the vanguard circs for them), and second because guard discovery attacks on the onion descriptor posts are already too fast for our comfort.

Replying to arma:

Replying to teor:

But maybe it is better to always do vanguards for internal circuits. Do we know if the onion service attacks apply to other circuits?

Another fun example is testing circuits. If you've set some Vanguards, that means you're primarily concerned with onion service (or onion client) performance. Does that mean your internal testing circuits should be using your Vanguards too? Seems like there's a good argument for yes.

Which testing are you talking about here, and how does that impact performance? Do you mean CIRCUIT_PURPOSE_TESTING, or the CBT stuff? The CBT aspects of this were handled in legacy/trac#23100 (moved). The CIRCUIT_PURPOSE_TESTING is just router reachability, right?

All fixups (up to 0a133f3) look good to me!

Replying to arma:

My major thought on this one:

It's never the case that we have preemptive old-style onion service circuits and also preemptive vanguard-style onion service circuits going on at the same time, right?

That is, with the current design (as of this patch), we have two things supported in the code at the same time: a C_GENERAL is_internal circuit, and an HS_VANGUARDS is_internal circuit. And which one we make, and expect to use, is controlled by whether we have one (or both) of the HSLayerXGuards options set.

So a much simpler design could be: if it's C_GENERAL but is_internal is set, then either build it like a 3-hop vanguard circ if you're using vanguards, else build it like a normal 3-hop onion service circ.

This did not strike me as simpler, especially since "is_internal" doesn't always seem to mean "internal for HS activity only", like for router descriptor downloads and who knows what else in the future. I wanted to keep it completely and obviously separate.

Oh, also using is_internal means that there is some possibility of use confusion if a controller is SETCONFing these options. Then there could be some old C_GENERAL is_internal circuits that are not pinned laying around after the SETCONF. We do try to mark all of these dirty, but it is not as easy for the controller to track them and verify correctness if they do not have a unique purpose.

Okay, I'm fine with merging this patch series once Roger is. I still think that having separate client and service vanguards might be a better idea, but if you've done the analysis, I'll believe you.