The Circuit Build Timeout code currently ignores all circuits with a desired_path_len other than 3. This causes it not to count hidden service circuits of longer path lengths.
When we change the path selection for Proposal247/#9001 (moved), we are going to want to count those circuits, because we want the circuit build timeout to prune 20% of Prop247 circuits, too. If we omit those circuits from the CBT calculation, we risk timing out either too few, too many, or even all of them.
The simplest way to fix this is to change circuit_timeout_want_to_count_circ() to decide to count every circuit once it has completed 3 hops, even if it plans to build more.
We may also want to alter the timeout application in circuit_expire_building(), depending on the prop247 implementation we choose. In some versions of the proposal, we can end up creating what are technically 5 hop paths (though these topologies were not very popular in Wilmington).
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Child items 0
Show closed items
No child items are currently assigned. Use child items to break down this issue into smaller parts.
Linked items 0
Link issues together to show that they're related.
Learn more.
Ok, I have an initial patch for this in mikeperry/prop247_torrc-rebased (d730d6337d24f20980469e6bb331b093afec643b). However, along the way I also noticed legacy/trac#23114 (moved), which we should also fix so that the results from the legacy/trac#13837 (moved) experiments make sense.
Thanks for the patch! This seems important! A few comments:
I don't understand how the mods to circuit_timeout_want_to_count_circ() helps us achieve the goal of this ticket. It kinda seems like we are restricting that function even more (doc change implies that too), when we actually want it to be more accepting. I think a nice comment is required for people like me to understand how that function works wrt HS circs.
I also don't understand why the circuit_build_times_decide_to_count_circ() logic was moved from one function to another. I think we need some help here to understand, since I'm not familiar with the CBT system at all.
I think circuit_get_cpath_len() should be moded instead of adding another func circuit_get_cpath_opened_len(). We can add an arg only_count_opened which does that. I think that'd be cleaner.
circuit_build_times_decide_to_count_circ() is undocumented and I'd actually like to know what it does.
Thanks for the patch! This seems important! A few comments:
I don't understand how the mods to circuit_timeout_want_to_count_circ() helps us achieve the goal of this ticket. It kinda seems like we are restricting that function even more (doc change implies that too), when we actually want it to be more accepting. I think a nice comment is required for people like me to understand how that function works wrt HS circs.
Ok, I will add to the comment to explain this. Basically we're not making it more restrictive. We are changing it from considering only exactly three hop circuits to considering any circuit that has currently less than or equal to 3 hops built (and plans to build at least 3 hops, but maybe more).
I also don't understand why the circuit_build_times_decide_to_count_circ() logic was moved from one function to another. I think we need some help here to understand, since I'm not familiar with the CBT system at all.
Actually, what happened here was that a bunch of code that was dangling off of circuit_build_no_more_hops() got its own function that does only one thing (decides to count the circuit). circuit_send_next_onion_skin() now calls that function before we decide if we're done adding hops (so we can calculate timeout info at the point where the circuit reaches 3 hops).
I think circuit_get_cpath_len() should be moded instead of adding another func circuit_get_cpath_opened_len(). We can add an arg only_count_opened which does that. I think that'd be cleaner.
In terms of code readability, code quality, and testability, I really disagree here. Combining these two would make for a really ugly conditional in the loop, which will take much more effort to verify for correctness than two separate functions. Also, it's not like binary size matters for us for the extra copy of the loop...
circuit_build_times_decide_to_count_circ() is undocumented and I'd actually like to know what it does.
Ok, patches for this and legacy/trac#23114 (moved) are in mikeperry/bugs23100+23114-cleanup-squashed (head cf6c5cc04c3f4e8024409f9a02d4285a804b6c4e). I updated the comments and wrote tests. This and the child are ready for review.
Thanks Mike! You have an oniongit.eu account because this ain't small amount of code and would be much better to do it there? If not, we can push it there and start the review but you'll need eventually to get an account so you can push your fixups.
Replied on gitlab. As for testability, there is a separate testing commit that tests the new functionality. Based on your gitlab comments, it sounded like you didn't see it? It tests one of the functions you asked for tests for.
Replied on gitlab. As for testability, there is a separate testing commit that tests the new functionality. Based on your gitlab comments, it sounded like you didn't see it? It tests one of the functions you asked for tests for.
Thanks for the quick replies. I only looked at 42e4b3e18 since I thought that this was the commit for this ticket. As I said in legacy/trac#23114 (moved), I'm not used to a single branch for multiple tickets. I will review the extra commit and also reply to the gitlab comments next week.
Ok. I pushed fixups for all of asn's comments. We also had some discussion about refactoring, so I did that in the commit for legacy/trac#23114 (moved).
The old branch with all of the fixups is still bugs23100+23114-cleanup-squashed. The new one is bugs23100+23114-cleanup-squashed2. Both heads are identical.
I don't like the way this and legacy/trac#23114 (moved) are being developed. I'm trying to review your code and I feel like you are not respecting my efforts. I have asked you three times to decouple this from the legacy/trac#23114 (moved) branch and you have still not done it, and not only that but you are now merging fixes of legacy/trac#23100 (moved) into the legacy/trac#23114 (moved) commits... I think you are underestimating the time it takes me to review your code and you are making it much harder for me to ACK it. It took me more than a full day to work out the details of legacy/trac#23100 (moved) and it's gonna take me more time to do legacy/trac#23114 (moved), so by having those two branches together you are delaying the whole thing. I'd prefer to see legacy/trac#23100 (moved) get merged, before I jump into reviewing legacy/trac#23114 (moved), or maybe I'd even prefer if someone else reviewed legacy/trac#23114 (moved) since it's non-trivial amount of reviewing.
I'm not familiar with the CBT code and reviewing it sure ain't easy, so please try to make it easy for me. Also decoupling tickets and branches from each other is standard procedure and it's how Tor development is being done.
asn - I am trying to honor your many requests for things that were either already done, redundant to other code, and/or difficult to deal with wrt git history.
Since these two tickets touch the very same block of code, I see them as a unit. Maybe I should have simply filed only one ticket... Separating them is very time consuming, especially wrt tests (which did not exist for any of this code before, so I ended up having to write tests that covered a lot of timeout functionality - which was also changing in both of these tickets). If you want me to separate the tests, and refactor separately, and do the copy-code-first thing (even though in my eyes, that is what #213100 mostly was, aside from one extra conditional and a relocation), that will take me another two days of git juggling, test re-writing, and verification, at least.
I spent all day yesterday trying to merge the fixups for legacy/trac#23100 (moved) down without conflicts for legacy/trac#23114 (moved). Other than the refactoring you asked for, in fact all of those fixups are cleanly merged into legacy/trac#23100 (moved). I even provided you two versions of the branch (pre-fixup and post-fixup) to verify this.
Wrt refactoring: because you requested that the initial commit just be a move of code with minimal changes, I merged the refactoring that you asked for into the following ticket, legacy/trac#23114 (moved). I think the refactoring was a good idea. But to avoid making you review a completely different legacy/trac#23100 (moved), I did it in legacy/trac#23114 (moved) instead... It is frustrating to me that you are now upset by this effort.
I think we are both underestimating what we're asking of the other, and communicating poorly on top of it. :/
I can make a separate commit for this, but I think it is way too much for you to also ask for me to separate the tests for this and legacy/trac#23114 (moved) just for git history. And I'm not sure if you are even asking for this. That is still not clear.
If you want to merge this without tests, that's fine. I will make a branch for the legacy/trac#23100 (moved) commit in a second, and I will also make a reference branch with only the fixups it contains. But if you want tests with this code, again, I would rather simply wait for someone to review legacy/trac#23114 (moved). As I said, the tests cover a lot of timeout functionality that is being changed by these tickets. Writing tests for each ticket means changing the tests after each ticket, which is a lot of busy work for no gain in the end.
Ok, I separated bug legacy/trac#23100 (moved) without the tests. mikeperry/bug23100-squashed is just that (and is the same commit hash for legacy/trac#23100 (moved) from bugs23100+23114-cleanup-squashed2). mikeperry/bug23100-presquashed-fixups contains just the fixups in that commit.
I think I also see where you got confused with the rebase. I accidentally labeled the refactoring commits as fixups to legacy/trac#23100 (moved), since you asked for them in that review. I simply removed those commits from mikeperry/bug23100-presquashed-fixups now. They will be part of legacy/trac#23114 (moved).
I will annotate the gitlab review with the fixup commit hashes for your comments.
As I said on IRC, for further background: The reason why these tickets are functionally together (and touch the same block of code) is that both changes are needed for the rate of circuit timeouts for hidden service circuits (and regular circuits) to be at the target 20%. If we only merge this ticket, the resulting tor still will timeout circuits at some other rate. This is why in my mind they are a package deal. Tor won't exhibit the designed behavior without both fixes together.
I filed the tickets separately as I began noticing problems while writing the prop247 simulator. In retrospect, I should have filed a single ticket: "Tor does not time out circuits at the target 20% rate".
There is now an extra fixup commit on top of both mikeperry/bug23100-squashed and mikeperry/bug23100-presquashed-fixups that refactors the function that both of these tickets are changing. This fixup commit does not change any functionality, it only moves code. legacy/trac#23114 (moved) will be rebased on top of this fixup.