Back when the CBT code was first written, circuit build times were around 10 seconds. This meant it was fine to check if the timeout had passed in circuit_expire_building(), which was called once per second.
However, now that the typical timeout is more like 2 seconds or less, we actually let a significantly larger fraction of circuits through by waiting for this once-per-second callback.
The fix is to switch the purpose of circuits as they are being built and/or opened to MEASUREMENT circuits as soon as they pass the timeout, but before use. This will cause us to actually discard the proper fraction of slow paths.
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Child items ...
Show closed items
Linked items 0
Link issues together to show that they're related.
Learn more.
BTW is it possible to decouple legacy/trac#23114 (moved) from legacy/trac#23100 (moved) so that we review it independently? They are both non-trivial patches so I think treating them as two separate things is beneficial for reviewing purposes.
Hrmm.. I'm guessing that this branch is going to be annoying to review with the refactoring and changes in the same commit as well. I guess I will do the git juggling to break them apart...
Sorry for all the git mess here. This ticket and legacy/trac#23100 (moved) really should have been just one ticket. Changing the same block of code in two tickets was really unwieldy. It means every fixup is automatically a conflict. Not to mention the unit test problem...
Ok, mikeperry/bug23114 has been rebased on top of an extra fixup commit on top of mikeperry/bug23100-squashed that refactors the function that both of these tickets are changing. This fixup commit does not change any functionality, it only moves code.
I hope this makes it easier to review this change. Again, sorry for filing two tickets for the same block of code. I'll never do that again.
OK great mike! I'm quite happy with the current state of the code!
Some minor nitpicks and we are ready to roll:
I find it a bit awkward that we are calculating first_hop_succeeded twice in the same way both in circuit_build_times_mark_circ_as_measurement_only() and in its caller circuit_expire_building(). I wonder what we could do to only do the calculation once. One solution could be to return the value of first_hop_succeeded from circuit_build_times_mark_circ_as_measurement_only() and use it in circuit_expire_building(). Do you think that makes sense or would it be uglier?
Let's add a note in the function doc of circuit_any_opened_circuits() that we cache its result as well.
Can we constify next_circ in circuit_any_opened_circuits()? Can we constify circ in circuit_get_cpath_opened_len()?
After these two minor things are resolved, I think the next steps is to squash everything, make a new merge request, and mark it as merge_ready so that Nick can check it out.
I added the comment and the consts in 7ecc6ebcc2d58fd49ece7a4478a5b0eee0a97c37.
I don't think that bending over backwards to propagate first_hop_succeded is an improvement. The variable is basically an alias for the first cpath being STATE_OPENED. This serves as a clarifying comment rather than than an optimization or abstraction. Trying to propagate the result a long way will be both less clear and more error prone, and doesn't save us anything that the compiler wouldn't optimize anyway.
Nick has previously said that he prefers to squash things himself, so I'm just going to leave mikeperry/bug23114 unsquashed for now until we hear from him. That way he can also decide if he wants the refactoring commit (90b29f1cc56a2402d00373043748eb198815d9a4) to be squashed into Bug legacy/trac#23100 (moved) or kept separate in the final merge.
I added the comment and the consts in 7ecc6ebcc2d58fd49ece7a4478a5b0eee0a97c37.
I don't think that bending over backwards to propagate first_hop_succeded is an improvement. The variable is basically an alias for the first cpath being STATE_OPENED. This serves as a clarifying comment rather than than an optimization or abstraction. Trying to propagate the result a long way will be both less clear and more error prone, and doesn't save us anything that the compiler wouldn't optimize anyway.
I suggested that not for optimization purposes, but because it's non-trivial code duplication, that might bite us in the future if we change one instance and not the other.
Anyhow, it's nothing tragic, so I'll defer to Nick. Marking this and legacy/trac#23100 (moved) as merge_ready.
Okay, I've read the code here, and read over the history. I think it makes sense at this point to put comments on the squashed branch instead.
To capture my comments on the branch, I have made new fixup commits for the issues where I thought it was easier to just "code what I meant" -- please review those commits and let me know if you think any are wrong? They are mostly cosmetic and documentation issues, except for one that fixes a bug in the denominator of close_rate.
Mike, do you have time to squash this? I tried to do it myself, and got so many conflicts that I'm fairly sure I'm doing it wrong.
(if you git rebase -i --autosquash 3c31d99b023f341829ccc94d2e270f38cfbf893a, or use the git-resquash script in the githax repository, it'll be a pure squash with no rebasing)