Here are some initial comments based on reading the patch and the email, based on the assumption that the email is right, and that the patch does what the email says it should do.
Should next_sampled_idx persist across runs or be recalculated when we load guards? Right now it seems to start at 0 every time Tor starts. Maybe we should also re-calculate the sampled_idx values to be a dense array when we save/load the state, so that we aren't leaking more information than we intend to.
The patch will also need:
a short proposal. It can mostly be based on Florentin's email.
a patch to guard-spec.txt.
tests for the changed behavior.
a "changes" file (see doc/HACKING/)
editing so that "make check-spaces" passes.
Updates to the documentation and names of all the functions whose behavior has changed. (For example, sample_reachable_filtered_entry_guards` no longer does that the function's name implies or the documentation says.)
How much of that do the original authors want to do, and how much should we pick up?
Trac: Status: needs_review to needs_revision Priority: Medium to High
woups, yes currently it would start at 0 every time, which is bad. Recalculation based on the last sampled_idx from the state file + 1 should be good.
I would recommend the short proposal to include the required change over the confirmed list (i.e., this list needs sorting by sampled_idx instead of confirmed_idx). So this patch would need to handle that as well.
I am happy to help on writing a small proposal from my email, and I can pick up whatever is left from your bullet list starting from January (I am bit busy for these next two months :/).
Could the proposal includes a kind of "further work" suggestion? Describing the issue that Roger wrote in the email thread.
I've looked at entry_guard_parse_from_state(), and I can't work out where we create a dense set of indexes on load. (But I can see where we do it on save.)
Creating a dense set of indexes on load is important, because qsort() is not stable. Therefore, the first time a legacy state is loaded, the guards will be sorted in arbitrary order. And the order may change, every time the sort is performed.
I also wonder if we need to sort the guard list every time we select a guard. Instead, can we keep the list sorted when we add or delete guards?
Appending guards to the end of the list is just smartlist_add(). We have the smartlist_del_keeporder() to remove guards from the list.
Trac: Milestone: Tor: 0.4.3.x-final to Tor: 0.4.4.x-final Keywords: tor-spec prop271 deleted, tor-spec prop271 prop308 added Priority: High to Medium Status: closed to reopened Summary: Proposal 271 - straightforward improvements to Proposal 308 - choose guards in sampled order Resolution: implemented toN/A
I've looked at entry_guard_parse_from_state(), and I can't work out where we create a dense set of indexes on load. (But I can see where we do it on save.)
Creating a dense set of indexes on load is important, because qsort() is not stable. Therefore, the first time a legacy state is loaded, the guards will be sorted in arbitrary order. And the order may change, every time the sort is performed.
I also wonder if we need to sort the guard list every time we select a guard. Instead, can we keep the list sorted when we add or delete guards?
Appending guards to the end of the list is just smartlist_add(). We have the smartlist_del_keeporder() to remove guards from the list.
Yes, so, indeed I forgot to account for existing clients that would load states without any sampled_idx the first time they would use that patch. I need to workout a "fake" sampling order for those clients, or qsort() would not be stable as you mention (I guess taking the confirmed_idx ordering would be a good enough heuristic).
I am going to look closer to your recommendation on ordering. Thanks for your thoughtful review!
On loading, Tor sets the sampled_idx to the confirmed_idx. That should keep older clients to behave the same (and not reordering primary guards). On the next state save, the sampled_idx should be made dense.
Also, the patch applies now ordering when it seems necessary (a couple of redundant orderings have been removed).
Also, I was concerned by the fact that Tor assumes integrity of the state when loading it. If some application has write access to this file, making the client rotate guards until a chosen one is found shouldn't be too much of a hard task. Is that kind of threat relevant?
On loading, Tor sets the sampled_idx to the confirmed_idx. That should keep older clients to behave the same (and not reordering primary guards). On the next state save, the sampled_idx should be made dense.
Also, the patch applies now ordering when it seems necessary (a couple of redundant orderings have been removed).
Thanks!
Also, I was concerned by the fact that Tor assumes integrity of the state when loading it. If some application has write access to this file, making the client rotate guards until a chosen one is found shouldn't be too much of a hard task. Is that kind of threat relevant?
An attacker who can modify files on the local system could do many worse things. So those attacks are not really part of tor's threat model. To defend against those kinds of attacks, people should use an amnesiac system like TAILS.
File corruption is a risk, though. And tor could detect file corruption earlier with checksums. But that's a different ticket :-)
Hi! Initial notes here. Sorry for the delay in the review.
First thing is, continuous integration isn't passing. It looks like there might be a memory leak in the unit tests? You can test that yourself by building with --enable-fragile-hardening and running the tests.
Also there seems to be a practracker failure:
problem file-size /src/feature/client/entrynodes.h 652
You can suppress this warning by editing the scripts/maint/practracker/exceptions.txt file and increasing the number on that line.
The code itself looks well-written and straightforward. I have some suggestions for perhaps making it more robust; I have left them on the github request.
Trac: Status: needs_review to needs_revision Priority: Medium to High
Things have been fixed and github looks happy now.
Summary: The problem was apparently due to a qsort() behavior difference between my setup and appveyor running on windows (I wasn't experiencing the failure). Yet, the test was doing something weird; i.e., setting confirmed_idx of different guards to the same value, which is not something that seems ok to test. It means unstable behaviors with sorted lists, and we don't like unstable behaviors :)
I've slightly modified the original test since now the confirmed_guards are sorted by sampled_idx, and it makes sure that the confirmed guards are sorted as expected, with valid confirmed_idx (i.e., not the same).
Hello, thanks for all the work here! I actually missed the proposal and the ticket so this caught me by surprise today. This seems really well thought all-in-all and a solid patch, and the test adjustments look good to.
I added a few comments to the PR. It's mainly documentation requests to polish some parts that I didn't comprehend.
Also, can the next revision come with a new PR, which contains the changes in the top commits? This can be done with a rebase. Isolating the changes of this PR was not easy for me because the relevant commits were scattered around the log.
Thanks a lot!
Trac: Status: needs_review to needs_revision Reviewer: nickm to nickm, asn
Hey thanks for the super fast revisions here! I left a comment on the new PR.
Furthermore, I'd be more confidence if we had some more testing here. The test coverage of the branch in terms of LoC is great, but I would also like a test that tests the entire logic of this new feature. In particular, given that the changes are far from trivial, and change various behaviors in the code, I would like a test that tests most of that. In particular, I would be looking for something like: Load some guards to the sampled set (without using a state file), make sure that the sampled set has the sampled idxs correctly set, then create the confirmed set and make sure that's well ordered too, then make the primary guards list and make sure that's well ordered, finally ask for a guard and make sure that's what you would expect. Feel free to reuse code from other unittests of course...
So, before this branch gets merged we would need to do the following two things:
Get that unittest in.
Answer the comment on the new PR.
Get the torspec patch for guard-spec.txt merged.
Squash the PR into more organized commits. Ideally this PR would be squashed down to max 3-4 logical commits. Something like the following commit structure would work, but feel free to improvise:
Commit 1: Use helper functions parse_from_state_set_vals/parse_from_state_handle_time
Hi, how is this going? I'd love to take a version of this in 0.4.4 if we can, to improve our security, but it might make more sense to take it early in 0.4.5 if it's not going to be done by the deadline (Jun 15).
Hi, how is this going? I'd love to take a version of this in 0.4.4 if we can, to improve our security, but it might make more sense to take it early in 0.4.5 if it's not going to be done by the deadline (Jun 15).
I'd love to see it in 0.4.4. I've made the requested unit test last Sunday and replied to asn's comment on the github pull request (https://github.com/torproject/tor/pull/1873). Btw asn, that was a good call, I caught a corner case issue.
So currently, I am missing a patch in guard-spec.txt and I can workout the commit structure as soon as the code is flagged ok. I am sure all of this can be done this week :)
Also, Travis and AppVeyor are complaining but I did not understand why (I found the logs hard to deciphers).
Oh! It sounds like this is ready for a fresh review then. I'll put it back in needs_review.
(Yeah, I agree about testing. It's gotten to the point where when I write a test that passes the first time, I am suspicious that it isn't really testing the code.)
As for the tests, it looks like you're getting this compilation error:
The issue here is that all of the tt_...() macros will run goto done; on failure. The done block of test_entry_guard_correct_cascading_order is calling smartlist_free(old_primary_guards)... but that variable isn't actually declared until half-way through the function. You can fix this error by declaring it earlier.
Things look good right now. Seems like the CI is broken tho. Any chance you could rebase to master (I think that should fix appveyor) and also fix the wide line in Wide:../src/feature/client/entrynodes.c:94 (I think that should fix travis).
Otherwise, tell me and I can try to fix it myself.