Proposal 310 - choose guards in sampled order

changed milestone to %Tor: 0.4.4.x-final in legacy/trac

added 044-should in Legacy / Trac component::core tor/tor in Legacy / Trac milestone::Tor: 0.4.4.x-final in Legacy / Trac priority::high in Legacy / Trac prop271 in Legacy / Trac prop310 in Legacy / Trac resolution::fixed in Legacy / Trac reviewer::asn in Legacy / Trac reviewer::nickm in Legacy / Trac severity::normal in Legacy / Trac status::closed in Legacy / Trac tor-spec in Legacy / Trac type::enhancement in Legacy / Trac labels

Trac:
Type: defect to enhancement
Component: Core Tor to Core Tor/Tor

Trac:
Status: new to needs_review
Milestone: N/A to Tor: 0.4.3.x-final

Trac:
Reviewer: N/A to nickm
Keywords: N/A deleted, tor-spec prop271 added

Trac:
0001-Makes-selection-of-filtered-guards-and-primary-guard.patch

Here are some initial comments based on reading the patch and the email, based on the assumption that the email is right, and that the patch does what the email says it should do.

Should next_sampled_idx persist across runs or be recalculated when we load guards? Right now it seems to start at 0 every time Tor starts. Maybe we should also re-calculate the sampled_idx values to be a dense array when we save/load the state, so that we aren't leaking more information than we intend to.

The patch will also need:

a short proposal. It can mostly be based on Florentin's email.
a patch to guard-spec.txt.
tests for the changed behavior.
a "changes" file (see doc/HACKING/)
editing so that "make check-spaces" passes.
Updates to the documentation and names of all the functions whose behavior has changed. (For example, sample_reachable_filtered_entry_guards` no longer does that the function's name implies or the documentation says.)

How much of that do the original authors want to do, and how much should we pick up?

Trac:
Status: needs_review to needs_revision
Priority: Medium to High

woups, yes currently it would start at 0 every time, which is bad. Recalculation based on the last sampled_idx from the state file + 1 should be good.

I would recommend the short proposal to include the required change over the confirmed list (i.e., this list needs sorting by sampled_idx instead of confirmed_idx). So this patch would need to handle that as well.

I am happy to help on writing a small proposal from my email, and I can pick up whatever is left from your bullet list starting from January (I am bit busy for these next two months :/).

Could the proposal includes a kind of "further work" suggestion? Describing the issue that Roger wrote in the email thread.

Could the proposal includes a kind of "further work" suggestion? Describing the issue that Roger wrote in the email thread.

Yes, that's fine.

Here is an attempt for a short proposal: https://github.com/frochet/prop271_towards_loadbalancing/blob/master/xxx-bandaid-on-guard-selection.txt

Trac:
Status: needs_revision to needs_review

This is now added as proposal 310.

Trac:
Status: needs_review to closed
Resolution: N/A to implemented

https://github.com/torproject/tor/pull/1773

I did a small review on this code.

I've looked at entry_guard_parse_from_state(), and I can't work out where we create a dense set of indexes on load. (But I can see where we do it on save.)

Creating a dense set of indexes on load is important, because qsort() is not stable. Therefore, the first time a legacy state is loaded, the guards will be sorted in arbitrary order. And the order may change, every time the sort is performed.

I also wonder if we need to sort the guard list every time we select a guard. Instead, can we keep the list sorted when we add or delete guards? Appending guards to the end of the list is just smartlist_add(). We have the smartlist_del_keeporder() to remove guards from the list.

Trac:
Milestone: Tor: 0.4.3.x-final to Tor: 0.4.4.x-final
Keywords: tor-spec prop271 deleted, tor-spec prop271 prop308 added
Priority: High to Medium
Status: closed to reopened
Summary: Proposal 271 - straightforward improvements to Proposal 308 - choose guards in sampled order
Resolution: implemented to N/A

Trac:
Status: reopened to needs_revision

Replying to teor:

I did a small review on this code.

I've looked at entry_guard_parse_from_state(), and I can't work out where we create a dense set of indexes on load. (But I can see where we do it on save.)

Creating a dense set of indexes on load is important, because qsort() is not stable. Therefore, the first time a legacy state is loaded, the guards will be sorted in arbitrary order. And the order may change, every time the sort is performed.

I also wonder if we need to sort the guard list every time we select a guard. Instead, can we keep the list sorted when we add or delete guards? Appending guards to the end of the list is just smartlist_add(). We have the smartlist_del_keeporder() to remove guards from the list.

Yes, so, indeed I forgot to account for existing clients that would load states without any sampled_idx the first time they would use that patch. I need to workout a "fake" sampling order for those clients, or qsort() would not be stable as you mention (I guess taking the confirmed_idx ordering would be a good enough heuristic).

I am going to look closer to your recommendation on ordering. Thanks for your thoughtful review!

Hello,

The pull request has been updated.

On loading, Tor sets the sampled_idx to the confirmed_idx. That should keep older clients to behave the same (and not reordering primary guards). On the next state save, the sampled_idx should be made dense.

Also, the patch applies now ordering when it seems necessary (a couple of redundant orderings have been removed).

Also, I was concerned by the fact that Tor assumes integrity of the state when loading it. If some application has write access to this file, making the client rotate guards until a chosen one is found shouldn't be too much of a hard task. Is that kind of threat relevant?

Trac:
Status: needs_revision to needs_review
Summary: Proposal 308 - choose guards in sampled order to Proposal 310 - choose guards in sampled order

Trac:
Keywords: tor-spec prop271 prop308 deleted, tor-spec prop271 prop310 added

Replying to Jaym:

The pull request has been updated.

On loading, Tor sets the sampled_idx to the confirmed_idx. That should keep older clients to behave the same (and not reordering primary guards). On the next state save, the sampled_idx should be made dense.

Also, the patch applies now ordering when it seems necessary (a couple of redundant orderings have been removed).

Thanks!

Also, I was concerned by the fact that Tor assumes integrity of the state when loading it. If some application has write access to this file, making the client rotate guards until a chosen one is found shouldn't be too much of a hard task. Is that kind of threat relevant?

An attacker who can modify files on the local system could do many worse things. So those attacks are not really part of tor's threat model. To defend against those kinds of attacks, people should use an amnesiac system like TAILS.

File corruption is a risk, though. And tor could detect file corruption earlier with checksums. But that's a different ticket :-)

Hi! Initial notes here. Sorry for the delay in the review.

First thing is, continuous integration isn't passing. It looks like there might be a memory leak in the unit tests? You can test that yourself by building with --enable-fragile-hardening and running the tests.

Also there seems to be a practracker failure:

problem file-size /src/feature/client/entrynodes.h 652

You can suppress this warning by editing the scripts/maint/practracker/exceptions.txt file and increasing the number on that line.

The code itself looks well-written and straightforward. I have some suggestions for perhaps making it more robust; I have left them on the github request.

Trac:
Status: needs_review to needs_revision
Priority: Medium to High

Trac:
Status: needs_revision to needs_review

Sorry for the delay: the last week or so has been hugely busy.

The changes look good, but CI still isn't passing. It looks like these tests are failing on appveyor:

entrynodes/confirming_guards: [forking] Skipping 1 testcases.
  FAIL ../src/test/test_entrynodes.c:1513: assert(g1->confirmed_idx OP_EQ 0): 2 vs 0
  [confirming_guards FAILED]
entrynodes/confirming_guards_reasonably_future: [forking] Skipping 1 testcases.
  FAIL ../src/test/test_entrynodes.c:1513: assert(g1->confirmed_idx OP_EQ 0): 2 vs 0
  [confirming_guards_reasonably_future FAILED]
entrynodes/confirming_guards_reasonably_past: [forking] Skipping 1 testcases.
  FAIL ../src/test/test_entrynodes.c:1513: assert(g1->confirmed_idx OP_EQ 0): 2 vs 0
  [confirming_guards_reasonably_past FAILED]

I'm also seeing more practracker issues on Travis:

problem function-size /src/feature/client/entrynodes.c:entry_guard_parse_from_state() 272

For this one we should probably look for some code we can extract into a new function, to simplify this function.

Trac:
Status: needs_review to needs_revision

Hello Nick,

Thanks for the review!

Things have been fixed and github looks happy now.

Summary: The problem was apparently due to a qsort() behavior difference between my setup and appveyor running on windows (I wasn't experiencing the failure). Yet, the test was doing something weird; i.e., setting confirmed_idx of different guards to the same value, which is not something that seems ok to test. It means unstable behaviors with sorted lists, and we don't like unstable behaviors :)

I've slightly modified the original test since now the confirmed_guards are sorted by sampled_idx, and it makes sure that the confirmed guards are sorted as expected, with valid confirmed_idx (i.e., not the same).

Trac:
Status: needs_revision to needs_review

Hello, thanks for all the work here! I actually missed the proposal and the ticket so this caught me by surprise today. This seems really well thought all-in-all and a solid patch, and the test adjustments look good to.

I added a few comments to the PR. It's mainly documentation requests to polish some parts that I didn't comprehend.

Also, can the next revision come with a new PR, which contains the changes in the top commits? This can be done with a rebase. Isolating the changes of this PR was not easy for me because the relevant commits were scattered around the log.

Thanks a lot!

Trac:
Status: needs_review to needs_revision
Reviewer: nickm to nickm, asn

Thanks! rebased on https://github.com/torproject/tor/pull/1873

tried to improve based on asn's comments

Trac:
Status: needs_revision to needs_review

Hey thanks for the super fast revisions here! I left a comment on the new PR.

Furthermore, I'd be more confidence if we had some more testing here. The test coverage of the branch in terms of LoC is great, but I would also like a test that tests the entire logic of this new feature. In particular, given that the changes are far from trivial, and change various behaviors in the code, I would like a test that tests most of that. In particular, I would be looking for something like: Load some guards to the sampled set (without using a state file), make sure that the sampled set has the sampled idxs correctly set, then create the confirmed set and make sure that's well ordered too, then make the primary guards list and make sure that's well ordered, finally ask for a guard and make sure that's what you would expect. Feel free to reuse code from other unittests of course...

So, before this branch gets merged we would need to do the following two things:

Get that unittest in.
Answer the comment on the new PR.
Get the torspec patch for guard-spec.txt merged.
Squash the PR into more organized commits. Ideally this PR would be squashed down to max 3-4 logical commits. Something like the following commit structure would work, but feel free to improvise:
- Commit 1: Use helper functions parse_from_state_set_vals/parse_from_state_handle_time
- Commit 2: Implement prop310"
- Commit 3: Fix existing unittests
- Commit 4: Add unittests

Thanks a lot for your work! :)

Trac:
Status: needs_review to needs_revision

Hi, how is this going? I'd love to take a version of this in 0.4.4 if we can, to improve our security, but it might make more sense to take it early in 0.4.5 if it's not going to be done by the deadline (Jun 15).

Trac:
Keywords: tor-spec prop271 prop310 deleted, tor-spec prop271 prop310 044-should added

Replying to nickm:

Hi, how is this going? I'd love to take a version of this in 0.4.4 if we can, to improve our security, but it might make more sense to take it early in 0.4.5 if it's not going to be done by the deadline (Jun 15).

I'd love to see it in 0.4.4. I've made the requested unit test last Sunday and replied to asn's comment on the github pull request (https://github.com/torproject/tor/pull/1873). Btw asn, that was a good call, I caught a corner case issue.

So currently, I am missing a patch in guard-spec.txt and I can workout the commit structure as soon as the code is flagged ok. I am sure all of this can be done this week :)

Also, Travis and AppVeyor are complaining but I did not understand why (I found the logs hard to deciphers).

Oh! It sounds like this is ready for a fresh review then. I'll put it back in needs_review.

(Yeah, I agree about testing. It's gotten to the point where when I write a test that passes the first time, I am suspicious that it isn't really testing the code.)

As for the tests, it looks like you're getting this compilation error:

x86_64-w64-mingw32-gcc -DHAVE_CONFIG_H -I. -I..  -DSHARE_DATADIR="\"/mingw64/share\"" -DLOCALSTATEDIR="\"/mingw64/var\"" -DBINDIR="\"/mingw64/bin\"" -DTOR_UNIT_TESTS -I../src -I../src/ext -I../src/ext/trunnel -I../src/trunnel -I../src/ext -Isrc/ext -DSHARE_DATADIR="\"/mingw64/share\"" -DLOCALSTATEDIR="\"/mingw64/var\"" -DBINDIR="\"/mingw64/bin\"" -DTOR_UNIT_TESTS  -DHAVE_MODULE_RELAY=1 -DHAVE_MODULE_DIRAUTH=1 -DHAVE_MODULE_DIRCACHE=1 -I/mingw64/include  -IC:/msys64/mingw64/include -L/mingw64/include  -g -O2 -fasynchronous-unwind-tables -Wall -fno-strict-aliasing -Werror  -Waddress -Warray-bounds -Wdate-time -Wdouble-promotion -Wduplicate-decl-specifier -Wduplicated-cond -Wenum-conversion -Wextra -Wfloat-conversion -Wignored-attributes -Wimplicit-fallthrough -Winit-self -Wlogical-op -Wmissing-field-initializers -Wmissing-format-attribute -Wmissing-noreturn -Wnormalized=nfkc -Wnull-dereference -Woverlength-strings -Woverride-init -Wshadow -Wshift-count-negative -Wshift-count-overflow -Wshift-negative-value -Wshift-overflow=2 -Wsizeof-array-argument -Wstrict-overflow=1 -Wstring-compare -Wsuggest-attribute=format -Wsuggest-attribute=noreturn -Wswitch-bool -Wsync-nand -Wtrampolines -Wunused-but-set-parameter -Wunused-but-set-variable -Wunused-const-variable=2 -Wunused-local-typedefs -Wvariadic-macros  -W -Wfloat-equal -Wundef -Wpointer-arith -Wstrict-prototypes -Wmissing-prototypes -Wwrite-strings -Wredundant-decls -Wchar-subscripts -Wcomment -Wformat=2 -Wwrite-strings -Wnested-externs -Wbad-function-cast -Wswitch-enum -Waggregate-return -Wpacked -Wunused -Wunused-parameter  -Wold-style-definition -Wmissing-declarations -MT src/test/test-test_geoip.o -MD -MP -MF src/test/.deps/test-test_geoip.Tpo -c -o src/test/test-test_geoip.o `test -f 'src/test/test_geoip.c' || echo '../'`src/test/test_geoip.c
bash.exe : In file included from ../src/lib/container/smartlist.h:18,
At line:2 char:5
+     & $commandPath $args 2>&1
+     ~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : NotSpecified: (In file include...smartlist.h:18,:String) [], RemoteException
    + FullyQualifiedErrorId : NativeCommandError
 
                 from ../src/core/or/or.h:29,
                 from ../src/test/test_entrynodes.c:14:
../src/test/test_entrynodes.c: In function 'test_entry_guard_correct_cascading_order':
../src/lib/smartlist_core/smartlist_core.h:40:55: error: 'old_primary_guards' may be used uninitialized in this function [-Werror=maybe-uninitialized]
   40 | #define smartlist_free(sl) FREE_AND_NULL(smartlist_t, smartlist_free_, (sl))
      |                                                       ^~~~~~~~~~~~~~~
../src/test/test_entrynodes.c:1825:16: note: 'old_primary_guards' was declared here
 1825 |   smartlist_t *old_primary_guards = smartlist_new();
      |                ^~~~~~~~~~~~~~~~~~
mv -f src/test/.deps/test-test_geoip.Tpo src/test/.deps/test-test_geoip.Po

The issue here is that all of the tt_...() macros will run goto done; on failure. The done block of test_entry_guard_correct_cascading_order is calling smartlist_free(old_primary_guards)... but that variable isn't actually declared until half-way through the function. You can fix this error by declaring it earlier.

Trac:
Status: needs_revision to needs_review

As discussed in IRC we asked for some documentation on the smartlist_sort pattern,.

Trac:
Status: needs_review to needs_revision

Added light doc about the smartlist_sort pattern
Change on guard-spec.txt: https://github.com/torproject/torspec/pull/119

Trac:
Status: needs_revision to needs_review

Things look good right now. Seems like the CI is broken tho. Any chance you could rebase to master (I think that should fix appveyor) and also fix the wide line in Wide:../src/feature/client/entrynodes.c:94 (I think that should fix travis).

Otherwise, tell me and I can try to fix it myself.

Thanks! We are close!

Trac:
Status: needs_review to needs_revision

Done! https://github.com/torproject/tor/pull/1926

rebase on master fixed the appveyor issue, thanks!

Trac:
Status: needs_revision to needs_review

Great. I squashed and made a PR here: https://github.com/torproject/tor/pull/1928

Waiting to finish and will merge.

Trac:
Status: needs_review to merge_ready

Merged!!! Thanks a lot!!!!

Trac:
Status: merge_ready to closed
Resolution: N/A to fixed

(Also merged the spec patch)

closed

moved from legacy/trac#32088 (moved)

added Feature label and removed 1 deleted label

added Specifications label and removed 1 deleted label

Proposal 310 - choose guards in sampled order

Child items 0

Activity