Resolvers would like to know the isolation information of incoming streams so they know which streams need to be isolated from which other streams.
Semantically, this is a little tricky. The underlying rule that Tor implements is that each stream has a tuple of attributes (A_1, A_2... A_n), and a bit field (b_1, b_2... b_n). Two streams S_a and S_b may share the same circuit iff, for every i such that the OR of their b_i values is true, they have the same A_i value.
Note that this is not transitive: Stream S_a may be able to share a circuit with S_b or S_c, even if S_b cannot share with S_c. Worse
Should we (1) expose these attribute tuples and bitfields and require controllers to manipulate them correctly? That seems obnoxious and error-prone.
Or should we (2) allow controllers to ask questions like "may stream A share a circuit with stream B?" Or "what streams may A share a circuit with?" This might lead to O(n) queries, and it will still be error-prone because of the non-transitivity issue.
Or would it be better to (3) oversimplify the system above and provide each stream a 'cookie' such that any two streams with the same cookie may definitely share the same circuit? But this is problematic, and will overestimate how much isolation we need.
My current best idea is that (4) we should provide an operation of the form "make stream A have the same isolation properties as stream B". And possibly "make circuit C have isolation properties as if it had been used by stream A". So we don't expose isolation information, we just expose a way to manipulate it.
Or maybe there's a further clever way I'm not even thinking about just now.
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Child items 0
Show closed items
No child items are currently assigned. Use child items to break down this issue into smaller parts.
Linked items 0
Link issues together to show that they're related.
Learn more.
Indeed, this does seem like a complex problem. For the specific use cases that Namecoin has, my guess is that (4) is probably the simplest and least-error-prone option. However, a concern comes to mind: if a Namecoin lookup is considered to have the same isolation properties as the stream created by the client application that generated the Namecoin lookup, this presumably reveals to someone (who? The exit relay? A Namecoin P2P node? The service being accessed? All of the above?) that the Namecoin lookup and the client application stream are linked.
Is this a problem? If the protocol is HTTP(S), then the accessed service can already tell that it's being looked up using Namecoin by checking the Host header (and if TLS isn't used, then the exit relay can also tell this). If the protocol uses TLS, then the accessed service and the exit relay can tell it's being looked up using Namecoin by checking the SNI header (although SNI could be disabled for some protocols that use TLS, and perhaps a future version of TLS will offer SNI encryption). If it's some other protocol that doesn't have a Host-like header, then I don't see any way that the use of Namecoin is inherently detectable. Since Tor is used for arbitrary TCP traffic, I am hesitant to endorse a solution that unnecessarily reveals to adversaries that Namecoin is in use, particularly since Namecoin usage is rather rare right now and therefore this narrows the anonymity set substantially.
From the point of view of the Namecoin P2P node, looking up a Namecoin name doesn't necessarily reveal everything about how it will be used. For example, it normally doesn't reveal what subdomain is being looked up, nor does it reveal what record types are being looked up (IPv4, IPv6, TorHS, TLSA, SSHFP, etc.). It's not immediately obvious to me how much of this extra data is revealed, and to whom, as a result of the service being accessed over the same circuit, but it makes me uneasy without a clear, convincing argument that it's safe.
So, possible solution: offer an operation of the form "make stream A have the same isolation properties as stream B, except that stream A must also be isolated from stream B".
Is this modification worth it? Would it put much extra load on the Tor network? Are my concerns about revealing extra information to adversaries justified, or am I being overly cautious about something that isn't really a threat?
I ended up going with Nick's option (1), for the following reasons:
It keeps the Tor patch simple, and allows the controller (which is written in a higher-level language (Python in my case) and is therefore less prone to memory safety bugs) to handle the isolation logic.
For performance reasons, Namecoin needs to preemptively open connections before the STREAM events are received. So whatever isolation data the controller receives will be used to assign resolution commands to existing Namecoin connections; this limits the applicability of Nick's option (4).
There's already precedent for this in the CIRC event (which sends the stream isolation data fields verbatim), and it seems weird to make the STREAM event have a wildly different design than the CIRC event for the same functionality.
I infer from the keywords on this ticket that you'll also want a spec patch; is that correct? Is there anything else needed for this?
That's right; the spec patch should document the new behavior in control-spec.txt.
Okay, I'll work on that next. I should have a spec patch for you within a few days.
It would also be really good to have tests for the new code here.
This patch is mostly a copy/paste job of the existing code for the CIRC event. So ordinarily I'd base the tests for this patch on the CIRC tests as well, but a brief look at the tests didn't find any tests for that CIRC code. Am I correct in thinking that the CIRC stream isolation fields don't have any tests right now, or did I miss something?
If that's correct, I'm still okay with writing tests if you want, but given that this is my first time hacking on the core Tor code, it's likely that I'll ask for some minor hand-holding on IRC, as the test codebase used by core Tor is not something that I have any familiarity with.
Thanks for the patch! Before I get too deep into the code, I'll review the spec.
It would be good if the programmer who uses this feature doesn't need to know in advance a complete list of all the different things that can be isolated. So for example, if we later add isolation based on source port, or isolation based on destination address or TLD, it would be good if the programmer doesn't need to change their code accordingly.
So what if, instead of having a bunch of different ISOLATE_XYZ flags, we have one item that is a comma-separated list of entries flags, each matching the name of one of the other fields?
Also, I suggest that we mark all of these fields as optional, since older Tors will not provide them, and not all protocols necessarily have them.
Finally, the list of ClientProtocol values should mention that they can be extended in the future.
Looking at the code, it looks plausible. I'd suggest writing unit tests mainly for the new entry_connection_describe_status_for_controller function. Also the argument to that function should be const. But before you revise the code, I'd suggest doing a new version of the spec patch so we're on the same page about the desired behavior.
So what if, instead of having a bunch of different ISOLATE_XYZ flags, we have one item that is a comma-separated list of entries flags, each matching the name of one of the other fields?
If I'm understanding you correctly, you're suggesting having an ISO_FIELDS field, such that ISO_FIELDS=FOO,BAR means that the FOO and BAR fields of the STREAM event represent fields that the SOCKS port is configured to isolate on? I think this is a good idea, but there is a caveat. Tor can isolate based on destination address, destination port, and client address, but those don't have a 1:1 correspondence to STREAM fields (they're components of fields that contain both an address and a port). So in order for this to work, I'd need to add 3 additional fields to STREAM events that are arguably redundant. Is that acceptable?
I think that's acceptable if we have to. Alternatively, we could declare that DESTADDR, DESTPORT, and CLIENTADDR are handled specially, but that all other and future isolatable fields will be handled uniformly.
Alternatively, we could declare that DESTADDR, DESTPORT, and CLIENTADDR are handled specially, but that all other and future isolatable fields will be handled uniformly.
This sounds preferable to me, although I think we should specify that those special values might be replaced by correspondingly named fields in the future, i.e. controllers should fall back to the special behavior if those fields don't exist, but should use the named fields if they are present. That way it's easier to transition away from the special behavior in the future if we want to. Also might as well include CLIENTPORT even though Tor doesn't currently isolate based on it, since the client port is included in the same field as CLIENTADDR. This way we don't need to change the control protocol if Tor decides to add client-port-based stream isolation in the future.
My tor patch is triggering a build warning problem function-size for control_event_stream_status. It looks like there's already an exception for this function. The standard threshold for triggering a warning is 100; the status quo for that function (as listed in exceptions.txt) is 118; my patch increases it to 124. Should I just modify the exception to allow 124, or am I going to need to do some refactoring of that function as a prerequisite to merging?
Alas, I wasn't comfortable refactoring control_event_stream_status, as I'm insufficiently confident in both my C skills and my familiarity with the Tor daemon codebase to really want to touch that code... I'd be worried that I'd introduce a bug. Probably safer to let someone else who has better familiarity with that code do it.
@nickm, feel free to review these, as I think I've made all of the needed changes. Also I tested the code patch with Namecoin's stream isolation code yesterday, and everything appears to work properly (specifically, I confirmed that Tor Browser's New Identity button, New Tor Circuit button, and first-party domain isolation all work correctly with Namecoin when this patch is applied to the Tor daemon, so that at least confirms that the NYM_EPOCH, SOCKS auth, and ISO_FIELDS fields are working with real-world software).
The code changes and tests are looking pretty good. When I merge the spec changes, I'll want to add a note to the spec explaining which version added them, since that's something developers like to see.
CI isn't passing; the unit test seems to be failing with a use-after-free problem:
control/event/format_stream: [forking] ===================================================================35695==ERROR: AddressSanitizer: heap-use-after-free on address 0x602000627e90 at pc 0x00010a99e059 bp 0x7ffee6cd4160 sp 0x7ffee6cd3900WRITE of size 6 at 0x602000627e90 thread T0 #0 0x10a99e058 in wrap_memset (libclang_rt.asan_osx_dynamic.dylib:x86_64+0x1b058) #1 0x7fff7f5fbb0c in memset_s (libsystem_c.dylib:x86_64+0x6db0c) #2 0x1096d6c99 in memwipe crypto_util.c:82 #3 0x1094d695c in socks_request_free_ proto_socks.c:99 #4 0x1093de8da in connection_free_minimal connection.c:685 #5 0x10939d3d3 in testcase_run_one tinytest.c:107 #6 0x10939dca2 in tinytest_main tinytest.c:454 #7 0x10939b8e7 in main testing_common.c:350 #8 0x7fff7f5443d4 in start (libdyld.dylib:x86_64+0x163d4)0x602000627e90 is located 0 bytes inside of 7-byte region [0x602000627e90,0x602000627e97)freed by thread T0 here: #0 0x10a9e1bed in wrap_free (libclang_rt.asan_osx_dynamic.dylib:x86_64+0x5ebed) #1 0x1090a3e5e in test_cntev_format_stream test_controller_events.c:666 #2 0x10939d3d3 in testcase_run_one tinytest.c:107 #3 0x10939dca2 in tinytest_main tinytest.c:454 #4 0x10939b8e7 in main testing_common.c:350 #5 0x7fff7f5443d4 in start (libdyld.dylib:x86_64+0x163d4)previously allocated by thread T0 here: #0 0x10a9dbbbf in wrap_strdup (libclang_rt.asan_osx_dynamic.dylib:x86_64+0x58bbf) #1 0x109778e3f in tor_strdup_ malloc.c:165 #2 0x1090a3316 in test_cntev_format_stream test_controller_events.c:552 #3 0x10939d3d3 in testcase_run_one tinytest.c:107 #4 0x10939dca2 in tinytest_main tinytest.c:454 #5 0x10939b8e7 in main testing_common.c:350 #6 0x7fff7f5443d4 in start (libdyld.dylib:x86_64+0x163d4)
To me, this looks like you're setting something up in a fake connection objecct that is getting double-freed.
If you want to test for this bug yourself, you can run configure with "--enable-fragile-hardening", and the resulting binaries will have sanitizers enabled. Using "--enable-fragile-hardening CC=clang" will generally give nicer-looking results.
If you want to test for this bug yourself, you can run configure with "--enable-fragile-hardening", and the resulting binaries will have sanitizers enabled.
Thanks for pointing me to that. AFAICT this flag isn't mentioned in the doc folder of the tor repo, except the ReleasingTor.md file. Hence why I didn't think to test with it enabled. Should I file a ticket about adding that to the docs more prominently?
If you want to test for this bug yourself, you can run configure with "--enable-fragile-hardening", and the resulting binaries will have sanitizers enabled.
Thanks for pointing me to that. AFAICT this flag isn't mentioned in the doc folder of the tor repo, except the ReleasingTor.md file. Hence why I didn't think to test with it enabled. Should I file a ticket about adding that to the docs more prominently?
Sure! Additionally, we should replace references to --enable-expensive-hardening (an old name) with --enable-fragile-hardening, and remove the contrib/clang/sanitize_blacklist.txt file as obsolete.