Have a separate backend for the tor circuits

marked this issue as related to #41600 (closed)

mentioned in merge request !587 (merged)

marked this issue as related to #40938 (closed)

added Roadmap::Future label

added Feature label

added Desktop label

marked this issue as related to #40982 (closed)

mentioned in issue #40982 (closed)

mentioned in issue #40981 (closed)

marked this issue as related to #40981 (closed)

marked this issue as related to #40983 (closed)

There are a couple of UX features that are currently blocked by this:

Providing for a disabled state for the previous circuit after requesting a new circuit (see this design in Figma, for reference).
Animating the circuit display chrome button from grey to the colored gradient when a new circuit has been established.

changed the description

added Circuit Display label

mentioned in issue #41600 (closed)

We could observe the new identity signal and clean the circuits at least when we observe it.

It isn't a solution to the whole problem, but it might help.

I wonder if listening to the circuit status changes (not sure it's possible), and using the CLOSED status could help.

mentioned in issue #41768

changed the description

The separate backend could be helpful also for the current bridge in use case.

I've just remembered I had to implement another event watcher in browser/components/torpreferences/content/connectionPane.js because I could not access the data the circuit display already collected.

We will have to rework it for Arti for sure, which will directly give us the data of the bridge in use, hopefully. At that point, having a unified backend would be a great thing.

thanks for pointing this out. Mentioned the overlap in the issue description.

changed the description

marked this issue as related to #41844 (closed)

assigned to @pierov

removed Roadmap::Future label

added Doing label

Last week I had some downtime because I'm waiting for my 115 MRs to be reviewed. So, I took the occasion to finally rework Torbutton and maybe demolish that commit ️.

One of the steps I found is removing direct calls to the controller (#41844 (closed)), for two reasons:

I can modernize/refactor the control port authentication without worrying of its consumers (except for a few ones I can easily control)
adding a middle layer that adds some abstraction will be useful for having multiple backends, like Arti (#41843 (closed)).

We didn't have many direct consumers. Basically, only the onion authentication, the bridge section of the control panel, and the circuit display.

So, I took this issue to address it in TorMonitorService, that has a persistent connection to the control port and already listens for several events.

My first attempt won't be the definitive solution. But at least I hope it contributes to getting a broader knowledge of the tor-related patches and making them clearer. So, for now I'm taking the issue, doing some changes, but probably we'll have to leave it open for further work.

The strategy I came up is to collect some data for circuits and streams.

For circuits, add it to a map CircuitID: Node fingerprints in the BUILT event, and then remove it on its CLOSE event.
For streams, create a map StreamID: {CircuitID, SOCKS Credentials}, add entries on the SUCCEEDED event (we can also use SENTCONNECT, if we think it's better) and remove them on the CLOSED event.

As a proof of concept, I've created a third map (SOCKS Credential: StreamID), for lookups on the stream map without going through all of it. But it might be not needed, eventually. (Again, it's a duplication of data to do direct lookups. Worth it, or premature optimization?).

We would also want it to track information about what bridge is in use, as part of this. Right now gConnectionPane in "about:preferences" is having to do a similar but separate approach to gTorCircuitPanel to determine whether a bridge is in use, and which one.

gConnectionPane currently only collects circuit data while "about:preferences" is loaded. So it will not know which bridge is in use until it receives a circuit event.

This seem to be the easiest part, but it has some caveats.

We constantly get data about the circuits that are built. Theoretically, we can just look at the first node and it should be our bridge.

However, when we set bridge lines, Tor checks them, to decide which one to use, and it opens a lot of circuits to do so.

We have several possibilities:

update the current first node only after seeing stream events, with the side effect of (possibly) delaying the update in the settings (this is the current strategy)
- with a persistent backend this should unlikely happen; maybe it's more likely to when you're bootstrapping and looking at the bridge cards at the same time
- least complex solution: we keep only the ID of the currently connected bridge
display the bridge that appears in the highest number of circuits as the currently connected one
- empirically, I've seen that bridges that are only tested and never used appear in only one circuit
- we already collect the data with my proposal, but checking all the circuits every time a new one is added seems silly. We could rather add a map NodeID: Number of circuits for each bridge, and maybe a string with the highest key.
- as a proof of concept, I've tested this, and it seems to work. I'm not sure the higher complexity is worth it.
do a mix of the two: rely on circuit events only until we see a stream event.
- most of the complexity. We should do it only if the delay of 1 is unbearable and 2 produces results we're not satisfied with (but why use it for the initial stage, at that point?)

It's worth mentioning that tor keeps a secondary bridge, and we can detect it by looking at the circuits. Should we ever display it? Probably something to discuss with UX + AC (and maybe even net team).

We gather circuit node data but we never free it, or know when it is safe to free it. This is because the tor process itself does not store the circuit data after it has already expired, but in the browser we still need that information to tell the user what circuit was used to show the current page.

This problem (aka #40982 (closed)) is a hard one. Streams are closed quickly, and we need least one with a certain pair of credentials to get the related circuit.

We could completely skip the streams, and only update a map Credentials: CircuitID when we see streams. Then, when a circuit is closed, we can iterate and remove all the unneeded circuits.

Circuits generally live longer than streams, but a user could be interested in seeing the circuit much later than the circuit has been closed, for any reason.

I've seen the circuit display has a WeakMap<MozBrowser, BrowserCircuitData> to the last used circuit. Maybe we could change it to WeakMap<MozBrowser, Map<Document URI, BrowserCircuitData>>, instead, to show the (stale) data that has been used to actually load the page?

Populating/updating this map should be very quick: either we have the data already, or we cannot update it. No communication on the control port should be involved at this stage.

Then, the lifetime of each browser will also determine when the data will be cleaned. Do we risk of accumulating too much data?

Or, can we add custom data to History only for the privileged side?

gTorCircuitPanel is not shared between windows. As a result, a web page split across two windows can be missing information if the other window hasn't picked up the circuit event. E.g. if you open a web page in a new tab and open the same page in a new window, then the new window will be missing the circuit display.

The WeakMap risks of not solving this problem, if it runs in chrome JS. History data, on the other hand, is usually kept also when moving a browser. I will have to investigate more.

We might be able to address this by integrating the circuit node information with the firefox request process by storing it within some relevant object, so it only sticks around for as long as it is in use. I don't know much about the firefox process, but maybe something like the page's loadInfo?

Is it living enough? Or is it deleted after the loading has ended?

A good thing about this object is that we can access it directly from the domain isolator. We register a filter function with ProtocolProxyService.registerChannelFilter. This function takes the channel (which includes a loadInfo) and the original proxy data that we change.

Anyway, I think this (changes at the Firefox internal level) is more involved then what I'd like to do now, so I'd leave it for a second revision.

On the other hand, changing the domain isolator without changing Firefox's interface is doable also for this revision.

We don't have a clean way to determine whether the current page is establishing the initial circuit, or requesting a new circuit, or will never get a circuit (like a data: uri). We do some guess-work to get around this, but it also means we cannot implement designs like

!587 (comment 2888415)

when you request a new circuit for a website, if it is slow enough to reload then there is a period of time where the old circuit is lost, but information about the new circuit is unknown. So we have an empty circuit:

I'm a little concerned that users may be alarmed, or the empty state could flash. Could we try a more explicit working state instead?

The old code was kind of hacked on top, rather than integrated. Basically, it just harvested all the circuit data it could find, and we try and map the currently shown <xul:browser> to one of these circuits. And if none is found, the page has no circuit, but there is no information to discern whether it will have one, whether it used to have one, or whether a circuit is not applicable for the page (e.g. a "data:" URI like data:text/plain,hello).

So, right now I could only guess when we are requesting a new circuit for the currently shown page. But there could be certain conditions that break this.

!587 (comment 2892297).

Actually, would it be possible to restrict the animation to when the circuit is initially established, so it doesn’t re-animate between pages of the same site?

That would also provide some nice feedback that the circuit is reloading after requesting a new circuit.

if we had an integrated backend, it could track the exact state.

Side note

I've tried to have a look at what events I get from the control port.

I couldn't get a circuit to be created after I started a request.

From what I understand, tor continuously creates spare circuits, to be ready to create a stream on it. I should tweak the torrc to try to make it less responsive and/or try harder with creating circuits . At that point, a log parser might be useful, too .

I think we could rework the matter in this way: «can we get the circuit information in a more direct way?». I believe we will always pass through the SOCKS credential. So, the correct question becomes: «is there a way to get the SOCKS credentials directly?».

It is very easy for the "New circuit for this site" part of the problem.

I refactored the domain isolator a few months ago, but I kept it back from 12.5 for 13.0 (it's been merged on the various 102.x-13.0 branches). At the moment (my patches on 13.0), when the users wants a new circuit, the circuit display calls TorDomainIsolator.newCircuitForBrowser, and this function doesn't return anything. We could make it return the new sets of credentials.

For the other cases, we need to find a way to make the domain isolator communicate with the circuit display. One way is channel.loadInfo.innerWindowID. It'd be nice if we could inject the SOCKS credential to the window, by knowing its ID, but I'm not sure it's possible. Another way would be an observer. Or the domain isolator could also keep a map windowID: SOCKS credential, and the circuit display could poll it at the right moment (when can we empty that map? After a certain window ID has been polled?). Maybe that could solve the catch-all case, that right now doesn't work in some cases (e.g., local file containing a remote resource, e.g., a HTML file loading an image or a script).

Good news: I've noticed that probing circuits are usually only one hop long, so they are easy to filter out.

So, looking at the first node of a circuit actually works, without even waiting for a new stream.

Bad news: I think speaking of the connected bridge is wrong.

I've tried to load a few sites, to get circuits generated, and here's what I found out:

TorMonitorService._circuits
Map(18) { 12 → (3) […], 13 → (3) […], 14 → (3) […], 15 → (3) […], 16 → (3) […], 18 → (3) […], 19 → (3) […], 20 → (3) […], 22 → (3) […], 23 → (3) […], … }

size: 18

<entries>

0: 12 → Array(3) [ "86AC7B8D430DAC4117E9F42C9EAED18133863AAF", "B578F226720D1196E4D75388CBFB7DDC863A4F82", "0011254CC8444369B20EF11156B8990438221A54" ]
1: 13 → Array(3) [ "C5B7CD6946FF10C5B3E89691A7D3F2C122D2117C", "7551C1446DBA7BCF8395389A125445E71952D467", "FC728F329C92D67A435EFBA1D34B5933DAA60F62" ]
2: 14 → Array(3) [ "C5B7CD6946FF10C5B3E89691A7D3F2C122D2117C", "4D65D8653311CAB5501E81F7914E9BD1B51F6EDF", "A3AFBDEE30238E44899C9F8B7666D12B09C8EE32" ]
3: 15 → Array(3) [ "86AC7B8D430DAC4117E9F42C9EAED18133863AAF", "9715C81BA8C5B0C698882035F75C67D6D643DBE3", "66476BFA0F95111B009077EF5A70B86B6FF5C72E" ]
4: 16 → Array(3) [ "C5B7CD6946FF10C5B3E89691A7D3F2C122D2117C", "4D65D8653311CAB5501E81F7914E9BD1B51F6EDF", "9715C81BA8C5B0C698882035F75C67D6D643DBE3" ]
5: 18 → Array(3) [ "C5B7CD6946FF10C5B3E89691A7D3F2C122D2117C", "4055CDDFF7B3F9E6A50447609A3014753A82EB26", "E347622E1228CB1490817B9E78DE2107CC17E1B4" ]
6: 19 → Array(3) [ "86AC7B8D430DAC4117E9F42C9EAED18133863AAF", "AFEAF3A9E0DB1D837BE8FF1983BA0C65A3E71D73", "BD50D26C4F6D4A33769C1E18AD0CFB1306415227" ]
7: 20 → Array(3) [ "86AC7B8D430DAC4117E9F42C9EAED18133863AAF", "B5C35473EF851462D36AA93B3F71AF4FD502ACCC", "BBE1DBF6009B6267AFB4DEF789F62FD9D8A940A4" ]
8: 22 → Array(3) [ "86AC7B8D430DAC4117E9F42C9EAED18133863AAF", "F7A052D4EEA2F4BC942DFB054AF2DC54A2A37E5D", "A903E420F915A67FAA679C7E6B70B140AE2A303E" ]
9: 23 → Array(3) [ "86AC7B8D430DAC4117E9F42C9EAED18133863AAF", "FC6F665E3C0637976DFF2E128E2DA2684E6633AA", "A61B56A50D13DC29DBB23C6A033FA3F0C57420CF" ]
10: 26 → Array(3) [ "86AC7B8D430DAC4117E9F42C9EAED18133863AAF", "728F97D5BCB131698814D8C713C2220C6E7267DE", "D5DE257E30E5CE44A187D308298C8FCDC1B013B9" ]
11: 27 → Array(3) [ "86AC7B8D430DAC4117E9F42C9EAED18133863AAF", "9311F380700D55F2BC955176664343247C36025C", "6A6A34B55DF1B0A1D97376721E7669A26ACD447C" ]
12: 46 → Array(3) [ "C5B7CD6946FF10C5B3E89691A7D3F2C122D2117C", "3B953203AF332D8FE1452E1CE7CB50A3B5297DB2", "6B61EFE3AEDEB3351FD3C910443D95556316E01C" ]
13: 60 → Array(3) [ "86AC7B8D430DAC4117E9F42C9EAED18133863AAF", "8929D361CA7E924F4A3B115E6F37CD341CBCC0BE", "ABDDC1461F11280854474DDE523C98629C0F95E4" ]
14: 58 → Array(3) [ "C5B7CD6946FF10C5B3E89691A7D3F2C122D2117C", "699653E5DF094B56775275F8A4F4BFA7C5BD5E2F", "AD08584AC6A2A421DAEDF227DA6F0DE53DFE40B6" ]
15: 65 → Array(3) [ "86AC7B8D430DAC4117E9F42C9EAED18133863AAF", "7FC53C8FEB30944C9D0EF0D124B26F7052112FC5", "8E6EDA78D8E3ABA88D877C3E37D6D4F0938C7B9F" ]
16: 64 → Array(3) [ "C5B7CD6946FF10C5B3E89691A7D3F2C122D2117C", "24A818D9F1E09F1845FC1589DE5AAF15C8E0867E", "9C61FC0A01401EDF71C4048665E53968E81351FC" ]
17: 63 → Array(3) [ "C5B7CD6946FF10C5B3E89691A7D3F2C122D2117C", "B578F226720D1196E4D75388CBFB7DDC863A4F82", "AC14D7773BFA1D25E4CFB94648F0BC893DD19E37" ]
18: 67 → Array(3) [ "86AC7B8D430DAC4117E9F42C9EAED18133863AAF", "5C8B811887778DCF705F3D39F19E40A21889451F", "57C081A676BAF4E08BF644BBFF6CC295126E44F5" ]

Out of 18 circuits, 11 were created with a bridge, and 7 with another one. It means about 2/3 and 1/3.

The trend continues like this, and the current bridge switches continuously.

I've seen the circuit display has a WeakMap<MozBrowser, BrowserCircuitData> to the last used circuit. Maybe we could change it to WeakMap<MozBrowser, Map<Document URI, BrowserCircuitData>>, instead, to show the (stale) data that has been used to actually load the page?

Is the idea to store the URI to search for the circuit instead of using credentials? I think the main issue with this is that when you open the web page in another tab, it will not share the same MozBrowser so you won't be able to look up the circuit.

Populating/updating this map should be very quick: either we have the data already, or we cannot update it. No communication on the control port should be involved at this stage.

Then, the lifetime of each browser will also determine when the data will be cleaned. Do we risk of accumulating too much data?

Or, can we add custom data to History only for the privileged side?

As part of nsISHEntry? I think this has the same issue with opening the same page in another tab, it won't share the history to look up.

We might be able to address this by integrating the circuit node information with the firefox request process by storing it within some relevant object, so it only sticks around for as long as it is in use. I don't know much about the firefox process, but maybe something like the page's loadInfo?

Is it living enough? Or is it deleted after the loading has ended?

I'm not sure, I just chose any of the internal structures that looks like it might stick around as long as the page is loaded and cached. I'm not sure where loadInfo is the right place to look, but overall we would ideally have:

When a page is loaded (with network activity) we want to be able to determine its circuit.
The circuit remains in memory and accessible for as long as the page is cached, but not longer.
We know definitively when a page is being loaded for the first time or reloaded with a new circuit, so that we know to ignore the cached circuit and display some loading state in the UI.

I was already adding a comment while you commented, too .

When a page is loaded (with network activity) we want to be able to determine its circuit.

I think some integration is actually possible.

The Domain Isolator can even access the circuit display (but for some reason it breaks it, i.e., the button isn't shown anymore, I will have to investigate more).

As a proof of concept, I've tried this:

        const channel = aChannel.QueryInterface(Ci.nsIIdentChannel);
        const { username, password } = this.getSocksProxyCredentials(
          firstPartyDomain,
          userContextId,
          true
        );
        const circuitDisplay =
          channel.loadInfo.browsingContext?.topChromeWindow?.gTorCircuitPanel;
        if (circuitDisplay) {
          circuitDisplay.setProxyCredentials(
            channel.channelId,
            channel.loadInfo.innerWindowID,
            username,
            password
          );
        }

The circuit display then receives the data.

The first request has channel.loadInfo.innerWindowID === 0, which is a little bit annoying. The rest of the requests have a real innerWindowID, which can be used to detect that a local file, (or something else) includes some remote elements and needs to get the circuit display shown.

I still haven't figured out how I can get the channelId, but we always have one. Being able to use it when we don't have a window id would be great.

Then, I've tried to create a topic on TorMonitorService, that notify observers when a stream goes to SUCCEEDED. In this way we can push both the credentials and the circuit (the node fingerprints, for now) to the circuit display. We don't even have to store streams and credentials anymore, if the observer also manages the circuit cache.

I'm still trying to figure out the various pieces, so I haven't thought where we should collect the node data. As long as it doesn't use tor-control-port.js directly I could be happy for now, and come back to a better implementation later .

We know definitively when a page is being loaded for the first time or reloaded with a new circuit, so that we know to ignore the cached circuit and display some loading state in the UI.

We can key the cache on the credentials, and show the animation whenever we have a miss. Do you think this could work?

The circuit remains in memory and accessible for as long as the page is cached, but not longer.

For that I don't have a solution, yet.

I think the main issue with this is that when you open the web page in another tab, it will not share the same MozBrowser so you won't be able to look up the circuit.

I've tested earlier, and from what I could tell, MozBrowser is moved between different windows. Of course, we'd need to store the WeakMap on the "process scope", not on the "chrome scope".

In other cases, if you open a link in a new tab or something like this, you also do a network request, and at that point, we'll have fresh data from the controller, unless the page is shown from (memory) cache.

I still haven't figured out how I can get the channelId, but we always have one. Being able to use it when we don't have a window id would be great.

Well, the channel also contains the URI (and originalURI), I forgot about that in my previous comment .

If it matches the requested URI and the user context ID, it should be safe to assume we can use these credentials SOCKS to match the circuit for that tab, even though we didn't get an innerWindowID.

@pierov

We know definitively when a page is being loaded for the first time or reloaded with a new circuit, so that we know to ignore the cached circuit and display some loading state in the UI.

We can key the cache on the credentials, and show the animation whenever we have a miss. Do you think this could work?

I think we want something direct to be able to improve on what we do now. For example, we want to know the difference between waiting for an initial load and a new circuit request.

If domain-isolator handles all of this, then it would also know when a circuit is being re-requested, so it could just flag the stored circuit as such.

It seems that moving all of this circuit data and SOCKS credentials out of gTorCircuitDisplay and into domain-isolator might be a good first step. And gTorCircuitDisplay is able to request circuit data from domain-isolator by sending in the browser and register itself to be notified of any new circuit states.

It seems that moving all of this circuit data and SOCKS credentials out of gTorCircuitDisplay and into domain-isolator might be a good first step. And gTorCircuitDisplay is able to request circuit data from domain-isolator by sending in the browser and register itself to be notified of any new circuit states.

That could definitely work.

I will try to do something.

If domain-isolator handles all of this, then it would also know when a circuit is being re-requested, so it could just flag the stored circuit as such.

Yes. #40982 (closed) maybe isn't clear about that, but we have maps to clear also there (the association FPD: random nonce and User context ID: nonce).

Deleting an entry when the related circuit has been closed sounds like an idea.

(Moreover, the user context ID is taken into account, but there isn't any way to refresh it, but that's another story).

For example, we want to know the difference between waiting for an initial load and a new circuit request.

I'm sorry, but the difference isn't clear to me .

From the backend point of view they seem the same to me: you don't find a nonce, so you generate one.

From what I understand, you're referring to this case:

https://gitlab.torproject.org/tpo/applications/tor-browser/uploads/d303982123e3faa9d08078e648b4d595/circuit-display-new-circuit-light.png

Since it's an animation I wonder if we need to handle with that from the frontend, instead.

I'll see what I can come up with, or I'll ask you more information.

mentioned in issue #41851 (closed)

marked this issue as related to #41851 (closed)

@donuts there's a UX question here which might change what we want in the back end. Consider this set up:

Open "example.org/page1" in one tab and "example.org/page2" in another tab. Both are loaded over circuit A.
Reload "example.org/page1" with a new circuit: circuit B.
Switch to the "example.org/page2" tab.

In this tab, do we want to show cirucit A or circuit B? Note that:

The top document for /page2 was not reloaded, so the visible page was loaded with circuit A.
Any ongoing or future connections for /page2 will use circuit B. E.g. if the page wants to do some network requests. If the page is static, then circuit B will only be used in this tab when moving to another page or reloading.

Right now, we show circuit B. Is this generally what you expect? To show the active circuit rather than the historic circuit?

To show the active circuit

Notice also that a circuit has its own lifespan.

So, if you load a page, go doing something else for a certain amount of time (let's say one hour), then come back, the old circuit won't exist anymore.

So, we should speak of last known rather than active.

@pierov yes, there is that extra detail. For the example, I'm assuming we're not waiting so the circuit is still active.

I think it only takes about 10 minutes for the circuit to expire, and for busy websites that like to do a lot of networking (tested with amazon dot com) you can actually see the circuit display switch whilst seemingly doing nothing on the page :)

But that is a similar case, do we show the circuit that loaded the top document, or do we show the circuit that was last used within the page.

If we decide we can say that a circuit has expired and don't display anything caching becomes very easy, just delete all the entries when we get the circuit close signal .

@henry @donuts I have a stupid proof of concept!!

animation

When a browser is detected, but we don't have nodes, yet, instead of hiding the panel, I set opacity to 0.4.

Of course, we'll need a better animation, but hey, we know it's feasible!

It works also to show the catch-all circuit.

It's a bit clumsy, because the heading says "Circuit for " without a domain (should we introduce the catch-all circuit to users?).

Also, I've tried to load a remote resource with a timeout:

Code

<html>
<body>
<script>
setTimeout(() => {
	const s = document.createElement("script");
	s.setAttribute("src", "https://code.jquery.com/jquery-3.7.0.js");
	document.body.append(s);
}, 10000);
</script>
</body>
</html>

It works. The circuit display icon isn't visible initially, then it appears as soon as the script is loaded.

But this is also clumsy, because if you reload the document, it will have the icon already.

Anyway, we can fix the details after merging the backend .

marked this issue as related to #41008

marked this issue as related to #40998 (closed)

marked this issue as related to #40984 (closed)

marked this issue as related to #40946 (closed)

marked this issue as related to #40884 (closed)

As suggested by Henry, the new backend should take into consideration the case of users who disable circuit isolation.

mentioned in merge request !699 (merged)

removed Doing label

added Next label

unassigned @pierov

removed Next label

added Backlog label

removed Backlog label

added 13.5 stable Roadmap::Future labels

mentioned in issue #40984 (closed)

mentioned in issue #41933

removed 13.5 stable label

added 14.5 stable label

added 14.0 stable label

removed 14.0 stable label

Have a separate backend for the tor circuits

Designs

Child items ...

Activity