Tor Browser needs from Arti

We've been told on IRC that the best place to do requests for the RPC layer is here, so here we are 😄.

Lower level stuff

Tor Browser handles different kind of users.

I'd expect the majority of users to just use the process started by the browser. (In this case, the browser needs to be able to configure the implementation).

But we have a notable exception: Tails. In this case, we still connect to the control port, but only for stuff like the circuit display. I'm not sure on how this will be handled. An idea is that Tails might have only one Arti (Arti Server?) instance, and somehow the browser will be given some credentials to access a restricted set of commands.

For all the other cases, I think running Arti in the browser's parent/main process would work for us. (Firefox is multi-processed, but we run the JS to interact with little-t-tor in the main process; I think an in-process Arti would also run in the parent process, but I'm not sure). Firefox has already a bunch of async Rust and it offers facilities to spawn more async stuff (we saw something with ahf in Sweden, maybe it was this).

But it isn't a big deal if eventually we'll have to start another process, we already do it for little-t-tor. The most annoying part is how to connect to Arti's interface. For the control port, we use a fixed port number (and some users are unhappy about this, because it prevents to have different Tor Browser instances running at the same time). On Linux we support Unix-sockets, and we pass its path on the command line (but TCP control port is the default also on Linux).

In my opinion, the most important thing for us is to be able to use a single interface.

Eventually we'll consume everything in JS.

I guess an official JS implementation would use long-polling (with fetch?) or WebSocket. fetch runs in privileged JS. WebSockets work more or less, in an not-very-elegant way.

Otherwise, we have access to TCP sockets, but it's a Firefox-only API. I don't know if we can easily adapt to a Node.js-like API.

I also guess an official JS implementation would target Node.js, be in npm and possibly have some other JS dependencies. We're not very used to pull in stuff from npm (esp. if it has dependencies), but we can work that out.

Otherwise Firefox has well-established mechanisms to pass stuff between JS and Rust that allows you to avoid FFI. (Spawning an async task, and consuming its result seems to be easy enough. But in the worst case scenario, we could probably use Firefox's event broadcasting mechanisms). Even passing the JSON strings (in the past you talked about a JSON-based format) without doing any parsing in the Rust side down to JS would probably work for us.

@ma1 knows Firefox internals much better than me and I'm sure he will have a lot of additional information and opinions 🙂.

Commands we use

Tor Browser's current interactions with the control port live in a couple of files: TorProvider.sys.mjs and TorControlPort.sys.mjs (these links are from the current alpha version, but will be outdated in a few weeks, because we change branch every time there's a new Firefox release).

We've done our best to abstract from the control port, and write a few functions based on our needs from the browser point of view, but it's likely that some functionalities are still influenced by how the control port works. So, we think that when we'll start integrating Arti, we might update our API, or adapt to it to make the integration easier.

At the moment, the calls we have are:

a function to write settings
- network enabled/disabled: we use it to start/stop the bootstrap
- bridges: enabled/disabled and bridge lines
- proxy: address (socks4/socks5/https) + authentication (socks5/https)
- ports for fascist firewall
a function to flush settings
- this might be a piece of legacy from when we wrote in torrc, but we don't need the underlying layer to save settings, we save them in Firefox preferences
connect/stopBootstrap
- actually implemented with SETCONF DisableNetwork
newnym
- IIRC, even though we use SOCKS isolation, we still do newnym for something related to onion services
a function to get the configured bridges
- it seems not to be used directly, even though we still query tor about the bridges in its configuration when we create the data for the circuit display
a function to get the configured pluggable transports
- it's used when initializing the domain fronting for censorship cirumvention. But we could do something else and skip this.
a function to get the current bootstrap status
- this is likely something we will have to fix a little bit on our end, because we just relay all the infromation tor sends us to the upper layers (the connection page's state machine), but we don't check them (we look for anything that is in the form key=value and return a generic object with any key we find)
a function to get the information about a node, given its id
- IP addresses (v4 and v6 if available; in case of PTs such as Snowflake, we return address from the bridge line rather then the actual IP)
- region code (for regular relays)
- bridge type (vanilla/PT)
- this might fail for C-tor, if the user specified a bridge line without a fingerprint 😒
onion authentication
- add key (optionally, save it in some key storage!)
- remove key
- list known keys
logs
- we listen to the various ERR/WARN/NOTICE, but a direct way of getting logs would be appreciated 🙂
a function to get the current bridge
- little-t-tor doesn't actually have this feature at the moment, so we look at every new circuit, and check the first node
ownership stuff
- the browser takes the ownership of the tor daemon by default, but my preference would be for running it in-process
we listen to some events
- circuit, stream: we proactively populate some data structures with all the streams and the circuits we see for the circuit display. We basically need to be able to associate the SOCKS credential to the circuits. In some cases (Tails with onion grater) we have some misses in our structures, and we also query the circuits with GETINFO. In general, we would like to have a more direct way of getting the data for the circuit display and for the current bridge.
- status_client: we use it to update the information in the bootstrap page. The handler shares most of its code with the function to query for the bootstrap status.
- notice/warn/err: we collect these events to show them to the user in about:preferences. Having a functions to get all the logs would also work for us for this purpose, but in addition to that, when we see a warn or err we show a "Show logs" button in the connection page.

/cc @richard