For many years, it's been expected that people who use Tor to surf the web, need to hide the fact that they are using Tor, and that the best way to achieve that is to attempt to mimic some other browser - typically Windows-Firefox or similar.
There is no argument that other protections such as protecting screen sizes and so forth, continue to be useful in protecting the user against fingerprinting and tracking, so those and other protections should certainly continue.
However, there can be no reasonable argument against the observation that:
the world treats any inbound TCP connection from a Tor exit node as "traffic from Tor"
there exist any number of "IP reputation" systems which track (with varying amounts of lag) the state of the Tor exit node cloud, and essentially assign Tor a "geography" and offer the user the opportunity to "block" it, etc
that there exist some number of sites which would like to do something "nice" for Tor users (there are people who would like to do something "nasty", too, but those people are already served by upstream geoblock providers) and those sites are typically hampered by having no concept of "How many of our users are legitimate people coming from a Tor Browser?"
So: people who use Tor to access a website are:
1/ immediately "outed" as coming from Tor, by virtue of their apparent IP address
2/ treated as a faceless group, worthy of (usually: trivial) blocking, by flicking a "Block Bad Sites" IP-reputation switch
3/ hard to identify at "Layer-7" (ie: web logs) as being legitimate users
...and...
4/ any Trolls amongst the Tor userbase who harass are investigated, determined as "being from Planet Tor" and thereby drag down the reputation of Tor even further.
Two personal observations are useful here:
a) When I was building the Facebook Onion, we had to run an experiment which was essentially "Are the people who access Facebook over Tor, generally bad people?" - because common sense suggested that we check this. We checked a sample of users who accessed Facebook over Tor, and found that overwhelmingly (> %2014 Tor Blog Replacement in legacy/trac) they were just normal users doing normal things. This was a realisation, and combined with the scale of Tor usage, led to the creation of the Facebook onion.
b) Conversations with [people at Cloudflare] who noted to me that TorBrowser's attempts to "hide" meant that in site-protection "flows" (think: Captchas) it was hard to economically (think: experience latency from a geocheck hit) identify TorBrowser users so as to give them special consideration and care.
Therefore, I propose that TorBrowser be amended to include "OnionCapable" - or "TorCapable" or some other well-advertised, similar string - in the "User-Agent" header of the browser.
== Why the User-Agent?
Because if a special, magic header were created, it would probably be dropped en-route upstream.
Also: site owners already know how to act on User-Agent strings, whereas new headers would be considered "deep magic".
There is the possibility to go to a commercial CDN and ask them to serve certain content or return certain headers ("Alt-Svc: foofoofoofoofoof.onion" perhaps?) on the basis of "User-Agent", but asking them to detect and act upon the presence of "X-Tor-Browser-Special-Header:1" would be onerous and unlikely to succeed.
== But what about Logs?
Let's think about that threat model:
We want more people to use Tor because it's a better network.
We want sites to know that their users are using Tor
Tor usage can be inferred anyway, from IP addresses and time cross-checked against relay logs
So what's the concrete problem, here? Perhaps that some Stasi will in some case subpoena the logs of a service provider and then use that to prove that [person] was using Tor at the time? That's a pretty small risk compared to concretely enabling better Tor for everyone.
That's some interesting client-side work, and I am sure that it would benefit from telling the server "I can do Tor, please be nice to me", but otherwise it's essentially disjoint.
== But many many site owners will detect this new header and block Tor access!
If they are inclined then they already do, and it's probably better to enable them get it out of the way more easily, so that their behaviour and attitudes can be called-out.
If a site is so trivially tricked that their "OnionCapable" detection is their only protection, then this also screams for a TorBrowserExtension which tests whether that header is being detected and causing blocking, which will help enumerate hostile sites for public awareness.
= tl;dr
Sites, if they care, can already determine whether someone is accessing them from Tor.
If such determination were made easier, other sites could be nicer to Tor users
Tor would become even more normal.
It's time for TorBrowser to come out of the closet.
Designs
Child items
...
Show closed items
Linked items
0
Link issues together to show that they're related.
Learn more.
Trac: Description: For many years, it's been expected that people who use Tor to surf the web, need to hide the fact that they are using Tor, and that the best way to achieve that is to attempt to mimic some other browser - typically Windows-Firefox or similar.
There is no argument that other protections such as protecting screen sizes and so forth, continue to be useful in protecting the user against fingerprinting and tracking, so those and other protections should certainly continue.
However, there can be no reasonable argument against the observation that:
the world treats any inbound TCP connection from a Tor exit node as "traffic from Tor"
there exist any number of "IP reputation" systems which track (with varying amounts of lag) the state of the Tor exit node cloud, and essentially assign Tor a "geography" and offer the user the opportunity to "block" it, etc
that there exist some number of sites which would like to do something "nice" for Tor users (there are people who would like to do something "nasty", too, but those people are already served by upstream geoblock providers) and those sites are typically hampered by having no concept of "How many of our users are legitimate people coming from a Tor Browser?"
So: people who use Tor to access a website are:
1/ immediately "outed" as coming from Tor, by virtue of their apparent IP address
2/ treated as a faceless group, worthy of (usually: trivial) blocking, by flicking a "Block Bad Sites" IP-reputation switch
3/ hard to identify at "Layer-7" (ie: web logs) as being legitimate users
...and...
4/ any Trolls amongst the Tor userbase who harass are investigated, determined as "being from Planet Tor" and thereby drag down the reputation of Tor even further.
Two personal observations are useful here:
a) When I was building the Facebook Onion, we had to run an experiment which was essentially "Are the people who access Facebook over Tor, generally bad people?" - because common sense suggested that we check this. We checked a sample of users who accessed Facebook over Tor, and found that overwhelmingly (> %2014 Tor Blog Replacement in legacy/trac) they were just normal users doing normal things. This was a realisation, and combined with the scale of Tor usage, led to the creation of the Facebook onion.
b) Conversations with [people at Cloudflare] who noted to me that TorBrowser's attempts to "hide" meant that in site-protection "flows" (think: Captchas) it was hard to economically (think: experience latency from a geocheck hit) identify TorBrowser users so as to give them special consideration and care.
Therefore, I propose that TorBrowser be amended to include "OnionCapable" - or "TorCapable" or some other well-advertised, similar string - in the "User-Agent" header of the browser.
== Why the User-Agent?
Because if a special, magic header were created, it would probably be dropped en-route upstream.
Also: site owners already know how to act on User-Agent strings, whereas new headers would be considered "deep magic".
There is the possibility to go to a commercial CDN and ask them to serve certain content or return certain headers ("Alt-Svc: foofoofoofoofoof.onion" perhaps?) on the basis of "User-Agent", but asking them to detect and act upon the presence of "X-Tor-Browser-Special-Header:1" would be onerous and unlikely to succeed.
== But what about Logs?
Let's think about that threat model:
We want more people to use Tor because it's a better network.
We want sites to know that their users are using Tor
Tor usage can be inferred anyway, from IP addresses and time cross-checked against relay logs
So what's the concrete problem, here? Perhaps that some Stasi will in some case subpoena the logs of a service provider and then use that to prove that [person] was using Tor at the time? That's a pretty small risk compared to concretely enabling better Tor for everyone.
That's some interesting client-side work, and I am sure that it would benefit from telling the server "I can do Tor, please be nice to me", but otherwise it's essentially disjoint.
== But many many site owners will detect this new header and block Tor access!
If they are inclined they they already do, and it's probably better to enable them get it out of the way more easily, so that their behaviour and attitudes can be called-out.
If a site is so trivially tricked that their "OnionCapable" detection is their only protection, then this also screams for a TorBrowserExtension which tests whether that header is being detected and causing blocking, which will help enumerate hostile sites for public awareness.
= tl;dr
Sites, if they care, can already determine whether someone is accessing them from Tor.
If such determination were made easier, other sites could be nicer to Tor users
Tor would become even more normal.
It's time for TorBrowser to come out of the closet.
to
For many years, it's been expected that people who use Tor to surf the web, need to hide the fact that they are using Tor, and that the best way to achieve that is to attempt to mimic some other browser - typically Windows-Firefox or similar.
There is no argument that other protections such as protecting screen sizes and so forth, continue to be useful in protecting the user against fingerprinting and tracking, so those and other protections should certainly continue.
However, there can be no reasonable argument against the observation that:
the world treats any inbound TCP connection from a Tor exit node as "traffic from Tor"
there exist any number of "IP reputation" systems which track (with varying amounts of lag) the state of the Tor exit node cloud, and essentially assign Tor a "geography" and offer the user the opportunity to "block" it, etc
that there exist some number of sites which would like to do something "nice" for Tor users (there are people who would like to do something "nasty", too, but those people are already served by upstream geoblock providers) and those sites are typically hampered by having no concept of "How many of our users are legitimate people coming from a Tor Browser?"
So: people who use Tor to access a website are:
1/ immediately "outed" as coming from Tor, by virtue of their apparent IP address
2/ treated as a faceless group, worthy of (usually: trivial) blocking, by flicking a "Block Bad Sites" IP-reputation switch
3/ hard to identify at "Layer-7" (ie: web logs) as being legitimate users
...and...
4/ any Trolls amongst the Tor userbase who harass are investigated, determined as "being from Planet Tor" and thereby drag down the reputation of Tor even further.
Two personal observations are useful here:
a) When I was building the Facebook Onion, we had to run an experiment which was essentially "Are the people who access Facebook over Tor, generally bad people?" - because common sense suggested that we check this. We checked a sample of users who accessed Facebook over Tor, and found that overwhelmingly (> %2014 Tor Blog Replacement in legacy/trac) they were just normal users doing normal things. This was a realisation, and combined with the scale of Tor usage, led to the creation of the Facebook onion.
b) Conversations with [people at Cloudflare] who noted to me that TorBrowser's attempts to "hide" meant that in site-protection "flows" (think: Captchas) it was hard to economically (think: experience latency from a geocheck hit) identify TorBrowser users so as to give them special consideration and care.
Therefore, I propose that TorBrowser be amended to include "OnionCapable" - or "TorCapable" or some other well-advertised, similar string - in the "User-Agent" header of the browser.
== Why the User-Agent?
Because if a special, magic header were created, it would probably be dropped en-route upstream.
Also: site owners already know how to act on User-Agent strings, whereas new headers would be considered "deep magic".
There is the possibility to go to a commercial CDN and ask them to serve certain content or return certain headers ("Alt-Svc: foofoofoofoofoof.onion" perhaps?) on the basis of "User-Agent", but asking them to detect and act upon the presence of "X-Tor-Browser-Special-Header:1" would be onerous and unlikely to succeed.
== But what about Logs?
Let's think about that threat model:
We want more people to use Tor because it's a better network.
We want sites to know that their users are using Tor
Tor usage can be inferred anyway, from IP addresses and time cross-checked against relay logs
So what's the concrete problem, here? Perhaps that some Stasi will in some case subpoena the logs of a service provider and then use that to prove that [person] was using Tor at the time? That's a pretty small risk compared to concretely enabling better Tor for everyone.
That's some interesting client-side work, and I am sure that it would benefit from telling the server "I can do Tor, please be nice to me", but otherwise it's essentially disjoint.
== But many many site owners will detect this new header and block Tor access!
If they are inclined then they already do, and it's probably better to enable them get it out of the way more easily, so that their behaviour and attitudes can be called-out.
If a site is so trivially tricked that their "OnionCapable" detection is their only protection, then this also screams for a TorBrowserExtension which tests whether that header is being detected and causing blocking, which will help enumerate hostile sites for public awareness.
= tl;dr
Sites, if they care, can already determine whether someone is accessing them from Tor.
If such determination were made easier, other sites could be nicer to Tor users
Tor would become even more normal.
It's time for TorBrowser to come out of the closet.
ps: take care with that 95% number, though; at scale more than 99% of people are just normal users doing normal things, so it can lead to reportage of the "TOR USERS ARE 5x WORSE THAN NORMAL" selective quoting.
The point is: there's a lot of legitimate Tor usage which is invisible to site administrators, and their perception of Tor is skewed by investigations of "bad people" who arrive over Tor, leading to negative confirmation-bias.
Here's the hack which Privacy International use for redirection, setting up a cronjob to reliably fetch a list of exit nodes every 6 hours, and populating a "geo" database for NGINX to query.
This may sound trivial to the average Tor/Privacy engineer, but is significant hassle and cognitive load for owners of smaller sites; a simple regexp check against User-Agent would be far, far cheaper.
As you say [the above hack is] not even a great solution, some days the Tor exit node list is down, or the request times out, and if every site did what PI was doing, although we are trying to be non abusive (hence 6 hours) the exit node list would probably be overwhelmed
Note for anyone saying that (eg) "to preserve privacy, Facebook should simply offer Alt-Svc: (or Onion-Location:) headers on all its requests", some basic math:
strlen("Alt-Svc: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx.onion\r\n") = 73Facebook does more than 1 trillion requests per day.73 trillion bytes in terabits => 584 terabits
Are they willing to sacrifice the environmental and energy and monetary costs of causing 584 terabits of global daily traffic, on Facebook alone, for the potential sake of one or two million users (ie: about 0.1% of the Facebook userbase)
Frankly, I can't see that being justifiable in any way, when instead TorBrowser could just identify itself and let Facebook [and everyone else - BBC? NYT? Other social networks?] selectively offer Alt-Svc instead.
I had started a ticket along these lines too, so here is a motivational sentence that we might find useful here:
"""
A growing number of websites offer onion service versions of their site, and a growing number of those are either offering alt-srv headers to Tor users (Facebook and Cloudflare) or auto redirecting Tor users (Privacy International) or detecting Tor users and changing their page content (archive.is)
"""
All three of these categories of sites overlap in that they are trying to figure out whether people are using Tor, to serve them content differently. And false positives in their detection mechanisms are harmful to our / their users.
(Originally I argued against auto redirection to onion services, on the theory that users should have the choice about what properties they get from their transport protocols. But, when sites auto redirect http to https, because they know better than their users what is good for them, I'm not sad. So why should I be uncomfortable when sites choose to upgrade their users from https to https+.onion? But, we don't actually need to answer this question here, because whether Tor Browser users should signal their capabilities is orthogonal to what we recommend sites should do with this information, UX wise.)
One other angle to consider here is that we should prepare for a world where other browsers want to advertise onion capabilities too: there's Brave in the near term, and maybe Firefox and others coming too. So, (a) it would be cool to figure out how to blend with these other browsers, so we get safety in numbers in terms of normalizing Tor usage, rather than partitioning each browser population, and (b) we might want to think about a "versioning" scheme so that browsers are advertising their onion handling capabilities, not just a broad "I can do all the onions forever", so when we discover a bug in one of the browsers we can recover. I don't have any good intuition about how to do this versioning, but wanted to raise the idea early in case somebody else does.
Oh, and a last note: I don't think Tor Browser even says it's trying to blend with other browsers at this point. The goal of Tor Browser is to make all the Tor Browser users have the same fingerprint (as each other). So changing it for all of them would be fine on that front.
Oh, and a last note: I don't think Tor Browser even says it's trying to blend with other browsers at this point. The goal of Tor Browser is to make all the Tor Browser users have the same fingerprint (as each other). So changing it for all of them would be fine on that front.
Something to consider is the websites that complain when they see a User-Agent from a browser they don't know, claiming they are not compatible. But adding a new word in addition to the ones we currently have in the User-Agent is probably fine for this.
I am not convinced that we should send this information in every request. What does it mean if a request to fetch an embedded image contains that information? What is supposed to happen in that case? The idea in this ticket is that once a site gets requested the site owner has a reliable idea of whether the user is using Tor or not to act upon that, for instance by sending an Onion-Location header back. There is no need to send that information later on for every sub-resource: that does not help making the decision on whether to point to some onion or not. Rather, it just contributes to unnecessary load on the Tor network (yes, it's just some bytes, but they some up).
Whether that means the changing the UA or not is a good approach, I am not sure yet.
I've thought about this a lot in the past few days, and this is my considered response.
Before I get into it, I am going to standardise my terminology in order to make things a bit clearer and more consistent:
"client" means TorBrowser in the hands of a user
"site" means a large and sprawling (typically: multiple-onion) site, such as Facebook, the BBC, NYTimes, or any of several others which exist.
"estate" means a distinct "chunk" of a site, probably with one or more separate onion addresses; for instance "bbc.com" and "bbc.co.uk" are separate estates within the "BBC site", not least because of content licensing concerns. In the cleartext internet, Facebook, Messenger, Instagram and WhatsApp are all separate "estates" of "[the] Facebook [Site]"
"CDN" means a (usually: third-party) estate which serves part of a site, e.g. Fastly
"server" means one specific machine within a site.
My particular interest is in sites which often comprise one or more estates - typically: CDNs; the BBC site in particular comprises at least 4 distinct BBC estates (.com, .co.uk, BBCi, and something-i've-forgotten) plus two separate third-party global CDN estates.
I believe that the BBC is by far the most complex onion site on the planet; Facebook by comparison has only onionified its core site + its CDN; partly for the reasons that I outline below.
With this glossary established, I'll begin; and I will number my paragraphs in case you wish to reference them:
[1] First: I am glad that we agree that consumption of bandwidth does have a cost associated with it. The questions we must determine are:
who should bear that cost?
how often?
under what circumstances?
and how great a cost is it?
[2] I believe that we've established above that it's not economic nor environmentally aware for a large site to issue "Onion-Location" headers with every response, merely in the hope that 0.05% of users might make use of them
Scary Maths : (1 million Tor users / 2.7 billion FB users ) * 100 => 0.037%
[3] I also believe that the commentary from Privacy International, and elsewhere, describes both the imperfections and tragedy-of-the-commons issues relevant to treating Tor exit nodes as a geography to be "tracked".
[4] As such: it's neither economic nor robust for a server in a site to somehow "know" that a request has arrived from an onion-capable browser, without employing considerable effort; and when attempting to grow Onion adoption it's much harder to pitch fiddly, specialist, complex solutions which require specialist realtime databases (etc) to function, with the ability to deliver that information across several estates and possibly into third parties.
[5] One could posit an architecture where "when the client tries to log in" (nb: the BBC do not operate login over their onion site, the NYT do not offer POST functionality at all) then the browser offers an Onion-Location header, thereby reducing the cost?
[6] Yes, one could do that, but then you're in the realm of custom engineering to support an onion site, which (again) is a barrier to adoption.
[7] Onion sites are simply HTTP/HTTPS via an alternative network stack with a different domain name; setting them up shouldn't require custom re-engineering of a site's login page, nor any more effort than would establishment of "www.bbc.co.nz" or some other top-level domain.
[8] In other words: such a proposition would be a kludge, and (again) it might be a kludge to support as few as 0.05% of users, so would dissuade site adoption of onions.
[9] Ergo: if the server cannot "know" that the request comes from an onion-capable client, then the server needs to be "provoked" into action. The client must proactively "tell" the server that the client is onion-capable, and then that capability should be expressed to the entire site, across all estates, to support the best experience.
[10] Can that onion-capability be expressed to the backend, via out-of-band means - eg: setting a flag in a backend Redis instance? Possibly, but practically "no", especially where third-party CDNs in other estates are involved.
[11] Therefore: the onion capability will need to be encoded in the session; most likely in a cookie. Something like:
Cookie: onion=1
or probably more reasonably:
Cookie: onion_capable=yes
...but let's go with the first one because it's shorter; you'll be adding at least 7 more bytes to every request to the site, probably much more, and for each site the capability will have to be custom-engineered into the CMS or webserver.
[12] Also, because cookies, it's going to be complicated and fiddly engineering to express this capability to third-party CMS onion sites. Tor is well aware of the problems with cookies sharing data across sites, so I don't think this requires much explanation on my part.
[13] So would it not be simpler, instead, to simply add those 7-or-more bytes to the User-Agent, and have Onion-Capability expressed to everyone?
[14] The Tor user-agent is currently:
Mozilla/5.0 (Windows NT 10.0; rv:68.0) Gecko/20100101 Firefox/68.0
[15] I aver that it would be fine to add 6 bytes - "Tor/1" + space:
Mozilla/5.0 (Windows NT 10.0; rv:68.0) Gecko/20100101 Firefox/68.0 Tor/1
...which would be cheaper than the minimum-7-byte Cookie solution for talking to the onion-capable sites; this data would also be sent to non-onion sites, and make it easy for those sites to address onion-client users equitably, as outlined in my first post.
[16] Aside from anything else, adding "Tor/1" (or whatever) to the User-Agent would also greatly assist greatly with the matter of third-party CDNs, and also with Alt-Svc.
[17] There seems to be an assumption that Onion-Location headers are necessary in order to provide an optional flow away from a site, to the corresponding onion address; and that Onion-Location headers would be issued only in "special circumstances" because otherwise they would be being sent with every response, leading (again) to the 584-terabits-per-day problem, described in comment:6.
[18] Ergo: to implement "Onion-Location" properly, requires the site to:
a) know when a client is onion-capable, and
b) track that it has issued the "Onion-Location" header, and not to do it again.
[19] This is achievable but suffers from the same cross-estate state-sharing challenges that are described above. In short: Onion-Location is weird, fiddly, and not robust.
[20] If I am gazing into a crystal ball to speculate [NOTE: THIS IS SPECULATION] about the future, it could include:
We put "Tor/1" into the User-Agent
When a session cookie is dropped for the first time on a server connecting to bbc.com, the server also issues:
Alt-Svc: someverylongbbcversion3address.onion
...which mechanism (no session cookie? have these!) has the benefit of not requiring state to be maintained or propagated around the site.
even if the BBC does not adopt the above, CDN providers like Fastly could certainly adopt such; and also this header would benefit Cloudflare's extant "Opportunistic Onions"
also: CDNs are typically better geared-up to deal with User-Agent, than they are with X-Onion-Capable: 1, so integration should be simpler; the upstream should be able to pass onion-specific headers through the CDN to onion-capable clients by means of the Vary: header.
equally, anyone connecting to www.bbc.com with "Tor/1" might simply be given a Location: https://www.bbcnewsv2vjtpsuy.onion/ and redirected there, because why not when one has been given notice of onion capability?
speculation: also, in preparation for the deprecation of v2 addressing, perhaps in future anyone who accesses "facebookcorewwwi" might be given a "Location:" of www.facebook.com because of Facebook's newly-robust and "User-Agent"-powered Alt-Svc capabilities.
[21] In conclusion, I have to ask: what are User-Agents for, if not for this? Why is that word Gecko there, other than to express standard rendering capabilities to servers? Why not express communications capabilities in the same manner, or why invent a new means?
[22] You say: "There is no need to send that information later on for every sub-resource"; I propose that "need" is not the question; the questionis what innovation could be unlocked by making that information available to every resource?
Putting Tor/1 or similar into the User-Agent will enable site owners to get clients to the proper onion, to the onion they want to serve the user through, via the means they are best equipped to serve the client - direct connection, alt-svc - with a minimum of cost, technical debt, state-maintenance or fuss.
Putting Tor/1 or similar into the User-Agent will enable site owners to get clients to the proper onion, to the onion they want to serve the user through, via the means they are best equipped to serve the client - direct connection, alt-svc - with a minimum of cost, technical debt, state-maintenance or fuss.
Yeah, I get the idea of signalling a server onion capabilities, say, by using a UA. I am not opposed to that. What I am not convinced of yet is why this needs to get included into every request, e.g. for fetching an image on a website. Why should the user agent do that if the server it talked to did not react to the onion capability information in the very first request. Should it hope that while all the other resources are loaded over the "regular" Internet that particular image is loaded over .onion? It seems to me the user agent could sent this header with the first request and if the server does not want to react to that, that's cool, but then the header could get omitted as it seems the content provider does not want to act on that .onion offer.
It seems to me the user agent could sent this header with the first request and if the server does not want to react to that, that's cool, but then the header could get omitted as it seems the content provider does not want to act on that .onion offer.
Thought experiment: Why do we not do that with the entire User-Agent?
edit: I gather with HTTP/2, header-resending will be greatly reduced, anyway; no?
Aside: I think you answered your own question when you said:
the user agent could sent this header with the first request
...because that word "first" means state-maintenance; as with the "link onion-enabling stuff with the first dropping of a session cookie" model, there are three issues:
not all sites want to use cookies / track their users; what then, re: state-tracking?
by linking the onion-enablement to "first drop", there is no opportunity to revisit or "nudge" the user towards onion
and thirdly
by eliding the six bytes of Tor/1 from the header, you make it somewhat harder for people to implement stuff like:
**You are not Onion-Capable! We recommend that you run TorBrowser for access to this site! **
...which is potentially desirable for brand awareness :-)
It seems to me the user agent could sent this header with the first request and if the server does not want to react to that, that's cool, but then the header could get omitted as it seems the content provider does not want to act on that .onion offer.
Thought experiment: Why do we not do that with the entire User-Agent?
There are a number of reasons. Let me just name two (both of them do not apply to adding the onion capability bit):
The user agent has been around for a long time on the web and services check that one for a variety of reasons. Dropping it suddenly after sending it in the first request would very likely break setups in the sense that the whole site gets non-functional.
The user agent is sometimes used for deciding which content is shown to the user. If you sent a desktop string in the first request but then dropped it and the server decides to suddenly deliver you content designed for your mobile you would probably be quite unhappy about it.
Aside: I think you answered your own question when you said:
the user agent could sent this header with the first request
...because that word "first" means state-maintenance;
I am not sure whether "state-maintenance" is the proper term here given that it is coming with some connotations that are probably not applying here. The browser knows whether a request is a first request when surfing to foo.com (or if that's too hand-wavy for you then it should not be hard to get the browser into a state to make it known). And "first" does not necessarily mean "first" in the whole browsing session but rather something like "the first request to foo.com after entering 'foo.com' into the URL bar and hitting RETURN". That means if you revisit foo.com later on the first request to it will contain the onion capability bits again. The server knows whether it's a first request or not by seeing the onion capability bit or not (because the server admin knows that Tor Browser is sending that bit on a first request or if a user has not denied to visit that site over .onion (see below)) and can act accordingly, e.g. by adding Alt-Svc headers or a Location header pointing to the .onion or by adding an Onion-Location header etc.
as with the "link onion-enabling stuff with the first dropping of a session cookie" model, there are > three issues:
not all sites want to use cookies / track their users; what then, re: state-tracking?
There is no need for that as I said above. If you for some reason really have the need to track the user on a site as to whether they are in a .onion context you should be easily able to use the Referer header. But that should not be necessary.
by linking the onion-enablement to "first drop", there is no opportunity to revisit or "nudge" the user towards onion
Sure there is, see above: anytime the user visits foo.com there is a chance to revisit the decision as the onion capability bit is sent again in the first request.
and thirdly
by eliding the six bytes of Tor/1 from the header, you make it somewhat harder for people to implement stuff like:
**You are not Onion-Capable! We recommend that you run TorBrowser for access to this site! **
...which is potentially desirable for brand awareness :-)
That actually depends. Because if the server sent back an Onion-Location header and the user declined to visit the onion and does not want so in the future (for whatever reason) I expect the server to comply with that wish and not throw a "You are not Onion-Capable" in the user's face every time they visit that website. So, this use-case works perfectly well with the idea of not sending the onion capability bit in every request with the user agent: one could easily omit that bit entirely if the user made the decision to not wanting to load that website (or any website, that is) over .onion.
So, I think using the onion capability bit for that idea is not necessarily a thing we should encourage.
Note for anyone saying that (eg) "to preserve privacy, Facebook should simply offer Alt-Svc: (or Onion-Location:) headers on all its requests", some basic math:
strlen("Alt-Svc: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx.onion\r\n") = 73Facebook does more than 1 trillion requests per day.73 trillion bytes in terabits => 584 terabits}}}Are they willing to sacrifice the environmental and energy and monetary costs of causing 584 terabits of global daily traffic, on Facebook alone, for the potential sake of one or two million users (ie: about 0.1% of the Facebook userbase)Frankly, I can't see that being justifiable in any way, when instead TorBrowser could just identify itself and let Facebook [and everyone else - BBC? NYT? Other social networks?] selectively offer `Alt-Svc` instead.https://twitter.com/AlecMuffett/status/1187338978221006848