In this ticket we are going to document instances where websites, especially the ones in top 500 in terms of traffic, are deliberately blocking or making it difficult for users to access their website on Tor. Please feel free to add to this ticket with some proof and/or a little description of any such incident. More information on this can be found on our wiki.
General advice that we give folks when running into this issue can be found in our Support Portal.
Edited
Designs
Child items ...
Show closed items
Linked items 0
Link issues together to show that they're related.
Learn more.
In this ticket we are going to document instances where websites are deliberately blocking or making it difficult for users to access their website on Tor. Please feel free to add to this ticket with some proof and/or a little description of any such incident.
General advice that we give folks when running into this issue can be found in our Support Portal.
I don't think framing this as censorship is helpful. Actually, I think
it will make it much harder that way to convince site-owners to allow
access via Tor if we put it that way. Moreover, I doubt it is censorship
to begin with. Just because I set up a web site or web service should
entitle anyone using it for any means without me having a say about that
or folks start otherwise yelling "Censorship!!!" at me?
In this ticket we are going to document instances where websites are deliberately blocking or making it difficult for users to access their website on Tor. Please feel free to add to this ticket with some proof and/or a little description of any such incident.
General advice that we give folks when running into this issue can be found in our Support Portal.
I don't think framing this as censorship is helpful. Actually, I think
it will make it much harder that way to convince site-owners to allow
access via Tor if we put it that way. Moreover, I doubt it is censorship
to begin with. Just because I set up a web site or web service should
entitle anyone using it for any means without me having a say about that
or folks start otherwise yelling "Censorship!!!" at me?
As an addition point: We are happy with Tor exit nodes allowing access
to only particular services (we recommend the reduced exit policy) which
means they don't allow access to a big number of services right away.
How does that fare with the idea of us claiming web sites are censoring
Tor users because they don't like our users for reason X? It seems to me
things don't really add up.
And we should not blame web site owners for trying to defend against
jerks using Tor for activity that hurts them. Quite to the contrary, I
think we should start with seeing their point and then think how we can
help improve the situation from there.
Hi, @gk!
Thanks a lot for your insights and feedback on this and now that I come to think of it, these are very wise and valid points to consider!
To give some context, me and @gus came to open this ticket because we have been getting reports of some popular sites like Youtube and Reddit making it difficult for Tor users to connect to. And since these are widely popular websites it happens to be an usability pain point for Tor users.
And we should not blame web site owners for trying to defend against jerks using Tor for activity that hurts them. Quite to the contrary, I think we should start with seeing their point and then think how we can help improve the situation from there
I believe we can surely work on this kind of research/feedback to include and emphasize on the points of how we can improve the situation on both ends. I will bring this up with @gus next week and update the ticket accordingly.
Hi @gk, it's not about blaming or yelling "censorship!!!", but if we want to carry on a campaign to unblock Tor, we need to learn which websites that are part of Alexa Top 500 are blocking Tor users. What we do next is other discussion. I believe this topic is important for TB user retention and user experience.
Hi @gk, it's not about blaming or yelling "censorship!!!", but if we want to carry on a campaign to unblock Tor, we need to learn which websites that are part of Alexa Top 500 are blocking Tor users. What we do next is other discussion. I believe this topic is important for TB user retention and user experience.
It is, I agree. But I feel having the problem mentioned in our support
portal as a censorship incident via
https://support.torproject.org/censorship/censorship-2/ does not help
us. As I said, quite to the contrary: "censorship" is a negatively
connoted term, in particular in our context. I would argue that web
sites owners blocking Tor users for whatever reason would heavily argue
against censoring users. They would see it as a prejudice taking their
stance not seriously, in particular, as I mentioned earlier we are fine
that exit relays are blocking entire ranges of services by allowing
traffic only to certain ports. We don't talk about that in terms of
censorship (rightly so) and there is no reason to start doing so when we
talk to site-owners. I think that is both wrong and is actively harming
our efforts here. (I can even back that up with personal experiences
with Cloudflare etc. in the past :)
I can file a ticket if you think we want to have one dealing with that
but, either way, I really think we should redo that support entry and
getting "censorship" out of our vocabulary here.
Oh, now I see your point. It's about Support portal topics (tpo/web/support#99). The article is good, but was wrongly categorized on "Censorship" section. We could move it to Misc or other section.
Oh, now I see your point. It's about Support portal topics (tpo/web/support#99). The article is good, but was wrongly categorized on "Censorship" section. We could move it to Misc or other section.
Yeah. On hindsight I should just have filed a new ticket and brought my
points up there. But the link to the support article in this issue
jumped at me, so I feel the urge to reply here. :) Sorry for that.
i. Website: Reddit
ii. Ticket: tpo/applications/tor-browser#40300 (closed)
iii. Description: We have had several users on #tor IRC and the frontdesk report that reddit.com serves "503 Service Temporarily Unavailable" when connected to via Tor. The issue persists on both Tor Browser and Tor on Brave. The old interface of Reddit i.e https://old.reddit.com/ and https://np.reddit.com/ work fine.
iv. Screenshots: 1. Reddit on Tor Browser
2. Reddit on Tor with Brave
i. Website: Youtube
ii. Description: As many users on the frontdesk have reported, Youtube generally lands users on the page (see Screenshots) with the captcha to solve. But the pain-point is the captcha usually takes a very long time to solve or more often than not times out. Changing circuits is also not helping users.
iii. Screenshots: 1. Youtube on Tor Browser
2. Youtube on Tor with Brave
It might be worth systematically testing websites in the Alexa top 100, perhaps filling in a table like this:
Website
Alexa Rank
Site Works?
Problem
Cause
Date Last Tested
SITE_A
1
No
Time-consuming CAPTCHA
The site uses Cloudflare
2020-2-12
SITE_B
2
Intermittently
Time-consuming CAPTCHA
The site is a Google property
2020-2-12
SITE_C
3
No
503 error
Explicit blacklisting of exit nodes?
2020-2-12
Having a taxonomy like this would help identify the biggest opportunities for improvement. For example, this technology might improve the UX on Cloudflare sites.
Unfortunately, I checked using the CAPTCHA dashboard, and it doesn't seem like very many of the Alexa top sites are included in the CAPTCHA monitoring. I checked the sites with Alexa ranks 1-10, and 40-50, and none were included in the CAPTCHA monitoring data. This is likely because many of the very largest sites (e.g., youtube.com) don't use Cloudflare.
This points to a weakness of looking at just the Alexa top sites, since those huge sites are probably very different than the sites in the long-tail of sites (e.g., local news sites). However, the Alexa top sites still seems like a good place to start.
Hello everyone, while fiddling, with Reddit I found out something unusual, just the url https://reddit.com shows a 503 temporary error as presented by @championquizzerhere, but just after we provide our user details (logging into it)[1] I found out that the site works perfectly. I'll be attaching below the screenshots of my findings:
---Brave: Version 1.21.76 Chromium: 89.0.4389.86 (Official Build) (64-bit)---
Before signing in:
After signing in:
---Tor: 10.0.12 (based on Mozilla Firefox 78.8.0esr) (64-bit)---
Before signing in:
After signing in:
[1] How can we login? Well the login link such as the login: https://www.reddit.com/login, register: https://www.reddit.com/register/ links do work, other than the ssl link: https://ssl.reddit.com/
Hope that it helps, for the people trying to use reddit using tor!
Well, to ask for account creation just to see some content is a bit too much, I think.
Instagram is doing it since a while (redirects me to the login page when using Tor Browser), it used to be possible to see pics. I do not want to create an account for it.
Yeah, as a user I don't like it much either, but if they're blocking tor to mitigate abuse, instead requiring sign-in to mitigate abuse could be a reasonable compromise.
As discussed on IRC earlier, we decided to re-evaluate the websites mentioned on the "List of services blocking Tor wiki" and check how are they now behaving with Tor. These websites are not necessarily the websites in Alexa Top 500.
Methodology:
All websites have been checked on Tor Browser at 'Standard' Security Setting.
For websites inaccessible, I have tried to connect by changing the Tor Circuit exactly three times. Marked as if it worked even once. A doesn't necessarily mean that the website is fully usable over Tor, just that the landing page works. In most cases, I was unable to check logging in, creating an account, etc.
Legend: = website works over Tor. = couldn't reach the website
blank = it's complicated. See 'Remarks' or the wiki.
Notes: If you have experience with any of the following services, please let me know and I will update the table accordingly. Thanks!
Some advice we give to our users when they run into issues with Banking Websites: https://support.torproject.org/tbb/tbb-30/
"We are sorry an error has occurred, please try again later. We are currently working to fix the problem and should have it resolved shortly." -- the site works perfectly on a regular browser though.
"Uh oh. As an anonymous user, your access to this network has been blocked. Requests from anonymous users using TOR (The Onion Router) IP addresses are routed through various networks and the identity of the original user cannot be traced, which is a security risk. If you are not using a TOR IP address and continue to receive this message, please get in touch with your local Standard Chartered contact centre." Status code 200. A direct link to the block page seems to be https://www.sc.com/global/error/AnonymousProxy. The block seems to use the same infrastructure as country-level blocking: editing the URL of an in included image leads to the page http://countryblock.standardchartered.com.edgesuite.net/SanctionedCountries/blocked_ctry/: "Uh oh. Standard Chartered Online Banking cannot be accessed from Cuba, N. Korea, Iran, Syria and Crimea/Sevastopol. If you are not in one of these countries and are getting this message, please contact your local Standard Chartered contact center."
The website is no more. "The Beatles Rarity is officially ended. Thank you for all who participated. It was fun but now it's time to time to move on". The landing page works now though.
Hi @epcnt19!
I'm glad to see that you are interested in the GSoC project, I will be mentoring that project. Did you join Tor's IRC channel? If you didn't, please do, we can discuss details there as well. You can follow this tutorial: https://support.torproject.org/get-in-touch/#why-i-cant-join-tor-channels If you prefer, you can email or keep writing under this issue as well.
I will quickly summarise here so that other people interested in the project can see it as well. In a nutshell, there is already a system called CAPTCHA Monitor for keeping track of various websites (mostly CDN fronted ones) and checking whether they return CAPTCHAs. This project is somewhat limited and broken, but it has many working parts, such as modules for fetching websites through Tor Browser, Firefox, Chromium, etc (and their versions). You can also specify which exit node to use while fetching the websites in this system. The system displays the results here: https://dashboard.captcha.wtf/ However, as I said, it is limited and kinda broken. You can find a summary about its status here: https://gitlab.torproject.org/woswos/CAPTCHA-Monitor/-/wikis/GSoC-2020 I would highly suggest reading this summary.
Currently, it only does a string search to understand if there is a CAPTCHA or not in a fetched website. So, with the "Alexa Top Sites Captcha and Tor Block Monitoring" project, the target is building upon the CAPTCHA Monitor and expanding it in many ways. Thus, you don't really need to worry about "Writing a web client to periodically fetch" and "using the pre-existing architecture, run a second client that does not fetch this webpage via Tor" parts of the proposal. They can be done with the existing code. There are actually a lot of small but important details about doing website fetching in the right way in code. For example, the CDNs and most other websites can distinguish a real user and a code based on many factors. The existing code tries really hard to pretend to be a real user so that the results reflect reality. So, there is no need to spend a lot of time trying to build the fetching system from scratch and figuring out these details again.
The very first obvious way to improve the CAPTCHA Monitor is the CAPTCHA/blocking detection methodology. The problem with string search is the fact that you need to actually know what to search for each individual website beforehand and that is very limiting. Also, websites keep changing all the time and the way they block Tor users. There are already a few ideas that seem to be working such as parsing the DOM trees of the websites and figuring out the changes. For example, we can fetch a website using regular Firefox and Tor browser and compare the DOM tree to see if there is any difference in terms of the content.
Again, the method I described above might produce a lot of false positives since websites might contain dynamic content. Another improvement to this method is creating a "consensus" of the target webpages and using the consensus to do these comparisons. You can read the following papers to learn more (let me know if you cannot access the papers somehow):
The other option is using the status codes of the webpages to make decisions. Sometimes, websites just return error codes directly without any content. In that case, the detection is very easy. That said, websites sometimes still return 200 but show CAPTCHA or have other things to block users. Thus, in such cases, the DOM tree comparison method (the one I mentioned above) is quite handy.
And the other option is analyzing the headers the websites return. Again, CAPTCHA Monitor can record and return the headers while fetching websites.
The other obvious way to improve CAPTCHA Monitor is the analysis part. Currently, the analysis part is almost non-existent. If we don't have any quick way to understand the data or summaries, the raw data itself doesn't mean much. So, this part is very important. I really liked your idea of checking if the tested exit relay's ip is on the blacklist!
@woswos Hi, Thank you for the thorough explanation.
I understood about the project and created a IRC account.
First of all, I will set up for CAPTCHA Monitor.