Understanding and Using Tor - An Introduction for the Lay(wo)man
How does Tor make me anonymous?
Tor makes you anonymous by routing your traffic through a series of computers before it arrives at its destination. These computers are part of the tor network and pass your request among themselves in such a way that the computer at the end of the series doesn't know anything about the computer at the start of the series. This means that the website you are sending traffic to doesn't know who you are. This 'series of computers' is known as a 'circuit' through the tor network. Think of it as a cable running from your computer to the website you are viewing. The website can see the cable emerging from the tor network but can't see where it entered the tor network.
So who chooses the computers on the circuit?
The computers on the circuit are chosen randomly by the tor application sitting on your computer.
How long should the circuit be, how many computers should be on it?
The circuit usually consists of 3 computers on the Tor network: the entry guard, the relay node, and the exit node.
Does each computer on the circuit know who you are?
No. Only the entry guard knows who you are. You only connect directly (over the 'real' internet) to the entry guard.
So how does my Tor client build the circuit by only talking to the entry guard?
To begin with, your Tor client creates a connection with the entry guard and tells it that it wants to create a circuit. It then tells the entry guard who the next computer in the circuit should be, i.e. who should be the relay node. As instructed, the entry guard creates a connection with the relay node and now passes on whatever you send it. When your Tor client starts sending stuff to the relay node, it looks like junk to the entry guard. The entry guard can't read it and it doesn't need to, it just needs to pass it on.
So the entry guard is just passing stuff blindly on to the relay node?
Yes. From the entry guard's point of view it's just big blobs of ones and zeroes with an instruction 'pass on to relay node' on the front of each one. At this point, the blobs your Tor client is sending to the relay node are in fact an instruction to create a new connection with the exit node your Tor client has chosen. The relay node does as it is told, and now your Tor client can send stuff to the exit node. As with the previous step, the relay node doesn't know what these blobs are - it can't read them - it just needs to send them on. In this case, the stuff your Tor client is sending to the exit node is the stuff you want to send to the website you're connecting to. Voila - you're communicating anonymously with the website.
I'm still digesting that. Have you got a patronizing analogy?
Of course. Think of a parcel with three layers of wrapping. The entry guard unwraps the first layer and finds that the layer underneath has the relay node's address on it, so it passes it on to the relay node. The relay node unwraps its layer and finds that the layer of wrapping underneath has the exit node's address on it, so passes the parcel on to the exit node. The exit node unwraps its layer and finds the goodies inside - i.e. the name of the website you want to connect to and the message you want to transmit to it. This layering is why it's called 'onion routing'.
So what's stopping the entry guard from unwrapping all of the layers himself? Why can't the entry guard read what you're sending to the relay node or the exit node?
Because each layer can only be unwrapped by the node whose name is on the wrapping. This is enforced by encrypting the wrapper in such a way that only the addressee can decrypt it. The same is true all along the circuit: the relay node can't read what you're sending to the exit node because it can only be decrypted with a key that the exit node alone knows, the same goes for the entry guard and the relay node.
How does that work?
The key used to set up the encryption with each node is a public key owned and generated by that node. This public key, which is distributed to all and sundry, has a counterpart called a private key which the node keeps for itself.If you encrypt anything with the public key, it can only be decrypted by the corresponding private key. That is why it is possible to send something along a circuit that only one of the nodes on the circuit can decrypt. This encryption is sometimes referred to as RSA encryption, assymetrical encryption or public/private key encryption.
OK let's go back to who knows who on the circuit. The relay node knows who the entry guard is, right?
And the exit node knows who the relay node is, right?
So does the entry guard know who the exit node is (and vice versa)?
No. Because it couldn't read the instruction you sent to the relay node telling it who the exit node should be. The exit node and the entry node on a circuit cannot tell who is on the other end of the circuit. The computer at one end of the cable does not know who, what or where the computer at the other end of the cable is.
Right. So who knows who I am, the person at the start of the cable, the Tor client?
Only the entry guard. The relay node only knows that it is talking to another server, possibly an entry guard. It doesn't know who is talking to the entry guard on the other side.
How can that be? Couldn't the entry guard just tell the relay node who the tor client is?
If they are both specially-modified malicious tor nodes collaborating with each other, yes.
So what's to prevent them?
What makes this hard to do is: (i) they would have to agree and know a way of exchanging this information without breaking their ability to interact with other 'normal' Tor servers, (ii) since the Tor client is randomly choosing the members of the circuit there would have to be enough collaborating nodes out there so that they would stand a reasonable chance of two malicious nodes getting selected enough times to be effective.
Doesn't seem beyond the bounds of possibility.
It's not and it would be a problem if the circuit contained just two nodes both of which were collaborating with each other, since both would know who the tor client is and the second node would know where the traffic is going. This is why three-hop circuits (i.e. entry guard, relay node and exit node) rather than two-hop circuits (i.e. entry guard and exit node) are standard on the tor network. Three-hop circuits help guard against this attack because even if the entry node and the relay node know who the tor client is they still don't know where the traffic is going, unless they also own the exit node.
So what it boils down to is: if the same person owns all the nodes on your circuit you're screwed?
Yes. And if they owned all the nodes on your circuit they wouldn't even need a specially-modified version of Tor to figure out who you are.
Oh, how come?
client:a <---> tornode1:9001 tornode1:b <---> tornode2:9001 tornode2:c <---> tornode3:9001 tornode3:d <---> host:e
where a, b, c, and d are randomly chosen TCP ports and e is the TCP port used by host for contacting a service (such as 443 for HTTPS). If all of the Tor nodes were paying attention, then
tornode1 knows that its connections involving client:a and tornode1:b are part of the same circuit tornode2 knows that its connections involving tornode1:b and tornode2:c are part of the same circuit tornode3 knows that its connections involving tornode2:c and host:e are part of the same circuit
Knowing all of these facts, these nodes could deduce that client:a and host:e are actually communicating with one another. This is not a "timing attack" and does not rely on observing any packets actually transmitted across the fully-established circuit.
Malicious nodes that log this kind of information could also collaborate after the fact to correlate it, without recording large quantities of timing information. They just need TCP port pairs and accurate times when TCP connections were established.
OK, I'm a bit worried now. Reassure me. What reduces the chances of my tor client choosing a circuit that contains computers all owned by the same person or organisation?
Tor does a number of things to help prevent this sort of thing taking place and reduce the impact when it does:
i. New circuits are created at frequent intervals. ii. Computers within the same IP ranges are avoided on the same circuit
The risk you're worried about actually has a catchy name: a 'Sybil Attack'. Wikipedia defines it as "one in which an attacker subverts the reputation system of a peer-to-peer network by creating a large number of pseudonymous entities, using them to gain a disproportionately large influence."
OK so once you've set up a circuit (you can spare me the 'cable' metaphor from now on) you start sending your traffic along it?
And what's to prevent all the computers on the circuit from reading your traffic and figuring out what website you're browsing?
The same thing as before. Your Tor client sends your traffic along the circuit encrypted with a key that only the exit node knows. To the entry guard and relay node it's just indecipherable junk, and that includes the ultimate destination of the traffic, e.g. http://www.google.com.
But the exit node can read my traffic?
If you are browsing a site like http://www.google.com, yes.
That doesn't seem very secure.
It's not. If you can be identified by your traffic alone, then your exit node can identify you. Remember that the exit node is the point at which your traffic enters the 'real' internet. From the website's point of view, it's as though someone at the exit node's computer is browsing them.
So I can't browse securely with Tor?
Of course you can. If you are browsing https://www.google.com rather than http://www.google.com, then the only thing the exit node will know is that it is sending your traffic to https://www.google.com, it won't be able to read the content of your traffic at all.
So the exit node will always know where my traffic is going?
But it won't know that it's my traffic?
So if my traffic contains information that could identify me, I should always use a https:// website rather than a http:// one?
Precisely. The https:// prefix on a website's name makes your browser connect to the website using SSL (Secure Sockets Layer). This is a form of encryption that ensures only the sender and receiver know the content of the traffic being passed between them.
Is this only available for websites?
No. It is also available for sending and receiving mail (pop3s and stmps), telnet sessions (ssh) and so on.
So as long as I use one of these secure protocols, I'm fine?
"Well, up to a point Lord Copper..."
If you connect to a 'secure' website using the https:// prefix, you may receive a pop-up warning. The actual content of the warning will vary depending on the browser. It is generally something along the lines of 'The certificate used by this site may not be valid &c.'. This warning can be displayed for a number of different reasons, but the most important to be aware of, especially when using Tor, is that someone between you and the website could be intercepting your secure traffic in such a way that they can actually read it.
Why 'especially when using Tor'?
Do you remember we said that exit nodes are the only ones who can read your traffic? If you are browsing https://google.com it will only be able to read junk. Under normal circumstances. But when you first connect to https://google.com an exit node could (and some have) pretend to be https://google.com and send back a phony certificate for that site to your browser. Your browser will warn you that the certificate may not be valid but if you click 'Accept' and proceed to view the site with the phony certificate then the exit node will be able to view your traffic. It can do this because by clicking 'Accept' in this scenario you have set up a secure session between yourself and the exit node, as though the exit node itself was https://google.com. The exit node will accept the traffic you send to google.com, read it, and then pass it on to the real google.com. This is known as a 'Man in the Middle' attack (MITM for short).
But couldn't this happen on the 'normal' internet?
Yes it could, and does. Anyone with a computer along the route between your computer and the website you are connecting to could do this.
Is it easy to do on the 'normal' internet?
On the 'normal' internet, not that easy. It would generally have to be an insider at your ISP or any other ISP or service provider along the route your traffic takes. Or someone who has gained illegal access to an ISP's resources.
Is it easy to do on Tor?
It as easy as setting up a Tor exit node and running a packet sniffer on your local network connection.
So it happens?
So do I have to be more careful about the security of my traffic than normal when using tor?
No, you should always be more careful about the security of your traffic.
But once I'm careful with my traffic while using Tor, I will always be completely anonymous, right?
It depends who you want to be anonymous from. Tor may not protect you from a snooper who has worldwide reach. For example, an organisation that could somehow watch the traffic of every node on the Tor network.
If someone can watch the traffic of every Tor server out there, they can time traffic as it enters and leaves all of the servers and start figuring out where a particular piece of traffic entered and left the tor network (without breaking any encryption). This means they have figured out where a particular circuit starts and ends in the tor network. So by extension they know who was using the circuit and where the traffic on the circuit went.
And can anyone actually do that?
It's possible that one or more security institutions in the "Five Eyes" surveillance states (United States, UK, Canada, New Zealand, and Australia) can get quite close to doing it. It's hard to tell the extent of their capabilities, but some of the surveillance revelations in 2013 indicated that while they are actively working to attack Tor they are not able to deanonymize users en masse. However, Tor's design does not claim to be able to prevent a sufficiently-capable global adversary from deanonymizing users so you should not assume it is able to (especially if you might be individually targeted).
So if I can't be sure of my anonymity from the likes of the NSA and GCHQ, who can I be sure of anonymity from?
Anyone who does not have access to your computer. Though 'access to your computer' has a broader definition than you might think.
You mean someone who has hacked in to my computer?
Hackers, yes. But you might be surprised to learn that even websites can find subtle ways of accessing your computer, even in a minimal way, that can allow them to find out who you are.
Like figuring out what Operating System I'm using?
That doesn't sound like a big deal.
Now that sounds like a big deal. What should I do?
The scorched-earth approach is to disable the following in your browser:
Cookies? What are they again?
They are small files on your computer that some websites create to keep track of your visits. If you visit a site non-anonymously, then re-visit it anonymously with the same cookie you are not very, um, anonymous.
I'm not sure I know how to disable all this stuff every time I want to browse anonymously.