In today’s article, we will talk about Tor and how it can provide people with better anonymity on the web. For this purpose, we will see what it is used for, how it works, and what are the hidden services. Finally, we will talk about the known attacks on this service and how to use it correctly. This article does not aim to give all the details about how the network and particularly the attacks on it works but to give a good overview.
What is Tor#
Tor, which stands for “The Onion Router” is a decentralized network composed of multiple servers called nodes. Its goal is to anonymize the TCP exchanges. It could be used by someone wishing to consult websites that are censored in his country or to access a part of the deep web (other services such as I2P allow to access different parts of the deep web). This part of the web can be tough to quantify, but some papers made some estimations. In 2001, it was expected to be 400 to 550 times bigger than the World Wide Web. The deep web contains a lot of resources but the ones we hear the more about are the ones linked to illegal activities such as drug dealing, child abuse, and so on. Even if you are not familiar with the subject, you probably heard about the famous marketplace Silk Road. This website was hosted on a hidden service; we will talk more about that later.
How does it Work?#
Basic Usage#
In a standard usage of the WEB, if Alice wishes to access a website, she will directly communicate and exchange with it. While she does that, she will leave some tracks of her visit to multiple places. First, her ISP will know she visited this particular website, and then, the website owner will know that the IP a.b.c.d loaded some pages (and some more information but this is not the point here).
With Tor, this could have been prevented. Let’s get back to the example of Alice wishing to connect to Bob’s website, but this time, she will use Tor. First, Tor will contact a server that knows a list of the nodes to know which node she will connect to (as we can see in the following picture).
Then, a random path from Alice to the website will be generated. Alice’s Tor browser will contact the first node which will contact the second node who will contact the third node (AKA exit node) who will communicate with the website (as shown in the illustration below).
As the pictures show, with Tor, the traffic is encrypted between Alice and the nodes and between the nodes themselves. The only unencrypted traffic could be between Bob’s website and the exit node. The good thing is, the nodes have no idea of the path taken by the data. They only know the machines they are directly communicating to, but they don’t know if these machines are clients or nodes. Finally, the only node there that can know what data are transiting through the network is the exit node, but he has no idea of who these data belong to.
Before being transmitted, the packets are encrypted with the public key of the exit node, then with the public key of the node before, etc… (see the illustration below) So, for the packet to be usable by the exit node, it must have been decrypted by all the previous nodes. This implies the fact that only the exit node can read the data and only if it has gone through the defined path. Unfortunately, it also means that the exit node can sniff the traffic; we will talk about that a bit later.
Tor also offers a functionality allowing to get around Tor blocking. Indeed, if a country, an organization, or whatever decides to block the network, they’d need to block the nodes. It would be easy since the addresses are public. Still, they can’t block Tor thanks to bridge relays. These bridges are basic nodes, but they aren’t listed on the relay list or anywhere in general. Since it is not possible to know them automatically, it becomes hard to block them.
Hidden Services#
Finally, Tor allows using what is called hidden services. A hidden service is a service that is accessible only using the Tor network. Its URL ends with “.onion” and its objective is to make the location of the service hidden.
Let’s say that Bob wishes to host a hidden service. To do this, he will need to select some random nodes on the network. They will be called “introduction points”. Then, he will generate a path to them (as in the previous part). When it is done, a hidden service descriptor will be created. It contains Bob’s public key and the introduction points addresses. Bob will sign this descriptor with its private key and upload the whole thing into a database that will be used kind of like a DNS server. An address of 16 characters ending with .onion will then be generated from its public key.
Now, let’s say that Alice decides to connect to Bob’s service. First, she will get the hidden service descriptor from the database. Then, she will connect with a random path to a random node in the network. This will be the meeting point. At this point, a one-time secret will be delivered to her and she will forge an introduction message (encrypted with the public key of the hidden service) which will contain the meeting point’s address and the one-time secret. This message will be sent to an introduction point. When the hidden service receives the message, it will connect to the meeting point and send its one-time secret. Alice will then be notified that everything is right and Alice and Bob’s service will now be able to communicate through the meeting point.
More detailed information and schemes can be found on the Tor website.
The Known Attacks#
Tor is good. However, it is not magic. A certain number of things can pose a problem on the network, and there is no guarantee to be absolutely anonymous, even when using it.
The first problem, as I said before, is the exit nodes. Since they are the only ones to be able to see unencrypted traffic, they can set up a Man in the Middle attack and, for example, sniff the packets transiting and get login information or change the code of the returned page to include some malicious code which could be used to infect the client’s computer and for example, find their real location.
There are also other possibilities that are way harder to process. They are mostly based on the traffic analysis of multiple nodes controlled by the same person or organization. Multiple governmental agencies such as NSA or GCHQ are known for running Tor nodes. If some conditions are filled, it is theoretically possible to associate the traffic of the exit node with users. One way to do that would be to have control of entry and exit nodes that are actually used by someone. By analyzing the data going in the entry node, it could be possible to know what is the size of the data to expect and when they are expected to reach the exit node. Then, a real IP address can be linked to what its owner is doing with Tor. Some papers are developing these attacks; here are two of them:A Practical Congestion Attack on Tor Using Long Paths, On the Effectiveness of Traffic Analysis Against Anonymity Networks Using Flow Records.
Good Practices#
From the previous parts of the article, we can now deduce a certain number of good practices that should allow us to improve our anonymity with Tor (even though, there are no guarantees to be 100% anonymous).
- Keep the Tor browser up to date. Governmental agencies are known for using Firefox exploits to infect users’ computers.
- Don’t enable the plugins such as Flash. For example, Flash is responsible for a good proportion of all attacks on web users.
- Separate your contextual identities. If you wish to stay anonymous while using Tor; using Facebook on it 10s later is not a good idea.
- Avoid using files downloaded using Tor. These files can contain resources on the web and you could access them with your real IP address. It could as well contain some malware.
- Using a VPN could be an extra security measure.
- Launch Tor using some security-oriented distribution such as Tails.
- Avoid connecting HTTP websites using Tor.
- DO NOT use Tor to make peer-to-peer or to download things illegally. It is not done for that, it is not efficient and it will just slow down legit users.
- Tor is no magic. Do not make illegal stuff on it; you can’t be 100% sure to be absolutely safe.