Investigate users spikes in selected countries

We have been observing users spikes in selected countries increasing our daily user count by +2M:

https://metrics.torproject.org/userstats-relay-country.html?start=2024-10-28&end=2024-11-26&country=all&events=off

userstats-relay-country-all-2024-10-28-2024-11-26-off

The countries involved where at least:

At the same time we noticed a increased number of dir requests: https://metrics.torproject.org/dirbytes.html

Our current clients/users counts is based on dir requests. Estimating that an average client would spend a certain amount of time online during the day. The details of our estimation algorithm are documented at: https://metrics.torproject.org/reproducible-metrics.html#users. The gist is that we take the number of dir req and we divide that number by 10. A client that is connected 24/7 makes about 15 requests per day, but not all clients are connected 24/7, so we picked the number 10 for the average client. We simply divide directory requests by 10 and consider the result as the number of users. Another way of looking at it, is that we assume that each request represents a client that stays online for one tenth of a day, so 2 hours and 24 minutes.

During those same days we could also identify an entity doing scraping of social media site using hetzner (that would map to the countries identified in our graphs).

Given how we estimate users, it means they generated 20M requests during the days of the incident, while this is possible, it is also hard to say if a single entity could generates this many requests, and also why.

A few other points to consider:

  • This is hardly the only entity doing scraping. We would have to see more of these spikes on a more regular basis or at least count more users.
  • It is possible that social media sites monitor connections from Tor nodes and they would start throttling/blocking http requests from exits after a while if they would get so many clients connecting all at once.

cc: @gus @gk

Edited by Hiro