Closing #9317 (moved) as a duplicate of this one, putting the information from that ticket on this one, and setting as 'needs revision' because I haven't testing or looked at this branch in a while.
While writing bridgedb's logger, I made a context manager for storing a state dictionary which is, so far rather loosely defined, but it would allow us to gather free statistics on bridgedb. Essentially, you would use it like so:
{{{
from bridgedb import log as logging
logging.callWithContext(myfoocontext, {'addBridgeAssignment': foobridge})
}}}
It is also safely threadable, so it would be possible to use this to retrieve debugging information from threads, for instance for #5232 (moved).
The nice thing about this is that it is easily called from the logger (and will still handles log levels and all the other added features from #9199 (moved)). The bad thing is that if it is not written very clearly, it could be difficult for other/new people reading the code to understand, especially if they are not familiar with Twisted.
Part of this was also discussed between myself and Karsten on tor-assistants@…, earlier this month, in the "BridgeDB data for metrics" thread.
Arma commented on !#4771 that we should be also tracking the "successfulness" of each distributor:
I would define success of a distribution strategy as a function of how many people are using the bridges that are given out by that strategy.
That means if a strategy never gives bridges to anybody, it would score low. And if it gives out a lot of bridges but they never get used because they got blocked, it would also score low.
It we wanted to get fancier, we would then have a per-country success value. And then we could compare distribution strategies for a given country.
The intuition comes from Damon's Proximax paper from long ago.
sysrqb and I discussed this topic in Mexico City. IIRC, we said that sysrqb would send me 24 hours of logs, which can easily be non-recent and heavily obfuscated and use encrypted email, and I use those logs to suggest a possible statistics format on tor-dev@. sysrqb, want to send me those logs, and I move things forward as time permits?
Here's a preliminary list of statistics that we may want, and why we want them. Needless to say, we need to figure out how to collect these statistics safely.
Approximate number of successful requests per distribution mechanism, per country, per bridge type.
This shows us the demand for bridges over time, and how much use BridgeDB sees.
It also teaches us what distribution mechanisms are the most useful (or at least popular).
Approximate number of denied requests per distribution mechanism, per country, per bridge type.
This may show us if people are interacting with BridgeDB unsuccessfully, despite good intentions.
It may also show us if somebody is trying to game the system.
Unfortunately, it's difficult to tell apart well-intentioned misuse from ill-intentioned misuse.
Approximate number of email requests per provider, per bridge type.
This would help us decide what email providers we should pay attention to.
This would also teach us what providers we could safely retire. For example, over at #28496 (moved), we are thinking about removing Yahoo. What fraction of requests would be affected by this?
Approximate number of HTTPS requests coming from proxies.
This may be an indicator of people trying to game the system.
Maybe the number of bridges per transport in BridgeDB (see #14453 (moved)).
I briefly discussed this with dgoulet and sysrqb. dgoulet suggested that we may want to export these statistics to our prometheus instance. The idea is to run an exporter on the BridgeDB host. This exporter would only expose the latest BridgeDB stats.
Here's a preliminary list of statistics that we may want, and why we want them. Needless to say, we need to figure out how to collect these statistics safely.
If it's possible, I would like to have a guess at what fraction of bridge requesters are bots. Proxy-distribution papers usually assume that an adversary controls some fraction of the users--it would be great to know what the fraction is in this case. For example Mahdian2010a "n users, k of whom [are] adversaries," Wang2013a "Let f denote the fraction of malicious users among all potential bridge users.... We expect a typical value of f between 1% and 5%...."
Here are some possible ways to identify bots:
IP address clustering--for example if BridgeDB considers all addresses in a /24 the same, find the most commonly occurring /20
auto-generated email addresses following a pattern
to start, you could make a histogram of the lengths of email addresses, and see if it's concentrated at a single point. or count the frequency of short prefixes and suffixes of email address local-parts, and see if there are any that appear overwhelmingly more often than others.
an anachronistic HTTP User-Agent (for example, Chrome from 2 years ago, when most real Chrome users auto-update)
inconsistent HTTP headers, for example Chrome or Firefox without Accept-Encoding: gzip
With some sort of bot-classification heuristic, then it would be good to analyze the statistics you mentioned already (e.g. fraction allowed/denied) for bot and non-bot requests.
I would like to see a graph that shows how long it takes for a single bridge to be given to n different requesters. When BridgeDB starts distributing a bridge, how long does it take before 5 people know about it? Before 50 people know about it?
Approximate number of HTTPS requests coming from proxies.
This may be an indicator of people trying to game the system.
On this point, specifically I would want to know what fraction of of requests have an X-Forwarded-For or Via header, and how many entries it contains. I mention this because not only can these headers indicate the use of a proxy, a client may spoof them. And I seem to remember that BridgeDB may process X-Forwarded-For incorrectly, like it reads the entries in the wrong order when there are multiple of them.
For this analysis, you will have to be aware that requests via Moat always have at least one X-Forwarded-For (I believe), because Moat is implemented using an Apache ProxyPass reverse proxy and Apache adds that header.
We just heard back from Tor's Research Safety Board. You can find the response below. The reviewer writes that our proposal wouldn't be an issue in a one-off setting but could be problematic in the long run. I think a reasonable way forward would be to implement the proposal, run it in a one-off setting for, say, a week, and then evaluate if we should change data collection. In the long run, we should also transition to PrivCount as the reviewer mentions.
Tor Research Safety Board Paper #20 Reviews and Comments===========================================================================Paper #20 Collecting BridgeDB usage statisticsReview #20A===========================================================================* Updated: 11 Jun 2019 6:02:53pm EDTOverall merit-------------4. AcceptReviewer expertise------------------3. KnowledgeablePaper summary-------------The document proposing collecting a new set of usage statistics through dataavailable from the operation of BridgeDB. The statistics would be useful forbetter prioritizing development tasks, to improve reaction time to bridgeenumeration attacks and blockages, to reduce failure rates, and to help promotecensorship circumvention research.Comments for author-------------------If this was a short term study, I would say go for it, no questions asked. Thebenefits are clear and I agree that they outweigh the risks.However, I think it was implied (although not explicitly stated) that the newstatistics would be regularly collected and published on an ongoing basis. Ithink there are more risks associated with such an ongoing collection as opposedto a one-off or short term study, so we should carefully consider the trade-offsbetween cost/effort of safer collection methods with the privacy benefits ofsuch methods.The most concerning statistics to me are the per-country statistics and theper-service (gmail, yahoo, etc.) statistics. I think it is clear from Sections 3and 4 that you understand the risks associated with collecting these statistics:a single user from an unpopular country could be identified because the 1-10bucket suddenly changed from a 0 count to a 1 count. This issue might also existif unpopular email service providers are selected. This issue is already presentin Tor's per-country user statistics, and I believe there is a plan totransition away from these statistics because of the safety concerns. Thebucketing proposal (round to the nearest 10) does provide some uncertainty, butit's hard to reason about what protection it is providing.In an ideal world, we would collect these statistics with a privacy-preservingstatistics collection tool. In fact, I think most if not all of these could becollected with PrivCount (assuming it was extended to support the new eventtypes).One useful thing about PrivCount is secure aggregation, meaning that if you havemultiple data collectors, you can securely count a total across all of themwithout leaking individual inputs. In this case, it seems like there is only oneBridgeDB data source, so we woud not benefit from PrivCount's secureaggregation.The other useful thing that PrivCount provides is differential privacy. This iswhere you could get most of the benefit. Rather than rounding to 10 and notknowing how much privacy that provides, you instead start by defining how muchprivacy each statistic should achieve based on your operational environment(these are called action bounds), and then PrivCount will add noise to thestatistics in a way that will guarantee differential privacy under thoseconstraints. If these constraints add too much noise for the resultingstatistics to be useful, then you have to consider if the measurement is tooprivacy-invasive for the given actions you are trying to protect and thereforeyou possibly shouldn't collect them.Tor has PrivCount on the roadmap (I believe), so one option could be toimplement the non-PrivCount version now and eventually transition the statisticsto PrivCount. Another option would be to set up a PrivCount instance using theopen source tool rather than waiting for the PrivCount-in-Tor version to beready. In fact, if the data is collected at BridgeDB, then I'm not sure thathaving PrivCount in Tor would help anyway (unless the BridgeDB runs Tor).There has been some work to use PrivCount for measurement and also to explainthe process of defining action bounds. I think the most relevant is the IMCpaper: - https://torusage-imc2018.github.io
I sent an email with a summary of the first batch of metrics to tor-dev@. Rick Huebner suggested to add metrics on BridgeDB's hash ring allocations. This would be a useful albeit not critical addition to our existing metrics.
Trac: Parent: #19332 (moved)to#31268 (moved) Description: BridgeDB should export statistics on its usage. Stuff like distributor usage, number of clients served, etc.
to
BridgeDB should export statistics on its usage. Stuff like distributor usage, number of clients served, etc.
Ticket #19332 (moved) tracks our follow-up task: the inclusion of BridgeDB's metrics into CollecTor.
There were some changes that I couldn't track how they relate to the metrics feature, perhaps they snuck in from some other changes being made to bridgedb? Otherwise it looks really good!
There might be an unchecked failure case here with the moat reporting for when we don't have any bridge lines to return. It's not really a failure of the system though so much as a lack of bridges so I'm not sure how we'd want to count that.
There were some changes that I couldn't track how they relate to the metrics feature, perhaps they snuck in from some other changes being made to bridgedb? Otherwise it looks really good!
Right, I split the branch into three commits, to make it less confusing. 0d5ed52e fixes the broken download of Tor exit relays, 85a69d1b updates a comment, and 5cde59d9 implements the metrics feature.
There might be an unchecked failure case here with the moat reporting for when we don't have any bridge lines to return. It's not really a failure of the system though so much as a lack of bridges so I'm not sure how we'd want to count that.
The current metrics implementation is user-centric, meaning that requests are classified as "success" if the user did everything right and "fail" if the user did something wrong (e.g., used an email account other than Gmail or Riseup, or failed to solve the CAPTCHA).
We probably also want BridgeDB-centric metrics such as "number of bridges per ring" and "number of requests that were answered with 0 bridges". I suggest that we discuss these in a separate ticket, ok?
We probably also want BridgeDB-centric metrics such as "number of bridges per ring" and "number of requests that were answered with 0 bridges". I suggest that we discuss these in a separate ticket, ok?