BridgeDB should export statistics

added actualpoints::2.3 anti-censorship-roadmap-september bridgedb component::circumvention/bridgedb metrics owner::phw parent::31274 points::3 priority::medium prometheus resolution::implemented reviewer::cohosh s30-o21a1 severity::normal sponsor::30-must status::closed type::task labels

Trac:
Keywords: N/A deleted, important added

Trac:
Keywords: important deleted, bridges, metrics added
Status: new to assigned
Parent: N/A to #9199 (moved)
Cc: N/A to isis@torproject.org
Owner: N/A to isis

Related: #7525 (moved) Design a system for tracking bridge assignment metrics.

Closing #9317 (moved) as a duplicate of this one, putting the information from that ticket on this one, and setting as 'needs revision' because I haven't testing or looked at this branch in a while.

Quoting #9317 (moved):

While writing bridgedb's logger, I made a context manager for storing a state dictionary which is, so far rather loosely defined, but it would allow us to gather free statistics on bridgedb. Essentially, you would use it like so: {{{ from bridgedb import log as logging logging.callWithContext(myfoocontext, {'addBridgeAssignment': foobridge}) }}} It is also safely threadable, so it would be possible to use this to retrieve debugging information from threads, for instance for #5232 (moved).

The nice thing about this is that it is easily called from the logger (and will still handles log levels and all the other added features from #9199 (moved)). The bad thing is that if it is not written very clearly, it could be difficult for other/new people reading the code to understand, especially if they are not familiar with Twisted.

Part of this was also discussed between myself and Karsten on tor-assistants@…, earlier this month, in the "BridgeDB data for metrics" thread.

Trac:
Status: assigned to needs_revision

Trac:
Parent: #9199 (moved) to N/A
Keywords: bridges deleted, bridgedb added

Arma commented on !#4771 that we should be also tracking the "successfulness" of each distributor:

I would define success of a distribution strategy as a function of how many people are using the bridges that are given out by that strategy.

That means if a strategy never gives bridges to anybody, it would score low. And if it gives out a lot of bridges but they never get used because they got blocked, it would also score low.

It we wanted to get fancier, we would then have a per-country success value. And then we could compare distribution strategies for a given country.

The intuition comes from Damon's Proximax paper from long ago.

Set all open tickets without a severity to "Normal"

Trac:
Severity: N/A to Normal

Trac:
Reviewer: N/A to N/A
Cc: isis@torproject.org to N/A
Owner: isis to N/A
Points: N/A to 3
Sponsor: N/A to Sponsor19
Status: needs_revision to assigned

sysrqb and I discussed this topic in Mexico City. IIRC, we said that sysrqb would send me 24 hours of logs, which can easily be non-recent and heavily obfuscated and use encrypted email, and I use those logs to suggest a possible statistics format on tor-dev@. sysrqb, want to send me those logs, and I move things forward as time permits?

Trac:
Cc: N/A to metrics-team

Trac:
Owner: N/A to dgoulet

This is required to exist before metrics team can archive them in CollecTor.

Trac:
Parent: N/A to #19332 (moved)

Trac:
Milestone: N/A to Network Team 2019 Q1Q2

Trac:
Keywords: N/A deleted, network-team-roadmap-2019-Q1Q2 added

Trac:
Milestone: Network Team 2019 Q1Q2 to N/A

Trac:
Cc: metrics-team to metrics-team, phw

Here's a preliminary list of statistics that we may want, and why we want them. Needless to say, we need to figure out how to collect these statistics safely.

Approximate number of successful requests per distribution mechanism, per country, per bridge type.
- This shows us the demand for bridges over time, and how much use BridgeDB sees.
- It also teaches us what distribution mechanisms are the most useful (or at least popular).
Approximate number of denied requests per distribution mechanism, per country, per bridge type.
- This may show us if people are interacting with BridgeDB unsuccessfully, despite good intentions.
- It may also show us if somebody is trying to game the system.
- Unfortunately, it's difficult to tell apart well-intentioned misuse from ill-intentioned misuse.
Approximate number of email requests per provider, per bridge type.
- This would help us decide what email providers we should pay attention to.
- This would also teach us what providers we could safely retire. For example, over at #28496 (moved), we are thinking about removing Yahoo. What fraction of requests would be affected by this?
Approximate number of HTTPS requests coming from proxies.
- This may be an indicator of people trying to game the system.
Maybe the number of bridges per transport in BridgeDB (see #14453 (moved)).

What am I forgetting?

I briefly discussed this with dgoulet and sysrqb. dgoulet suggested that we may want to export these statistics to our prometheus instance. The idea is to run an exporter on the BridgeDB host. This exporter would only expose the latest BridgeDB stats.

Trac:
Keywords: N/A deleted, prometheus added

Trac:
Keywords: network-team-roadmap-2019-Q1Q2 deleted, N/A added

Replying to phw:

Here's a preliminary list of statistics that we may want, and why we want them. Needless to say, we need to figure out how to collect these statistics safely.

If it's possible, I would like to have a guess at what fraction of bridge requesters are bots. Proxy-distribution papers usually assume that an adversary controls some fraction of the users--it would be great to know what the fraction is in this case. For example Mahdian2010a "n users, k of whom [are] adversaries," Wang2013a "Let f denote the fraction of malicious users among all potential bridge users.... We expect a typical value of f between 1% and 5%...."

Here are some possible ways to identify bots:

IP address clustering--for example if BridgeDB considers all addresses in a /24 the same, find the most commonly occurring /20
auto-generated email addresses following a pattern
- to start, you could make a histogram of the lengths of email addresses, and see if it's concentrated at a single point. or count the frequency of short prefixes and suffixes of email address local-parts, and see if there are any that appear overwhelmingly more often than others.
an anachronistic HTTP User-Agent (for example, Chrome from 2 years ago, when most real Chrome users auto-update)
inconsistent HTTP headers, for example Chrome or Firefox without Accept-Encoding: gzip

With some sort of bot-classification heuristic, then it would be good to analyze the statistics you mentioned already (e.g. fraction allowed/denied) for bot and non-bot requests.

I would like to see a graph that shows how long it takes for a single bridge to be given to n different requesters. When BridgeDB starts distributing a bridge, how long does it take before 5 people know about it? Before 50 people know about it?

Approximate number of HTTPS requests coming from proxies.

This may be an indicator of people trying to game the system.

On this point, specifically I would want to know what fraction of of requests have an X-Forwarded-For or Via header, and how many entries it contains. I mention this because not only can these headers indicate the use of a proxy, a client may spoof them. And I seem to remember that BridgeDB may process X-Forwarded-For incorrectly, like it reads the entries in the wrong order when there are multiple of them.

For this analysis, you will have to be aware that requests via Moat always have at least one X-Forwarded-For (I believe), because Moat is implemented using an Apache ProxyPass reverse proxy and Apache adds that header.

Trac:
Owner: dgoulet to phw

I posted a draft proposal for Tor's research safety board on our mailing list.

Adding the keyword to mark everything that didn't fit into the time for sponsor 19.

Trac:
Keywords: N/A deleted, ex-sponsor-19 added

Moving from Sponsor 19 to Sponsor 30.

Trac:
Sponsor: Sponsor19 to Sponsor30-must

Trac:
Keywords: N/A deleted, anti-censorship-roadmap added

We just heard back from Tor's Research Safety Board. You can find the response below. The reviewer writes that our proposal wouldn't be an issue in a one-off setting but could be problematic in the long run. I think a reasonable way forward would be to implement the proposal, run it in a one-off setting for, say, a week, and then evaluate if we should change data collection. In the long run, we should also transition to PrivCount as the reviewer mentions.

Tor Research Safety Board Paper #20 Reviews and Comments
===========================================================================
Paper #20 Collecting BridgeDB usage statistics


Review #20A
===========================================================================
* Updated: 11 Jun 2019 6:02:53pm EDT

Overall merit
-------------
4. Accept

Reviewer expertise
------------------
3. Knowledgeable

Paper summary
-------------
The document proposing collecting a new set of usage statistics through data
available from the operation of BridgeDB. The statistics would be useful for
better prioritizing development tasks, to improve reaction time to bridge
enumeration attacks and blockages, to reduce failure rates, and to help promote
censorship circumvention research.

Comments for author
-------------------
If this was a short term study, I would say go for it, no questions asked. The
benefits are clear and I agree that they outweigh the risks.

However, I think it was implied (although not explicitly stated) that the new
statistics would be regularly collected and published on an ongoing basis. I
think there are more risks associated with such an ongoing collection as opposed
to a one-off or short term study, so we should carefully consider the trade-offs
between cost/effort of safer collection methods with the privacy benefits of
such methods.

The most concerning statistics to me are the per-country statistics and the
per-service (gmail, yahoo, etc.) statistics. I think it is clear from Sections 3
and 4 that you understand the risks associated with collecting these statistics:
a single user from an unpopular country could be identified because the 1-10
bucket suddenly changed from a 0 count to a 1 count. This issue might also exist
if unpopular email service providers are selected. This issue is already present
in Tor's per-country user statistics, and I believe there is a plan to
transition away from these statistics because of the safety concerns. The
bucketing proposal (round to the nearest 10) does provide some uncertainty, but
it's hard to reason about what protection it is providing.

In an ideal world, we would collect these statistics with a privacy-preserving
statistics collection tool. In fact, I think most if not all of these could be
collected with PrivCount (assuming it was extended to support the new event
types).

One useful thing about PrivCount is secure aggregation, meaning that if you have
multiple data collectors, you can securely count a total across all of them
without leaking individual inputs. In this case, it seems like there is only one
BridgeDB data source, so we woud not benefit from PrivCount's secure
aggregation.

The other useful thing that PrivCount provides is differential privacy. This is
where you could get most of the benefit. Rather than rounding to 10 and not
knowing how much privacy that provides, you instead start by defining how much
privacy each statistic should achieve based on your operational environment
(these are called action bounds), and then PrivCount will add noise to the
statistics in a way that will guarantee differential privacy under those
constraints. If these constraints add too much noise for the resulting
statistics to be useful, then you have to consider if the measurement is too
privacy-invasive for the given actions you are trying to protect and therefore
you possibly shouldn't collect them.

Tor has PrivCount on the roadmap (I believe), so one option could be to
implement the non-PrivCount version now and eventually transition the statistics
to PrivCount. Another option would be to set up a PrivCount instance using the
open source tool rather than waiting for the PrivCount-in-Tor version to be
ready. In fact, if the data is collected at BridgeDB, then I'm not sure that
having PrivCount in Tor would help anyway (unless the BridgeDB runs Tor).

There has been some work to use PrivCount for measurement and also to explain
the process of defining action bounds. I think the most relevant is the IMC
paper:
    - https://torusage-imc2018.github.io

I pushed a patch to my fix/9316 branch. The commit message provides an overview of what the patch seeks to accomplish.

Edit: Changed patch URL.

Trac:
Reviewer: N/A to sysrqb
Status: assigned to needs_review

Trac:
Keywords: ex-sponsor-19, anti-censorship-roadmap deleted, anti-censorship-roadmap-september added

I sent an email with a summary of the first batch of metrics to tor-dev@. Rick Huebner suggested to add metrics on BridgeDB's hash ring allocations. This would be a useful albeit not critical addition to our existing metrics.

Trac:
Parent: #19332 (moved) to #31268 (moved)
Description: BridgeDB should export statistics on its usage. Stuff like distributor usage, number of clients served, etc.

to

BridgeDB should export statistics on its usage. Stuff like distributor usage, number of clients served, etc.

Ticket #19332 (moved) tracks our follow-up task: the inclusion of BridgeDB's metrics into CollecTor.

Trac:
Parent: #31268 (moved) to #31274 (moved)

Trac:
Reviewer: sysrqb to cohosh

This looks good to me, I added some comments to this commit: https://github.com/NullHypothesis/bridgedb/commit/d6fa8e18dd764cbc612f834338cb71c4ab322a9b

There were some changes that I couldn't track how they relate to the metrics feature, perhaps they snuck in from some other changes being made to bridgedb? Otherwise it looks really good!

There might be an unchecked failure case here with the moat reporting for when we don't have any bridge lines to return. It's not really a failure of the system though so much as a lack of bridges so I'm not sure how we'd want to count that.

Trac:
Status: needs_review to needs_information

Replying to cohosh:

This looks good to me, I added some comments to this commit: https://github.com/NullHypothesis/bridgedb/commit/d6fa8e18dd764cbc612f834338cb71c4ab322a9b

Thanks. My latest branch should include all your feedback: https://github.com/NullHypothesis/bridgedb/commits/fix/9316

There were some changes that I couldn't track how they relate to the metrics feature, perhaps they snuck in from some other changes being made to bridgedb? Otherwise it looks really good!

Right, I split the branch into three commits, to make it less confusing. 0d5ed52e fixes the broken download of Tor exit relays, 85a69d1b updates a comment, and 5cde59d9 implements the metrics feature.

There might be an unchecked failure case here with the moat reporting for when we don't have any bridge lines to return. It's not really a failure of the system though so much as a lack of bridges so I'm not sure how we'd want to count that.

The current metrics implementation is user-centric, meaning that requests are classified as "success" if the user did everything right and "fail" if the user did something wrong (e.g., used an email account other than Gmail or Riseup, or failed to solve the CAPTCHA).

We probably also want BridgeDB-centric metrics such as "number of bridges per ring" and "number of requests that were answered with 0 bridges". I suggest that we discuss these in a separate ticket, ok?

Edit: Changed URLs to git commits.

Trac:
Status: needs_information to needs_review

Thanks, this looks good to me!

Trac:
Status: needs_review to merge_ready

Replying to phw:

We probably also want BridgeDB-centric metrics such as "number of bridges per ring" and "number of requests that were answered with 0 bridges". I suggest that we discuss these in a separate ticket, ok?

I filed #31422 (moved) for this enhancement.

Replying to cohosh:

Thanks, this looks good to me!

Merged and deployed, thanks!

Trac:
Status: merge_ready to closed
Resolution: N/A to implemented

Trac:
Actualpoints: N/A to 2.3

Trac:
Keywords: N/A deleted, s30-a1 added

Trac:
Keywords: s30-a1 deleted, s30-o21a1 added

closed

changed time estimate to 24h

added 18h 24m of time spent

mentioned in issue #9317 (moved)

mentioned in issue #19332 (moved)

mentioned in issue #28496 (moved)

mentioned in issue #29278 (moved)

mentioned in issue #31422 (moved)

mentioned in issue #31426 (moved)

BridgeDB should export statistics

Child items 0

Activity