Graphs for multiple relays that have the same fingerprint
This is a feature request for an unusual relay configuration.
The Snowflake bridge is getting more and more traffic, which caused the tor process to become a bottleneck. We removed the bottleneck by running 4 instances of tor on the host, all having the same identity keys, with a load balancer in front of them. For details, see the bridge installation guide, the tor-relays thread, and tpo/anti-censorship/pluggable-transports/snowflake#40095 (closed). The short summary is that where we formerly there was one instance of tor:
nickname | hashed fingerprint |
---|---|
flakey | 5481936581E23D2D178105D44DB6915AB06BFB7F |
There are now 4 instances, all independently uploading descriptors:
nickname | hashed fingerprint |
---|---|
flakey1 | 5481936581E23D2D178105D44DB6915AB06BFB7F |
flakey2 | 5481936581E23D2D178105D44DB6915AB06BFB7F |
flakey3 | 5481936581E23D2D178105D44DB6915AB06BFB7F |
flakey4 | 5481936581E23D2D178105D44DB6915AB06BFB7F |
The problem is that multiple descriptors for the same fingerprint currently result in inaccurate metrics graphs, include Relay Search and Bridge users by transport. What we think is happening is that the analysis programs are, in effect, choosing one instance per day as a representative for the fingerprint. Since there are 4 instances, the numbers are are 1/4 as large as they should be. Also, during the load balancing upgrade, we were running a separate staging server, so there were actually 8 instances running at time (4 used, 4 unused), which causes the graph to go to zero on some days. Like Roger says: "the metrics world will think it is a single bridge that keeps changing its mind about its past bandwidth use and other stats."
Take a look at the Relay Search graphs. The anomalies start on 2022-01-25, which is this comment.
bandwidth | clients |
---|---|
If the graphs showed all instances together, rather than one instance at a time, they would look like this:
bandwidth | clients |
---|---|
I made the graphs by processing bridge-extra-info descriptors from Collector, in a multi-instance-aware way (R and Python source code), following the Reproducible Metrics guidelines for Consumed bandwidth and Bridge users. The difference is that I use fingerprint+nickname as a bridge identifier, not just fingerprint. A later descriptor with the same fingerprint but a different nickname adds to the day's total, instead of replacing it. (I am not sure, but I think here in BandwidthStatus.java is where existing values are replaced in Onionoo.)
So I'm wondering if one of two things are possible:
- Consider all the nicknames for a given fingerprint as being the same relay. Relay Search would show the sum of all their contributions.
- Consider different fingerprints for a given fingerprint to be different relays. Relay search would have a separate page for every instance.