For the exit sponsoring plan, it would be good to make a periodic (e.g. daily) list of 100mbit+ exits. Specifically, these would be relays where:
Their bandwidth rate is at least 12500KB
Their advertised bandwidth is above some cutoff, say 5000KB.
Their exit policy allows at least 80, 443, 554, and 1755.
If we sorted this list by consensus weight, showing percentage of exit weights they are and providing links to atlas or some other individual page for each, we'd be doing even better.
Then we could imagine doing a graph over time of the number of relays in this list. Ideally we will be able to see it go up. :)
(I'm not quite sure what the advertised bandwidth cutoff should be. Ideally it would be quite high, except there might come a time where we have such excess capacity that some relays don't see enormous spikes. If they have the capacity, it's a feature that they don't see the spikes. So maybe we should pick a low cutoff to be flexible for the future.)
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Child items ...
Show closed items
Linked items 0
Link issues together to show that they're related.
Learn more.
For the exit sponsoring plan, it would be good to make a periodic (e.g. daily) list of 100mbit+ exits. Specifically, these would be relays where:
Their bandwidth rate is at least 12500KB
Their advertised bandwidth is above some cutoff, say 5000KB.
Their exit policy allows at least 80, 443, 554, and 1755.
If we sorted this list by consensus weight, showing percentage of exit weights they are and providing links to atlas or some other individual page for each, we'd be doing even better.
By "periodic", do you mean updating that list periodically, or keeping lists of all past periods and letting users navigate between them? I have an idea for building the former, but not for the latter. Assuming you mean a single list of current 100mbit+ exits in the following.
How about we extend Atlas itself for this? It's the better fit than the metrics website, because it was specifically designed for displaying individual relay information vs. aggregated statistics.
The way how the extended Atlas would work is that you select a "stored query" of some sort instead of searching by nickname/fingerprint/address. Atlas would make sure that the three criteria you mentioned are included in the filter, and it would sort by consensus weight and show percentage of exit weights. And of course, if you click on a relay, you'd get the already known details page for that relay. In the near future, that details page might contain a graph with exit weight percentage over time.
The only downside of that plan is that it may take weeks until everything's implemented, and you probably want to have it really soon. That problem shouldn't prevent us from extending Atlas, because it's the better design IMO. But I can easily hack up a Python script that outputs the requested information. We could even run it in a daily cronjob and send its results to the tor-relays mailing list. We can shut it down once Atlas is ready.
Then we could imagine doing a graph over time of the number of relays in this list. Ideally we will be able to see it go up. :)
This is something for the metrics website. I'll make a sample graph later today or tomorrow and attach it here.
(I'm not quite sure what the advertised bandwidth cutoff should be. Ideally it would be quite high, except there might come a time where we have such excess capacity that some relays don't see enormous spikes. If they have the capacity, it's a feature that they don't see the spikes. So maybe we should pick a low cutoff to be flexible for the future.)
The suggested solution above would allow you to tweak the advertised bandwidth parameter in Atlas. Changing that parameter for the metrics website graph will be somewhat harder, but not impossible.
By "periodic", do you mean updating that list periodically, or keeping lists of all past periods and letting users navigate between them? I have an idea for building the former, but not for the latter. Assuming you mean a single list of current 100mbit+ exits in the following.
Right, just the latest list is fine (for now ;)
But I can easily hack up a Python script that outputs the requested information.
By "periodic", do you mean updating that list periodically, or keeping lists of all past periods and letting users navigate between them? I have an idea for building the former, but not for the latter. Assuming you mean a single list of current 100mbit+ exits in the following.
Right, just the latest list is fine (for now ;)
Okay. :)
But I can easily hack up a Python script that outputs the requested information.
Yes please.
Here we go. I extended the #6329 (moved) script to output what we're interested in here:
On second thought, do we really want to send this output to a mailing list? You're so much more flexible using the script yourself, e.g., by aggregating by AS (add the -A flag) or by country (add the -C flag). Also, that enables you to play around with advertised bandwidth thresholds more easily (look for mentions of fast_exits_only in the script).
When asked about the previous list and its lack of any of the usual top exit nodes (CCC ones, for example) Karsten told me that it was because of the weird requirements for ports 554 and 1755 (on top of the more usual 80 and 443).
We have torservers' ones (lumumba, wau, chomsky, sofia, gorz, rainbowwarrior, politkovskaja, politkovskaja2), that's nearly 40% of the above list.
Add '--almost-fast-exit-relay' -
Display only exits which have a bw rate between 80 Mbit/s and
95 Mbit/s, advertised bw between 2000kb and 5000kb and allow
exiting to 80, 443 ports.
Add 'no more than 2 per /24' clause -
Only 2 relays per /24 blocks are used. Figure out the network addr for
a relay and store it in a dict(network_data) with a 'network'->'relay'
mapping. Check if 2 relays are present in network_data for a
particular network, if yes, then find the relay with the smallest exit
probability and remove it from the list of all relays and add the current
relay to the list of all relays.
I made a mistake in the commit msg, the mapping is actually 'network' -> ['relay', 'relay'].
Running it with --almost-fast-exit-relay takes almost 9 secs. I wonder if I can optimize this even more. Thoughts?
I just attached six graphs as discussed with arma on #tor-dev this morning. Some meta stats: generating these graphs took 2 hours CPU time and 7:15 hours developer time.
gsathya, thanks for the patch above! I'll review and merge it tomorrow morning. (If you feel you want to tweak the commit message before it gets merged, please do.)
gsathya, thanks for the patch above! I'll review and merge it tomorrow morning.
Thanks!
(If you feel you want to tweak the commit message before it gets merged, please do.)
Done.
gsathya, I looked at your patch. Here are some comments:
Your last commit, 787e3d4d, should be a "fixup" of 8e08f01. The two commits should get squashed before merging. (Feel free to rebase your branch before my next review if you want.)
When you call stats.get_relays() you don't pass options.almost_fast_exits_only, which means that you get all relays, not just the almost fast exits.
The --amost-fast-exit-relay option prints out relays almost failing both bandwidth requirements. It should instead print out relays failing at least one requirement. Similarly, the port check should exclude relays allowing all four ports, because those already meet the port requirement instead of almost meeting them.
The -f flag should become the short form of the --family option which is contained in another pending patch. Can you use -w instead (almost x in the alphabet..)?
You're looping over all or_addresses to implement the "no more than 2 relays per /24" requirement. This is problematic once relays can have two or more IPv4 addresses, because then you'll add relays twice or more often to the selection. Fortunately, relays won't have more than one IPv4 address very soon. Can you add a loud warning if a relay has more than one IPv4 address in that list?
Related to or_addresses and your ip.split(':') command, you're treating IPv6 addresses wrongly. IPv6 addresses for relays will be added very soon, so we should handle them correctly, i.e., ignore them entirely for the /24 requirement.
Just guessing, but I wonder if the script is slow, because you're using socket.inet_aton to extract the /24 of an address. Why do the hard math there instead of simply cutting off at the last dot?
Another idea for the performance problems might be that my ports-checking code is quite inefficient. When I implemented something similar in Java yesterday for the graphs, my VM was very unhappy with me. In particular, reject 1-65535 is the worst case for creating a temporary list containing 65535 ints. Want to look into rewriting that part more efficiently? Otherwise, I'd do it.
While you're working on this patch, can you change the bandwidth-rate requirement for --fast-exits-only from 12500 * 1024 to 95 * 125 * 1024, too? Maybe make that a separate commit.
I just attached six graphs as discussed with arma on #tor-dev this morning. Some meta stats: generating these graphs took 2 hours CPU time and 7:15 hours developer time.
fast-exits-2months and almost-fast-exits-2months are great! Can we get them up on a metrics page somewhere, auto-updated several times a day?
The --amost-fast-exit-relay option prints out relays almost failing both bandwidth requirements. It should instead print out relays failing at least one requirement. Similarly, the port check should exclude relays allowing all four ports, because those already meet the port requirement instead of almost meeting them.
Maybe this comment was badly phrased. Let me try again:
We defined three requirements---two bandwidth requirements and one port requirement---for relays to be considered "fast exits". We also relaxed all three requirements and defined relays meeting those requirements, but not the above requirements, as being "almost fast exits".
Your current code has two problems:
It only considers a relay as almost fast if it meets the relaxed bandwidth requirements but fails the original bandwidth requirements. In numbers, 80 < rate < 95 and 2000 < advertised < 5000. If a relay has a rate of 90 and advertises 6000, you wouldn't list it.
Once you rewrite the bandwidth requirements, you'll also want to rewrite the port requirement to check if all four ports are permitted. If a relay has rate >= 95 and advertised >= 5000, and supports all four ports (of which you only check two), you'd call it an almost fast exit, though it's in fact a fast exit.
Replying to karsten:
Your current code has two problems:
It only considers a relay as almost fast if it meets the relaxed bandwidth requirements but fails the original bandwidth requirements. In numbers, 80 < rate < 95 and 2000 < advertised < 5000. If a relay has a rate of 90 and advertises 6000, you wouldn't list it.
Oh. I thought an almost-fast-exit relay should satisfy both -
80 < rate < 95 and
2000< advertised < 5000.
Sorry. So, it should be "80 < rate < 95 or 2000 < advertised < 5000"? Is this correct?
if not ((80 * 125 * 1024 <= relay.get('bandwidth_rate', -1) <= 95 * 125 * 1024) or (2000 * 1024 <= relay.get('advertised_bandwidth', -1) <= 5000 * 1024)): continue
Once you rewrite the bandwidth requirements, you'll also want to rewrite the port requirement to check if all four ports are permitted. If a relay has rate >= 95 and advertised >= 5000, and supports all four ports (of which you only check two), you'd call it an almost fast exit, though it's in fact a fast exit.
So, it should be "80 < rate < 95 or 2000 < advertised < 5000"? Is this correct?
Not quite. That would accept rate = 90 and advertised = 1000, which it shouldn't. It's rather "(rate >= 80 and advertised >= 2000 and ports 80, 443) and not (rate >= 95 and advertised >= 5000 and ports 80, 443, 554, 1755)".
Replying to gsathya:
Not quite. That would accept rate = 90 and advertised = 1000, which it shouldn't. It's rather "(rate >= 80 and advertised >= 2000 and ports 80, 443) and not (rate >= 95 and advertised >= 5000 and ports 80, 443, 554, 1755)".
Gotcha. Thanks! I'll merge this with delber's changes, and send you a patch.
Thanks! This is really nice for keeping track of progress. The funders love it too.
Can you put up the most recent list of relays for each case too? That way we'll have an easy list we can give out, and also it will be easier for me to know which ones currently count so I can distinguish the 'almost counting' from the 'counting'.
Can you put up the most recent list of relays for each case too? That way we'll have an easy list we can give out, and also it will be easier for me to know which ones currently count so I can distinguish the 'almost counting' from the 'counting'.
Sathya is working on extending the #6329 (moved) Python script to print out these relays. You'd have to run that Python script locally though, instead of looking at a website.
Other than that, I think we should extend Atlas, not the metrics website, to list relays meeting or almost meeting the fast exit requirements. See my comment above.
If we really wanted these lists to show up on a website we could turn the #6329 (moved) script into a website. But once we have extended Atlas, we'd throw away that code. I'd rather want to avoid that.
What we could also do is set up a mailing list for the output of the #6329 (moved) script. We could then add a link to that mailing list archive to the metrics website.
Sathya made some good progress with the website version of Compass which will list fast exits and almost fast exits. But we were unsure how to exactly define "fast" and "almost fast". I hear Roger had some ideas there, too. We should really agree on a common understanding soon. How about this:
The set of fast exits has
a. 95+ Mbit/s bandwidth rate,
a. 5000+ KB/s advertised bandwidth,
a. accepts ports 80/443/554/1755, and
a. at most 2 relays per /24 network.
The set of almost fast exits has
a. 80+ Mbit bandwidth rate,
a. 2000+ KB/s advertised bandwidth,
a. accepts ports 80/443,
a. does not have (95+ Mbit/s bandwidth rate and 5000+ KB/s advertised bandwidth and allows ports 80/443/554/1755), and
a. has as many relays per /24 network as there are.
The set of fast exits without network restriction has
a. 95+ Mbit/s bandwidth rate,
a. 5000+ KB/s advertised bandwidth,
a. accepts ports 80/443/554/1755, and
a. has as many relays per /24 network as there are.
The idea is that 1 is what the sponsor cares about, 2 contains the set of exits that we might turn into fast exits, and 3 gives an overview of network diversity.
If we agree on these requirements, I'll have to extend and re-run the tool producing the graph data, because right now the 80+ Mbit/s line has the "at most 2 per /24" requirement, too. That means that the line will slightly go up.
If this is the actual algorithm we'll use, 2.d. should be "is not in group 1". As you've specified it now, if there are 3 qualifying fast exits in a given /24, two of them make it into group 1 and zero of them make it into group 2. I think two of them should be in group 1 and the remaining one should be in group 2.
If this is the actual algorithm we'll use, 2.d. should be "is not in group 1". As you've specified it now, if there are 3 qualifying fast exits in a given /24, two of them make it into group 1 and zero of them make it into group 2. I think two of them should be in group 1 and the remaining one should be in group 2.
Ah, the idea is that the third relay could be moved to a different /24 and then count as fast exit, too? Indeed, then it makes sense to only remove group 1 relays from the group 2 results.
Other than that, sounds great!
Great!
gsathya, I think the easiest fix is to combine FastExitFilter and SameNetworkFilter by adding same_network as new parameter to FastExitFilter. Otherwise, I don't see how we could implement the new case 2. The cases would then be something along this:
Compass now has lists of fast and almost fast exits. And we have graphs. AFAICS that concludes this ticket. Closing. Please re-open if I missed something.
Trac: Resolution: N/Ato implemented Status: needs_revision to closed
Compass now has lists of fast and almost fast exits. And we have graphs. AFAICS that concludes this ticket. Closing. Please re-open if I missed something.
In compass, the third option is listed as "Fast exits relays any network." It's not clear to me whether this means "Fast exits (plural), PLUS relays any network" or Fast exit relay. Having seen the code in git and the discussion above, I'm even less clear now on what that third option does. Maybe it's just the extra 's' appended to 'exits,' but could that third descriptor be clarified?
Trac: Priority: normal to trivial Status: closed to reopened Cc: karsten to karsten, cypherpunks Resolution: implemented toN/A Component: Metrics Website to Compass