Investigate bottlenecks for Congestion Control (CC)

This is the parent ticket for all specific investigations we want to do to figure out why CC is not performing as we expected. The problem we see manifests itself in at least two different ways (the third mentioned below might be an indicator as well, but that's not clear yet):

  1. The gap between advertised and consumed bandwidth is not really getting smaller since deployment of CC

bandwidth-2022-06-04-2024-09-02

  1. The bandwidth measurements of relays are still significantly location-dependent:

location-dependent-sbws-measurements_2024-09-02

The first two relays in the image above are located in North America (AS19437 SECURED SERVERS LLC and AS55286 B2 Net Solutions Inc., respectively) and they get a significantly higher measurement by bw authorities located there, compared to European ones. For Najdorf, which is located at Hetzner, the exact opposite can be seen. In both cases, the different measurements are significant as they result in just about half as much weight by bandwidth authorities at "less ideal" locations (see: #13, onbasca#111 and sbws#40130 (closed) for more context).

  1. There is very often a big difference between stream average and onionperf throughput for relays

stream_avg_vs_throughtput_avg_2024-09-02

The stream average/throughput average can be seen in the last column of the screenshot above and is easily larger than 10 (I excluded a bunch of outliers) and not only affecting a handful of relays, indicating an underlying, relay-independent problem. FWIW using op-de8a is by far the best-case scenario, it seems. For op-us8a and op-hk8a the ratio of the "top 20" relays is even higher, easily by a factor of 2 or 3.

@juga @mikeperry @hiro

Edited Nov 25, 2024 by Roger Dingledine
Assignee Loading
Time tracking Loading