By the way, thanks for engaging with me on this idea! It definitely helps me focus my research efforts on things that are more likely to be useful to Tor.

Interesting point regarding surveillance value. I see how it might be valuable to conveniently surveil some guard and exit traffic. I could also see how the adversary might prefer instead just to see more guard traffic (or more exit traffic), which would be enabled by a guard-only/exit-only design.

For Conflux, I think you may be arguing that guard/middle throughput is doubled because instead of a client sending C->G1->M1->E, it sends C->G1/G2->M1/M2->E. But I am arguing that a better perspective is in the limit as the number of clients increases. In that case it's more like you went from two clients doing C1->G1->M1->E and C2->G2->M2->E to those clients doing C1->G1/G2->M1/M2->E and C2->G1/G2->M1/M2->E. That is, what happened was some client-one (C1) traffic that had gone through G1->M1 was rerouted to G2->M2, and some client-two (C2) traffic that had gone through G2->M2 was rerouted to G1->M1. The changes cancel. I could see how Conflux might increase throughput because it more effectively identifies underloaded relays, but that's a different argument than saying that the average/expected traffic in the guard position (or middle) increases (or will if the exits aren't a bottleneck).

It's very helpful to understand the network-team roadmap, thanks! Maybe a fine outcome here would be a new proposal or a modification of Proposal 265 for the next version of Tor's positional weights, whenever that may occur.

I am interested in those tickets regarding guard+exit usage. Regarding security, it may make it a bit harder to perform useful surveillance, but I'm not sure by how much. Any single relay is obviously not used in both positions in a single circuit, and so correlation attacks already require surveillance of multiple relays. Regarding performance, you may find that once you've taken into account the extra traffic guards handle (e.g. T_g) or if you raise the Guard flag requirement, guard traffic may become relatively scarce and could benefit from the additional capacity available in D relays.

I still don't understand why Conflux would cause exit capacity to be more scarce, even just relatively. Conflux, AIUI, doesn't send duplicate cells, it just splits the stream of Tor cells between the two guard+middle segments. So it won't reduce the load on any given guard (or middle), because although the relay only receives a fraction of traffic on a given circuit, it appears on twice as many circuits (actually, relays that tend to have high RTT will have less load, and low-RTT relays will have more load).

I don't disagree that there isn't evidence indicating that the current positional-weight scheme is reducing Tor's performance. It's more that the current weight equations have some pathologies that (1) could start causing actual performance problems in a possible future network, and (2) could be exploited by an adversary. I would argue to change them right now to solve those problems, but if you have an update already planned in the near future (e.g. to better incorporate overhead), then I think I may be able to help you make that plan even better.

T_g and T_m is the multiple of exit-position traffic that is seen in the guard and middle positions, respectively. Put another way, for every byte that gets forwarded by relays in the exit position, they forward T_g bytes while serving in the guard position (across all circuits) and T_m bytes in the middle position. Based on the sources of non-exit-circuit traffic outlined at the top of this issue, which show that the extra traffic is only forwarded in the guard and middle positions, we have that T_g >=1 and T_m >= 1. Therefore, T_g*x + T_m*x + x is the total traffic being forwarded by relays in the network (across all positions, for all purposes).

Can I ask what other reasons you have to eliminate relays serving in both the guard and exit positions? Also, if doing so is necessary, I think that maybe we could do better than simply treating D relays the same as E relays. One issue with that is that it doesn't take into account a G relay that attracts a lot of client traffic and then opens its exit policy enough to become a D relay. This could lead to poor performance for the clients of that relay, which would continue to use it.

I don't understand why Conflux would change the load balancing. It uses two guards and middles per "circuit", but I don't see how that would affect the global amounts of traffic in each position.

Yes, I agree that the Guard flag cutoff could be used to adjust the capacity in the guard position. However, dynamically adjusting it seems dangerous to me because clients choose guards and use them for several months. It would be possible for relays to only transiently become guards, attracting a small number of clients (perhaps only one), which would cause the guard identity (if learned by the adversary) to serve both as a pseudonym and as a marker of when the client last chose a guard (partitioning the anonymity set in both cases).

I think I understand now how Proposal 265 incorporates overhead. The design seems pretty good to me (although with the load balancing issue I brought up earlier in the non-overhead case)! It also seems easily compatible with a fuller load balancing that allows the "D" relays (i.e. those with both Guard and Exit flags) to be used in the guard and exit positions. Basically, we can consider the overhead to require differing capacity at the different positions in order for the effective throughput to be equal. The we can generalize all the position-weight equations to accommodate the differing capacity requirements.

In slightly more detail, let's convert the overhead values O_g and O_m into traffic "multipliers" T_g=1/(1-O_g) and T_m=1/(1-O_m). Throughput reaches its maximum value when the traffic through the exit position, call it x, satisfies the following: T_g*x + T_m*x + x = G+M+E+D. That is equivalent to x=(G+M+E+D)/(T_g+T_m+1). A necessary condition for full balancing is that M<=T_m*x, because M relays can only be used in the middle position. Another necessary condition is that the D relays could cause each of G and E to exceed the required capacity for guard and exit positions, respectively, at max throughput: G+D >= T_g*x and E+D >= x. Also, it is not hard to show that these two conditions are in fact sufficient for full load balancing. Basically, D can be used to cover the larger gap to full capacity between G and E, the remainder of D must then by definition cover the smaller gap at the guard and exit positions because M is still below capacity, and then everything left over must cover any gap at the middle position for the similar reasons.

It is therefore convenient to break down the weight cases by those two necessary and sufficient conditions. Case 1 (full balance not possible): M > T_m*x. Case 2 (full balance not possible): (G+D < T_g*x or E+D < x) and (M <= T_m*x). Case 3 (full balance possible): M <= T_m*x, G+D >= T_g*x, and E+D >= x.

I'd like to break down these cases further, in order to show that we can incorporate D in both guard and exit positions while using the overhead concept in a way that is (1) comprehensible and provably correct, (2) satisfies the throughput maximization goal first and then secondary goals we select, and (3) has no continuity issues.

Can you suggest how I should best communicate this idea? Should I just add more comments to this thread?

Regarding guard scarcity, I do believe that you were using three guards 2009–2013 (see https://blog.torproject.org/improving-tors-anonymity-changing-guard-parameters/). However, I don't see why the 3 guards->1 guard change would affect guard scarcity. But agreed that Tor has changed the requirements for the Guard flag several times, and they have had big effects on the size and composition of the guard relay population. It doesn't seem impossible to me that you may make a change in the future (or would wish to make a change) that would result in many fewer available guards.

Regarding the overhead, I was only considering the non-overhead weight balancing. Of course, because the overhead equations are a generalization, they suffer from the same problem I described. They should be similarly fixable. I myself would want to understand/critically analyze their definition before proposing any fix. Also, I don't see why similar overhead equations couldn't be developed that allow for using D relays in guard and exit positions. The cases to consider would go up some, but (assuming cases are defined "properly"), my experience is that each case by itself is relatively simple to understand and analyze. For example, there is a way to define load-balanced weights using just 6 cases (or 11 if you count G/E-symmetric cases separately). If the main goal of treating D like E was to reduce complexity, I do think that weights allowing D in the guard position can be derived, implemented, and communicated in a straightforward and comprehensible way.

Thanks for explaining how updates to the positional weights would propagate. Could such a change be rolled into a larger consensus update that would contain other (presumably even more important) things? Or do such updates happen infrequently, with no update currently planned?

@mikeperry, I hadn't seen that proposal, thanks! Simplifying the use of relays by treating Guard+Exit relays as part of the E class does make the analysis easier. I did recently observe that in the past Tor has benefited from using Guard+Exit relays in the guard position. An analysis of the consensuses 2009–2021 showed that the G (i.e. Guard-only) relays were commonly scarce 2009–2013.

Also, I think it may still make sense to do a little case analysis even without the D class. Proposal 265 moves any G or E bandwidth in excess of T/3, where T=G+M+E, to the middle position (moving none via clipping if less than T/3 is available). This has the possibly undesirable effect of moving the bandwidth away from the second-most constrained position, which leaves less room for bandwidth measurement error or traffic levels deviating from expected amounts. For example, with G=105, M=110, E=85, it would result in 100 total bandwidth in the guard position (all from G), 115 total in the middle position (5 from G and 110 from M), and 85 total in the exit position (all from E). The weights without the D class could instead be handled as follows:

- If (G >= T/3) and (E >= T/3): Use the equations in 265.
- Else if (G >= T/3) and (E < T/3):
- If (G >= M): Equalize guard and middle with Wgg=((G+M)/2)/G and Wmg=(G-(G+M)/2)/G, and put all E in exit with Wee=1 and Wme=0.
- Else (i.e. G < M): Put all G in guard with Wgg=1 and Wmg=0, and put all E in exit with Wee=1 and Wme=0.

- Else if (G < T/3) and (E >= T/3): Do symmetric case to 2, swapping G/guard/Wxg with E/exit/Wxe.
- Else (i.e. (G < T/3) and (E < T/3): Put all G in guard with Wgg=1 and Wmg=0, and all E in exit with Wee=1 and Wme=0.

Finally, improving the weight equations now should be an easy drop-in replacement. I'd like to make a pull request changing networkstatus_compute_bw_weights_v10() in src/feature/dirauth/dirvote.c, affecting lines 1116—1298, as well as to test_dir_networkstatus_compute_bw_weights_v10() in src/test/test_dir.c, affecting lines 3139–3347.

Yes, I recently looked into the equations defining the positional weights (Wgg, Wmg, etc.), and I think they could be derived in a more principled way that yields some improvements. My suggestions don't address any of the issues listed in this ticket, but they would

- Resolve a failure case (that you already know about https://github.com/torproject/tor/blob/f6600377b495c276cbaf423d9c8907abd4cb3c82/src/test/test_dir.c#L3218)
- Maximize throughput for another case in which the weights are valid but yield suboptimal throughput
- Make weights continuous so that that small changes in the network state yields small changes in the weights. Currently, for example, it is possible for a single relay changing its weights by just 1 can cause the D relays (i.e. those with Guard and Exit flags) to jump back and forth between not being used as guards at all to being used a large amount (e.g. 33% of the total D bandwidth).
- Achieve secondary goals like equalizing the "sensitive" traffic (i.e. guard and exit traffic) that G/E/D classes see.

The writeup is in progress, but I would be happy to engage with you now if you are interested.

A random wait would help but wouldn't solve the problem. The adversary could simply wait until the 99th percentile time from your delay distribution after each additional byte and see after which byte the connection gets dropped (if any).

There is an active attack to identify obfs4. Each data packet begins with an unauthenticated 2-byte length field. That length field is encrypted but it can be modified in a predictable way because the encryption is XORing. The adversary can therefore test if there is an encrypted two-byte length field by flipping some bits, which adds a known amount X to the received length if the adversary has seen the entire message and thus knows its true length. The adversary then follows that up with X arbitrary bytes, and the advesrary concludes the connection is running obfs4 if the server immediately closes the connection after the Xth added byte is sent (because at that point it believes it has the whole message and attempts unsuccessfully to authenticate the message). Having an obfuscated 2-byte length field at the beginning of data packets seems unique to obfs4, at least among all the encrypted protocols that I am familiar with.

Unfortunately, there is no obviously great solution to this problem. An obvious "fix" is to add a 16-byte MAC tag on (and immediately after) the 2-byte length field. This solution does still allow the adversary to test for such structure in the packet by flipping some bits in the length field and observing if the server closes the connection as soon as the subsequent 16 bytes are received. An authenticated 2-byte field beginning a data packet seems fairly unique as well.

Another option is to get rid of the length field and instead use a sentinal field marking the end of a variable-length message. This strategy is actually used by obfs4 for the handshake packets. Of course, the sentinal byte would have to look random, achievable easily enough by adding a counter into the HMAC. The adversary could, however, test for this strategy as well by replacing all data with random bytes to see at what point the server closes the connection, which would presumably happen when some fixed maximum frame length is reached. Perhaps that maximum frame length could be somewhat randomly chosen per-server, based on the bridge nodeID and/or public key.

Here is a suggestion modify path selection based on trust scores: during weight computation, add a constraint that trusted relays (for now, assume binary trust) contribute some minimum fraction of selection probability in a given position. Alternately, we could maximize the trusted relays probability subject to no loss in overall network throughput. We could even allow for an automated tradeoff between the criteria of throughput and trusted-relay probability. I recommend setting up the weight computation as an optimization problem of a certain efficiently-solvable type (e.g. linear, convex) with a given objective function and certain hard constraints. We show how to do this as a linear program in the CLAPS system (Fig. 1, https://ohmygodel.com/publications/claps-ccs2020.pdf). It seems straightforward to swap in other constraints and objective function for your current goals.

If you are interested in hearing more, please let me know! Florentin Rochet and I are interested in helping with this, and it doesn't seem like a major engineering effort as it just affects the way that weights are computed by the Directory Authorities.

Roger, I agree that this duplicates #11327 (moved), which is more general in that it includes the Fast flag.

This seems like it could affect any other flag based on bandwidth, which includes at least the Fast flag.

**Trac**:

**Cc**: *N/A* **to** amj703

**Trac**:

**Cc**: *N/A* **to** amj703

Just want to say that I don't disagree with anything in your most recent comment!

Why not throw out the outliers, then add the remaining, then do the adjustment?

The way we're determining whether a reported value was an outlier or not is by extrapolating all reported values to network totals and discarding the lowest 25% and highest 25% of

extrapolatedvalues. But extrapolating values requires us to make these adjustment first, or we'd extrapolate to the wrong network totals.

It sounds to me like it doesn't make a difference if throw out the outliers before adjustment or after adjustment. The adjustment preserves the relative ordering of the values, and so the bottom (and top) 25% of data points will remain the same. So here's a suggestion: (1) extrapolate by dividing the raw measurement by the relay's weight; (2) remove the top and bottom 25% of extrapolated values; (3) add the remaining raw values; (4) adjust that result by rounding towards the closest bin edge, subtracting num_relay*bin_size/2, and replacing a negative value with 0; and then (5) extrapolate the result by dividing by the total weight of the included relays.

Here's another idea, though: what if we change the way how we're removing noise by

onlysubtracting`bin_size / 2`

to undo the binning step as good as we can and leave the Laplace noise alone. Basically, we'd only account for the fact that relays always round up to the next multiple of`bin_size`

, but we wouldn't do anything about the positive or negative noise.

This isn't really the best guess about the value at the relays before noise is added. While not performing the noise-removal step makes it easier to explain how the metrics numbers are produced (and to implement them), I think it makes it harder to understand what they mean. This is a judgement call that I can see both sides of, though!

Wait, we're talking about two different things:

- Relays internally round
upto the next multiple of`bin_size`

.- metrics-web contains that
`removeNoise()`

method that this ticket is all about.

Ah, right, thanks for clarifying that.

We could sum up relay values first and then adjust the result. However, we'd lose the ability to discard outliers, which we're doing extensively with onion service statistics. After all, we're throwing out 2 times 25% of reported values which we'd then include again.

Why not throw out the outliers, then add the remaining, then do the adjustment?

Hang on. Relays always round

upto the next multiple of`bin_size`

. So, everything in`(-bin_size, 0]`

will be reported as`0`

andnotas`-bin_size`

.I don’t think the “right side” rounding is happening with current use of the floor function, if it ever was. Maybe I’m wrong, but as I understand it Math.floorDiv((reportedNumber + binSize / 2) will round -0.75*binSize to -binSize.

This part is correct. (The full "formula" is

`Math.floorDiv((reportedNumber + binSize / 2), binSize) * binSize`

.)

These statements appear inconsistent. Is everything in (-bin_size, 0] rounded to 0, or is only [-bin_size/2,0] rounded to zero with [-bin_size, -bin_size/2) rounded to -bin_size? I think it's the latter, because Math.floorDiv((reportedNumber + binSize / 2), binSize) * binSize with reportedNumber=-0.75*binSize should evaluate to Math.floorDiv((-0.25*binSize), binSize) * binSize = -1 * binSize = -binSize. That appears consistent with how you've described Math.floorDiv and how the docs describe it at https://docs.oracle.com/javase/8/docs/api/java/lang/Math.html: "Returns the largest (closest to positive infinity) int value that is less than or equal to the algebraic quotient".

Hmm, I believe that disallowing negative values by rounding them up to zero would bias the result unnecessarily. For example, imagine an extreme case where we picked a large

`bin_size`

value and an even larger`delta_f`

value: most relays would observe values between 0 and`bin_size`

, and they would add very large positive or negative noise values. In such a case we need both negative and positive values to come up with a result close to 0.

Sure, that makes sense as a reason to try and produce unbiased estimates for each relay’s value. It seems to me that there is another way to do this, which would involve adding the relay values first and then adjusting the result, given that the noise has already been summed and thus has zero bias (aka an expectation of zero). But the current method seems to be working just fine!

Hmm, I'm not so sure about this one, either. Remember that we implemented the code at relays to report the

right sideof a bin, not thecenter. Take another example where a relay observes exactly 0 events over two days: on day 1 it adds a small positive noise value and reports`bin_size`

as number, and on day 2 it adds a small negative noise value and reports 0 as number. If we subtract`bin_size / 2`

from those reported values, we'll end up with 0 on average, which would be correct. But if we didn't, we'd end up with`bin_size / 2`

as average, which seems wrong.

As another example, consider that on day 1 a positive noise value in (bin_size/2, bin_size] is added, which results in bin_size being reported, and on day 2 a large-magnitude negative value in [-bin_size, -bin_size/2) is reported, which results in -bin_size being reported. Then subtracting bin_size/2 from those reported values ends up with -bin_size/2 as the average, which seems wrong. I don’t think the “right side” rounding is happening with current use of the floor function, if it ever was. Maybe I’m wrong, but as I understand it Math.floorDiv((reportedNumber + binSize / 2) will round -0.75*binSize to -binSize.

Does this make sense? If so, I think I'd prefer to leave the general formula to remove noise unchanged and only focus on fixing the implementation bug related to integer truncation.

It definitely makes sense to just focus on the immediate issue discovered. I was just thinking through how this was supposed to work again and thought I would let you know how it appeared to me.

Also, PrivCount shouldn't be affected by this because it doesn't use bins. The bins help make sure, over time, if the noise can be averaged out due to a stable (or otherwise derivable) count, the count is still protected to within the accuracy of the bins. The PrivCount implementation is designed for a research tool that isn't producing these measurement values consistently over time. Tor should consider adding this binning enhancement for its own implementation of PrivCount.