(Moved this ticket over to Directory Authorities; please feel free to close or act as appropriate.)
We like to claim that if a minority of dirauths is not honest, the worst they can do is manipulate the voting process in such a way that no consensus emerges but not that a consensus emerges that is (at least partially) dictated by the bad actors. Unfortunately, this isn't the case for the opt-in features. If a majority of the dirauths opting in to features such as bad exit voting, bandwidth measurements, or voting for a specific parameter want to influence these values in the consensus, they don't require a majority of total dirauths to do that. This might not be so much of an issue with less important features like Naming, but since badexit and bandwidth weight directly influences path selection on the client, these authorities that opt in to those features have considerably more power over the consensus than those that do not.
Are there too many bandwidth scanners and servers in the same area (western Europe / eastern North America) ?
Are the bandwidth scanners and servers too close together? (If they are in the same data center, then any 2 relays that are in the same data center will get unrealistic results.)
There could also be specific bugs in sbws (legacy/trac#33350) or torflow (unmaintained).
We need:
We also need a more inclusive and transparent discussion and voting process that in some way includes community input.
See also ticket legacy/trac#19271.
Currently, it seems that Tor clients don't use DA nodes for construction of circuits because all DA's bandwidth is set to 20, which is a very low value. To move unneeded client traffic out of DA nodes completely, we could have some torrc
option on client's side (which disables construction of such circuits) and on DA's side (which block client's requests to use the DA as a node in its circuits). Additionally, use of DA nodes in clients circuits may be governed by some consensus values published by DA nodes.
In general, in testing Tor network with a small number of nodes, enabling this option may be undesirable, but in the main Tor network it may help DA nodes to decrease their load. Also, this explicit solution may be more clean than the current probabilistic approach which relies on a small bandwidth value set for DA nodes.
This task was originally discussed in [comment] with teor.
Trac:
Username: wagon
We need to find a way to determine vote divergence so we can find out who is and who is not voting on things that everyone else is agreeing on.
DocTor is the perfect framework for monitoring everything, so we need to somehow link those things (or parse each DirAuth vote to find divergences? I dont know if Reject lines are published in the votes, I know Invalid and BadExit are there, but I dont think Reject is, perhaps that is because if it were there it would leak that information?)
If we have a convenient way to see which DAs are not voting to reject relays, then we can use that to notify and apply appropriate social pressure to get them to block things.
Damien, do you have some thoughts about how we could do this with DocTor?
Moving this to health team. I'll let them assess if they want this on their roadmap or not.
Right now the consensus is voting NumNTorsPerTAP=100, i.e. relays will handle one tap handshake for every 100 ntor handshakes they handle. We put this feature into place during the 2013 botnet overload (legacy/trac#9574).
TAP handshakes are used by obsolete clients (we don't know how many of these remain, but I think it might be quite few), and for v2 onion service clients reaching intro points, and for v2 onion services reaching rendezvous points.
With the recent overload that has to do with v2 onion services, the TAP frequency has gone up, e.g.
Jan 30 11:46:23.580 [notice] Circuit handshake stats since last time: 1350439/1350439 TAP, 68743431/68743431 NTor.
Jan 30 17:46:23.592 [notice] Circuit handshake stats since last time: 1183340/1183340 TAP, 71590118/71590118 NTor.
Jan 30 23:50:19.525 [notice] Circuit handshake stats since last time: 1069004/1069004 TAP, 72357977/72357977 NTor.
It's still low compared to the NTor frequency, but 1M TAP handshakes per 6 hours is 46 second per second to my relay.
(Also note that these log messages don't include stats from client connections, because we wanted to leave those out to be cautious about client privacy.)
The key realization here is that we can squeeze down v2 onion service usage, by squeezing down the prioritization for TAP handshakes.
Now, on my relay above, I'm able to handle all of both kinds, so changing the ratio will just change which cells get answered first -- and given that ntor cells are so much cheaper to answer than tap cells, there could be a moderate win there.
But for relays that can't handle the load, if they're similarly getting 1:70 ratios, we could potentially have a much bigger impact by cranking up the balance. If we got to the point where most of the ntors are handled and some of the taps are left unhandled, that seems like a fine balance.
So: good idea, bad idea? And if good idea, what's a good new number? 500? 1000?
v2 is dead.