zimmer, one of our prolific relay operators, has recently started getting these complaints on his relays:
http status 400 ("Router descriptor was too large.") response from dirserver 204.13.164.118:80. Please correct
We talked to him and it isn't a 'huge exit policy' issue, it is a 'huge family' issue.
In the glorious future, we will have implemented proposal 321 and we won't have this size explosion from family entries. But we don't live in that future yet, so we need some workaround in the short term.
Edited
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Child items ...
Show closed items
Linked items 0
Link issues together to show that they're related.
Learn more.
and in the modern world clients don't fetch descriptors anymore (though relays do).
We would be wise to check the microdescriptor calculation code to make sure there isn't some other size cap over there that we could accidentally violate if we let descriptors become bigger.
I have told zimmer that the short term fix is that, while he surely loves all his children equally, he should pick the 360 or so that he loves the most (and take the rest down until we get a plan figured out).
Yeah I think we have to rely on prop321 for arti-relay here. As for C-tor, maybe a cap to 360 is good. It would also prevent an operator amassing way too many relays ;) ?
Sep 02 14:49:45.000 [notice] Somebody attempted to publish a router descriptor 'Quetzalcoatl' (source: 194.233.84.228) with size 20387. Either this is an attack, or the MAX_DESCRIPTOR_UPLOAD_SIZE (20000) constant is too low.
Yeah I think we have to rely on prop321 for arti-relay here.
That and I think the problem is exacerbated by our recent move to allow 8 relays per IP address which should not be a problem anymore in the arti-relay world.
As for C-tor, maybe a cap to 360 is good. It would also prevent an operator amassing way too many relays ;) ?
Yeah. But on the other hand it feels wrong to prevent a good and known exit operator from adding more relays given that this would reduce the weight of (potentially) malicious ones.
I wonder if we could limp along by proposing to just have the exits in two families. Given that it's all exits the chances of more than one of @Zimmer's relays ending up in the same path should be non-existent. And impersonation should be detectable (and therefore manageable) quickly.
It's not an ideal solution and it is adding additional load on the relay operator's side. But maybe that's good enough to limp along for now (and we could close that ticket in that case)?
As a slightly less insecure solution, split them into 2+ families so that when multiple Tors run on one IP, they are divided evenly into families (e.g. a family of every Tor that is the first occupant of its IP, a family for seconds and so on). This is complex to manage, but ensures nobody uses two relays in a circuit and provides verifiable linkage.
And please don't firewall, the network is supposed to be clique and this topology is already not in good health.
Agree with Georg suggestion.
Why hinder good operators who want to follow the rules?
Why are 360 relays a good operator and 361 relays a bad operator?
For good operators, two families seems like a reasonable workaround suggestion? If so, is there a recommended way to implement? Does the recommendation differ for exit vs guard/middle relays?
I'm constraining myself to avoid the limit right now at ~350 relays, all new and ramping up traffic. Ideally, most will be exit, but starting with guard/middle until all legal risk sorted.
Before knowing of this limit, I was projecting my upper bound (by cost) to be ~500 relays to maximize 40 Gbps (4 servers x 10 Gbps each) across a few IP address ranges.
Based on data from other large scale relay operators:
Least efficient, ~512 relays (1 relay per CPU thread) for 4 servers x 10 Gbps each (128 threads/relays per 1 x 10 Gbps server).
Most efficient, ~320 relays (1 relay per CPU thread) for 4 servers x 10 Gbps each (80 threads/relays per 1 x 10 Gbps server).