zimmer family hitting max descriptor size

Cc'ing @gk and @dgoulet as this is a network health related topic too.

One possible medium term fix would be to simply raise the max descriptor size. Right now it's 20k:

core/or/or.h:#define MAX_DESCRIPTOR_UPLOAD_SIZE 20000

and in the modern world clients don't fetch descriptors anymore (though relays do).

We would be wise to check the microdescriptor calculation code to make sure there isn't some other size cap over there that we could accidentally violate if we let descriptors become bigger.

Ideas for other fixes?

I have told zimmer that the short term fix is that, while he surely loves all his children equally, he should pick the 360 or so that he loves the most (and take the rest down until we get a plan figured out).

changed the description

Cc'ing @Zimmer here too now that he has a gitlab account.

added For Network Health Team Network health labels

Yeah I think we have to rely on prop321 for arti-relay here. As for C-tor, maybe a cap to 360 is good. It would also prevent an operator amassing way too many relays ;) ?

added Roadmap::Future label

FWIW - I see this message on longclaw:

Sep 02 14:49:45.000 [notice] Somebody attempted to publish a router descriptor 'Quetzalcoatl' (source: 194.233.84.228) with size 20387. Either this is an attack, or the MAX_DESCRIPTOR_UPLOAD_SIZE (20000) constant is too low.

Yeah I think we have to rely on prop321 for arti-relay here.

That and I think the problem is exacerbated by our recent move to allow 8 relays per IP address which should not be a problem anymore in the arti-relay world.

As for C-tor, maybe a cap to 360 is good. It would also prevent an operator amassing way too many relays ;) ?

Yeah. But on the other hand it feels wrong to prevent a good and known exit operator from adding more relays given that this would reduce the weight of (potentially) malicious ones.

I wonder if we could limp along by proposing to just have the exits in two families. Given that it's all exits the chances of more than one of @Zimmer's relays ending up in the same path should be non-existent. And impersonation should be detectable (and therefore manageable) quickly.

It's not an ideal solution and it is adding additional load on the relay operator's side. But maybe that's good enough to limp along for now (and we could close that ticket in that case)?

I wonder what @gus thinks here...

What about having 2 families and firewall rules at each relay to block "the other" family?

I wonder if we could leap along by proposing to just have the exits in two families.

Yes, it's fine.

As a slightly less insecure solution, split them into 2+ families so that when multiple Tors run on one IP, they are divided evenly into families (e.g. a family of every Tor that is the first occupant of its IP, a family for seconds and so on). This is complex to manage, but ensures nobody uses two relays in a circuit and provides verifiable linkage.

And please don't firewall, the network is supposed to be clique and this topology is already not in good health.

Agree with Georg suggestion. Why hinder good operators who want to follow the rules?

Why are 360 relays a good operator and 361 relays a bad operator?

For good operators, two families seems like a reasonable workaround suggestion? If so, is there a recommended way to implement? Does the recommendation differ for exit vs guard/middle relays?

I'm constraining myself to avoid the limit right now at ~350 relays, all new and ramping up traffic. Ideally, most will be exit, but starting with guard/middle until all legal risk sorted.

Before knowing of this limit, I was projecting my upper bound (by cost) to be ~500 relays to maximize 40 Gbps (4 servers x 10 Gbps each) across a few IP address ranges. Based on data from other large scale relay operators: Least efficient, ~512 relays (1 relay per CPU thread) for 4 servers x 10 Gbps each (128 threads/relays per 1 x 10 Gbps server). Most efficient, ~320 relays (1 relay per CPU thread) for 4 servers x 10 Gbps each (80 threads/relays per 1 x 10 Gbps server).

zimmer family hitting max descriptor size

Child items ...

Activity