Allow limiting total consensus exit fraction by family
Right now we have a family of exit relays that is 25% of the exit capacity. The operator has requested that we limit him to be 15% of the exit capacity. This is an important capability to have for the network, and also we should prioritize a feature request from our largest exit relay operator.
We could ask him to turn off 40% of his relays (which should bring 25% -> 15%), but that is a terrible long term answer, first because it discards that capacity from the network (thus impacting performance), and second because our bwauths might well respond by bumping up the weights on the remaining relays -- so the result is that we would have even fewer relays and they would be even more overloaded.
To me it seems that the right answer is to do this "global cap" at the directory authority layer: either each dir auth by itself downweights those relays, or we make a new consensus method where all the dir auths use the same algorithm to downweight those relays.
(We could instead do it inside the bwauths, but that seems harder to get right, because we want to downweight the relays that are currently running, based on which other ones are currently running, and the bwauth doesn't have this info.)
Doing it at each dir auth has the advantage that it's easier to deploy and easier to iterate on the design, but we need to rely on the emergent effects of "each vote is lower therefore the consensus weights will be lower". Doing it in a new consensus method will be more steps for deployment, but more reliably arrive at the results we want. Based on this comparison it seems to me that doing it in a new consensus method is the winner. We could even vote on a consensus parameter which is the cap percentage, in case we want to slide the cap up or down later without a new consensus method.
I actually don't know the efficient algorithm for taking a set of relay weights and ending up with no families being more than 15% of the total. The greedy algorithm of "take largest family, if under 15% then stop, else reduce to 15% and goto start" could potentially run for many iterations because the act of reducing the biggest family to 15% could bump up some other family over 15%. Maybe there is somebody who has taken/taught algorithms more recently than I who knows the answer. :)
Extra complexity: Mike's "relay position weights" (Wgd, Wee, etc) are based on how much overall capacity there is for each position, and if we reduce total exit capacity, we could mess up this step. Does that mean we should do our downweighting before we compute the relay position weights? My first guess is yes.
And, why stop at just exit weights? Maybe we should do the same algorithm for e.g. fraction of total guard capacity. But since some relays are both Exits and Guards, we'd want to do this in some proper order -- and it could get ugly, e.g. if we reduce exit weights so nobody is bigger than 15% of exit, but then we reduce guard weights so nobody is bigger than 15% of guards, but that step causes some family to be more than 15% of exit again.
This sounds like a clear "needs-spec" to do the design I describe above. But if anybody can think of some shorter-term hacks that would accomplish some of these goals too, that would be fantastic.