- Truncate descriptions
Activity
With each new dirauth we add into the kool kids klub, it becomes less likely we'll be able to contact at least half of them within 25 hours in the event of something like the crash in the #2664 (moved) description.
What's the simplest thing we can get done on the 0.2.4.x timescale to improve this situation? Can we just bump the consensus and descriptor freshness limits?
In terms of what limit to bump to: I think we need to be able to survive at least a 3 day weekend. If someone were to bring down the dirauths on xmas, thanksgiving, or NYE, we need to not lose the Tor network because a patch couldn't be written in time.
Trac:
Milestone: Tor: unspecified to Tor: 0.2.4.x-final
Keywords: N/A deleted, proposal-needed, dirauth-dos-resistance addedI'll see if I can write something up for the Oct 10th deadline for this. Nick's main concern is that we make sure to preserve clients attempts fetch a fresh consensus from the dir mirrors/dirauths before using the old one to build circuits.
Trac:
Keywords: N/A deleted, MikePerry201210d addedI pushed a proposal draft to my torspec.git remote mikeperry/tolerate-old-consensus.
The proposal is pretty simple, opting for the "just raise the freshness limit" route. I just did a bit of code review to round up all the defines involved in consensus and descriptor freshness and the functions that use them.
Initial thoughts:
-
s/Implementation Nodes/Implementation Notes/
-
It's good we're not trying to do this back in the era of normal descriptors. We throw those out after 24 hours, and we've had some concern in the past that it would be harder to move to a "after 24 hours but not if they're still referenced in a consensus" model.
-
While thinking about this I pondered trying to draw a distinction between "when I asked for a consensus they gave me this old one" and "I haven't been able to fetch a consensus for the past two days, but I still have this old one". The hope was that the former situation is scary ("under attack") but the latter is less scary ("undirected network problems"). But since clients fetch dir stuff via begin_dir these days, I don't think that distinction makes sense -- we can compare the time the relay says it is with the time on the consensus. But if they're much different, what do we do? "Log the possible attack and use it" is not so good.
I miss a discussion of the risk from using a 4-day-old consensus. Right now an adversary can give you his choice of 18 or so consensus documents, and you'll try a couple times to get something better, while using the one you've got. Now he can give you his favorite out of something like 120 consensuses. How much variance is there in them, and what are the characteristics between them that make them 'more vulnerable to attack' or less?
We should also make sure clients are asking with the "only give me a consensus if the one you have is newer than this time" option, to save bandwidth all around. (Alas, that's another leak about old client state -- "I'm the client that got its last consensus 36 days ago".)
Overall, I like the idea of bumping up the disaster timeframe. 5 days seems as good as any other choice. I think since some of the logic we're touching is finnicky, it'll be smartest to do some testing -- e.g. trigger the conditions in a test network and see what actually happens.
-
Hrmm. I tried to ponder these imponderables, but I failed to both do that and get my other proposals done on time. Can we just set the limit at 3 days and call this 'small-feature' (or make a different ticket for that and call that one 'small-feature')?
Otherwise, once we start talking about checking consensus on the most current/correct consensus, we probably want something that tries to do multipath consensus hash verification. That seems like a /real/ proposal, as it would solve both this and other, perhaps more interesting attacks (such as https://lists.torproject.org/pipermail/tor-dev/2012-October/004063.html). One simple idea: Ask the k of fallback mirrors from #572 (moved) their current consensus hash, and make sure they all agree. They should all be authenticated by their identity key in the source code. Seems like this is a separate ticket for sure, though.
Replying to mikeperry:
Hrmm. I tried to ponder these imponderables, but I failed to both do that and get my other proposals done on time. Can we just set the limit at 3 days and call this 'small-feature' (or make a different ticket for that and call that one 'small-feature')?
I'm pretty leery of calling stuff "small" when our reason for doing so is that if we called it "big" we couldn't merge it on the timeframe we want. That's as many as four tens^W^W^W^W^W motivated reasoning, and that's terrible.
That said, this does feel simple to me. If we get the proposal done soon and the code merged before the big feature deadline, we can try it out. (I don't want to push it to the end of the feature merge schedule, since this kind of thing is prone to having unexpected consequences that could mean more fixing would be needed .)
Ok, which of the two would you prefer? If we're just changing the constant to 3-5 days, I think that proposal is "done" (modulo choosing the freshness duration. I picked 5 days, but 3 is also better than 24 hours).
If we're talking about creating mechanisms to verify consensus material is not targeted and is actually as current as it possibly can be, then we'd need a different (and substantially more complicated) proposal probably involving #572 (moved) in combination with some kind of query for the latest consensus creation time and ideally also some kind of "What's your latest consensus's hash" query.
I would like to write that second proposal, because I think it's a neat idea and helps address some other more serious route capture attacks involving dirauth key compromise, but I also probably can't get it done this week, nor will it be as straight-forward as just changing these defines to be a bit more relaxed.
I think the "change it to 3 days" proposal isn't so bad; how about you turn your proposal into a minimal version of that and send it to tor-dev.
The second one seems significantly more complex; it's interesting to consider it for 0.2.5, but it doesn't feel feasible for 0.2.4 right now.
Ok, I created #7126 (moved) for the second one.
Just brainstorming here, but I wonder if some kind of metric on how quickly the Tor network changes would help us decide if 3 days is a better interval than 5 days. By "how quickly the Tor network changes", I mean that if you take a consensus X from 3 days ago and a consensus Y from today, what's the percentage of routers in Y that are also in X (based on identity key)?
Such a metric could be a set of probability distributions that describe how likely it is for the Tor network to change by a specific amount in X days. So, for example, the probability distributions would tell us stuff like "Based on previous data, the Tor network has 40% chance to change by 20%, in five days." or "The Tor network has 80% chance to change by less than 5%, in one day." or "The Tor network has 40% chance to change by 35%, in two months".
Hrmm. Not sure if we should make implementing the proposal and any related metrics a new ticket or not, here.. I'm probably going to personally ignore this until whatever the next deadline is, though. Let's just say 'small-feature' for now we don't forget about it entirely.
Trac:
Keywords: dirauth-dos-resistance proposal-needed MikePerry201210d tor-client deleted, dirauth-dos-resistance proposal-needed tor-client small-feature added