be more lenient about changed descriptors
We have a series of bugs where relays publish a descriptor within 12 hours of their last descriptor, but the authorities drop it because it's not different "enough" from the last one and it's too close to the last one.
The original goal of this idea was to a) reduce the number of new descriptors authorities accept (and thus have to store) and b) reduce the total number of descriptors that clients and mirrors fetch. It's a defense against bugs where relays publish a new descriptor every minute.
Now that we're putting out one consensus per hour, we're doing better at the total damage that can be caused by 'b'.
There are broader-scale design changes that would help here, and we've had a trac entry open for years about how relays should recognize that they're not in the consensus, or recognize when their publish failed, and republish sooner.
In the mean time, I think we should change some of the parameters to make the problem less painful.
The first is
/** Any changes in a router descriptor's publication time larger than this are
* automatically non-cosmetic. */
#define ROUTER_MAX_COSMETIC_TIME_DIFFERENCE (12*60*60)
Let's change that to 1 or 2 hours. That will reduce the number of times we encounter this problem.
The second proposed parameter change is
/** How old can a router get before we (as a server) will no longer
* consider it live? In seconds. */
#define ROUTER_MAX_AGE_TO_PUBLISH (60*60*20)
I'd like to move that to 23 or 24 hours.
Ideally it should be in the 48 hour or longer range, since if a relay is still getting the Running flag assigned to it, let's keep using it, and if it's not, no harm in voting about it. But I worry that clients will fetch a 36 hour old descriptor, drop it because it's old, and get into a cycle. (I think we made it so they wouldn't get into such a cycle, but what do I know.)