Rethink publisher retry loop
The publisher currently retries any failed uploads using its PublisherBackoffSchedule
. The retry mechanism involves retrying the failed upload until it succeeds, or until the the timeout specified in PublisherBackoffSchedule::timeout()
elapses.
However, if the descriptor can't be uploaded after PublisherBackoffSchedule::timeout()
time units of trying, it will be declared a failure (UploadStatus::Failure), and never retried again. We should consider improving the situation: the publisher shouldn't idle if there are some known, retriable failed uploads. We should have some sort of (almost) infinite retry loop, somewhat like the one sketched out in !1823 (closed) (but we shouldn't have two retry loops).
The upload result reporting makes this a bit complicated. The "dirtiness" of each HsDir needs to be updated after the upload, and we can't acquire the lock to do that in the function where the upload happens. This ticket will involve rethinking how we report and update the HsDir statuses.