prop224: Service descriptor uploads race condition
The service gets a new consensus and microdescs fetch happens right after. The HS subsystem gets notified that new directory information arrived so it should consider a re-upload of its descriptors:
Sep 07 23:43:01.000 [info] A consensus needs 5 good signatures from recognized authorities for us to accept it. This one has 8 (dannenberg tor26 longclaw maatuska moria1 dizum gabelmoo Faravahar). [...] Sep 07 23:43:02.000 [info] hs_service_dir_info_changed(): New dirinfo arrived: consider reuploading descriptor Sep 07 23:43:02.000 [info] launch_descriptor_downloads(): Launching 3 requests for 148 microdescs, 50 at a time
... and an upload has been scheduled
now() for all 6 HSDirs. So far so good. Now, the microdescriptors arrive and a second upload is triggered because the service hashring changed. Remember that we need mds to consider a relay for it to be in the hashring:
Sep 07 23:43:02.000 [info] handle_response_fetch_microdesc(): Received answer to microdescriptor request (status 200, body size 19779) from server '184.108.40.206:9001' Sep 07 23:43:02.000 [info] hs_service_dir_info_changed(): New dirinfo arrived: consider reuploading descriptor
In a two seconds time frame, two uploads were initiated with two different revision counters, let's say rev counter 1 and 2. Then, to save you from more text and logs, the result is that the 2 was uploaded before the 1 finishes thus 2>1 so the HSDir will reject it and respond with a 400 malformed descriptor like so:
Sep 07 23:43:05.000 [warn] Uploading hidden service descriptor: http status 400 ("Invalid HS descriptor. Rejected.") response from dirserver '<RELAY>'. Malformed hidden service descriptor?
The consequence of that is benign that is the HSDir will end up with the correct descriptor and client will be able to reach the service. But, we end up with this annoying warning in the logs that we can easily prevent and ultimately also more load on the network.
The fix is to cancel all uploads (for the specific descriptor) right before trying to upload a new one because that new one will always have a higher revision counter.
We could do something like schedule a new descriptor upload only when all requested microdesc have arrived but then that would probably introduce a reachability issue which is making a client query the correct HSDir and unable to find the service because the service is waiting on getting all mds to upload its desc to the new hashring...