Onionoo stalled while downloading descriptors
Yesterday evening, Onionoo's back-end stalled forever while downloading descriptors from CollecTor. Subsequent back-end runs found the stalled run's lock file and terminated immediately. After six hours, the front-end considered its data to be stale and replied to all requests with 500 Internal Server Error. This was the reason for #12565 (moved).
From the back-end log:
[java] Sun Jul 06 22:15:02 UTC 2014: Downloading descriptors. [...] [java] Sun Jul 06 23:15:02 UTC 2014: Initializing. [java] Could not acquire lock. Is Onionoo already running? Terminating (00:00.000 minutes).
Three ideas to fix this problem:
- The terminating runs should have sent error messages to the operator, so that the problem would have been detected much earlier.
- Six hours may be too short for the front-end to consider its data stale. Maybe 24 hours is more realistic. After all, 24 hour old data are not wrong, it's just not as fresh as users would expect. Maybe clients could display results and add a little warning that the data are not as fresh as usual.
- I have no idea why downloading descriptors stalled in the first place. My current idea is to add more log statements to track down what went wrong.