CollecTor's relaydescs module freezes while downloading from directory authorities

This morning, 2017-06-14 ~07:00, I noticed that the latest consensus retrieved by CollecTor was valid after 2017-06-13 17:00.

The last log lines from the relaydescs module were:

2017-06-13 17:05:00,001 INFO o.t.c.c.CollecTorMain:66 Starting relaydescs module of CollecTor.
2017-06-13 17:05:26,184 INFO o.t.c.r.CachedRelayDescriptorReader:255 Finished importing relay descriptors from local Tor data directories:
cached-consensus: 2017-06-13 17:00:00
cached-descriptors: parsed 0, skipped 24560 server descriptors
cached-descriptors.new: parsed 608, skipped 8585 server descriptors
cached-extrainfo: parsed 0, skipped 24543 extra-info descriptors
cached-extrainfo.new: parsed 607, skipped 8239 extra-info descriptors
v3-status-votes: parsed 8, skipped 0 votes

All other modules continued as usual.

Here's a stack trace obtained using jcmd:

"CollecTor-Scheduled-Thread-8" daemon prio=10 tid=0x00007fedd8006800 nid=0x6411 runnable [0x00007fee023fd000]
   java.lang.Thread.State: RUNNABLE
        at java.net.SocketInputStream.socketRead0(Native Method)
        at java.net.SocketInputStream.read(SocketInputStream.java:153)
        at java.net.SocketInputStream.read(SocketInputStream.java:122)
        at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
        at java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
        at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
        - locked <0x000000078fd3b3d8> (a java.io.BufferedInputStream)
        at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:707)
        at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:650)
        at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1371)
        - locked <0x000000078fd3b418> (a sun.net.www.protocol.http.HttpURLConnection)
        at java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:468)
        at org.torproject.collector.relaydescs.RelayDescriptorDownloader.downloadResourceFromAuthority(RelayDescriptorDownloader.java:869)
        at org.torproject.collector.relaydescs.RelayDescriptorDownloader.downloadDescriptors(RelayDescriptorDownloader.java:817)
        at org.torproject.collector.relaydescs.ArchiveWriter.startProcessing(ArchiveWriter.java:176)
        at org.torproject.collector.cron.CollecTorMain.run(CollecTorMain.java:67)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:473)
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)

I stopped and restarted CollecTor and am now working on filling the gap of relay descriptors published in these ~16 hours by syncing from the backup instance.

I guess the fix is to start using a timeout somewhere. It's just curious that we didn't run into this case before. We didn't change anything there recently, did we?

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information