Configure timeout for metrics-lib clients, e.g., those using DescriptorIndexCollector
Tonight the relaydescs module of the main CollecTor instance froze when attempting to fetch the remote index.json from the backup CollecTor instance. Here's a stack trace I got from jcmd:
"CollecTor-Scheduled-Thread-10" #38 daemon prio=5 os_prio=0 tid=0x00007fb7ac009800 nid=0xea7 runnable [0x00007fb7e5893000]
java.lang.Thread.State: RUNNABLE
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
at java.net.SocketInputStream.read(SocketInputStream.java:171)
at java.net.SocketInputStream.read(SocketInputStream.java:141)
at sun.security.ssl.InputRecord.readFully(InputRecord.java:465)
at sun.security.ssl.InputRecord.read(InputRecord.java:503)
at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:983)
- locked <0x000000008061f060> (a java.lang.Object)
at sun.security.ssl.SSLSocketImpl.performInitialHandshake(SSLSocketImpl.java:1385)
- locked <0x000000008061f188> (a java.lang.Object)
at sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1413)
at sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1397)
at sun.net.www.protocol.https.HttpsClient.afterConnect(HttpsClient.java:559)
at sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.connect(AbstractDelegateHttpsURLConnection.java:185)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1564)
- locked <0x000000008061f1f8> (a sun.net.www.protocol.https.DelegateHttpsURLConnection)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1492)
- locked <0x000000008061f1f8> (a sun.net.www.protocol.https.DelegateHttpsURLConnection)
at sun.net.www.protocol.https.HttpsURLConnectionImpl.getInputStream(HttpsURLConnectionImpl.java:263)
- locked <0x000000008061f2e0> (a sun.net.www.protocol.https.HttpsURLConnectionImpl)
at java.net.URL.openStream(URL.java:1045)
at org.torproject.descriptor.index.IndexNode.fetchIndex(IndexNode.java:101)
at org.torproject.descriptor.index.DescriptorIndexCollector.collectDescriptors(DescriptorIndexCollector.java:74)
at org.torproject.collector.sync.SyncManager.collectFromOtherInstances(SyncManager.java:59)
at org.torproject.collector.sync.SyncManager.merge(SyncManager.java:43)
at org.torproject.collector.cron.CollecTorMain.run(CollecTorMain.java:76)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
I didn't look at the code in detail, but I could imagine that a timeout would have helped in this case. Maybe there are other possible fixes, though.
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information