Skip to content

Out of memory when loading in multiple years of relay descriptors

The interesting thing here is that this happens after the descriptors have been imported and written into the out/ and recent/ directories. This implies that we're not that far off being able to handle huge imports of relay descriptors.

I believe this is happening in the ReferenceChecker but I haven't confirmed this.

2018-09-14 17:26:42,757 INFO o.t.m.c.c.Scheduler:150 New Thread created: CollecTor-Scheduled-Thread-1
2018-09-14 17:26:42,758 INFO o.t.m.c.c.CollecTorMain:66 Starting relaydescs module of CollecTor.
2018-09-14 17:26:42,762 INFO o.t.d.DescriptorSourceFactory:150 Serving implementation org.torproject.descriptor.impl.DescriptorParserImpl for descriptor.parser.                                                                              
2018-09-14 18:32:21,639 INFO o.t.m.c.r.ArchiveReader:273 Finished importing relay descriptors from local directory:
Parsed 5253040, ignored 0 files.
2018-09-14 18:32:21,759 INFO o.t.m.c.r.ArchiveWriter:484 Finished writing relay descriptors to disk.
While importing relay descriptors from local directory, we stored 0 consensus(es), 0 microdesc consensus(es), 0 vote(s), 0 certificate(s), 5253040 server descriptor(s), 0 extra-info descriptor(s), and 0 microdescriptor(s) to disk.        
Statistics on the completeness of written relay descriptors:
2018-09-14 18:32:21,760 WARN o.t.m.c.r.ArchiveWriter:535 The last known relay server descriptor was published at 2018-09-13 19:32:29, which is more than 5:30 hours in the past.                                                              
2018-09-14 18:32:21,760 WARN o.t.m.c.r.ArchiveWriter:542 The last known relay extra-info descriptor was published at 2018-09-13 19:32:29, which is more than 5:30 hours in the past.                                                          
2018-09-14 18:32:21,761 WARN o.t.m.c.r.ArchiveWriter:549 The last known relay microdescriptor was contained in a microdesc consensus that was valid after 2018-09-13 14:00:00, which is more than 5:30 hours in the past.                     
2018-09-14 18:32:22,528 INFO o.t.d.DescriptorSourceFactory:150 Serving implementation org.torproject.descriptor.impl.DescriptorReaderImpl for descriptor.reader.                                                                              
2018-09-14 18:32:22,558 ERROR o.t.d.i.DescriptorReaderImpl:165 Bug: uncaught exception or error while reading descriptors: Required array size too large                                                                                      
java.lang.OutOfMemoryError: Required array size too large
        at java.nio.file.Files.readAllBytes(Files.java:3156)
        at org.torproject.descriptor.impl.DescriptorReaderImpl$DescriptorReaderRunnable.readDescriptorFile(DescriptorReaderImpl.java:333)                                                                                                     
        at org.torproject.descriptor.impl.DescriptorReaderImpl$DescriptorReaderRunnable.readDescriptorFiles(DescriptorReaderImpl.java:255)                                                                                                    
        at org.torproject.descriptor.impl.DescriptorReaderImpl$DescriptorReaderRunnable.run(DescriptorReaderImpl.java:161)                                                                                                                    
        at java.lang.Thread.run(Thread.java:748)
2018-09-14 18:32:22,559 ERROR o.t.m.c.c.CollecTorMain:71 The relaydescs module failed: Saving history is only permitted after reading descriptors.                                                                                            
java.lang.IllegalStateException: Saving history is only permitted after reading descriptors.
        at org.torproject.descriptor.impl.DescriptorReaderImpl.saveHistoryFile(DescriptorReaderImpl.java:116)
        at org.torproject.metrics.collector.relaydescs.ReferenceChecker.readNewDescriptors(ReferenceChecker.java:201)
        at org.torproject.metrics.collector.relaydescs.ReferenceChecker.check(ReferenceChecker.java:89)
        at org.torproject.metrics.collector.relaydescs.ArchiveWriter.startProcessing(ArchiveWriter.java:194)
        at org.torproject.metrics.collector.cron.CollecTorMain.run(CollecTorMain.java:67)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)                                                                                                              
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)                                                                                                                     
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
2018-09-14 18:32:22,561 INFO o.t.m.c.c.ShutdownHook:23 Shutdown in progress ...
2018-09-14 18:32:22,561 INFO o.t.m.c.c.Scheduler:127 Waiting at most 10 minutes for termination of running tasks ...
2018-09-14 18:32:22,562 INFO o.t.m.c.c.Scheduler:132 Shutdown of all scheduled tasks completed successfully.
2018-09-14 18:32:22,562 INFO o.t.m.c.c.ShutdownHook:25 Shutdown finished. Exiting.
-rw-r--r--  1 collector  collector   9.4G Sep 14 19:32 2018-09-14-17-26-42-server-descriptors