![https://collector.torproject.org/images/collector-logo.png, link=https://collector.torproject.org, valign=middle, height=70](https://collector.torproject.org/images/collector-logo.png, link=https://collector.torproject.org, valign=middle, height=70) ![https://collector.torproject.org/images/collector-wordmark.png, link=https://collector.torproject.org, valign=middle, height=20](https://collector.torproject.org/images/collector-wordmark.png, link=https://collector.torproject.org, valign=middle, height=20)
This is a living and changing document to accompany the current project for improving CollecTor.
== Areas of Work During the course of this project the following sections will more and more turn into descriptions and documentation. Currently, they are a mixture of very defined improvements as well as sketches and wishes and questions.
=== Analyze Descriptor Completeness
The analysis will be based on log-files and the downloaded files and address the following questions: ==== How many descriptors are missing?
- Details about missing referenced descriptors can be found here: Analysis Part 1
- Details about missing consensus and votes: Analysis Part 2
- Analysis of missing referenced descriptors on the current development CollecTor mirror: Analysis of pure download mirror
==== How could this loss be avoided?
- actively monitor resources like available storage space (discussion in ticket #18865 (moved)).
- verify and improve runtime statistics in order to have a clearer picture (discussion in ticket #19169 (moved)).
- Extra-info descriptors dropped b/c of parsing problems are counted as missing. This should be avoided. ticket #19170 (moved).
Continue analysis when sync-process is deployed.
=== Provide Guide Documents
These guides should be based on the previous work in Onionoo and metrics-lib. In detail
- Contributor's Guide: create as detailed in #18733 (moved) and place the new guide in a central location, which still needs to be identified; this could be a large document in the central place and a small document in CollecTor referencing the main document. (detailed discussion in #18730 (moved))
- Release Process (definded in #18732 (moved))
- Installation Guide for Operators (adapt the existing document), ticket #18734 (moved)
=== Implement the Release Process
(according to the guide above)
== Design Changes
This section describes improvements that ought to make CollecTor more maintainable, testable, and more efficient.
- Run collector with an internal scheduler instead of using external scheduling (e.g. crontab), #19018 (moved)
- Add shutdown hook to provide a controlled way of stopping. Discussion #19016 (moved).
- Some parts of CollecTor's data processing are provided by bash scripts run via crontab. These should be integrated into the java application.
=== Improve CollecTor Operation and Setup
Once there is the executable jar including the shutdown hook implementation CollecTor should be started as a linux service, i.e., an appropriate shell script needs to be provided.
=== Further Sketches of Areas for Improvements
- store unparsable descriptors rather than discarding them
- add local storage for descriptors that cannot be parsed for review by the service operator and later reprocessing
- synchronization between CollecTor instances see #18910 (moved) and DescriptorDistribution
- improve the process of creating tarballs
- reduce memory consumption throughout
- consider using an embedded http server in order to reduce operating complexity
=== bugfix Release 1.0.1, August 22, 2016 Prevent out-of-memory error, cf. #19913 (moved).
== All Tasks in Trac
=== Active Tasks
=== Completed Tasks