Raw import from Trac using Trac markup language. authored by Alexander Hansen Færøy's avatar Alexander Hansen Færøy
= Outline of CollecTor Descriptor Distribution =
The sync-process will available for the modules relaydescs, bridgedescs, exitlists, and torperf.
The additional functionality should be generalized as far as possible and module dependent functionality should be part of the module's code.
== Configuration ==
1. General settings:
Add the properties
`SyncRelayDescriptors`, `SyncBridgeDescriptors`, `SyncExitLists`, and `SyncTorperfFiles`
to the respective properties sections. These properties have the enum type `SyncType` with the
following values: `Sync`, `NoSync`, and `SyncOnly`.
The property `SyncFolder` contains the top path for storing the downloaded descriptors.
1. Choice of sync-sources:
The properties `SyncSourcesRelayDescriptors`, `SyncSourcesBridgeDescriptors`, and `SyncSourcesExitLists` are added to the respective properties sections.
Each containing an array of strings specifying a source name and source URL for each CollecTor instance to retrieve descriptors from.
1. Choice of descriptors:
The entire substructure of 'recent' will be fetched, i.e. `recent/exit-lists/*` for exitlists, `recent/relay-descriptors/**/*` for relaydescs, and `recent/bridge-descriptors/**/*` for bridgesdescs.
1. Backup of replaced local files:
There won't be a backup of replaced local files.
== Fetching and Merging ==
If `Sync*` has the value `NoSync`, nothing is done. `SyncOnly` will not start the module and immediately begin fetching from the instances configured in `SyncSources*`. `Sync` will first run the module and then begin to sync.
=== Processing ===
a. Retrieve descriptors from the CollecTor instances defined in `SyncSources*`. These descriptors are stored in `SyncFolder` under the host part of the instance's url, e.g. {{{my-sync-folder/collector.torproject.org/recent/exit-lists}}} for exitlists from the main instance.
b. Following retrieval the fetched descriptors are examined:
i. discard descriptor files that do not contain what they should (see comment:11) and log a warning with sync-source info and reason (see criteria).
i. copy valid descriptors (see criteria) without a pre-existing local copy to the local `*OutDirectory` (cf. [https://gitweb.torproject.org/collector.git/tree/src/main/resources/collector.properties collector.properties]) and 'recent' structure.
i. if there is a local copy already, decide which copy to keep (see criteria).
I. local copy is kept, log debug message with source and reason.
I. local and fetched are identical, log debug message with source and reason.
I. Maybe later: fetched copy should replace local descriptor. Copy fetched descriptor to local `*OutDirectory` and 'recent'. In all cases log debug message with source and reason.
== Replacement criteria ==
As the replacement criteria are not fully defined yet and it is very likely that there will be more criteria in future a modular/pluggable approach seems useful, i.e.:
1. define `KeepCriterium` and `ReplaceCriterium` interfaces
1. register implementing classes with CollecTor in order to facilitate the selection steps described above.
The only initial `ReplaceCriterium` will never allow replacing.
The only initial `KeepCriterium` is a valid descriptor is contained in the descriptor file.
For the initial implementation it suffices to hard-code the `*Criterium` classes with the option to easily make that configurable later.