Loading INSTALL.md +51 −24 Original line number Diff line number Diff line Loading @@ -34,7 +34,9 @@ a CollecTor instance. In that case it's sufficient to use a library like ## Setting up the host You'll need a host with at least 150G disk space and 4G RAM. You'll need a host with at least 150G disk space and 4G RAM. The available disk space will need closer monitoring when the disk has less than 200G and also accommodates the log files. In the following we'll assume that your host runs Debian stable as operating system. CollecTor should run on any other Linux or possibly even *BSD, though Loading Loading @@ -65,20 +67,36 @@ This concludes the host setup. Later in the process you'll once more need root privileges to configure Apache to serve CollecTor files. But until then you can do all setup steps with the non-privileged user account. ## Planning the Service Setup ## Setting up the service By default, CollecTor is configured to do nothing at all. The reason is that new operators should first understand its capabilities and make a plan for configuring their new CollecTor instance. Let's do that now. CollecTor releases are available at: In order to have the latest information at hand first download a CollecTor release available at: ```https://dist.torproject.org/collector/``` Download the latest tarball and signature file, verify the signature on the tarball using `gpg`, and extract the tarball in the working directory of your CollecTor instance. Choose the latest tarball and signature file, verify the signature on the tarball using `gpg`, and extract the tarball in a directory of your CollecTor instance, let's name it 'basedir'. Extracting will create a subdirectory ```collector-<version>``` inside 'basedir'. By default, CollecTor is configured to do nothing at all. The reason is that new operators should first understand its capabilities and make a plan for configuring their new CollecTor instance. Let's do that now. Part of the tarball is an executable jar, i.e. ```collector-<version>/generated/dist/collector-<version>.jar``` Copy this jar file into the working directory you want to use for CollecTor, let's name it 'workdir', which should be empty except for the jar. Run ```java -jar collector-<version>.jar``` This will print some text about not being able to find a config file, which is fine as there was none, and exit. When you list the directory contents again you ought to find a fresh default configuration file ```collector.properties``` in addition to the jar copied earlier. Read through the default properties file to learn about all available configuration options. CollecTor consists of a background updater with an internal scheduler and several data-collecting modules that write data to local directories which are Loading @@ -89,17 +107,18 @@ authority. You'll have to decide which of the data-collecting modules you want to activate, how often to execute these modules, and which data sources to collect data from. Read through the default properties file to learn about all available configuration options: ```collector-<version>/src/main/resources/collector.properties``` ## Setting up the service Now the properties in ```collector.properties``` are all set according to your data needs. One final decision to make: run once or repeatedly? Let's try run once first: When you have made a plan how to configure your CollecTor instance, copy the `collector.properties` file to the working directory. Edit that file, set it to run only once, activate all relevant modules, check and possibly edit other options as needed, and save the file. Run the Java process using: `collector.properties` can be renamed and put in any place as long as it is readable by the user running the jar and the path is given as command line argument: ```java -Xmx2g -DLOGBASE=log -jar collector-<version>/collector-<version>.jar``` ```java -Xmx2g -DLOGBASE=log -jar collector-<version>/collector-<version>.jar </path/to/config.file>``` This may take a while, depending on which modules you activated. Read the logs in `log/` to learn if the run was successful. If it wasn't, go back to editing Loading @@ -112,13 +131,17 @@ subdirectory in the working directory, and executing that shell script from the working directory. Note that this script will at least partly fail if one or more modules are deactivated. ### Scheduled CollecTor The next step in setting up the CollecTor instance is to start the updater with its internal scheduler and let it run continuously in the background. In order to do so, make sure the run-once property is set to `false`, possibly adapt the scheduling properties, and execute the .jar file using the same command as above but this time in the background or in a `screen` session. Also put that same line into an `@reboot` line in the user's crontab, so that it will be started automaticlly after a reboot. automatically after a reboot. Or, use some other means to ensure the proper restart. Set up a crontab entry to execute the `create-tarballs.sh` script at least every three days, but no more than once per day. Loading Loading @@ -164,20 +187,24 @@ without stopping and restarting the Java process. Scheduling settings are exempt from this, but all general and module settings may be changed at run-time. Just edit the config file, and the changes will become effective in the next execution of a module. Changes to the scheduler, however, require stopping and restarting the Java update process. stopping and restarting the Java update process. Configuration changes are logged at info level once shortly after their made and when a module uses a new configuration for the first time. If you need to stop the background updater for some reason, like rebooting the host, there is a way to do that gracefully: kill the `java` process, and a shutdown hook will stop the internal scheduler and wait for up to 10 minutes for all currently running updates to be finished. However, if you must stop the process immediately, use `kill -9`, though you might have to clean up manually. shutdown hook will stop the internal scheduler and wait for up to 10 minutes (or the amount of time configured) for all currently running updates to be finished. However, if you must stop the process immediately, use `kill -9`, though you might have to clean up manually. You should try to avoid rebooting while tarballs are being created. If you need to upgrade to a newer release or downgrade to a previous release, download that tarball to the same place as the initial tarball and extract it. Stop the current service version as described above, possibly adapt your `collector.properties` file as necessary, and restart the Java process using the new .jar file. Also update `@reboot` crontab entries to execute the new Java `collector.properties` file as necessary, copy the new jar to 'workdir' directory and restart the Java process using the new .jar file. Don't forget to update `@reboot` crontab entries to execute the new Java process and tarball-creating script. Watch the logs to see if the upgrade or downgrade was successful. Loading Loading
INSTALL.md +51 −24 Original line number Diff line number Diff line Loading @@ -34,7 +34,9 @@ a CollecTor instance. In that case it's sufficient to use a library like ## Setting up the host You'll need a host with at least 150G disk space and 4G RAM. You'll need a host with at least 150G disk space and 4G RAM. The available disk space will need closer monitoring when the disk has less than 200G and also accommodates the log files. In the following we'll assume that your host runs Debian stable as operating system. CollecTor should run on any other Linux or possibly even *BSD, though Loading Loading @@ -65,20 +67,36 @@ This concludes the host setup. Later in the process you'll once more need root privileges to configure Apache to serve CollecTor files. But until then you can do all setup steps with the non-privileged user account. ## Planning the Service Setup ## Setting up the service By default, CollecTor is configured to do nothing at all. The reason is that new operators should first understand its capabilities and make a plan for configuring their new CollecTor instance. Let's do that now. CollecTor releases are available at: In order to have the latest information at hand first download a CollecTor release available at: ```https://dist.torproject.org/collector/``` Download the latest tarball and signature file, verify the signature on the tarball using `gpg`, and extract the tarball in the working directory of your CollecTor instance. Choose the latest tarball and signature file, verify the signature on the tarball using `gpg`, and extract the tarball in a directory of your CollecTor instance, let's name it 'basedir'. Extracting will create a subdirectory ```collector-<version>``` inside 'basedir'. By default, CollecTor is configured to do nothing at all. The reason is that new operators should first understand its capabilities and make a plan for configuring their new CollecTor instance. Let's do that now. Part of the tarball is an executable jar, i.e. ```collector-<version>/generated/dist/collector-<version>.jar``` Copy this jar file into the working directory you want to use for CollecTor, let's name it 'workdir', which should be empty except for the jar. Run ```java -jar collector-<version>.jar``` This will print some text about not being able to find a config file, which is fine as there was none, and exit. When you list the directory contents again you ought to find a fresh default configuration file ```collector.properties``` in addition to the jar copied earlier. Read through the default properties file to learn about all available configuration options. CollecTor consists of a background updater with an internal scheduler and several data-collecting modules that write data to local directories which are Loading @@ -89,17 +107,18 @@ authority. You'll have to decide which of the data-collecting modules you want to activate, how often to execute these modules, and which data sources to collect data from. Read through the default properties file to learn about all available configuration options: ```collector-<version>/src/main/resources/collector.properties``` ## Setting up the service Now the properties in ```collector.properties``` are all set according to your data needs. One final decision to make: run once or repeatedly? Let's try run once first: When you have made a plan how to configure your CollecTor instance, copy the `collector.properties` file to the working directory. Edit that file, set it to run only once, activate all relevant modules, check and possibly edit other options as needed, and save the file. Run the Java process using: `collector.properties` can be renamed and put in any place as long as it is readable by the user running the jar and the path is given as command line argument: ```java -Xmx2g -DLOGBASE=log -jar collector-<version>/collector-<version>.jar``` ```java -Xmx2g -DLOGBASE=log -jar collector-<version>/collector-<version>.jar </path/to/config.file>``` This may take a while, depending on which modules you activated. Read the logs in `log/` to learn if the run was successful. If it wasn't, go back to editing Loading @@ -112,13 +131,17 @@ subdirectory in the working directory, and executing that shell script from the working directory. Note that this script will at least partly fail if one or more modules are deactivated. ### Scheduled CollecTor The next step in setting up the CollecTor instance is to start the updater with its internal scheduler and let it run continuously in the background. In order to do so, make sure the run-once property is set to `false`, possibly adapt the scheduling properties, and execute the .jar file using the same command as above but this time in the background or in a `screen` session. Also put that same line into an `@reboot` line in the user's crontab, so that it will be started automaticlly after a reboot. automatically after a reboot. Or, use some other means to ensure the proper restart. Set up a crontab entry to execute the `create-tarballs.sh` script at least every three days, but no more than once per day. Loading Loading @@ -164,20 +187,24 @@ without stopping and restarting the Java process. Scheduling settings are exempt from this, but all general and module settings may be changed at run-time. Just edit the config file, and the changes will become effective in the next execution of a module. Changes to the scheduler, however, require stopping and restarting the Java update process. stopping and restarting the Java update process. Configuration changes are logged at info level once shortly after their made and when a module uses a new configuration for the first time. If you need to stop the background updater for some reason, like rebooting the host, there is a way to do that gracefully: kill the `java` process, and a shutdown hook will stop the internal scheduler and wait for up to 10 minutes for all currently running updates to be finished. However, if you must stop the process immediately, use `kill -9`, though you might have to clean up manually. shutdown hook will stop the internal scheduler and wait for up to 10 minutes (or the amount of time configured) for all currently running updates to be finished. However, if you must stop the process immediately, use `kill -9`, though you might have to clean up manually. You should try to avoid rebooting while tarballs are being created. If you need to upgrade to a newer release or downgrade to a previous release, download that tarball to the same place as the initial tarball and extract it. Stop the current service version as described above, possibly adapt your `collector.properties` file as necessary, and restart the Java process using the new .jar file. Also update `@reboot` crontab entries to execute the new Java `collector.properties` file as necessary, copy the new jar to 'workdir' directory and restart the Java process using the new .jar file. Don't forget to update `@reboot` crontab entries to execute the new Java process and tarball-creating script. Watch the logs to see if the upgrade or downgrade was successful. Loading