Skip to content
Snippets Groups Projects
Commit c3253e09 authored by Iain R. Learmonth's avatar Iain R. Learmonth
Browse files

metrics tocs and monitoring

parent b120576a
No related branches found
No related tags found
No related merge requests found
# exit-ops
Exit Scanner, TorDNSEL and Tor Check Operations
[[!toc levels=3]]
# Overview
While the three services described in this document could be implemented as discrete components,
......
# Table of Contents
1. [Synopsis](#orgb3a4817)
2. [Usage of AWS for Tor Metrics Development](#orgb76cd81)
1. [CloudFormation Templates](#orgee150b1)
1. [Quickstart: Deploying a template](#org7813a03)
2. [SSH Key Selection](#orgdc7711c)
2. [Templates and Stacks](#org19b1306)
1. [`billing-alerts`](#org1b9ae57)
2. [`metrics-vpc`](#org2c178f5)
3. [Typical Dev/Testing Stacks](#org97f9e67)
3. [Linting](#orga89e157)
3. [Ansible Playbooks](#org8371364)
1. [Inventories and site.yml](#org81a0dc9)
2. [`metrics-common`](#org55e2902)
3. [Service roles](#org7050aae)
4. [Linting](#org9684f51)
4. [Common Tasks](#org8267248)
1. [Add a new member to the team](#org9040a14)
2. [Update an SSH key for a team member](#org97696ab)
3. [Deploy and provision a development environment for a service](#org400659a)
<a id="orgb3a4817"></a>
# DONE Synopsis
# Overview
The metrics-cloud framework aims to enable:
......@@ -43,9 +17,7 @@ The CloudFormation templates are relevant only to testing and development, while
to both environments.
<a id="orgb76cd81"></a>
# DONE Usage of AWS for Tor Metrics Development
# Usage of AWS for Tor Metrics Development
Each member of the Tor Metrics team has a standing allowance of 100USD/month for development using AWS. In practice,
we have not used more than 50USD/month for the team in any one month and generally sit around 25USD/month. It is
......@@ -53,9 +25,7 @@ still important to minimize costs when using AWS and the use of CloudFormation t
rapid creation, provisioning and destruction should help with this.
<a id="orgee150b1"></a>
## DONE CloudFormation Templates
## CloudFormation Templates
CloudFormation is an AWS service allowing the definition of *stacks*. These stacks describe a series of AWS services
using a domain-specific language and allow for the easy creation of a number of interconnected resources. All resources
......@@ -71,9 +41,7 @@ tracking of spending in the billing portal through the tags.
Documentation for CloudFormation, including an API reference, can be found at: <https://docs.aws.amazon.com/cloudformation/>.
<a id="org7813a03"></a>
### DONE Quickstart: Deploying a template
### Quickstart: Deploying a template
Each template begins with comments with any relevant notes about the template, and a deployment command that will upload
and deploy the template on AWS. The commands will look something like:
......@@ -88,9 +56,7 @@ Once the stack has been deployed from the template, you can view its resources a
the [CloudFormation management console](https://console.aws.amazon.com/cloudformation/home?region=us-east-1#/stacks?filteringText=&filteringStatus=active&viewNested=true&hideStacks=false).
<a id="orgdc7711c"></a>
### DONE SSH Key Selection
### SSH Key Selection
The [identify\_user.sh](https://gitweb.torproject.org/metrics-cloud.git/tree/cloudformation/identify_user.sh) script prints out the name of the SSH public key to be used based on either:
......@@ -104,9 +70,7 @@ If you change the default key you would like to use, update the mapping in this
SSH keys are managed through the [EC2 management console](https://console.aws.amazon.com/ec2/v2/home?region=us-east-1#KeyPairs:) and are not (currently) managed by a CloudFormation template.
<a id="org19b1306"></a>
## DONE Templates and Stacks
## Templates and Stacks
There is no directory hierachy for the templates in the `cloudformation` folder of the repository. There are a couple of naming
conventions used though:
......@@ -115,17 +79,13 @@ conventions used though:
- Long-term and shared templates/stacks start with `metrics-`
<a id="org1b9ae57"></a>
### DONE `billing-alerts`
### `billing-alerts`
The [`billing-alerts` template](https://gitweb.torproject.org/metrics-cloud.git/tree/cloudformation/billing-alerts.yml) sends notifications to the subscribed individuals whenever the predicted spend for the month will be
over 50USD. Email addresses can be added here if other people should be notified too.
<a id="org2c178f5"></a>
### DONE `metrics-vpc`
### `metrics-vpc`
The [`metrics-vpc` template](https://gitweb.torproject.org/metrics-cloud.git/tree/cloudformation/metrics-vpc.yml) contains shared resources for Tor Metrics development templates. This includes:
......@@ -180,9 +140,7 @@ The [`metrics-vpc` template](https://gitweb.torproject.org/metrics-cloud.git/tre
These domain names should **never** appear on anything user facing and are for **development purposes only**.
<a id="org97f9e67"></a>
### DONE Typical Dev/Testing Stacks
### Typical Dev/Testing Stacks
A typical test/dev stack will consist of an EC2 instance and a DNS name. Some services store a lot of data and may have
a second volume attached for the data storage.
......@@ -243,9 +201,7 @@ It's not common to use other AWS services as part of these templates as the goal
deployed on TPA managed hosts.
<a id="orga89e157"></a>
## DONE Linting
## Linting
[`cfn-lint`](https://github.com/aws-cloudformation/cfn-python-lint) is used to ensure we are complying with best practices. None of the team have formal training in the use of CloudFormation
so we are really making it up as we go along. Other tools may be used in the future, as we learn about them, to make sure we are using
......@@ -254,17 +210,13 @@ things efficiently and correctly.
This is also run as part of the [continuous integration checks](https://travis-ci.org/github/torproject/metrics-cloud/) on Travis CI.
<a id="org8371364"></a>
# TODO Ansible Playbooks
# Ansible Playbooks
Ansible is an open-source software provisioning, configuration management, and application-deployment tool. It's written in Python,
is mature, and has an extensive selection of modules for almost everything we could need.
<a id="org81a0dc9"></a>
## TODO Inventories and site.yml
## Inventories and site.yml
In general, there are two inventories: [production](https://gitweb.torproject.org/metrics-cloud.git/tree/ansible/production) and dev. Only the production inventory is committed to git, the dev inventory will
vary between members of the team, referencing their own dev instances as created by CloudFormation. We do not specify a default
......@@ -277,9 +229,7 @@ Inside the inventory, hosts are grouped by their purpose. For each group there i
allow multiple hosts to be provisioned together.
<a id="org55e2902"></a>
## TODO `metrics-common`
## `metrics-common`
The [`metrics-common`](https://gitweb.torproject.org/metrics-cloud.git/tree/ansible/roles/metrics-common) role allows us to have a consistent environment between services, and closely matches the environment that
would be provided by a TSA managed machine. The role handles:
......@@ -299,14 +249,10 @@ This is all configured via group variables in the [`ansible/group_vars/`](https:
these work. These override the [defaults](https://gitweb.torproject.org/metrics-cloud.git/tree/ansible/roles/metrics-common/defaults/main.yml) set in the role.
<a id="org7050aae"></a>
## TODO Service roles
## Service roles
<a id="org9684f51"></a>
## DONE Linting
## Linting
[`ansible-lint`](https://docs.ansible.com/ansible-lint/) is used to ensure we are complying with best practices. None of the team have formal training in the use of Ansible
so we are really making it up as we go along. Other tools may be used in the future, as we learn about them, to make sure we are using
......@@ -315,22 +261,14 @@ things efficiently and correctly.
This is also run as part of the [continuous integration checks](https://travis-ci.org/github/torproject/metrics-cloud/) on Travis CI.
<a id="org8267248"></a>
# TODO Common Tasks
<a id="org9040a14"></a>
## TODO Add a new member to the team
# Common Tasks
<a id="org97696ab"></a>
## Add a new member to the team
## TODO Update an SSH key for a team member
## Update an SSH key for a team member
<a id="org400659a"></a>
## TODO Deploy and provision a development environment for a service
## Deploy and provision a development environment for a service
# monitoring
[[!toc levels=3]]
## CollecTor
This is a TSA host so already has a bunch of ping and NRPE checks. Application
specific checks are mostly looking at the index file:
* That there is an index file that parses and:
* it was recently updated
* it contains a recent run for:
* bridge descriptors
* relay descriptors
* exit lists
The old check uses bushel's CollecTor index parser, but we could equally hack
up a single python script to do this with the JSON at a lower level. In the
end it looks a lot like the Onionoo plugin on the TSA Nagios.
## Onionoo
We have a Python script that runs on the TSA Nagios to check Onionoo.
https://gitweb.torproject.org/admin/tor-nagios.git/tree/tor-nagios-checks/checks/tor-check-onionoo
### Bonus Points
A quick win for someone with some time, I had started extending this to check
a relay's status (with a relay ops hat on):
* Onionoo is unhappy => UNKNOWN (because we're monitoring the relay not Onionoo)
* Tor version number not recommended => WARN
* Last changed address recently => WARN
* BadExit flag is present => WARN
* Not running => CRIT
* Rate of change of consensus weight is large => WARN
* Rate of change of bandwidth usage is large => WARN
* Otherwise => OK
If it's OK, output the current set of flags alphabetically sorted (or at least
consistently sorted) and include the current consensus weight and bandwidth
values in Nagios performance data format.
## OnionPerf
The primary issue with OnionPerfs is that they run out of disk space. A decent
set of ping and NRPE checks should cover most of the common issues we've had.
Application specific checks would include:
* that a file is available in the webserver root for the last analysis run
* that there is something listening on the tgen connect port
* also on the onion service
* that the HTTPS certificate is valid and not about to expire (on port 8443)
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment