Skip to content
Snippets Groups Projects
Commit b120576a authored by Iain R. Learmonth's avatar Iain R. Learmonth
Browse files

metrics convert all to markdown now

parent ea0dd76d
No related branches found
No related tags found
No related merge requests found
File moved
This diff is collapsed.
# Table of Contents
1. [Synopsis](#orgb3a4817)
2. [Usage of AWS for Tor Metrics Development](#orgb76cd81)
1. [CloudFormation Templates](#orgee150b1)
1. [Quickstart: Deploying a template](#org7813a03)
2. [SSH Key Selection](#orgdc7711c)
2. [Templates and Stacks](#org19b1306)
1. [`billing-alerts`](#org1b9ae57)
2. [`metrics-vpc`](#org2c178f5)
3. [Typical Dev/Testing Stacks](#org97f9e67)
3. [Linting](#orga89e157)
3. [Ansible Playbooks](#org8371364)
1. [Inventories and site.yml](#org81a0dc9)
2. [`metrics-common`](#org55e2902)
3. [Service roles](#org7050aae)
4. [Linting](#org9684f51)
4. [Common Tasks](#org8267248)
1. [Add a new member to the team](#org9040a14)
2. [Update an SSH key for a team member](#org97696ab)
3. [Deploy and provision a development environment for a service](#org400659a)
<a id="orgb3a4817"></a>
# DONE Synopsis
The metrics-cloud framework aims to enable:
- reproducible deployments of software
- consistency between those software deployments
Side-effects of these goals are:
- reproducible experiments (good science)
- reduced maintainence costs
- reduced human error
There are currently two components to the metrics-cloud framework: CloudFormation templates and Ansible playbooks.
The CloudFormation templates are relevant only to testing and development, while the Ansible playbooks are applicable
to both environments.
<a id="orgb76cd81"></a>
# DONE Usage of AWS for Tor Metrics Development
Each member of the Tor Metrics team has a standing allowance of 100USD/month for development using AWS. In practice,
we have not used more than 50USD/month for the team in any one month and generally sit around 25USD/month. It is
still important to minimize costs when using AWS and the use of CloudFormation templates and Ansible playbooks for
rapid creation, provisioning and destruction should help with this.
<a id="orgee150b1"></a>
## DONE CloudFormation Templates
CloudFormation is an AWS service allowing the definition of *stacks*. These stacks describe a series of AWS services
using a domain-specific language and allow for the easy creation of a number of interconnected resources. All resources
in a stack are tagged with the stack name which allows for tracking of costs per project. Each stack can also have all
resources terminated together easily, allowing stacks to exist for only as long as they are needed.
The CloudFormation templates used in the framework can be found in the [cloudformation](https://gitweb.torproject.org/metrics-cloud.git/tree/cloudformation) folder of the repository.
It may be that for some services the templates are very simple, and others may be more complex. No matter the level of
complexity we still want to use the templates to ensure we are meeting the key goals of the framework and also to simplify
tracking of spending in the billing portal through the tags.
Documentation for CloudFormation, including an API reference, can be found at: <https://docs.aws.amazon.com/cloudformation/>.
<a id="org7813a03"></a>
### DONE Quickstart: Deploying a template
Each template begins with comments with any relevant notes about the template, and a deployment command that will upload
and deploy the template on AWS. The commands will look something like:
aws cloudformation deploy --region us-east-1 --stack-name `whoami`-exit-scanner-dev --template-file exit-scanner-dev.yml --parameter-overrides myKeyPair="$(./identify_user.sh)"
You'll notice that the command includes a call to `whoami` to prefix the stack name with your current username, and also
that the `identify_user.sh` script is used to determine which SSH key to use for new instances.
You do not need to modify this command line before running it.
Once the stack has been deployed from the template, you can view its resources and delete it through
the [CloudFormation management console](https://console.aws.amazon.com/cloudformation/home?region=us-east-1#/stacks?filteringText=&filteringStatus=active&viewNested=true&hideStacks=false).
<a id="orgdc7711c"></a>
### DONE SSH Key Selection
The [identify\_user.sh](https://gitweb.torproject.org/metrics-cloud.git/tree/cloudformation/identify_user.sh) script prints out the name of the SSH public key to be used based on either:
- the `TOR_METRICS_SSH_KEY` environment variable, or
- the current user name.
The environment variable takes precedence over the username to key mapping.
If you change the default key you would like to use, update the mapping in this shell script.
SSH keys are managed through the [EC2 management console](https://console.aws.amazon.com/ec2/v2/home?region=us-east-1#KeyPairs:) and are not (currently) managed by a CloudFormation template.
<a id="org19b1306"></a>
## DONE Templates and Stacks
There is no directory hierachy for the templates in the `cloudformation` folder of the repository. There are a couple of naming
conventions used though:
- Development/testing templates/stacks use a `-dev` suffix after the service name
- Long-term and shared templates/stacks start with `metrics-`
<a id="org1b9ae57"></a>
### DONE `billing-alerts`
The [`billing-alerts` template](https://gitweb.torproject.org/metrics-cloud.git/tree/cloudformation/billing-alerts.yml) sends notifications to the subscribed individuals whenever the predicted spend for the month will be
over 50USD. Email addresses can be added here if other people should be notified too.
<a id="org2c178f5"></a>
### DONE `metrics-vpc`
The [`metrics-vpc` template](https://gitweb.torproject.org/metrics-cloud.git/tree/cloudformation/metrics-vpc.yml) contains shared resources for Tor Metrics development templates. This includes:
1. MetricsVPC and MetricsSubnet
The subnet should be referenced by any resource that requires it. Use of the default VPC should be avoided as we
share the AWS account with other Tor teams.
For example, to create an EC2 instance:
Instance:
Type: AWS::EC2::Instance
Properties:
AvailabilityZone: !Select [ 0, !GetAZs ]
ImageId: ami-01db78123b2b99496
InstanceType: t2.large
SubnetId:
Fn::ImportValue: 'MetricsSubnet'
KeyName: !Ref myKeyPair
SecurityGroupIds:
- Fn::ImportValue: 'MetricsInternetSecurityGroup'
- Fn::ImportValue: 'MetricsPingableSecurityGroup'
- Fn::ImportValue: 'MetricsHTTPASecurityGroup'
Note also that the availability zone is not hardcoded to allow for portability between regions if we ever want that.
2. Various security groups
The EC2 example above uses some of the security groups from the `metrics-vpc` template. Refer to the template source
for details on each group's rules.
3. The development DNS zone
Often services require TLS certificates, or require DNS names for other reasons. To facilitate this, a zone is hosted
using Route53 allowing for DNS records to be created in CloudFormation templates. This zone is:
`tm-dev-aws.safemetrics.org`.
As an example, creating an A record for an EC2 instance with the subdomain of the stack name:
DNSName:
Type: AWS::Route53::RecordSet
Properties:
HostedZoneName: tm-dev-aws.safemetrics.org.
Name: !Join ['', [!Ref 'AWS::StackName', .tm-dev-aws.safemetrics.org.]]
Type: A
TTL: '300'
ResourceRecords:
- !GetAtt Instance.PublicIp
Q: *Can we use the MetricsDevZone export from `metrics-vpc` instead of explicitly defining the zone name every time?*
These domain names should **never** appear on anything user facing and are for **development purposes only**.
<a id="org97f9e67"></a>
### DONE Typical Dev/Testing Stacks
A typical test/dev stack will consist of an EC2 instance and a DNS name. Some services store a lot of data and may have
a second volume attached for the data storage.
An example template with one t2.large EC2 instance, a 15GB additional disk, and a DNS name:
---
# CloudFormation Stack for example development instance
# This stack will only deploy on us-east-1 and will deploy in the Metrics VPC
# aws cloudformation deploy --region us-east-1 --stack-name `whoami`-example-dev --template-file example-dev.yml --parameter-overrides myKeyPair="$(./identify_user.sh)"
AWSTemplateFormatVersion: 2010-09-09
Parameters:
myKeyPair:
Description: Amazon EC2 Key Pair
Type: "AWS::EC2::KeyPair::KeyName"
Resources:
Instance:
Type: AWS::EC2::Instance
Properties:
AvailabilityZone: !Select [ 0, !GetAZs ]
ImageId: ami-01db78123b2b99496
InstanceType: t2.large
SubnetId:
Fn::ImportValue: 'MetricsSubnet'
KeyName: !Ref myKeyPair
SecurityGroupIds:
- Fn::ImportValue: 'MetricsInternetSecurityGroup'
- Fn::ImportValue: 'MetricsPingableSecurityGroup'
- Fn::ImportValue: 'MetricsHTTPSecurityGroup'
- Fn::ImportValue: 'MetricsHTTPSSecurityGroup'
ServiceVolume:
Type: AWS::EC2::Volume
Properties:
AvailabilityZone: !Select [ 0, !GetAZs ]
Size: 15
VolumeType: gp2
ServiceVolumeAttachment:
Type: AWS::EC2::VolumeAttachment
Properties:
Device: /dev/sdb
InstanceId: !Ref Instance
VolumeId: !Ref ServiceVolume
DNSName:
Type: AWS::Route53::RecordSet
Properties:
HostedZoneName: tm-dev-aws.safemetrics.org.
Name: !Join ['', [!Ref 'AWS::StackName', .tm-dev-aws.safemetrics.org.]]
Type: A
TTL: '300'
ResourceRecords:
- !GetAtt Instance.PublicIp
Outputs:
PublicIp:
Description: "Instance public IP"
Value: !GetAtt Instance.PublicIp
It's not common to use other AWS services as part of these templates as the goal is usually to have these services
deployed on TPA managed hosts.
<a id="orga89e157"></a>
## DONE Linting
[`cfn-lint`](https://github.com/aws-cloudformation/cfn-python-lint) is used to ensure we are complying with best practices. None of the team have formal training in the use of CloudFormation
so we are really making it up as we go along. Other tools may be used in the future, as we learn about them, to make sure we are using
things efficiently and correctly.
This is also run as part of the [continuous integration checks](https://travis-ci.org/github/torproject/metrics-cloud/) on Travis CI.
<a id="org8371364"></a>
# TODO Ansible Playbooks
Ansible is an open-source software provisioning, configuration management, and application-deployment tool. It's written in Python,
is mature, and has an extensive selection of modules for almost everything we could need.
<a id="org81a0dc9"></a>
## TODO Inventories and site.yml
In general, there are two inventories: [production](https://gitweb.torproject.org/metrics-cloud.git/tree/ansible/production) and dev. Only the production inventory is committed to git, the dev inventory will
vary between members of the team, referencing their own dev instances as created by CloudFormation. We do not specify a default
inventory in the `ansible.cfg` file, so you must specify an inventory for every invocation of `ansible-playbook` using the `-i` flag:
ansible-playbook -i dev ...
Inside the inventory, hosts are grouped by their purpose. For each group there is a corresponding YAML file in the root of the
`ansible` directory that specifies a playbook for the group. All of these files are included in the `site.yml` master playbook to
allow multiple hosts to be provisioned together.
<a id="org55e2902"></a>
## TODO `metrics-common`
The [`metrics-common`](https://gitweb.torproject.org/metrics-cloud.git/tree/ansible/roles/metrics-common) role allows us to have a consistent environment between services, and closely matches the environment that
would be provided by a TSA managed machine. The role handles:
- installation of dependency packages from Debian (optionally from the backports repository)
- formats additional volumes attached to the instance using the specified filesystem
- sets the timezone to UTC (Q: *is this what TSA do?*)
- creates user accounts for each member of the team
- all team members can perform unlimited passwordless sudo (TSA hosts require a password)
- SSH password authentication is disabled
- all user account passwords are removed/disabled
- creates service user accounts as specified
- home directories are created as specified, and linked from `/home/$user`
- lingering is enabled for service users
This is all configured via group variables in the [`ansible/group_vars/`](https://gitweb.torproject.org/metrics-cloud.git/tree/ansible/group_vars) folder. Examples there should help you to understand how
these work. These override the [defaults](https://gitweb.torproject.org/metrics-cloud.git/tree/ansible/roles/metrics-common/defaults/main.yml) set in the role.
<a id="org7050aae"></a>
## TODO Service roles
<a id="org9684f51"></a>
## DONE Linting
[`ansible-lint`](https://docs.ansible.com/ansible-lint/) is used to ensure we are complying with best practices. None of the team have formal training in the use of Ansible
so we are really making it up as we go along. Other tools may be used in the future, as we learn about them, to make sure we are using
things efficiently and correctly.
This is also run as part of the [continuous integration checks](https://travis-ci.org/github/torproject/metrics-cloud/) on Travis CI.
<a id="org8267248"></a>
# TODO Common Tasks
<a id="org9040a14"></a>
## TODO Add a new member to the team
<a id="org97696ab"></a>
## TODO Update an SSH key for a team member
<a id="org400659a"></a>
## TODO Deploy and provision a development environment for a service
#+TITLE: metrics-cloud: Scripts for orchestrating Tor Metrics services
#+OPTIONS: ^:nil
* DONE Synopsis
The metrics-cloud framework aims to enable:
- reproducible deployments of software
- consistency between those software deployments
Side-effects of these goals are:
- reproducible experiments (good science)
- reduced maintainence costs
- reduced human error
There are currently two components to the metrics-cloud framework: CloudFormation templates and Ansible playbooks.
The CloudFormation templates are relevant only to testing and development, while the Ansible playbooks are applicable
to both environments.
* DONE Usage of AWS for Tor Metrics Development
Each member of the Tor Metrics team has a standing allowance of 100USD/month for development using AWS. In practice,
we have not used more than 50USD/month for the team in any one month and generally sit around 25USD/month. It is
still important to minimize costs when using AWS and the use of CloudFormation templates and Ansible playbooks for
rapid creation, provisioning and destruction should help with this.
** DONE CloudFormation Templates
CloudFormation is an AWS service allowing the definition of /stacks/. These stacks describe a series of AWS services
using a domain-specific language and allow for the easy creation of a number of interconnected resources. All resources
in a stack are tagged with the stack name which allows for tracking of costs per project. Each stack can also have all
resources terminated together easily, allowing stacks to exist for only as long as they are needed.
The CloudFormation templates used in the framework can be found in the [[https://gitweb.torproject.org/metrics-cloud.git/tree/cloudformation][cloudformation]] folder of the repository.
It may be that for some services the templates are very simple, and others may be more complex. No matter the level of
complexity we still want to use the templates to ensure we are meeting the key goals of the framework and also to simplify
tracking of spending in the billing portal through the tags.
Documentation for CloudFormation, including an API reference, can be found at: https://docs.aws.amazon.com/cloudformation/.
*** DONE Quickstart: Deploying a template
Each template begins with comments with any relevant notes about the template, and a deployment command that will upload
and deploy the template on AWS. The commands will look something like:
#+BEGIN_SRC shell
aws cloudformation deploy --region us-east-1 --stack-name `whoami`-exit-scanner-dev --template-file exit-scanner-dev.yml --parameter-overrides myKeyPair="$(./identify_user.sh)"
#+END_SRC
You'll notice that the command includes a call to ~whoami~ to prefix the stack name with your current username, and also
that the ~identify_user.sh~ script is used to determine which SSH key to use for new instances.
You do not need to modify this command line before running it.
Once the stack has been deployed from the template, you can view its resources and delete it through
the [[https://console.aws.amazon.com/cloudformation/home?region=us-east-1#/stacks?filteringText=&filteringStatus=active&viewNested=true&hideStacks=false][CloudFormation management console]].
*** DONE SSH Key Selection
The [[https://gitweb.torproject.org/metrics-cloud.git/tree/cloudformation/identify_user.sh][identify_user.sh]] script prints out the name of the SSH public key to be used based on either:
- the ~TOR_METRICS_SSH_KEY~ environment variable, or
- the current user name.
The environment variable takes precedence over the username to key mapping.
If you change the default key you would like to use, update the mapping in this shell script.
SSH keys are managed through the [[https://console.aws.amazon.com/ec2/v2/home?region=us-east-1#KeyPairs:][EC2 management console]] and are not (currently) managed by a CloudFormation template.
** DONE Templates and Stacks
There is no directory hierachy for the templates in the ~cloudformation~ folder of the repository. There are a couple of naming
conventions used though:
- Development/testing templates/stacks use a ~-dev~ suffix after the service name
- Long-term and shared templates/stacks start with ~metrics-~
*** DONE ~billing-alerts~
The [[https://gitweb.torproject.org/metrics-cloud.git/tree/cloudformation/billing-alerts.yml][~billing-alerts~ template]] sends notifications to the subscribed individuals whenever the predicted spend for the month will be
over 50USD. Email addresses can be added here if other people should be notified too.
*** DONE ~metrics-vpc~
The [[https://gitweb.torproject.org/metrics-cloud.git/tree/cloudformation/metrics-vpc.yml][~metrics-vpc~ template]] contains shared resources for Tor Metrics development templates. This includes:
**** MetricsVPC and MetricsSubnet
The subnet should be referenced by any resource that requires it. Use of the default VPC should be avoided as we
share the AWS account with other Tor teams.
For example, to create an EC2 instance:
#+BEGIN_SRC yaml
Instance:
Type: AWS::EC2::Instance
Properties:
AvailabilityZone: !Select [ 0, !GetAZs ]
ImageId: ami-01db78123b2b99496
InstanceType: t2.large
SubnetId:
Fn::ImportValue: 'MetricsSubnet'
KeyName: !Ref myKeyPair
SecurityGroupIds:
- Fn::ImportValue: 'MetricsInternetSecurityGroup'
- Fn::ImportValue: 'MetricsPingableSecurityGroup'
- Fn::ImportValue: 'MetricsHTTPASecurityGroup'
#+END_SRC
Note also that the availability zone is not hardcoded to allow for portability between regions if we ever want that.
**** Various security groups
The EC2 example above uses some of the security groups from the ~metrics-vpc~ template. Refer to the template source
for details on each group's rules.
**** The development DNS zone
Often services require TLS certificates, or require DNS names for other reasons. To facilitate this, a zone is hosted
using Route53 allowing for DNS records to be created in CloudFormation templates. This zone is:
~tm-dev-aws.safemetrics.org~.
As an example, creating an A record for an EC2 instance with the subdomain of the stack name:
#+BEGIN_SRC yaml
DNSName:
Type: AWS::Route53::RecordSet
Properties:
HostedZoneName: tm-dev-aws.safemetrics.org.
Name: !Join ['', [!Ref 'AWS::StackName', .tm-dev-aws.safemetrics.org.]]
Type: A
TTL: '300'
ResourceRecords:
- !GetAtt Instance.PublicIp
#+END_SRC
:FUTUREQUESTION:
Q: /Can we use the MetricsDevZone export from ~metrics-vpc~ instead of explicitly defining the zone name every time?/
:END:
These domain names should *never* appear on anything user facing and are for *development purposes only*.
*** DONE Typical Dev/Testing Stacks
A typical test/dev stack will consist of an EC2 instance and a DNS name. Some services store a lot of data and may have
a second volume attached for the data storage.
An example template with one t2.large EC2 instance, a 15GB additional disk, and a DNS name:
#+BEGIN_SRC yaml
---
# CloudFormation Stack for example development instance
# This stack will only deploy on us-east-1 and will deploy in the Metrics VPC
# aws cloudformation deploy --region us-east-1 --stack-name `whoami`-example-dev --template-file example-dev.yml --parameter-overrides myKeyPair="$(./identify_user.sh)"
AWSTemplateFormatVersion: 2010-09-09
Parameters:
myKeyPair:
Description: Amazon EC2 Key Pair
Type: "AWS::EC2::KeyPair::KeyName"
Resources:
Instance:
Type: AWS::EC2::Instance
Properties:
AvailabilityZone: !Select [ 0, !GetAZs ]
ImageId: ami-01db78123b2b99496
InstanceType: t2.large
SubnetId:
Fn::ImportValue: 'MetricsSubnet'
KeyName: !Ref myKeyPair
SecurityGroupIds:
- Fn::ImportValue: 'MetricsInternetSecurityGroup'
- Fn::ImportValue: 'MetricsPingableSecurityGroup'
- Fn::ImportValue: 'MetricsHTTPSecurityGroup'
- Fn::ImportValue: 'MetricsHTTPSSecurityGroup'
ServiceVolume:
Type: AWS::EC2::Volume
Properties:
AvailabilityZone: !Select [ 0, !GetAZs ]
Size: 15
VolumeType: gp2
ServiceVolumeAttachment:
Type: AWS::EC2::VolumeAttachment
Properties:
Device: /dev/sdb
InstanceId: !Ref Instance
VolumeId: !Ref ServiceVolume
DNSName:
Type: AWS::Route53::RecordSet
Properties:
HostedZoneName: tm-dev-aws.safemetrics.org.
Name: !Join ['', [!Ref 'AWS::StackName', .tm-dev-aws.safemetrics.org.]]
Type: A
TTL: '300'
ResourceRecords:
- !GetAtt Instance.PublicIp
Outputs:
PublicIp:
Description: "Instance public IP"
Value: !GetAtt Instance.PublicIp
#+END_SRC
It's not common to use other AWS services as part of these templates as the goal is usually to have these services
deployed on TPA managed hosts.
** DONE Linting
[[https://github.com/aws-cloudformation/cfn-python-lint][~cfn-lint~]] is used to ensure we are complying with best practices. None of the team have formal training in the use of CloudFormation
so we are really making it up as we go along. Other tools may be used in the future, as we learn about them, to make sure we are using
things efficiently and correctly.
This is also run as part of the [[https://travis-ci.org/github/torproject/metrics-cloud/][continuous integration checks]] on Travis CI.
* TODO Ansible Playbooks
Ansible is an open-source software provisioning, configuration management, and application-deployment tool. It's written in Python,
is mature, and has an extensive selection of modules for almost everything we could need.
** TODO Inventories and site.yml
In general, there are two inventories: [[https://gitweb.torproject.org/metrics-cloud.git/tree/ansible/production][production]] and dev. Only the production inventory is committed to git, the dev inventory will
vary between members of the team, referencing their own dev instances as created by CloudFormation. We do not specify a default
inventory in the ~ansible.cfg~ file, so you must specify an inventory for every invocation of ~ansible-playbook~ using the ~-i~ flag:
#+BEGIN_SRC shell
ansible-playbook -i dev ...
#+END_SRC
Inside the inventory, hosts are grouped by their purpose. For each group there is a corresponding YAML file in the root of the
~ansible~ directory that specifies a playbook for the group. All of these files are included in the ~site.yml~ master playbook to
allow multiple hosts to be provisioned together.
** TODO ~metrics-common~
The [[https://gitweb.torproject.org/metrics-cloud.git/tree/ansible/roles/metrics-common][~metrics-common~]] role allows us to have a consistent environment between services, and closely matches the environment that
would be provided by a TSA managed machine. The role handles:
- installation of dependency packages from Debian (optionally from the backports repository)
- formats additional volumes attached to the instance using the specified filesystem
- sets the timezone to UTC (Q: /is this what TSA do?/)
- creates user accounts for each member of the team
- all team members can perform unlimited passwordless sudo (TSA hosts require a password)
- SSH password authentication is disabled
- all user account passwords are removed/disabled
- creates service user accounts as specified
- home directories are created as specified, and linked from ~/home/$user~
- lingering is enabled for service users
This is all configured via group variables in the [[https://gitweb.torproject.org/metrics-cloud.git/tree/ansible/group_vars][~ansible/group_vars/~]] folder. Examples there should help you to understand how
these work. These override the [[https://gitweb.torproject.org/metrics-cloud.git/tree/ansible/roles/metrics-common/defaults/main.yml][defaults]] set in the role.
** TODO Service roles
** DONE Linting
[[https://docs.ansible.com/ansible-lint/][~ansible-lint~]] is used to ensure we are complying with best practices. None of the team have formal training in the use of Ansible
so we are really making it up as we go along. Other tools may be used in the future, as we learn about them, to make sure we are using
things efficiently and correctly.
This is also run as part of the [[https://travis-ci.org/github/torproject/metrics-cloud/][continuous integration checks]] on Travis CI.
* TODO Common Tasks
** TODO Add a new member to the team
** TODO Update an SSH key for a team member
** TODO Deploy and provision a development environment for a service
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment