|
|
# metrics-cloud
|
|
|
|
|
|
[[!toc levels=3]]
|
|
|
|
|
|
## Overview
|
|
|
|
|
|
The metrics-cloud framework aims to enable:
|
|
|
|
|
|
- reproducible deployments of software
|
|
|
- consistency between those software deployments
|
|
|
|
|
|
Side-effects of these goals are:
|
|
|
|
|
|
- reproducible experiments (good science)
|
|
|
- reduced maintainence costs
|
|
|
- reduced human error
|
|
|
|
|
|
There are currently two components to the metrics-cloud framework: CloudFormation templates and Ansible playbooks.
|
|
|
The CloudFormation templates are relevant only to testing and development, while the Ansible playbooks are applicable
|
|
|
to both environments.
|
|
|
|
|
|
|
|
|
## Usage of AWS for Tor Metrics Development
|
|
|
|
|
|
Each member of the Tor Metrics team has a standing allowance of 100USD/month for development using AWS. In practice,
|
|
|
we have not used more than 50USD/month for the team in any one month and generally sit around 25USD/month. It is
|
|
|
still important to minimize costs when using AWS and the use of CloudFormation templates and Ansible playbooks for
|
|
|
rapid creation, provisioning and destruction should help with this.
|
|
|
|
|
|
|
|
|
### CloudFormation Templates
|
|
|
|
|
|
CloudFormation is an AWS service allowing the definition of *stacks*. These stacks describe a series of AWS services
|
|
|
using a domain-specific language and allow for the easy creation of a number of interconnected resources. All resources
|
|
|
in a stack are tagged with the stack name which allows for tracking of costs per project. Each stack can also have all
|
|
|
resources terminated together easily, allowing stacks to exist for only as long as they are needed.
|
|
|
|
|
|
The CloudFormation templates used in the framework can be found in the [cloudformation](https://gitweb.torproject.org/metrics-cloud.git/tree/cloudformation) folder of the repository.
|
|
|
|
|
|
It may be that for some services the templates are very simple, and others may be more complex. No matter the level of
|
|
|
complexity we still want to use the templates to ensure we are meeting the key goals of the framework and also to simplify
|
|
|
tracking of spending in the billing portal through the tags.
|
|
|
|
|
|
Documentation for CloudFormation, including an API reference, can be found at: <https://docs.aws.amazon.com/cloudformation/>.
|
|
|
|
|
|
|
|
|
#### Quickstart: Deploying a template
|
|
|
|
|
|
Each template begins with comments with any relevant notes about the template, and a deployment command that will upload
|
|
|
and deploy the template on AWS. The commands will look something like:
|
|
|
|
|
|
aws cloudformation deploy --region us-east-1 --stack-name `whoami`-exit-scanner-dev --template-file exit-scanner-dev.yml --parameter-overrides myKeyPair="$(./identify_user.sh)"
|
|
|
|
|
|
You'll notice that the command includes a call to `whoami` to prefix the stack name with your current username, and also
|
|
|
that the `identify_user.sh` script is used to determine which SSH key to use for new instances.
|
|
|
You do not need to modify this command line before running it.
|
|
|
|
|
|
Once the stack has been deployed from the template, you can view its resources and delete it through
|
|
|
the [CloudFormation management console](https://console.aws.amazon.com/cloudformation/home?region=us-east-1#/stacks?filteringText=&filteringStatus=active&viewNested=true&hideStacks=false).
|
|
|
|
|
|
|
|
|
#### SSH Key Selection
|
|
|
|
|
|
The [identify\_user.sh](https://gitweb.torproject.org/metrics-cloud.git/tree/cloudformation/identify_user.sh) script prints out the name of the SSH public key to be used based on either:
|
|
|
|
|
|
- the `TOR_METRICS_SSH_KEY` environment variable, or
|
|
|
- the current user name.
|
|
|
|
|
|
The environment variable takes precedence over the username to key mapping.
|
|
|
|
|
|
If you change the default key you would like to use, update the mapping in this shell script.
|
|
|
|
|
|
SSH keys are managed through the [EC2 management console](https://console.aws.amazon.com/ec2/v2/home?region=us-east-1#KeyPairs:) and are not (currently) managed by a CloudFormation template.
|
|
|
|
|
|
|
|
|
### Templates and Stacks
|
|
|
|
|
|
There is no directory hierachy for the templates in the `cloudformation` folder of the repository. There are a couple of naming
|
|
|
conventions used though:
|
|
|
|
|
|
- Development/testing templates/stacks use a `-dev` suffix after the service name
|
|
|
- Long-term and shared templates/stacks start with `metrics-`
|
|
|
|
|
|
|
|
|
#### `billing-alerts`
|
|
|
|
|
|
The [`billing-alerts` template](https://gitweb.torproject.org/metrics-cloud.git/tree/cloudformation/billing-alerts.yml) sends notifications to the subscribed individuals whenever the predicted spend for the month will be
|
|
|
over 50USD. Email addresses can be added here if other people should be notified too.
|
|
|
|
|
|
|
|
|
#### `metrics-vpc`
|
|
|
|
|
|
The [`metrics-vpc` template](https://gitweb.torproject.org/metrics-cloud.git/tree/cloudformation/metrics-vpc.yml) contains shared resources for Tor Metrics development templates. This includes:
|
|
|
|
|
|
1. MetricsVPC and MetricsSubnet
|
|
|
|
|
|
The subnet should be referenced by any resource that requires it. Use of the default VPC should be avoided as we
|
|
|
share the AWS account with other Tor teams.
|
|
|
|
|
|
For example, to create an EC2 instance:
|
|
|
|
|
|
Instance:
|
|
|
Type: AWS::EC2::Instance
|
|
|
Properties:
|
|
|
AvailabilityZone: !Select [ 0, !GetAZs ]
|
|
|
ImageId: ami-01db78123b2b99496
|
|
|
InstanceType: t2.large
|
|
|
SubnetId:
|
|
|
Fn::ImportValue: 'MetricsSubnet'
|
|
|
KeyName: !Ref myKeyPair
|
|
|
SecurityGroupIds:
|
|
|
- Fn::ImportValue: 'MetricsInternetSecurityGroup'
|
|
|
- Fn::ImportValue: 'MetricsPingableSecurityGroup'
|
|
|
- Fn::ImportValue: 'MetricsHTTPASecurityGroup'
|
|
|
|
|
|
Note also that the availability zone is not hardcoded to allow for portability between regions if we ever want that.
|
|
|
|
|
|
2. Various security groups
|
|
|
|
|
|
The EC2 example above uses some of the security groups from the `metrics-vpc` template. Refer to the template source
|
|
|
for details on each group's rules.
|
|
|
|
|
|
3. The development DNS zone
|
|
|
|
|
|
Often services require TLS certificates, or require DNS names for other reasons. To facilitate this, a zone is hosted
|
|
|
using Route53 allowing for DNS records to be created in CloudFormation templates. This zone is:
|
|
|
`tm-dev-aws.safemetrics.org`.
|
|
|
|
|
|
As an example, creating an A record for an EC2 instance with the subdomain of the stack name:
|
|
|
|
|
|
DNSName:
|
|
|
Type: AWS::Route53::RecordSet
|
|
|
Properties:
|
|
|
HostedZoneName: tm-dev-aws.safemetrics.org.
|
|
|
Name: !Join ['', [!Ref 'AWS::StackName', .tm-dev-aws.safemetrics.org.]]
|
|
|
Type: A
|
|
|
TTL: '300'
|
|
|
ResourceRecords:
|
|
|
- !GetAtt Instance.PublicIp
|
|
|
|
|
|
Q: *Can we use the MetricsDevZone export from `metrics-vpc` instead of explicitly defining the zone name every time?*
|
|
|
|
|
|
These domain names should **never** appear on anything user facing and are for **development purposes only**.
|
|
|
|
|
|
|
|
|
#### Typical Dev/Testing Stacks
|
|
|
|
|
|
A typical test/dev stack will consist of an EC2 instance and a DNS name. Some services store a lot of data and may have
|
|
|
a second volume attached for the data storage.
|
|
|
|
|
|
An example template with one t2.large EC2 instance, a 15GB additional disk, and a DNS name:
|
|
|
|
|
|
---
|
|
|
# CloudFormation Stack for example development instance
|
|
|
# This stack will only deploy on us-east-1 and will deploy in the Metrics VPC
|
|
|
# aws cloudformation deploy --region us-east-1 --stack-name `whoami`-example-dev --template-file example-dev.yml --parameter-overrides myKeyPair="$(./identify_user.sh)"
|
|
|
AWSTemplateFormatVersion: 2010-09-09
|
|
|
Parameters:
|
|
|
myKeyPair:
|
|
|
Description: Amazon EC2 Key Pair
|
|
|
Type: "AWS::EC2::KeyPair::KeyName"
|
|
|
Resources:
|
|
|
Instance:
|
|
|
Type: AWS::EC2::Instance
|
|
|
Properties:
|
|
|
AvailabilityZone: !Select [ 0, !GetAZs ]
|
|
|
ImageId: ami-01db78123b2b99496
|
|
|
InstanceType: t2.large
|
|
|
SubnetId:
|
|
|
Fn::ImportValue: 'MetricsSubnet'
|
|
|
KeyName: !Ref myKeyPair
|
|
|
SecurityGroupIds:
|
|
|
- Fn::ImportValue: 'MetricsInternetSecurityGroup'
|
|
|
- Fn::ImportValue: 'MetricsPingableSecurityGroup'
|
|
|
- Fn::ImportValue: 'MetricsHTTPSecurityGroup'
|
|
|
- Fn::ImportValue: 'MetricsHTTPSSecurityGroup'
|
|
|
ServiceVolume:
|
|
|
Type: AWS::EC2::Volume
|
|
|
Properties:
|
|
|
AvailabilityZone: !Select [ 0, !GetAZs ]
|
|
|
Size: 15
|
|
|
VolumeType: gp2
|
|
|
ServiceVolumeAttachment:
|
|
|
Type: AWS::EC2::VolumeAttachment
|
|
|
Properties:
|
|
|
Device: /dev/sdb
|
|
|
InstanceId: !Ref Instance
|
|
|
VolumeId: !Ref ServiceVolume
|
|
|
DNSName:
|
|
|
Type: AWS::Route53::RecordSet
|
|
|
Properties:
|
|
|
HostedZoneName: tm-dev-aws.safemetrics.org.
|
|
|
Name: !Join ['', [!Ref 'AWS::StackName', .tm-dev-aws.safemetrics.org.]]
|
|
|
Type: A
|
|
|
TTL: '300'
|
|
|
ResourceRecords:
|
|
|
- !GetAtt Instance.PublicIp
|
|
|
Outputs:
|
|
|
PublicIp:
|
|
|
Description: "Instance public IP"
|
|
|
Value: !GetAtt Instance.PublicIp
|
|
|
|
|
|
It's not common to use other AWS services as part of these templates as the goal is usually to have these services
|
|
|
deployed on TPA managed hosts.
|
|
|
|
|
|
|
|
|
### Linting
|
|
|
|
|
|
[`cfn-lint`](https://github.com/aws-cloudformation/cfn-python-lint) is used to ensure we are complying with best practices. None of the team have formal training in the use of CloudFormation
|
|
|
so we are really making it up as we go along. Other tools may be used in the future, as we learn about them, to make sure we are using
|
|
|
things efficiently and correctly.
|
|
|
|
|
|
This is also run as part of the [continuous integration checks](https://travis-ci.org/github/torproject/metrics-cloud/) on Travis CI.
|
|
|
|
|
|
|
|
|
## Ansible Playbooks
|
|
|
|
|
|
Ansible is an open-source software provisioning, configuration management, and application-deployment tool. It's written in Python,
|
|
|
is mature, and has an extensive selection of modules for almost everything we could need.
|
|
|
|
|
|
|
|
|
### Inventories and site.yml
|
|
|
|
|
|
In general, there are two inventories: [production](https://gitweb.torproject.org/metrics-cloud.git/tree/ansible/production) and dev. Only the production inventory is committed to git, the dev inventory will
|
|
|
vary between members of the team, referencing their own dev instances as created by CloudFormation. We do not specify a default
|
|
|
inventory in the `ansible.cfg` file, so you must specify an inventory for every invocation of `ansible-playbook` using the `-i` flag:
|
|
|
|
|
|
ansible-playbook -i dev ...
|
|
|
|
|
|
Inside the inventory, hosts are grouped by their purpose. For each group there is a corresponding YAML file in the root of the
|
|
|
`ansible` directory that specifies a playbook for the group. All of these files are included in the `site.yml` master playbook to
|
|
|
allow multiple hosts to be provisioned together.
|
|
|
|
|
|
|
|
|
### `metrics-common`
|
|
|
|
|
|
The [`metrics-common`](https://gitweb.torproject.org/metrics-cloud.git/tree/ansible/roles/metrics-common) role allows us to have a consistent environment between services, and closely matches the environment that
|
|
|
would be provided by a TSA managed machine. The role handles:
|
|
|
|
|
|
- installation of dependency packages from Debian (optionally from the backports repository)
|
|
|
- formats additional volumes attached to the instance using the specified filesystem
|
|
|
- sets the timezone to UTC (Q: *is this what TSA do?*)
|
|
|
- creates user accounts for each member of the team
|
|
|
- all team members can perform unlimited passwordless sudo (TSA hosts require a password)
|
|
|
- SSH password authentication is disabled
|
|
|
- all user account passwords are removed/disabled
|
|
|
- creates service user accounts as specified
|
|
|
- home directories are created as specified, and linked from `/home/$user`
|
|
|
- lingering is enabled for service users
|
|
|
|
|
|
This is all configured via group variables in the [`ansible/group_vars/`](https://gitweb.torproject.org/metrics-cloud.git/tree/ansible/group_vars) folder. Examples there should help you to understand how
|
|
|
these work. These override the [defaults](https://gitweb.torproject.org/metrics-cloud.git/tree/ansible/roles/metrics-common/defaults/main.yml) set in the role.
|
|
|
|
|
|
|
|
|
### Service roles
|
|
|
|
|
|
|
|
|
### Linting
|
|
|
|
|
|
[`ansible-lint`](https://docs.ansible.com/ansible-lint/) is used to ensure we are complying with best practices. None of the team have formal training in the use of Ansible
|
|
|
so we are really making it up as we go along. Other tools may be used in the future, as we learn about them, to make sure we are using
|
|
|
things efficiently and correctly.
|
|
|
|
|
|
This is also run as part of the [continuous integration checks](https://travis-ci.org/github/torproject/metrics-cloud/) on Travis CI.
|
|
|
|
|
|
|
|
|
## Common Tasks
|
|
|
|
|
|
|
|
|
### Add a new member to the team
|
|
|
|
|
|
|
|
|
### Update an SSH key for a team member
|
|
|
|
|
|
|
|
|
### Deploy and provision a development environment for a service
|
|
|
|