From b120576ae8e70ad634d05481425ab0cf1903e373 Mon Sep 17 00:00:00 2001 From: "Iain R. Learmonth" Date: Wed, 22 Apr 2020 14:41:57 +0100 Subject: [PATCH] metrics convert all to markdown now --- metrics/ops/{exit-ops.md => exit-ops.mdwn} | 0 metrics/ops/metrics-cloud.html | 702 --------------------- metrics/ops/metrics-cloud.mdwn | 336 ++++++++++ metrics/ops/metrics-cloud.org | 269 -------- 4 files changed, 336 insertions(+), 971 deletions(-) rename metrics/ops/{exit-ops.md => exit-ops.mdwn} (100%) delete mode 100644 metrics/ops/metrics-cloud.html create mode 100644 metrics/ops/metrics-cloud.mdwn delete mode 100644 metrics/ops/metrics-cloud.org diff --git a/metrics/ops/exit-ops.md b/metrics/ops/exit-ops.mdwn similarity index 100% rename from metrics/ops/exit-ops.md rename to metrics/ops/exit-ops.mdwn diff --git a/metrics/ops/metrics-cloud.html b/metrics/ops/metrics-cloud.html deleted file mode 100644 index ba0c6cc7..00000000 --- a/metrics/ops/metrics-cloud.html +++ /dev/null @@ -1,702 +0,0 @@ - - - - - - - -metrics-cloud: Scripts for orchestrating Tor Metrics services - - - - - - -
-

metrics-cloud: Scripts for orchestrating Tor Metrics services

- - -
-

1 DONE Synopsis

-
-

-The metrics-cloud framework aims to enable: -

- -
    -
  • reproducible deployments of software
  • -
  • consistency between those software deployments
  • -
- -

-Side-effects of these goals are: -

- -
    -
  • reproducible experiments (good science)
  • -
  • reduced maintainence costs
  • -
  • reduced human error
  • -
- -

-There are currently two components to the metrics-cloud framework: CloudFormation templates and Ansible playbooks. -The CloudFormation templates are relevant only to testing and development, while the Ansible playbooks are applicable -to both environments. -

-
-
- -
-

2 DONE Usage of AWS for Tor Metrics Development

-
-

-Each member of the Tor Metrics team has a standing allowance of 100USD/month for development using AWS. In practice, -we have not used more than 50USD/month for the team in any one month and generally sit around 25USD/month. It is -still important to minimize costs when using AWS and the use of CloudFormation templates and Ansible playbooks for -rapid creation, provisioning and destruction should help with this. -

-
- -
-

2.1 DONE CloudFormation Templates

-
-

-CloudFormation is an AWS service allowing the definition of stacks. These stacks describe a series of AWS services -using a domain-specific language and allow for the easy creation of a number of interconnected resources. All resources -in a stack are tagged with the stack name which allows for tracking of costs per project. Each stack can also have all -resources terminated together easily, allowing stacks to exist for only as long as they are needed. -

- -

-The CloudFormation templates used in the framework can be found in the cloudformation folder of the repository. -

- -

-It may be that for some services the templates are very simple, and others may be more complex. No matter the level of -complexity we still want to use the templates to ensure we are meeting the key goals of the framework and also to simplify -tracking of spending in the billing portal through the tags. -

- -

-Documentation for CloudFormation, including an API reference, can be found at: https://docs.aws.amazon.com/cloudformation/. -

-
- -
-

2.1.1 DONE Quickstart: Deploying a template

-
-

-Each template begins with comments with any relevant notes about the template, and a deployment command that will upload -and deploy the template on AWS. The commands will look something like: -

- -
-
aws cloudformation deploy --region us-east-1 --stack-name `whoami`-exit-scanner-dev --template-file exit-scanner-dev.yml --parameter-overrides myKeyPair="$(./identify_user.sh)"
-
-
- -

-You'll notice that the command includes a call to whoami to prefix the stack name with your current username, and also -that the identify_user.sh script is used to determine which SSH key to use for new instances. -You do not need to modify this command line before running it. -

- -

-Once the stack has been deployed from the template, you can view its resources and delete it through -the CloudFormation management console. -

-
-
- -
-

2.1.2 DONE SSH Key Selection

-
-

-The identify_user.sh script prints out the name of the SSH public key to be used based on either: -

- -
    -
  • the TOR_METRICS_SSH_KEY environment variable, or
  • -
  • the current user name.
  • -
- -

-The environment variable takes precedence over the username to key mapping. -

- -

-If you change the default key you would like to use, update the mapping in this shell script. -

- -

-SSH keys are managed through the EC2 management console and are not (currently) managed by a CloudFormation template. -

-
-
-
- -
-

2.2 DONE Templates and Stacks

-
-

-There is no directory hierachy for the templates in the cloudformation folder of the repository. There are a couple of naming -conventions used though: -

- -
    -
  • Development/testing templates/stacks use a -dev suffix after the service name
  • -
  • Long-term and shared templates/stacks start with metrics-
  • -
-
- -
-

2.2.1 DONE billing-alerts

-
-

-The billing-alerts template sends notifications to the subscribed individuals whenever the predicted spend for the month will be -over 50USD. Email addresses can be added here if other people should be notified too. -

-
-
- -
-

2.2.2 DONE metrics-vpc

-
-

-The metrics-vpc template contains shared resources for Tor Metrics development templates. This includes: -

-
- -
    -
  1. MetricsVPC and MetricsSubnet
    -
    -

    -The subnet should be referenced by any resource that requires it. Use of the default VPC should be avoided as we -share the AWS account with other Tor teams. -

    - -

    -For example, to create an EC2 instance: -

    - -
    -
    Instance:
    -  Type: AWS::EC2::Instance
    -  Properties:
    -    AvailabilityZone: !Select [ 0, !GetAZs ]
    -    ImageId: ami-01db78123b2b99496
    -    InstanceType: t2.large
    -    SubnetId:
    -      Fn::ImportValue: 'MetricsSubnet'
    -    KeyName: !Ref myKeyPair
    -    SecurityGroupIds:
    -      - Fn::ImportValue: 'MetricsInternetSecurityGroup'
    -      - Fn::ImportValue: 'MetricsPingableSecurityGroup'
    -      - Fn::ImportValue: 'MetricsHTTPASecurityGroup'
    -
    -
    - -

    -Note also that the availability zone is not hardcoded to allow for portability between regions if we ever want that. -

    -
    -
  2. - -
  3. Various security groups
    -
    -

    -The EC2 example above uses some of the security groups from the metrics-vpc template. Refer to the template source -for details on each group's rules. -

    -
    -
  4. - -
  5. The development DNS zone
    -
    -

    -Often services require TLS certificates, or require DNS names for other reasons. To facilitate this, a zone is hosted -using Route53 allowing for DNS records to be created in CloudFormation templates. This zone is: -tm-dev-aws.safemetrics.org. -

    - -

    -As an example, creating an A record for an EC2 instance with the subdomain of the stack name: -

    - -
    -
    DNSName:
    -  Type: AWS::Route53::RecordSet
    -  Properties:
    -    HostedZoneName: tm-dev-aws.safemetrics.org.
    -    Name: !Join ['', [!Ref 'AWS::StackName', .tm-dev-aws.safemetrics.org.]]
    -    Type: A
    -    TTL: '300'
    -    ResourceRecords:
    -    - !GetAtt Instance.PublicIp
    -
    -
    - -

    -Q: Can we use the MetricsDevZone export from metrics-vpc instead of explicitly defining the zone name every time? -

    - -

    -These domain names should never appear on anything user facing and are for development purposes only. -

    -
    -
  6. -
-
- -
-

2.2.3 DONE Typical Dev/Testing Stacks

-
-

-A typical test/dev stack will consist of an EC2 instance and a DNS name. Some services store a lot of data and may have -a second volume attached for the data storage. -

- -

-An example template with one t2.large EC2 instance, a 15GB additional disk, and a DNS name: -

- -
-
---
-# CloudFormation Stack for example development instance
-# This stack will only deploy on us-east-1 and will deploy in the Metrics VPC
-# aws cloudformation deploy --region us-east-1 --stack-name `whoami`-example-dev --template-file example-dev.yml --parameter-overrides myKeyPair="$(./identify_user.sh)"
-AWSTemplateFormatVersion: 2010-09-09
-Parameters:
-  myKeyPair:
-    Description: Amazon EC2 Key Pair
-    Type: "AWS::EC2::KeyPair::KeyName"
-Resources:
-  Instance:
-    Type: AWS::EC2::Instance
-    Properties:
-      AvailabilityZone: !Select [ 0, !GetAZs ]
-      ImageId: ami-01db78123b2b99496
-      InstanceType: t2.large
-      SubnetId:
-	Fn::ImportValue: 'MetricsSubnet'
-      KeyName: !Ref myKeyPair
-      SecurityGroupIds:
-	- Fn::ImportValue: 'MetricsInternetSecurityGroup'
-	- Fn::ImportValue: 'MetricsPingableSecurityGroup'
-	- Fn::ImportValue: 'MetricsHTTPSecurityGroup'
-	- Fn::ImportValue: 'MetricsHTTPSSecurityGroup'
-  ServiceVolume:
-    Type: AWS::EC2::Volume
-    Properties: 
-      AvailabilityZone: !Select [ 0, !GetAZs ]
-      Size: 15
-      VolumeType: gp2
-  ServiceVolumeAttachment:
-    Type: AWS::EC2::VolumeAttachment
-    Properties:
-      Device: /dev/sdb
-      InstanceId: !Ref Instance
-      VolumeId: !Ref ServiceVolume
-  DNSName:
-    Type: AWS::Route53::RecordSet
-    Properties:
-      HostedZoneName: tm-dev-aws.safemetrics.org.
-      Name: !Join ['', [!Ref 'AWS::StackName', .tm-dev-aws.safemetrics.org.]]
-      Type: A
-      TTL: '300'
-      ResourceRecords:
-      - !GetAtt Instance.PublicIp
-Outputs:
-  PublicIp:
-    Description: "Instance public IP"
-    Value: !GetAtt Instance.PublicIp
-
-
- -

-It's not common to use other AWS services as part of these templates as the goal is usually to have these services -deployed on TPA managed hosts. -

-
-
-
- -
-

2.3 DONE Linting

-
-

-cfn-lint is used to ensure we are complying with best practices. None of the team have formal training in the use of CloudFormation -so we are really making it up as we go along. Other tools may be used in the future, as we learn about them, to make sure we are using -things efficiently and correctly. -

- -

-This is also run as part of the continuous integration checks on Travis CI. -

-
-
-
- -
-

3 TODO Ansible Playbooks

-
-

-Ansible is an open-source software provisioning, configuration management, and application-deployment tool. It's written in Python, -is mature, and has an extensive selection of modules for almost everything we could need. -

-
- -
-

3.1 TODO Inventories and site.yml

-
-

-In general, there are two inventories: production and dev. Only the production inventory is committed to git, the dev inventory will -vary between members of the team, referencing their own dev instances as created by CloudFormation. We do not specify a default -inventory in the ansible.cfg file, so you must specify an inventory for every invocation of ansible-playbook using the -i flag: -

- -
-
ansible-playbook -i dev ...
-
-
- -

-Inside the inventory, hosts are grouped by their purpose. For each group there is a corresponding YAML file in the root of the -ansible directory that specifies a playbook for the group. All of these files are included in the site.yml master playbook to -allow multiple hosts to be provisioned together. -

-
-
- -
-

3.2 TODO metrics-common

-
-

-The metrics-common role allows us to have a consistent environment between services, and closely matches the environment that -would be provided by a TSA managed machine. The role handles: -

- -
    -
  • installation of dependency packages from Debian (optionally from the backports repository)
  • -
  • formats additional volumes attached to the instance using the specified filesystem
  • -
  • sets the timezone to UTC (Q: is this what TSA do?)
  • -
  • creates user accounts for each member of the team -
      -
    • all team members can perform unlimited passwordless sudo (TSA hosts require a password)
    • -
    • SSH password authentication is disabled
    • -
    • all user account passwords are removed/disabled
    • -
  • -
  • creates service user accounts as specified -
      -
    • home directories are created as specified, and linked from /home/$user
    • -
    • lingering is enabled for service users
    • -
  • -
- -

-This is all configured via group variables in the ansible/group_vars/ folder. Examples there should help you to understand how -these work. These override the defaults set in the role. -

-
-
- -
-

3.3 TODO Service roles

-
- -
-

3.4 DONE Linting

-
-

-ansible-lint is used to ensure we are complying with best practices. None of the team have formal training in the use of Ansible -so we are really making it up as we go along. Other tools may be used in the future, as we learn about them, to make sure we are using -things efficiently and correctly. -

- -

-This is also run as part of the continuous integration checks on Travis CI. -

-
-
-
- -
-

4 TODO Common Tasks

-
-
-
-

4.1 TODO Add a new member to the team

-
- -
-

4.2 TODO Update an SSH key for a team member

-
- -
-

4.3 TODO Deploy and provision a development environment for a service

-
-
-
-
-

Author: Iain Learmonth

-

Created: 2020-04-02 Thu 14:12

-

Validate

-
- - diff --git a/metrics/ops/metrics-cloud.mdwn b/metrics/ops/metrics-cloud.mdwn new file mode 100644 index 00000000..44367f8e --- /dev/null +++ b/metrics/ops/metrics-cloud.mdwn @@ -0,0 +1,336 @@ + +# Table of Contents + +1. [Synopsis](#orgb3a4817) +2. [Usage of AWS for Tor Metrics Development](#orgb76cd81) + 1. [CloudFormation Templates](#orgee150b1) + 1. [Quickstart: Deploying a template](#org7813a03) + 2. [SSH Key Selection](#orgdc7711c) + 2. [Templates and Stacks](#org19b1306) + 1. [`billing-alerts`](#org1b9ae57) + 2. [`metrics-vpc`](#org2c178f5) + 3. [Typical Dev/Testing Stacks](#org97f9e67) + 3. [Linting](#orga89e157) +3. [Ansible Playbooks](#org8371364) + 1. [Inventories and site.yml](#org81a0dc9) + 2. [`metrics-common`](#org55e2902) + 3. [Service roles](#org7050aae) + 4. [Linting](#org9684f51) +4. [Common Tasks](#org8267248) + 1. [Add a new member to the team](#org9040a14) + 2. [Update an SSH key for a team member](#org97696ab) + 3. [Deploy and provision a development environment for a service](#org400659a) + + + + + +# DONE Synopsis + +The metrics-cloud framework aims to enable: + +- reproducible deployments of software +- consistency between those software deployments + +Side-effects of these goals are: + +- reproducible experiments (good science) +- reduced maintainence costs +- reduced human error + +There are currently two components to the metrics-cloud framework: CloudFormation templates and Ansible playbooks. +The CloudFormation templates are relevant only to testing and development, while the Ansible playbooks are applicable +to both environments. + + + + +# DONE Usage of AWS for Tor Metrics Development + +Each member of the Tor Metrics team has a standing allowance of 100USD/month for development using AWS. In practice, +we have not used more than 50USD/month for the team in any one month and generally sit around 25USD/month. It is +still important to minimize costs when using AWS and the use of CloudFormation templates and Ansible playbooks for +rapid creation, provisioning and destruction should help with this. + + + + +## DONE CloudFormation Templates + +CloudFormation is an AWS service allowing the definition of *stacks*. These stacks describe a series of AWS services +using a domain-specific language and allow for the easy creation of a number of interconnected resources. All resources +in a stack are tagged with the stack name which allows for tracking of costs per project. Each stack can also have all +resources terminated together easily, allowing stacks to exist for only as long as they are needed. + +The CloudFormation templates used in the framework can be found in the [cloudformation](https://gitweb.torproject.org/metrics-cloud.git/tree/cloudformation) folder of the repository. + +It may be that for some services the templates are very simple, and others may be more complex. No matter the level of +complexity we still want to use the templates to ensure we are meeting the key goals of the framework and also to simplify +tracking of spending in the billing portal through the tags. + +Documentation for CloudFormation, including an API reference, can be found at: . + + + + +### DONE Quickstart: Deploying a template + +Each template begins with comments with any relevant notes about the template, and a deployment command that will upload +and deploy the template on AWS. The commands will look something like: + + aws cloudformation deploy --region us-east-1 --stack-name `whoami`-exit-scanner-dev --template-file exit-scanner-dev.yml --parameter-overrides myKeyPair="$(./identify_user.sh)" + +You'll notice that the command includes a call to `whoami` to prefix the stack name with your current username, and also +that the `identify_user.sh` script is used to determine which SSH key to use for new instances. +You do not need to modify this command line before running it. + +Once the stack has been deployed from the template, you can view its resources and delete it through +the [CloudFormation management console](https://console.aws.amazon.com/cloudformation/home?region=us-east-1#/stacks?filteringText=&filteringStatus=active&viewNested=true&hideStacks=false). + + + + +### DONE SSH Key Selection + +The [identify\_user.sh](https://gitweb.torproject.org/metrics-cloud.git/tree/cloudformation/identify_user.sh) script prints out the name of the SSH public key to be used based on either: + +- the `TOR_METRICS_SSH_KEY` environment variable, or +- the current user name. + +The environment variable takes precedence over the username to key mapping. + +If you change the default key you would like to use, update the mapping in this shell script. + +SSH keys are managed through the [EC2 management console](https://console.aws.amazon.com/ec2/v2/home?region=us-east-1#KeyPairs:) and are not (currently) managed by a CloudFormation template. + + + + +## DONE Templates and Stacks + +There is no directory hierachy for the templates in the `cloudformation` folder of the repository. There are a couple of naming +conventions used though: + +- Development/testing templates/stacks use a `-dev` suffix after the service name +- Long-term and shared templates/stacks start with `metrics-` + + + + +### DONE `billing-alerts` + +The [`billing-alerts` template](https://gitweb.torproject.org/metrics-cloud.git/tree/cloudformation/billing-alerts.yml) sends notifications to the subscribed individuals whenever the predicted spend for the month will be +over 50USD. Email addresses can be added here if other people should be notified too. + + + + +### DONE `metrics-vpc` + +The [`metrics-vpc` template](https://gitweb.torproject.org/metrics-cloud.git/tree/cloudformation/metrics-vpc.yml) contains shared resources for Tor Metrics development templates. This includes: + +1. MetricsVPC and MetricsSubnet + + The subnet should be referenced by any resource that requires it. Use of the default VPC should be avoided as we + share the AWS account with other Tor teams. + + For example, to create an EC2 instance: + + Instance: + Type: AWS::EC2::Instance + Properties: + AvailabilityZone: !Select [ 0, !GetAZs ] + ImageId: ami-01db78123b2b99496 + InstanceType: t2.large + SubnetId: + Fn::ImportValue: 'MetricsSubnet' + KeyName: !Ref myKeyPair + SecurityGroupIds: + - Fn::ImportValue: 'MetricsInternetSecurityGroup' + - Fn::ImportValue: 'MetricsPingableSecurityGroup' + - Fn::ImportValue: 'MetricsHTTPASecurityGroup' + + Note also that the availability zone is not hardcoded to allow for portability between regions if we ever want that. + +2. Various security groups + + The EC2 example above uses some of the security groups from the `metrics-vpc` template. Refer to the template source + for details on each group's rules. + +3. The development DNS zone + + Often services require TLS certificates, or require DNS names for other reasons. To facilitate this, a zone is hosted + using Route53 allowing for DNS records to be created in CloudFormation templates. This zone is: + `tm-dev-aws.safemetrics.org`. + + As an example, creating an A record for an EC2 instance with the subdomain of the stack name: + + DNSName: + Type: AWS::Route53::RecordSet + Properties: + HostedZoneName: tm-dev-aws.safemetrics.org. + Name: !Join ['', [!Ref 'AWS::StackName', .tm-dev-aws.safemetrics.org.]] + Type: A + TTL: '300' + ResourceRecords: + - !GetAtt Instance.PublicIp + + Q: *Can we use the MetricsDevZone export from `metrics-vpc` instead of explicitly defining the zone name every time?* + + These domain names should **never** appear on anything user facing and are for **development purposes only**. + + + + +### DONE Typical Dev/Testing Stacks + +A typical test/dev stack will consist of an EC2 instance and a DNS name. Some services store a lot of data and may have +a second volume attached for the data storage. + +An example template with one t2.large EC2 instance, a 15GB additional disk, and a DNS name: + + --- + # CloudFormation Stack for example development instance + # This stack will only deploy on us-east-1 and will deploy in the Metrics VPC + # aws cloudformation deploy --region us-east-1 --stack-name `whoami`-example-dev --template-file example-dev.yml --parameter-overrides myKeyPair="$(./identify_user.sh)" + AWSTemplateFormatVersion: 2010-09-09 + Parameters: + myKeyPair: + Description: Amazon EC2 Key Pair + Type: "AWS::EC2::KeyPair::KeyName" + Resources: + Instance: + Type: AWS::EC2::Instance + Properties: + AvailabilityZone: !Select [ 0, !GetAZs ] + ImageId: ami-01db78123b2b99496 + InstanceType: t2.large + SubnetId: + Fn::ImportValue: 'MetricsSubnet' + KeyName: !Ref myKeyPair + SecurityGroupIds: + - Fn::ImportValue: 'MetricsInternetSecurityGroup' + - Fn::ImportValue: 'MetricsPingableSecurityGroup' + - Fn::ImportValue: 'MetricsHTTPSecurityGroup' + - Fn::ImportValue: 'MetricsHTTPSSecurityGroup' + ServiceVolume: + Type: AWS::EC2::Volume + Properties: + AvailabilityZone: !Select [ 0, !GetAZs ] + Size: 15 + VolumeType: gp2 + ServiceVolumeAttachment: + Type: AWS::EC2::VolumeAttachment + Properties: + Device: /dev/sdb + InstanceId: !Ref Instance + VolumeId: !Ref ServiceVolume + DNSName: + Type: AWS::Route53::RecordSet + Properties: + HostedZoneName: tm-dev-aws.safemetrics.org. + Name: !Join ['', [!Ref 'AWS::StackName', .tm-dev-aws.safemetrics.org.]] + Type: A + TTL: '300' + ResourceRecords: + - !GetAtt Instance.PublicIp + Outputs: + PublicIp: + Description: "Instance public IP" + Value: !GetAtt Instance.PublicIp + +It's not common to use other AWS services as part of these templates as the goal is usually to have these services +deployed on TPA managed hosts. + + + + +## DONE Linting + +[`cfn-lint`](https://github.com/aws-cloudformation/cfn-python-lint) is used to ensure we are complying with best practices. None of the team have formal training in the use of CloudFormation +so we are really making it up as we go along. Other tools may be used in the future, as we learn about them, to make sure we are using +things efficiently and correctly. + +This is also run as part of the [continuous integration checks](https://travis-ci.org/github/torproject/metrics-cloud/) on Travis CI. + + + + +# TODO Ansible Playbooks + +Ansible is an open-source software provisioning, configuration management, and application-deployment tool. It's written in Python, +is mature, and has an extensive selection of modules for almost everything we could need. + + + + +## TODO Inventories and site.yml + +In general, there are two inventories: [production](https://gitweb.torproject.org/metrics-cloud.git/tree/ansible/production) and dev. Only the production inventory is committed to git, the dev inventory will +vary between members of the team, referencing their own dev instances as created by CloudFormation. We do not specify a default +inventory in the `ansible.cfg` file, so you must specify an inventory for every invocation of `ansible-playbook` using the `-i` flag: + + ansible-playbook -i dev ... + +Inside the inventory, hosts are grouped by their purpose. For each group there is a corresponding YAML file in the root of the +`ansible` directory that specifies a playbook for the group. All of these files are included in the `site.yml` master playbook to +allow multiple hosts to be provisioned together. + + + + +## TODO `metrics-common` + +The [`metrics-common`](https://gitweb.torproject.org/metrics-cloud.git/tree/ansible/roles/metrics-common) role allows us to have a consistent environment between services, and closely matches the environment that +would be provided by a TSA managed machine. The role handles: + +- installation of dependency packages from Debian (optionally from the backports repository) +- formats additional volumes attached to the instance using the specified filesystem +- sets the timezone to UTC (Q: *is this what TSA do?*) +- creates user accounts for each member of the team + - all team members can perform unlimited passwordless sudo (TSA hosts require a password) + - SSH password authentication is disabled + - all user account passwords are removed/disabled +- creates service user accounts as specified + - home directories are created as specified, and linked from `/home/$user` + - lingering is enabled for service users + +This is all configured via group variables in the [`ansible/group_vars/`](https://gitweb.torproject.org/metrics-cloud.git/tree/ansible/group_vars) folder. Examples there should help you to understand how +these work. These override the [defaults](https://gitweb.torproject.org/metrics-cloud.git/tree/ansible/roles/metrics-common/defaults/main.yml) set in the role. + + + + +## TODO Service roles + + + + +## DONE Linting + +[`ansible-lint`](https://docs.ansible.com/ansible-lint/) is used to ensure we are complying with best practices. None of the team have formal training in the use of Ansible +so we are really making it up as we go along. Other tools may be used in the future, as we learn about them, to make sure we are using +things efficiently and correctly. + +This is also run as part of the [continuous integration checks](https://travis-ci.org/github/torproject/metrics-cloud/) on Travis CI. + + + + +# TODO Common Tasks + + + + +## TODO Add a new member to the team + + + + +## TODO Update an SSH key for a team member + + + + +## TODO Deploy and provision a development environment for a service + diff --git a/metrics/ops/metrics-cloud.org b/metrics/ops/metrics-cloud.org deleted file mode 100644 index 17eb3752..00000000 --- a/metrics/ops/metrics-cloud.org +++ /dev/null @@ -1,269 +0,0 @@ -#+TITLE: metrics-cloud: Scripts for orchestrating Tor Metrics services -#+OPTIONS: ^:nil - -* DONE Synopsis - -The metrics-cloud framework aims to enable: - -- reproducible deployments of software -- consistency between those software deployments - -Side-effects of these goals are: - -- reproducible experiments (good science) -- reduced maintainence costs -- reduced human error - -There are currently two components to the metrics-cloud framework: CloudFormation templates and Ansible playbooks. -The CloudFormation templates are relevant only to testing and development, while the Ansible playbooks are applicable -to both environments. - -* DONE Usage of AWS for Tor Metrics Development - -Each member of the Tor Metrics team has a standing allowance of 100USD/month for development using AWS. In practice, -we have not used more than 50USD/month for the team in any one month and generally sit around 25USD/month. It is -still important to minimize costs when using AWS and the use of CloudFormation templates and Ansible playbooks for -rapid creation, provisioning and destruction should help with this. - -** DONE CloudFormation Templates - -CloudFormation is an AWS service allowing the definition of /stacks/. These stacks describe a series of AWS services -using a domain-specific language and allow for the easy creation of a number of interconnected resources. All resources -in a stack are tagged with the stack name which allows for tracking of costs per project. Each stack can also have all -resources terminated together easily, allowing stacks to exist for only as long as they are needed. - -The CloudFormation templates used in the framework can be found in the [[https://gitweb.torproject.org/metrics-cloud.git/tree/cloudformation][cloudformation]] folder of the repository. - -It may be that for some services the templates are very simple, and others may be more complex. No matter the level of -complexity we still want to use the templates to ensure we are meeting the key goals of the framework and also to simplify -tracking of spending in the billing portal through the tags. - -Documentation for CloudFormation, including an API reference, can be found at: https://docs.aws.amazon.com/cloudformation/. - -*** DONE Quickstart: Deploying a template - -Each template begins with comments with any relevant notes about the template, and a deployment command that will upload -and deploy the template on AWS. The commands will look something like: - -#+BEGIN_SRC shell -aws cloudformation deploy --region us-east-1 --stack-name `whoami`-exit-scanner-dev --template-file exit-scanner-dev.yml --parameter-overrides myKeyPair="$(./identify_user.sh)" -#+END_SRC - -You'll notice that the command includes a call to ~whoami~ to prefix the stack name with your current username, and also -that the ~identify_user.sh~ script is used to determine which SSH key to use for new instances. -You do not need to modify this command line before running it. - -Once the stack has been deployed from the template, you can view its resources and delete it through -the [[https://console.aws.amazon.com/cloudformation/home?region=us-east-1#/stacks?filteringText=&filteringStatus=active&viewNested=true&hideStacks=false][CloudFormation management console]]. - -*** DONE SSH Key Selection - -The [[https://gitweb.torproject.org/metrics-cloud.git/tree/cloudformation/identify_user.sh][identify_user.sh]] script prints out the name of the SSH public key to be used based on either: - -- the ~TOR_METRICS_SSH_KEY~ environment variable, or -- the current user name. - -The environment variable takes precedence over the username to key mapping. - -If you change the default key you would like to use, update the mapping in this shell script. - -SSH keys are managed through the [[https://console.aws.amazon.com/ec2/v2/home?region=us-east-1#KeyPairs:][EC2 management console]] and are not (currently) managed by a CloudFormation template. - -** DONE Templates and Stacks - -There is no directory hierachy for the templates in the ~cloudformation~ folder of the repository. There are a couple of naming -conventions used though: - -- Development/testing templates/stacks use a ~-dev~ suffix after the service name -- Long-term and shared templates/stacks start with ~metrics-~ - -*** DONE ~billing-alerts~ - -The [[https://gitweb.torproject.org/metrics-cloud.git/tree/cloudformation/billing-alerts.yml][~billing-alerts~ template]] sends notifications to the subscribed individuals whenever the predicted spend for the month will be -over 50USD. Email addresses can be added here if other people should be notified too. - -*** DONE ~metrics-vpc~ - -The [[https://gitweb.torproject.org/metrics-cloud.git/tree/cloudformation/metrics-vpc.yml][~metrics-vpc~ template]] contains shared resources for Tor Metrics development templates. This includes: - -**** MetricsVPC and MetricsSubnet - -The subnet should be referenced by any resource that requires it. Use of the default VPC should be avoided as we -share the AWS account with other Tor teams. - -For example, to create an EC2 instance: - -#+BEGIN_SRC yaml - Instance: - Type: AWS::EC2::Instance - Properties: - AvailabilityZone: !Select [ 0, !GetAZs ] - ImageId: ami-01db78123b2b99496 - InstanceType: t2.large - SubnetId: - Fn::ImportValue: 'MetricsSubnet' - KeyName: !Ref myKeyPair - SecurityGroupIds: - - Fn::ImportValue: 'MetricsInternetSecurityGroup' - - Fn::ImportValue: 'MetricsPingableSecurityGroup' - - Fn::ImportValue: 'MetricsHTTPASecurityGroup' -#+END_SRC - -Note also that the availability zone is not hardcoded to allow for portability between regions if we ever want that. - -**** Various security groups - -The EC2 example above uses some of the security groups from the ~metrics-vpc~ template. Refer to the template source -for details on each group's rules. - -**** The development DNS zone - -Often services require TLS certificates, or require DNS names for other reasons. To facilitate this, a zone is hosted -using Route53 allowing for DNS records to be created in CloudFormation templates. This zone is: -~tm-dev-aws.safemetrics.org~. - -As an example, creating an A record for an EC2 instance with the subdomain of the stack name: - -#+BEGIN_SRC yaml - DNSName: - Type: AWS::Route53::RecordSet - Properties: - HostedZoneName: tm-dev-aws.safemetrics.org. - Name: !Join ['', [!Ref 'AWS::StackName', .tm-dev-aws.safemetrics.org.]] - Type: A - TTL: '300' - ResourceRecords: - - !GetAtt Instance.PublicIp -#+END_SRC - -:FUTUREQUESTION: -Q: /Can we use the MetricsDevZone export from ~metrics-vpc~ instead of explicitly defining the zone name every time?/ -:END: - -These domain names should *never* appear on anything user facing and are for *development purposes only*. - -*** DONE Typical Dev/Testing Stacks - -A typical test/dev stack will consist of an EC2 instance and a DNS name. Some services store a lot of data and may have -a second volume attached for the data storage. - -An example template with one t2.large EC2 instance, a 15GB additional disk, and a DNS name: - -#+BEGIN_SRC yaml ---- -# CloudFormation Stack for example development instance -# This stack will only deploy on us-east-1 and will deploy in the Metrics VPC -# aws cloudformation deploy --region us-east-1 --stack-name `whoami`-example-dev --template-file example-dev.yml --parameter-overrides myKeyPair="$(./identify_user.sh)" -AWSTemplateFormatVersion: 2010-09-09 -Parameters: - myKeyPair: - Description: Amazon EC2 Key Pair - Type: "AWS::EC2::KeyPair::KeyName" -Resources: - Instance: - Type: AWS::EC2::Instance - Properties: - AvailabilityZone: !Select [ 0, !GetAZs ] - ImageId: ami-01db78123b2b99496 - InstanceType: t2.large - SubnetId: - Fn::ImportValue: 'MetricsSubnet' - KeyName: !Ref myKeyPair - SecurityGroupIds: - - Fn::ImportValue: 'MetricsInternetSecurityGroup' - - Fn::ImportValue: 'MetricsPingableSecurityGroup' - - Fn::ImportValue: 'MetricsHTTPSecurityGroup' - - Fn::ImportValue: 'MetricsHTTPSSecurityGroup' - ServiceVolume: - Type: AWS::EC2::Volume - Properties: - AvailabilityZone: !Select [ 0, !GetAZs ] - Size: 15 - VolumeType: gp2 - ServiceVolumeAttachment: - Type: AWS::EC2::VolumeAttachment - Properties: - Device: /dev/sdb - InstanceId: !Ref Instance - VolumeId: !Ref ServiceVolume - DNSName: - Type: AWS::Route53::RecordSet - Properties: - HostedZoneName: tm-dev-aws.safemetrics.org. - Name: !Join ['', [!Ref 'AWS::StackName', .tm-dev-aws.safemetrics.org.]] - Type: A - TTL: '300' - ResourceRecords: - - !GetAtt Instance.PublicIp -Outputs: - PublicIp: - Description: "Instance public IP" - Value: !GetAtt Instance.PublicIp -#+END_SRC - -It's not common to use other AWS services as part of these templates as the goal is usually to have these services -deployed on TPA managed hosts. - -** DONE Linting - -[[https://github.com/aws-cloudformation/cfn-python-lint][~cfn-lint~]] is used to ensure we are complying with best practices. None of the team have formal training in the use of CloudFormation -so we are really making it up as we go along. Other tools may be used in the future, as we learn about them, to make sure we are using -things efficiently and correctly. - -This is also run as part of the [[https://travis-ci.org/github/torproject/metrics-cloud/][continuous integration checks]] on Travis CI. - -* TODO Ansible Playbooks - -Ansible is an open-source software provisioning, configuration management, and application-deployment tool. It's written in Python, -is mature, and has an extensive selection of modules for almost everything we could need. - -** TODO Inventories and site.yml - -In general, there are two inventories: [[https://gitweb.torproject.org/metrics-cloud.git/tree/ansible/production][production]] and dev. Only the production inventory is committed to git, the dev inventory will -vary between members of the team, referencing their own dev instances as created by CloudFormation. We do not specify a default -inventory in the ~ansible.cfg~ file, so you must specify an inventory for every invocation of ~ansible-playbook~ using the ~-i~ flag: - -#+BEGIN_SRC shell -ansible-playbook -i dev ... -#+END_SRC - -Inside the inventory, hosts are grouped by their purpose. For each group there is a corresponding YAML file in the root of the -~ansible~ directory that specifies a playbook for the group. All of these files are included in the ~site.yml~ master playbook to -allow multiple hosts to be provisioned together. - -** TODO ~metrics-common~ - -The [[https://gitweb.torproject.org/metrics-cloud.git/tree/ansible/roles/metrics-common][~metrics-common~]] role allows us to have a consistent environment between services, and closely matches the environment that -would be provided by a TSA managed machine. The role handles: - -- installation of dependency packages from Debian (optionally from the backports repository) -- formats additional volumes attached to the instance using the specified filesystem -- sets the timezone to UTC (Q: /is this what TSA do?/) -- creates user accounts for each member of the team - - all team members can perform unlimited passwordless sudo (TSA hosts require a password) - - SSH password authentication is disabled - - all user account passwords are removed/disabled -- creates service user accounts as specified - - home directories are created as specified, and linked from ~/home/$user~ - - lingering is enabled for service users - -This is all configured via group variables in the [[https://gitweb.torproject.org/metrics-cloud.git/tree/ansible/group_vars][~ansible/group_vars/~]] folder. Examples there should help you to understand how -these work. These override the [[https://gitweb.torproject.org/metrics-cloud.git/tree/ansible/roles/metrics-common/defaults/main.yml][defaults]] set in the role. - -** TODO Service roles - -** DONE Linting - -[[https://docs.ansible.com/ansible-lint/][~ansible-lint~]] is used to ensure we are complying with best practices. None of the team have formal training in the use of Ansible -so we are really making it up as we go along. Other tools may be used in the future, as we learn about them, to make sure we are using -things efficiently and correctly. - -This is also run as part of the [[https://travis-ci.org/github/torproject/metrics-cloud/][continuous integration checks]] on Travis CI. - -* TODO Common Tasks - -** TODO Add a new member to the team - -** TODO Update an SSH key for a team member - -** TODO Deploy and provision a development environment for a service -- GitLab