|
|
# Onion Services Site Reliability Engineering
|
|
|
|
|
|
[[_TOC_]]
|
|
|
|
|
|
The SRE role aims to provide tools and tech support to deploy and monitor high
|
|
|
availability Onion Services sites.
|
|
|
|
|
|
## Oniongroove
|
|
|
|
|
|
The public-facing details of the software suite to manage Onion Service sites
|
|
|
are available at the [Oniongroove repository](https://gitlab.torproject.org/rhatto/oniongroove/).
|
|
|
|
|
|
### Objectives and key results (OKRs)
|
|
|
|
|
|
**Objective:** this role is part of the Onion Support Group's mission to increase
|
|
|
the adoption of Onion Services, for which we can select the following goals:
|
|
|
|
|
|
0. Provide easy ways to setup and maintain Onion Services. How to measure this?
|
|
|
|
|
|
1. With sane defaults. How to measure this?
|
|
|
|
|
|
2. That can be configurable and extensible. How to measure this?
|
|
|
|
|
|
## Kickstarting
|
|
|
|
|
|
### Initial plan
|
|
|
|
|
|
These are the proposed kickstarting steps for this role:
|
|
|
|
|
|
0. Meeting with dgoulet, hiro and anarcat to get advice on kickstarting the project:
|
|
|
what/where to look for about specs, tools, goals, security checklists, limits etc.
|
|
|
Check meeting notes
|
|
|
[here](https://gitlab.torproject.org/tpo/onion-support/-/wikis/Meetings/2022-02-08-Onion-Services-SRE-Kickstart) and
|
|
|
[here](https://lists.torproject.org/pipermail/tor-project/2022-February/003288.html).
|
|
|
|
|
|
1. Research on all relevant deployment technologies: build a first matrix.
|
|
|
|
|
|
2. Then meeting with the media organizations: inventory, compliances check etc.
|
|
|
|
|
|
3. Build the second matrix (use cases).
|
|
|
|
|
|
### Initial considerations
|
|
|
|
|
|
While brainstorming about this role, the following considerations were
|
|
|
sketched:
|
|
|
|
|
|
0. Software suite: Sponsor 123 project includes provisioning/monitoring onion
|
|
|
services as deliverables, but the effort could be used to create a generic
|
|
|
product (a "suite") which would include an Onionbalance deployer.
|
|
|
|
|
|
1. External instance(s): for the Sponsor 123 contract, a single instance of this
|
|
|
"CDN" solution could be used to manage all sites, instead of having to
|
|
|
manage many instances (and dashboards) in parallel.
|
|
|
|
|
|
Future contracts with other third-parties could either be managed using that
|
|
|
same instance or having their own instances (isolation).
|
|
|
|
|
|
2. Internal instance: another, internal instance could be set to manage all
|
|
|
sites listed at https://onion.torproject.org if TPA likes and decides to
|
|
|
adopt the solution :)
|
|
|
|
|
|
3. Existing considerations at the [Oniongroove
|
|
|
Scope](https://gitlab.torproject.org/rhatto/oniongroove/-/blob/main/specs.md#Scope).
|
|
|
|
|
|
3. Other considerations: see [rhatto's skill-test
|
|
|
research](https://gitlab.torproject.org/tpo/tpa/skill-test-onion-sre-candidate-sr/-/blob/main/research.md).
|
|
|
|
|
|
### Questions
|
|
|
|
|
|
General:
|
|
|
|
|
|
0. If you were the Onion Services SRE, how would you implement this project?
|
|
|
|
|
|
1. What existing solutions to look at, and what to avoid?
|
|
|
|
|
|
Architecture:
|
|
|
|
|
|
0. What people think about the architecture proposed by rhatto during his
|
|
|
skill-test (without paying attention to the improvised implementation he
|
|
|
coded)?
|
|
|
|
|
|
https://gitlab.torproject.org/tpo/tpa/skill-test-onion-sre-candidate-sr/-/blob/main/README.md#chosen-architecture
|
|
|
|
|
|
1. Which other limits are important to be considered in the scope of this project,
|
|
|
like the current upper bound of 8 Onionbalance backend servers?
|
|
|
|
|
|
Implementation:
|
|
|
|
|
|
0. What are the dimensions for the comparison matrix of existing DevOps solutions
|
|
|
such as Puppet, Ansbile, Terraform and Salt (and specific modules/recipes/cookbooks
|
|
|
/roles)?
|
|
|
|
|
|
1. Is this list complete for the second matrix (initial use cases survey)?
|
|
|
https://gitlab.torproject.org/tpo/onion-support/-/wikis/What-we-need-to-know-about-each-setup
|
|
|
|
|
|
2. How TPA manages passphrases and secrets for existing systems and keys?
|
|
|
|
|
|
Answer: check [evaluate password management options (#29677)](https://gitlab.torproject.org/tpo/tpa/team/-/issues/29677).
|
|
|
|
|
|
3. What (if any) TPA (or other) security policies should be observed in this
|
|
|
project?
|
|
|
|
|
|
Anseer: check [Tor security policy (#41)](https://gitlab.torproject.org/tpo/team/-/issues/41)
|
|
|
|
|
|
4. Which solutions are in use to manage the sites listed at
|
|
|
https://onion.torproject.org/?
|
|
|
|
|
|
Answer: custom puppet modules (currently not public).
|
|
|
|
|
|
5. How does the Tor daemon scales currently? How many connections it can
|
|
|
support at the same time?
|
|
|
|
|
|
Management:
|
|
|
|
|
|
0. Sponsor 123 Project Plan timeline predicts setup of first .onion sites
|
|
|
on M1 and M2, with 2-5 business days to set up a single .onion site.
|
|
|
But coding a solution could take longer. How to do then?
|
|
|
|
|
|
Answer: suggested approach is to have a detailed discovery phase while
|
|
|
coding the initial solution in parallel. Some rework migth be needed, but we
|
|
|
can gain time in overall. |