Skip to content
GitLab
Explore
Sign in
Primary navigation
Search or go to…
Project
Wiki Replica
Manage
Activity
Members
Labels
Plan
Issues
Issue boards
Milestones
Code
Merge requests
Repository
Branches
Commits
Tags
Repository graph
Compare revisions
Build
Pipelines
Jobs
Pipeline schedules
Artifacts
Deploy
Releases
Container Registry
Model registry
Operate
Environments
Monitor
Incidents
Analyze
Value stream analytics
Contributor analytics
CI/CD analytics
Repository analytics
Model experiments
Help
Help
Support
GitLab documentation
Compare GitLab plans
Community forum
Contribute to GitLab
Provide feedback
Keyboard shortcuts
?
Snippets
Groups
Projects
Show more breadcrumbs
The Tor Project
TPA
Wiki Replica
Commits
8377ddb4
Verified
Commit
8377ddb4
authored
10 months ago
by
anarcat
Browse files
Options
Downloads
Patches
Plain Diff
more tpa-rfc-33 ideas (
team#40755
)
parent
e322f978
No related branches found
Branches containing commit
No related tags found
No related merge requests found
Pipeline
#167161
passed with warnings
10 months ago
Stage: build
Stage: test
Changes
1
Pipelines
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
policy/tpa-rfc-33-monitoring.md
+18
-9
18 additions, 9 deletions
policy/tpa-rfc-33-monitoring.md
with
18 additions
and
9 deletions
policy/tpa-rfc-33-monitoring.md
+
18
−
9
View file @
8377ddb4
...
...
@@ -332,6 +332,11 @@ monitoring system, as provided by TPA.
# Personas
TODO: document impact on personas
TODO: review previous policies sections on personas to see if we're
missing anything
## Ethan, the TPA admin
Ethan is a member of the TPA team. He has access to the Puppet
...
...
@@ -504,6 +509,8 @@ services:
### Planned
The eventual architecture for the system might look something like this:

...
...
@@ -544,16 +551,16 @@ setup. Each server has its own set of services running:
*
**Karma**
: alerting dashboard which pulls alerts from Alertmanager
and can issue silences.
The current prometheus1/prometheus2 server
will
actually be retired in
favor of two
*new*
servers
which will
be rebuilt from scratch,
entirely from
Puppet, LDAP, and GitLab repository, ensuring they are
properly
reproducible.
The current prometheus1/prometheus2 server
may
actually be retired in
favor of two
*new*
servers
to
be rebuilt from scratch,
entirely from
Puppet, LDAP, and GitLab repository, ensuring they are
properly
reproducible.
Experiments can be done manually on the current servers to speed up
development and replacement of the legacy infrastructure, but the goal
is to merge the two current server in a single cluster.
TODO: start with a single merged server at first and HA lat
er
?
is to merge the two current server in a single cluster.
This might
also be accomplished by retiring one of the two servers and migrating
everything on the oth
er
.
## Metrics: Prometheus
...
...
@@ -895,8 +902,10 @@ TODO: review https://gitlab.com/gitlab-com/gl-infra/helicopter
*
turn off the Icinga server
*
remove all traces of NRPE on all nodes
TODO: how to merge the two databases? maybe adopt the prom2 data and
drop old TPA data?
TODO: multiple stages; emergency buster retirement, then alerting
improvements, then HA, then long term retention
TODO: consider merging prom2 into prom1
## Timeline
...
...
This diff is collapsed.
Click to expand it.
Preview
0%
Loading
Try again
or
attach a new file
.
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Save comment
Cancel
Please
register
or
sign in
to comment