Skip to content
GitLab
Explore
Sign in
Primary navigation
Search or go to…
Project
Wiki Replica
Manage
Activity
Members
Labels
Plan
Issues
Issue boards
Milestones
Code
Merge requests
Repository
Branches
Commits
Tags
Repository graph
Compare revisions
Build
Pipelines
Jobs
Pipeline schedules
Artifacts
Deploy
Releases
Container Registry
Model registry
Operate
Environments
Monitor
Incidents
Analyze
Value stream analytics
Contributor analytics
CI/CD analytics
Repository analytics
Model experiments
Help
Help
Support
GitLab documentation
Compare GitLab plans
Community forum
Contribute to GitLab
Provide feedback
Keyboard shortcuts
?
Snippets
Groups
Projects
Show more breadcrumbs
The Tor Project
TPA
Wiki Replica
Commits
553139e7
Verified
Commit
553139e7
authored
3 weeks ago
by
lelutin
Browse files
Options
Downloads
Patches
Plain Diff
prometheus: Fill in the TODO left in the page.
refs:
team#41655
parent
71b91270
No related branches found
Branches containing commit
No related tags found
Tags containing commit
No related merge requests found
Pipeline
#253277
passed with warnings
3 weeks ago
Stage: build
Stage: test
Changes
1
Pipelines
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
service/prometheus.md
+42
-10
42 additions, 10 deletions
service/prometheus.md
with
42 additions
and
10 deletions
service/prometheus.md
+
42
−
10
View file @
553139e7
...
...
@@ -2696,7 +2696,19 @@ retention periods](https://gitlab.torproject.org/tpo/tpa/team/-/issues/40330) fo
## Queues
<!-- TODO email queues, job queues, schedulers -->
There are a couple of places where things happen automatically on a schedule in
the monitoring infrastructure:
-
Prometheus schedules scrape jobs (pulling metrics) according to rules that can
differ for each scrape job. Each job can define its own
`scrape_interval`
. The
default is to scrape each 15 seconds, but some jobs are currently configured
to scrape once every minute.
-
Each alertmanager alert rule can define its own evaluation interval and delay
before triggering. See
[
Adding alerts
](
#writing-an-alert
)
-
Prometheus can automatically discover scrape targets through different means.
We currently don't fully use the auto-discovery feature since we create
targets through files created by puppet, so any interval for this feature does
not affect our setup.
## Interfaces
...
...
@@ -3002,16 +3014,12 @@ This was performed in [TPA-RFC-33][], over the course of 2024 and 2025.
## Security and risk assessment
<!-- TODO: risk assessment
There were no security review yet.
5.
When was the last security review done on the project? What was
the outcome? Are there any security issues currently? Should it
have another security review?
The shared password for accessing the web interface is a challenge. We intend to
replace this soon with individual users.
6.
When was the last risk assessment done? Something that would cover
risks from the data stored, the access required, etc.
-->
There were no risk assessments done yet.
## Technical debt and next steps
...
...
@@ -3024,7 +3032,31 @@ In progress projects:
### TPA-RFC-33
TODO: document the TPA-RFC-33 history here. see overlap with above
TPA's monitoring infrastructure has been originally setup with
[
Nagios
](
https://en.wikipedia.org/wiki/Nagios
)
and
[
Munin
][]
. Nagios was
eventually
[
removed from Debian in 2016
][]
and replaced with Icinga 1. Munin
somehow "died in a fire" some time before anarcat joined TPA in 2019.
At that point, the lack of trending infrastructure was seen as a serious
problem, so
[
Prometheus
][]
and
[
Grafana
][]
were
[
deployed in 2019
][]
as
a stopgap measure.
A secondary Prometheus server (
`prometheus2`
) was setup with stronger
authentication for service admins. The rationale was that those
services were more privacy-sensitive and the primary TPA setup
(
`prometheus1`
) was too open to the public, which could allow for
side-channels attacks.
Those tools has been used for trending ever since, while keeping Icinga
for monitoring.
During the March 2021 hack week, Prometheus'
[
Alertmanager
][]
was
deployed on the secondary Prometheus server to provide alerting to the
Metrics and Anti-Censorship teams.
[
Munin
]:
https://en.wikipedia.org/wiki/Munin_(software)
[
removed from Debian in 2016
]:
https://tracker.debian.org/news/818363/removed-351dfsg-22-from-unstable/
[
deployed in 2019
]:
https://gitlab.torproject.org/tpo/tpa/team/-/issues/29681
### Munin replacement
...
...
This diff is collapsed.
Click to expand it.
Preview
0%
Loading
Try again
or
attach a new file
.
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Save comment
Cancel
Please
register
or
sign in
to comment