Skip to content
GitLab
Explore
Sign in
Primary navigation
Search or go to…
Project
Wiki Replica
Manage
Activity
Members
Labels
Plan
Issues
Issue boards
Milestones
Code
Merge requests
Repository
Branches
Commits
Tags
Repository graph
Compare revisions
Build
Pipelines
Jobs
Pipeline schedules
Artifacts
Deploy
Releases
Container Registry
Model registry
Operate
Environments
Monitor
Incidents
Analyze
Value stream analytics
Contributor analytics
CI/CD analytics
Repository analytics
Model experiments
Help
Help
Support
GitLab documentation
Compare GitLab plans
Community forum
Contribute to GitLab
Provide feedback
Keyboard shortcuts
?
Snippets
Groups
Projects
Show more breadcrumbs
The Tor Project
TPA
Wiki Replica
Commits
ebb4fc90
Verified
Commit
ebb4fc90
authored
3 years ago
by
anarcat
Browse files
Options
Downloads
Patches
Plain Diff
long term prometheus metrics storage
parent
14d30eee
No related branches found
Branches containing commit
No related tags found
No related merge requests found
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
howto/prometheus.md
+55
-0
55 additions, 0 deletions
howto/prometheus.md
with
55 additions
and
0 deletions
howto/prometheus.md
+
55
−
0
View file @
ebb4fc90
...
...
@@ -823,6 +823,61 @@ would still be able to deduce some activity patterns from the metrics
generated by Prometheus, and use it to leverage side-channel attacks,
which is why the external Prometheus server access is restricted.
### Long term metrics storage
Metrics are held for about a year or less, depending on the server,
see
[
ticket 29388
][]
for storage requirements and possible
alternatives for data retention policies.
Note that extra long-term data retention might be possible
[
using
the remote read functionality
](
https://www.robustperception.io/looking-beyond-retention
)
, which enables the primary server to
read metrics from a secondary, longer-term server transparently,
keeping graphs working without having to change data source, for
example.
That way you could have a short-term server which keeps lots of
metrics and polls every minute or even 15 seconds, but keeps (say)
only 30 days of data and a long-term server which would poll the
short-term server every (say) 5 minutes) but keep (say) 5 years of
metrics. But how much data would that be?
The
[
last time we made an estimate, in May 2020
](
https://gitlab.torproject.org/tpo/tpa/team/-/issues/31244#note_2541965
)
, we had the
following calculation for 1 minute polling interval over a year:
```
> 365d×1.3byte/(1min)×2000×78 to Gibyte
99,271238 gibibytes
```
At the time of writing (August 2021), that is still the configured
interval, and the disk usage roughly matches that (98GB used). This
implies that we could store about 5 years of metrics with a 5 minute
polling interval, using the same disk usage, obviously:
```
> 5*365d×1.3byte/(5min)×2000×78 to Gibyte
99,271238 gibibytes
```
... or 15 years with 15 minutes, etc... As a rule of thumb, as long as
we multiple the scrape interval, we can multiply the retention period
as well.
On the other side, we might be able to increase granularity quite a
bit by lowering the retention to (say) 30 days and 5 seconds polling
interval, which would give us:
```
> 30d*1.3byte/(5 second)*2000*78 to Gibyte
97,911358 gibibytes
```
That might be a bit aggressive though: the default Prometheus
`scrape_interval`
is 15 seconds, not 5 seconds... With the defaults
(15 seconds scrape interval, 30 days retention), we'd be at about
30GiB disk usage, which makes for a quite reasonable and easy to
replicate primary server.
## Backups
Prometheus servers should be fully configured through Puppet and
...
...
This diff is collapsed.
Click to expand it.
Preview
0%
Loading
Try again
or
attach a new file
.
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Save comment
Cancel
Please
register
or
sign in
to comment