From e322f978962ee0e8e13a633a76a2352809760288 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Antoine=20Beaupr=C3=A9?= <anarcat@debian.org>
Date: Wed, 8 May 2024 22:14:28 -0400
Subject: [PATCH] remove fantasy scrape intervals that can never work
 (tpo/tpa/team#40755)

---
 policy/tpa-rfc-33-monitoring.md | 15 +++------------
 1 file changed, 3 insertions(+), 12 deletions(-)

diff --git a/policy/tpa-rfc-33-monitoring.md b/policy/tpa-rfc-33-monitoring.md
index 6d18e271..fc25703c 100644
--- a/policy/tpa-rfc-33-monitoring.md
+++ b/policy/tpa-rfc-33-monitoring.md
@@ -673,14 +673,9 @@ A few more samples calculations:
 |     5 min |    5 year |  60 GiB |
 |     5 min |   10 year | 100 GiB |
 |     5 min |  100 year |   1 TiB |
-|    15 min |    1 year |   7 GiB |
-|    15 min |    5 year |  33 GiB |
-|    15 min |   10 year |  66 GiB |
-|    15 min |  100 year | 662 GiB |
-|    1 hour |    1 year |   2 GiB |
-|    1 hour |    5 year |   8 GiB |
-|    1 hour |   10 year |  17 GiB |
-|    1 hour |  100 year | 167 GiB |
+
+Note that scrape intervals close to 5 minutes are unlikely to work at
+all, as that will [trigger Prometheus' stale data detection](https://utcc.utoronto.ca/~cks/space/blog/sysadmin/MetricsHowFarBackDepends?showcomments).
 
 Naturally, those are going to scale up with service complexity and
 fleet size, so they should be considered just to be an order of
@@ -694,10 +689,6 @@ would use 1TiB of data after one year, with the option of scaling by
 TODO: name each server according to retention? say mon-short-01 and
 the other mon-long-02?
 
-TODO: there's something about an upper limit to scrape interval, check
-https://utcc.utoronto.ca/~cks/space/blog/sysadmin/MetricsHowFarBackDepends?showcomments
-and source
-
 TODO: remote write can't downsample, *can* we remote read if the other
 server is scraping on its own? to be tested?
 
-- 
GitLab