TPA issueshttps://gitlab.torproject.org/groups/tpo/tpa/-/issues2024-02-14T15:38:44Zhttps://gitlab.torproject.org/tpo/tpa/team/-/issues/41514metricsdb-01 is out of disk space on /2024-02-14T15:38:44ZKezmetricsdb-01 is out of disk space on /Roger reported metrics.tpo as being down (website returning 503). I checked nagios, and it looks like metricsdb-01 is out of disk space on the root partition. No other metrics-related issues are being reported in nagios, so I assume this...Roger reported metrics.tpo as being down (website returning 503). I checked nagios, and it looks like metricsdb-01 is out of disk space on the root partition. No other metrics-related issues are being reported in nagios, so I assume this is what's causing the metrics.tpo outage.HiroHirohttps://gitlab.torproject.org/tpo/tpa/team/-/issues/41512Simplify onionoo architecture2024-03-26T15:45:11ZHiroSimplify onionoo architectureCurrently onionoo is a service comprised of 4 VMs: two backends with the onionoo java apps serving and updating the data, and two frontends.
At the time the service was launched this architecture made a lot of sense, but I think now we ...Currently onionoo is a service comprised of 4 VMs: two backends with the onionoo java apps serving and updating the data, and two frontends.
At the time the service was launched this architecture made a lot of sense, but I think now we could simplify its maintenance by reducing it to a backend with a web server (like nginx) with some aggressive caching.
I was hoping that we would get sooner to the point where onionoo would be retired, but given the current pace of development of the metrics pipeline, I personally think it makes sense to reduce this service now so that it is easier to maintain for metrics and tpa.
What do you think?HiroHirohttps://gitlab.torproject.org/tpo/tpa/team/-/issues/41483metricsdb-01 out of swap2024-02-17T00:06:09ZKezmetricsdb-01 out of swapNagios has an alert for metricsdb-01: SWAP CRITICAL - 4% free (65MB out of 2047MB). It's almost exclusively because of a victoria-metric process: `victoria-metric 1800892 kB`.
@hiro I'm assigning this to you because you'll probably know...Nagios has an alert for metricsdb-01: SWAP CRITICAL - 4% free (65MB out of 2047MB). It's almost exclusively because of a victoria-metric process: `victoria-metric 1800892 kB`.
@hiro I'm assigning this to you because you'll probably know what to do with it better than meHiroHirohttps://gitlab.torproject.org/tpo/tpa/team/-/issues/41454Migrate metrics-store-01 to object storage2024-01-04T19:34:23ZHiroMigrate metrics-store-01 to object storageWe have agreed we can migrate metrics-store-01 to object storage.We have agreed we can migrate metrics-store-01 to object storage.HiroHirohttps://gitlab.torproject.org/tpo/tpa/team/-/issues/41450Move collector.torproject.org to serve files stored in object storage2024-01-04T19:33:19ZHiroMove collector.torproject.org to serve files stored in object storageIn https://gitlab.torproject.org/tpo/tpa/team/-/issues/41416 we have discussed how we can move the tarballs from metrics-store-01 and those collector creates to object storage.
For metrics-store-01 we can just move the files, and once w...In https://gitlab.torproject.org/tpo/tpa/team/-/issues/41416 we have discussed how we can move the tarballs from metrics-store-01 and those collector creates to object storage.
For metrics-store-01 we can just move the files, and once we have the bucket, we can just update the links in the wiki where we list our archives.
For collector we need a way for people to browse the archives and download tarballs recursively if needed. I am thinking that we should preserve what we serve on collector.tpo, just have the links point to the buckets.
Once this is done, we can also discuss how we could generate the tarballs and move them to minio.https://gitlab.torproject.org/tpo/tpa/team/-/issues/41449estimate hardware requirements to host collector and metrics store in object ...2024-03-26T15:44:07Zanarcatestimate hardware requirements to host collector and metrics store in object storage / minioIn #41416, we have agreed to start moving storage from a filesystem into object storage for collector and metrics-store-01. This involves creating a separate bucket for each service and access tokens for each (which is easy enough) but w...In #41416, we have agreed to start moving storage from a filesystem into object storage for collector and metrics-store-01. This involves creating a separate bucket for each service and access tokens for each (which is easy enough) but we also need to consider the impact of the object storage server, since this is kind of a big deal.
Right now, the storage usage is as follows:
| machine | used | free |
|----------------|---------|---------|
| colchicifolium | 819GiB | 1.65TiB |
| collector-02 | 55GiB | 255GiB |
| metrics-store | 742GiB | 1.54GiB |
| **total** | 1.51TiB | 3.14TiB |
Source:
https://grafana.torproject.org/d/zbCoGRjnz/disk-usage?orgId=1&var-class=All&var-instance=colchicifolium.torproject.org&var-instance=collector-02.torproject.org&var-instance=metrics-store-01.torproject.org&from=now-1y&to=now&refresh=5s
Note that the total includes all disks partitions, including `/`, so it might inflate the total a bit.
We need to figure if we can host this in the current object storage infrastructure, including backups (#41415), and if not, how much it will cost to deploy new resources to do so.
/cc @lavamindanarcatanarcathttps://gitlab.torproject.org/tpo/tpa/team/-/issues/41372pg backups filling up on bungei2024-03-26T15:15:15Zanarcatpg backups filling up on bungeisimilar to #41361 except now it's the `/srv/backups/pg` partition that's filling up...
1 year graph:
![image](/uploads/6500ce9736e25737fd16357e8d1f0d19/image.png)
https://grafana.torproject.org/d/zbCoGRjnz/disk-usage?orgId=1&from=now-1...similar to #41361 except now it's the `/srv/backups/pg` partition that's filling up...
1 year graph:
![image](/uploads/6500ce9736e25737fd16357e8d1f0d19/image.png)
https://grafana.torproject.org/d/zbCoGRjnz/disk-usage?orgId=1&from=now-1y&to=now&var-class=All&var-instance=bungei.torproject.org
30 days:
![image](/uploads/8b193a1cc848d97cde37ab43b49d2c77/image.png)
https://grafana.torproject.org/d/zbCoGRjnz/disk-usage?orgId=1&from=now-30d&to=now&var-class=All&var-instance=bungei.torproject.org
change rate is -1TB per month according to grafana.
/cc @gkanarcatanarcat2024-03-21