The Tor Project issueshttps://gitlab.torproject.org/groups/tpo/-/issues2024-03-27T21:45:05Zhttps://gitlab.torproject.org/tpo/onion-services/onionprobe/-/issues/80Enhanced Grafana dashboard2024-03-27T21:45:05ZSilvio RhattoEnhanced Grafana dashboardEnhance the sample [exportable](https://grafana.com/docs/grafana/latest/dashboards/export-import/) Grafana Dashboard for Onion Services monitoring, including:
* [ ] Lists of expiring X.509 certificates (next days/weeks/month/quarter; cu...Enhance the sample [exportable](https://grafana.com/docs/grafana/latest/dashboards/export-import/) Grafana Dashboard for Onion Services monitoring, including:
* [ ] Lists of expiring X.509 certificates (next days/weeks/month/quarter; current quarter; etc).
* [ ] Enhanced metrics from tpo/onion-services/onionprobe#78.Onionprobe 1.2.0Silvio RhattoSilvio Rhatto2024-05-16https://gitlab.torproject.org/tpo/onion-services/onionprobe/-/issues/78Enhanced metrics for Onion Service descriptors2024-03-27T21:44:54ZSilvio RhattoEnhanced metrics for Onion Service descriptorsImplement additional metrics for Onion Service descriptors.
That need:
* A better way to parse descriptors would enable many other metrics.
* Some patches sent upstream to Stem.
Some fields that could get measurements:
* From the out...Implement additional metrics for Onion Service descriptors.
That need:
* A better way to parse descriptors would enable many other metrics.
* Some patches sent upstream to Stem.
Some fields that could get measurements:
* From the outer descriptor wrapper:
* [ ] "descriptor-lifetime".
* [ ] "revision-counter".
* From the first layer of encryption:
* [ ] "[caa-critical](https://gitlab.torproject.org/tpo/core/torspec/-/blob/main/proposals/343-rend-caa.txt)".
* From the second layer of encryption:
* [ ] "single-onion-service".
* [ ] "pow-params": an indirect way to measure DoS for PoW-enabled
services (by measuring the PoW settings in the descriptor),
which depends on tpo/core/tor#40634 to be implemented.
* [ ] "[caa](https://gitlab.torproject.org/tpo/core/torspec/-/blob/main/proposals/343-rend-caa.txt)".
Other measurements:
* [ ] Metrics for the descriptor and inner layer sizes.Onionprobe 1.2.0Silvio RhattoSilvio Rhatto2024-05-16https://gitlab.torproject.org/tpo/onion-services/onionspray-log-parser/-/issues/11Slowness on onionspray-get-logs-from-s3fs2024-03-28T13:25:06ZSilvio RhattoSlowness on onionspray-get-logs-from-s3fs# Tasks
* [ ] Investigate why [onionspray-get-logs-from-s3fs][] is being slow, and how that can be fixed.
* [ ] If can't be fixed easily, recomend users to try [onionspray-get-logs-from-s3][] first.
[onionspray-get-logs-from-s3fs]: htt...# Tasks
* [ ] Investigate why [onionspray-get-logs-from-s3fs][] is being slow, and how that can be fixed.
* [ ] If can't be fixed easily, recomend users to try [onionspray-get-logs-from-s3][] first.
[onionspray-get-logs-from-s3fs]: https://gitlab.torproject.org/tpo/onion-services/onionspray-log-parser/-/blob/main/onionspray-get-logs-from-s3fs
[onionspray-get-logs-from-s3]: https://gitlab.torproject.org/tpo/onion-services/onionspray-log-parser/-/blob/main/onionspray-get-logs-from-s3
# Time estimation
* Complexity: very small (0.5 day)
* Uncertainty: low (x1.1)
* [Reference](https://jacobian.org/2021/may/25/my-estimation-technique/) (adapted)Silvio RhattoSilvio Rhattohttps://gitlab.torproject.org/tpo/onion-services/onionspray-log-parser/-/issues/10Output template2024-03-28T14:23:08ZSilvio RhattoOutput template# Tasks
* [ ] Support for output with custom templating.
* [ ] Support for Markdown table output.
# Time estimation
* Complexity: very small (0.5 day)
* Uncertainty: low (x1.1)
* [Reference](https://jacobian.org/2021/may/25/my-estimat...# Tasks
* [ ] Support for output with custom templating.
* [ ] Support for Markdown table output.
# Time estimation
* Complexity: very small (0.5 day)
* Uncertainty: low (x1.1)
* [Reference](https://jacobian.org/2021/may/25/my-estimation-technique/) (adapted)Silvio RhattoSilvio Rhatto2024-04-01https://gitlab.torproject.org/tpo/tpa/team/-/issues/41514metricsdb-01 is out of disk space on /2024-02-14T15:38:44ZKezmetricsdb-01 is out of disk space on /Roger reported metrics.tpo as being down (website returning 503). I checked nagios, and it looks like metricsdb-01 is out of disk space on the root partition. No other metrics-related issues are being reported in nagios, so I assume this...Roger reported metrics.tpo as being down (website returning 503). I checked nagios, and it looks like metricsdb-01 is out of disk space on the root partition. No other metrics-related issues are being reported in nagios, so I assume this is what's causing the metrics.tpo outage.HiroHirohttps://gitlab.torproject.org/tpo/tpa/team/-/issues/41512Simplify onionoo architecture2024-03-26T15:45:11ZHiroSimplify onionoo architectureCurrently onionoo is a service comprised of 4 VMs: two backends with the onionoo java apps serving and updating the data, and two frontends.
At the time the service was launched this architecture made a lot of sense, but I think now we ...Currently onionoo is a service comprised of 4 VMs: two backends with the onionoo java apps serving and updating the data, and two frontends.
At the time the service was launched this architecture made a lot of sense, but I think now we could simplify its maintenance by reducing it to a backend with a web server (like nginx) with some aggressive caching.
I was hoping that we would get sooner to the point where onionoo would be retired, but given the current pace of development of the metrics pipeline, I personally think it makes sense to reduce this service now so that it is easier to maintain for metrics and tpa.
What do you think?HiroHirohttps://gitlab.torproject.org/tpo/tpa/team/-/issues/41483metricsdb-01 out of swap2024-02-17T00:06:09ZKezmetricsdb-01 out of swapNagios has an alert for metricsdb-01: SWAP CRITICAL - 4% free (65MB out of 2047MB). It's almost exclusively because of a victoria-metric process: `victoria-metric 1800892 kB`.
@hiro I'm assigning this to you because you'll probably know...Nagios has an alert for metricsdb-01: SWAP CRITICAL - 4% free (65MB out of 2047MB). It's almost exclusively because of a victoria-metric process: `victoria-metric 1800892 kB`.
@hiro I'm assigning this to you because you'll probably know what to do with it better than meHiroHirohttps://gitlab.torproject.org/tpo/tpa/team/-/issues/41454Migrate metrics-store-01 to object storage2024-01-04T19:34:23ZHiroMigrate metrics-store-01 to object storageWe have agreed we can migrate metrics-store-01 to object storage.We have agreed we can migrate metrics-store-01 to object storage.HiroHirohttps://gitlab.torproject.org/tpo/tpa/team/-/issues/41450Move collector.torproject.org to serve files stored in object storage2024-01-04T19:33:19ZHiroMove collector.torproject.org to serve files stored in object storageIn https://gitlab.torproject.org/tpo/tpa/team/-/issues/41416 we have discussed how we can move the tarballs from metrics-store-01 and those collector creates to object storage.
For metrics-store-01 we can just move the files, and once w...In https://gitlab.torproject.org/tpo/tpa/team/-/issues/41416 we have discussed how we can move the tarballs from metrics-store-01 and those collector creates to object storage.
For metrics-store-01 we can just move the files, and once we have the bucket, we can just update the links in the wiki where we list our archives.
For collector we need a way for people to browse the archives and download tarballs recursively if needed. I am thinking that we should preserve what we serve on collector.tpo, just have the links point to the buckets.
Once this is done, we can also discuss how we could generate the tarballs and move them to minio.https://gitlab.torproject.org/tpo/tpa/team/-/issues/41449estimate hardware requirements to host collector and metrics store in object ...2024-03-26T15:44:07Zanarcatestimate hardware requirements to host collector and metrics store in object storage / minioIn #41416, we have agreed to start moving storage from a filesystem into object storage for collector and metrics-store-01. This involves creating a separate bucket for each service and access tokens for each (which is easy enough) but w...In #41416, we have agreed to start moving storage from a filesystem into object storage for collector and metrics-store-01. This involves creating a separate bucket for each service and access tokens for each (which is easy enough) but we also need to consider the impact of the object storage server, since this is kind of a big deal.
Right now, the storage usage is as follows:
| machine | used | free |
|----------------|---------|---------|
| colchicifolium | 819GiB | 1.65TiB |
| collector-02 | 55GiB | 255GiB |
| metrics-store | 742GiB | 1.54GiB |
| **total** | 1.51TiB | 3.14TiB |
Source:
https://grafana.torproject.org/d/zbCoGRjnz/disk-usage?orgId=1&var-class=All&var-instance=colchicifolium.torproject.org&var-instance=collector-02.torproject.org&var-instance=metrics-store-01.torproject.org&from=now-1y&to=now&refresh=5s
Note that the total includes all disks partitions, including `/`, so it might inflate the total a bit.
We need to figure if we can host this in the current object storage infrastructure, including backups (#41415), and if not, how much it will cost to deploy new resources to do so.
/cc @lavamindanarcatanarcathttps://gitlab.torproject.org/tpo/tpa/team/-/issues/41372pg backups filling up on bungei2024-03-26T15:15:15Zanarcatpg backups filling up on bungeisimilar to #41361 except now it's the `/srv/backups/pg` partition that's filling up...
1 year graph:
![image](/uploads/6500ce9736e25737fd16357e8d1f0d19/image.png)
https://grafana.torproject.org/d/zbCoGRjnz/disk-usage?orgId=1&from=now-1...similar to #41361 except now it's the `/srv/backups/pg` partition that's filling up...
1 year graph:
![image](/uploads/6500ce9736e25737fd16357e8d1f0d19/image.png)
https://grafana.torproject.org/d/zbCoGRjnz/disk-usage?orgId=1&from=now-1y&to=now&var-class=All&var-instance=bungei.torproject.org
30 days:
![image](/uploads/8b193a1cc848d97cde37ab43b49d2c77/image.png)
https://grafana.torproject.org/d/zbCoGRjnz/disk-usage?orgId=1&from=now-30d&to=now&var-class=All&var-instance=bungei.torproject.org
change rate is -1TB per month according to grafana.
/cc @gkanarcatanarcat2024-03-21https://gitlab.torproject.org/tpo/core/tor/-/issues/40863Add TLS cipher suite stats to MetricsPort2023-09-22T23:50:12ZAlexander Færøyahf@torproject.orgAdd TLS cipher suite stats to MetricsPortIt would be nice to be able to learn which cipher suites are chosen for our incoming and outgoing TLS connections on relays/bridges/clients. It would probably be even more useful if we also fix #40715 together with this.It would be nice to be able to learn which cipher suites are chosen for our incoming and outgoing TLS connections on relays/bridges/clients. It would probably be even more useful if we also fix #40715 together with this.https://gitlab.torproject.org/tpo/core/arti/-/issues/1003Observability and metrics2023-12-07T14:46:24Zgabi-250Observability and metricsWe will eventually need to add metrics to Arti. We will need to come up with a plan regarding what needs to measured and how, which libraries to use, etc.
When we start working on this project, let's make sure we have measurements for t...We will eventually need to add metrics to Arti. We will need to come up with a plan regarding what needs to measured and how, which libraries to use, etc.
When we start working on this project, let's make sure we have measurements for the metrics that proved useful (or would've been useful) for debugging C Tor issues. See https://gitlab.torproject.org/tpo/core/tor/-/issues/40717#note_2930954https://gitlab.torproject.org/tpo/anti-censorship/team/-/issues/112Reported as offline in metrics, some bridges are online and running2024-02-29T15:22:53ZGusReported as offline in metrics, some bridges are online and runningSince last week, some bridge operators are reporting that their bridge is 'offline' in Metrics, but they are online and running.
I can confirm that this is happening. One of my bridges is marked as [offline](https://metrics.torproject....Since last week, some bridge operators are reporting that their bridge is 'offline' in Metrics, but they are online and running.
I can confirm that this is happening. One of my bridges is marked as [offline](https://metrics.torproject.org/rs.html#details/25A5B3BB5449EC5A0D4AE4DB657899C02C186EBE), but on the tor logs I see:
>Nov 28 12:02:57.000 [notice] Heartbeat: Since last heartbeat message, I have seen 200 unique clients.
Other messages on the logs:
```
Nov 20 12:23:29.000 [notice] Guard bauruine ($5B83DC983406651A0B4F6AE1940793CDD6A6F92E) is failing more circuits than usual. Most likely this means the Tor network is overloaded. Success counts are 198/283. Use counts are 63/63. 227 circuits completed, 0 were unusable, 30 collapsed, and 5 timed out. For reference, your timeout cutoff is 324 seconds.
Nov 20 23:04:10.000 [notice] Our directory information is no longer up-to-date enough to build circuits: We're missing descriptors for 1/3 of our primary entry guards (total microdescriptors: 5983/6034). That's ok. We will try to fetch missing descriptors soon.
Nov 21 03:24:31.000 [notice] Guard rixtyminutes ($01AE2DE314276C82FCCC3603A1C2F3238E6544C9) is failing more circuits than usual. Most likely this means the Tor network is overloaded. Success counts are 109/156. Use counts are 37/37. 132 circuits completed, 0 were unusable, 23 collapsed, and 5 timed out. For reference, your timeout cutoff is 324 seconds.
```
Reddit: https://www.reddit.com/r/TOR/comments/z2o7ro/bridge_metrics_showing_offline/meskiomeskio@torproject.orgmeskiomeskio@torproject.orghttps://gitlab.torproject.org/tpo/core/tor/-/issues/40715MetricsPort: inbound ORPort connections: relays vs. non-relay connections2023-09-22T23:50:13ZcypherpunksMetricsPort: inbound ORPort connections: relays vs. non-relay connectionsthis got previously submitted on 2022-10-24 https://gitlab.torproject.org/tpo/core/tor/-/issues/40194#note_2849481
but that issue got closed and asked for new specific tickets for each new metric:
From last week's relay meetup we know t...this got previously submitted on 2022-10-24 https://gitlab.torproject.org/tpo/core/tor/-/issues/40194#note_2849481
but that issue got closed and asked for new specific tickets for each new metric:
From last week's relay meetup we know that tor knows whether an incoming OR connection is from a client or from a relay without looking at the source IP address.
https://pad.riseup.net/p/tor-relay-op-meetup-o22-keep
From the metrics added in !625 (merged) we know, that the increased CPU load correlates with an increase in the rate of new inbound OR connections. This rate increases when CPU load increases on exits:
```
rate(tor_relay_connections{type="OR",state="created",direction="received"}[$__rate_interval])
```
Could you please add a label for OR connections coming from clients vs. OR connections coming from other relays?
This would allow us to confirm that exits get more new inbound connections from clients when CPU load increases.
that new label could be `src`:
```
tor_relay_connections_total{type="OR",state="created",direction="received",src="relay"}
tor_relay_connections_total{type="OR",state="created",direction="received",src="non-relay"}
tor_relay_connections{type="OR",state="opened",direction="received",src="relay"}
tor_relay_connections{type="OR",state="opened",direction="received",src="non-relay"}
```https://gitlab.torproject.org/tpo/onion-services/onionprobe/-/issues/59Onionprobe audit and metrics review2023-08-09T00:33:17ZSilvio RhattoOnionprobe audit and metrics review## About
This issue originated during a [Hackweek 2022 brainstorm for Onion Services](tpo/onion-services/onion-support#116), was not adopted by time but it's converted here as an Onionprobe issue.
## Abstract
[Onionprobe](https://gitl...## About
This issue originated during a [Hackweek 2022 brainstorm for Onion Services](tpo/onion-services/onion-support#116), was not adopted by time but it's converted here as an Onionprobe issue.
## Abstract
[Onionprobe](https://gitlab.torproject.org/tpo/onion-services/onionprobe) is a new and flexible tool for Onion Services monitoring and metrics aggregation.
This activity is intended to try out Onionprobe to discover how it can be helpful for analysis such as to detect network events, bottlenecks and general
quality of service evaluation.
# Description
Possible activities include:
* Review Onionprobe metrics: are they good, are they enough?
* Explore the aggregated data on Prometheus using Grafana or your favourite tool.
* Build and share Grafana dashboards and visualizations.
* Write a small project that sets Onion Services with faulty configurations, useful for testing and debugging.
A sample dataset is already being collected by monitoring the so-called [Real-World Onion Sites](https://github.com/alecmuffett/real-world-onion-sites/) and can be used for analysis and to build visualizations.
## Needed skills
While this activity is not strict in the required skill set, it can benefit with people having one of some of these abilities:
* Previous Onion Services knowledge.
* Experience with metrics.
* UX skills.
* Familiarity with Prometheus and Grafana.
* Be confortable with Docker Compose usage.
* Optional coding skills.https://gitlab.torproject.org/tpo/onion-services/onionprobe/-/issues/56Support for User-Agent HTTP Header2023-08-09T00:29:57ZSilvio RhattoSupport for User-Agent HTTP HeaderImplement configurable support for the `User-Agent` HTTP Header, so users can customize if (and which) user agent information they want to submit at each request.
As an example, this is useful for filtering out Onionprobe requests when ...Implement configurable support for the `User-Agent` HTTP Header, so users can customize if (and which) user agent information they want to submit at each request.
As an example, this is useful for filtering out Onionprobe requests when gathering page count statistics on Onion Service sites.Silvio RhattoSilvio Rhattohttps://gitlab.torproject.org/tpo/onion-services/onion-support/-/issues/93Setup Onionprobe visualizations for TPO2023-10-26T10:25:14ZSilvio RhattoSetup Onionprobe visualizations for TPOSetup Onionprobe visualizations for TPO in the Prometheus or Grafana dashboard.Setup Onionprobe visualizations for TPO in the Prometheus or Grafana dashboard.https://gitlab.torproject.org/tpo/onion-services/onionprobe/-/issues/13Add options to limit the number of measurements2023-08-09T00:27:17ZSilvio RhattoAdd options to limit the number of measurements* [x] Option to run just for a fixed number of rounds (iterations).
* [ ] Option to run just for a definite amount of time.
* [ ] Option to run until a given date.* [x] Option to run just for a fixed number of rounds (iterations).
* [ ] Option to run just for a definite amount of time.
* [ ] Option to run until a given date.https://gitlab.torproject.org/tpo/onion-services/onionprobe/-/issues/6Additional metrics2023-08-09T00:27:16ZSilvio RhattoAdditional metrics* [ ] Current introduction points (info metric type?).
* [ ] Response size.
* [ ] Response content type.
* [ ] Other relevant response metadata and headers.* [ ] Current introduction points (info metric type?).
* [ ] Response size.
* [ ] Response content type.
* [ ] Other relevant response metadata and headers.