Website issueshttps://gitlab.torproject.org/tpo/network-health/metrics/website/-/issues2024-03-27T09:41:41Zhttps://gitlab.torproject.org/tpo/network-health/metrics/website/-/issues/40111metrics.tpo not producing 'userstats-bridge-country' and 'userstats-bridge-co...2024-03-27T09:41:41ZHirometrics.tpo not producing 'userstats-bridge-country' and 'userstats-bridge-combined' graphs since Mar 15@gus noticed we are not producing graphs for 'userstats-bridge-country' and 'userstats-bridge-combined' since Mar 15th. The data is in the db, but the csv served by R are not being written.@gus noticed we are not producing graphs for 'userstats-bridge-country' and 'userstats-bridge-combined' since Mar 15th. The data is in the db, but the csv served by R are not being written.HiroHirohttps://gitlab.torproject.org/tpo/network-health/metrics/website/-/issues/40110Obsolete versions of bridges are not triggering the upgrade alert on relay-se...2024-03-20T13:40:39ZGeorg KoppenObsolete versions of bridges are not triggering the upgrade alert on relay-searchWe have
```
This relay is running a version of Tor that is
too old and may be missing important security fixes. If this is your relay, you
should update it as soon as possible.
```
which is shown for relays when they run obsolete Tor ver...We have
```
This relay is running a version of Tor that is
too old and may be missing important security fixes. If this is your relay, you
should update it as soon as possible.
```
which is shown for relays when they run obsolete Tor versions. However, even though Tor versions are marked for bridges as obsolete, too, like
```
{"nickname":"Yuccahimsa","hashed_fingerprint":"252DDE4EF4464904CB1CB6C45BB35ECB5AD2E1B0","or_addresses":["10.250.126.28:49191","[fd9f:2e19:3bcf::3e:1d27]:49191"],"last_seen":"2024-03-20 12:00:33","first_seen":"2022-01-13 00:00:00","running":true,"flags":["Running","V2Dir","Valid"],"last_restarted":"2023-07-25 00:02:03","advertised_bandwidth":8359955,"contact":"yuccahimsa@protonmail.com","platform":"Tor 0.4.7.7 on Linux","version":"0.4.7.7","version_status":"obsolete","recommended_version":false,"transports":["obfs4"],"bridgedb_distributor":"moat"},
```
(as 0.4.7.x is EOL right now) when looking at the details page of bridges no red banner with the text above shows up.
/cc @gushttps://gitlab.torproject.org/tpo/network-health/metrics/website/-/issues/40099Remove version 2 only onion addresses from metrics page.2024-03-05T19:41:33ZGabagaba@torproject.orgRemove version 2 only onion addresses from metrics page.When you access https://metrics.torproject.org/hidserv-dir-onions-seen.html the first tab you see is 'Unique .;onion addresses (version 2 only)' and that has no data. Can we remove it?When you access https://metrics.torproject.org/hidserv-dir-onions-seen.html the first tab you see is 'Unique .;onion addresses (version 2 only)' and that has no data. Can we remove it?https://gitlab.torproject.org/tpo/network-health/metrics/website/-/issues/40098Update and improve frac calculation documentation on reproducible-metrics web...2024-03-27T09:41:42ZGeorg KoppenUpdate and improve frac calculation documentation on reproducible-metrics website@dcf [noted a while back in a thread on tor-dev@](https://lists.torproject.org/pipermail/tor-dev/2022-April/014724.html) that our `frac` calculation is both hard to understand and wrong as stated on our reproducible-metrics website. We s...@dcf [noted a while back in a thread on tor-dev@](https://lists.torproject.org/pipermail/tor-dev/2022-April/014724.html) that our `frac` calculation is both hard to understand and wrong as stated on our reproducible-metrics website. We should fix both. FWIW: I think this would be a valuable thing to do during our documentation hackweek later this year in November.https://gitlab.torproject.org/tpo/network-health/metrics/website/-/issues/40095uptime is somewhat missleading2023-10-02T08:21:40Ztrinity-1686auptime is somewhat missleadingmetrics.tpo shows the uptime of relays, which it defines as `The time since this relay is online.`. I find it is somewhat misleading.
This uptime is computed solely from the last restart of the relay. Yesterday (2023-08-05), due to a n...metrics.tpo shows the uptime of relays, which it defines as `The time since this relay is online.`. I find it is somewhat misleading.
This uptime is computed solely from the last restart of the relay. Yesterday (2023-08-05), due to a network incident, [my relay](https://metrics.torproject.org/rs.html?#details/A8503903F97FF27F5D1C3CA38817329F581925E6) went offline for about 16 hours. So yesterday it showed rightfully as "offline", but today, now that the incident is resolved, it is shown having a 20days uptime, like nothing ever happened.https://gitlab.torproject.org/tpo/network-health/metrics/website/-/issues/40091Provide metrics relevant to relay operators2023-10-05T09:03:17ZGeorg KoppenProvide metrics relevant to relay operatorsWe should provide metrics on our infrastructure which is relevant for our relay operators. There are third-party tools like [OrNetStats](https://nusenu.github.io/OrNetStats/) but they might be focusing more on particular areas (like auth...We should provide metrics on our infrastructure which is relevant for our relay operators. There are third-party tools like [OrNetStats](https://nusenu.github.io/OrNetStats/) but they might be focusing more on particular areas (like authenticated operator ids and related metrics) while we should have the network as a whole in our focus and provide metrics and graphs as good as we can for all the things relay operators are concerned about or interested in.
Ideally, that would live on metrics.tpo at some point but we might want to start experimenting soon via a different venue first, now that our database containing descriptors etc. is almost set up and a [networkstatus API](https://gitlab.torproject.org/tpo/network-health/metrics/networkstatusapi) is being built during this year's GSoC project.
I'll ping the tor-relays@ crowd so we can start gathering feedback as needed.
/cc @gus @hirohttps://gitlab.torproject.org/tpo/network-health/metrics/website/-/issues/40082Find optimizations for the userstats and ipv6servers databases on meronense2024-01-16T13:55:03ZHiroFind optimizations for the userstats and ipv6servers databases on meronense**optimize the userstats DB queries**
**Here are a few potential optimization strategies that you could consider for the `merge()` function:**
Use a temporary index on the merged_part table: Since you are joining the imported and m...**optimize the userstats DB queries**
**Here are a few potential optimization strategies that you could consider for the `merge()` function:**
Use a temporary index on the merged_part table: Since you are joining the imported and merged_part tables multiple times, having an index on the merged_part table can potentially speed up the queries. You can create a temporary index on the merged_part table using the CREATE INDEX statement before the FOR loop, and then drop the index after the loop using the DROP INDEX statement.
Use a LATERAL join instead of a correlated subquery: In the SELECT statement inside the FOR loop, you are using a correlated subquery to find the val of the preceding entry in the merged_part table. Instead of using a correlated subquery, you can use a LATERAL join to achieve the same result. A LATERAL join can be more efficient than a correlated subquery because it can allow the optimizer to use an index scan instead of a sequential scan.
Use a BETWEEN condition instead of AND conditions in the ON clause of the JOIN statement: In the JOIN statement inside the FOR loop, you are using two AND conditions to find adjacent intervals in the merged_part table. Instead of using two AND conditions, you can use a BETWEEN condition, which may be more efficient.
Use a CASE expression instead of multiple IF statements: In the FOR loop, you are using multiple IF statements to handle different cases. Instead of using multiple IF statements, you can use a single CASE expression, which may be more efficient because it can avoid unnecessary branching.
Use a BULK COLLECT and FORALL loop to insert multiple rows at once: Instead of inserting each row one at a time using a FOR loop, you can use a BULK COLLECT and FORALL loop to insert multiple rows at once. This can be more efficient because it can reduce the overhead of repeatedly calling the INSERT statement.
Use the ON CONFLICT clause in the INSERT statement to avoid unnecessary updates: In the FOR loop, you are using an UPDATE statement followed by an INSERT statement to handle conflicts. Instead of using an UPDATE statement, you can use the ON CONFLICT clause in the INSERT statement to handle conflicts. This can be more efficient because it avoids the need to perform an unnecessary UPDATE operation.
**Here are a few optimization startegies for the `aggregate()` function:**
Use a temporary index on the update table: Since you are performing multiple GROUP BY and JOIN operations on the update table, having an index on the update table can potentially speed up the queries. You can create a temporary index on the update table using the CREATE INDEX statement before the INSERT and UPDATE statements, and then drop the index after the statements using the DROP INDEX statement.
Use a LATERAL join instead of a correlated subquery: In the SELECT statement inside the INSERT statement that inserts into the aggregated table, you are using a correlated subquery to find the val of the preceding entry in the update_no_dimensions table. Instead of using a correlated subquery, you can use a LATERAL join to achieve the same result. A LATERAL join can be more efficient than a correlated subquery because it can allow the optimizer to use an index scan instead of a sequential scan.
Use a CASE expression instead of multiple IF statements: In the UPDATE statements, you are using multiple IF statements to handle different cases. Instead of using multiple IF statements, you can use a single CASE expression, which may be more efficient because it can avoid unnecessary branching.
Use a BULK COLLECT and FORALL loop to insert or update multiple rows at once: Instead of inserting or updating each row one at a time using a FOR loop, you can use a BULK COLLECT and FORALL loop to insert or update multiple rows at once. This can be more efficient because it can reduce the overhead of repeatedly calling the INSERT or UPDATE statement.
Use the ON CONFLICT clause in the INSERT statement to avoid unnecessary updates: In the INSERT statement that inserts into the aggregated table, you are using an UPDATE statement followed by an INSERT statement to handle conflicts. Instead of using an UPDATE statement, you can use the ON CONFLICT clause in the INSERT statement to handle conflicts. This can be more efficient because it avoids the need to perform an unnecessary UPDATE operation.
**Here are a few suggestions for optimizing the `combine()` function:**
Index the imported table on the stats_start column to improve the performance of the DELETE statement that uses this column in the WHERE clause.
Consider using a JOIN instead of multiple subqueries in the INSERT statement. This may improve the performance of the query by reducing the number of scans and reads.
Use the LEAST and GREATEST functions only when necessary, as they can be computationally expensive. For example, in the INSERT statement, you could consider using the CASE statement to compute the low and high values instead of using LEAST and GREATEST.
Consider using the EXPLAIN ANALYZE command to understand the performance characteristics of the combine function and identify any additional optimization opportunities.
Here's an example of how the INSERT statement could be rewritten using a JOIN and the CASE statement:
```
INSERT INTO combined_country_transport
SELECT country.date AS date, country.country AS country,
transport.transport AS transport,
SUM(CASE WHEN transport.val + country.val - total.val > 0 THEN transport.val + country.val - total.val ELSE 0 END) AS low,
SUM(CASE WHEN transport.val < country.val THEN transport.val ELSE country.val END) AS high
FROM update2 country
JOIN update2 transport ON country.date = transport.date AND country.fingerprint = transport.fingerprint AND country.nickname = transport.nickname
JOIN update2 total ON total.date = transport.date AND total.fingerprint = transport.fingerprint AND total.nickname = transport.nickname
WHERE country.country <> ''
AND transport.transport <> ''
AND total.country = ''
AND total.transport = ''
AND country.val > 0
AND transport.val > 0
AND total.val > 0
GROUP BY country.date, country.country, transport.transport;
```
**Here is how the `combined VIEW` could be optimized:**
Consider using a materialized view instead of a regular view. This will allow the view to be pre-computed and stored, which can improve the performance of queries that use the view.
If you are frequently querying the view with a specific set of filters, you can create a function that accepts those filters as arguments and dynamically generates the view's query with those filters applied. This can improve the performance of querying the view with the same filters multiple times.
Consider creating indexes on the columns that are used in the JOIN and WHERE clauses of the view's query. This can improve the performance of queries that use the view by allowing the database to more quickly retrieve the relevant rows from the underlying tables.
If you are using PostgreSQL 12 or later, you can use the WITH (NO_MATERIALIZE) option when creating the view to specify that the view should not be materialized. This can be useful if the view's underlying tables are frequently updated, as it will allow the view to always reflect the latest data.
**Further optimize the ipv6servers db queries**
**Here is how the `aggregate()` function could be optimized:**
Index the tables you are joining and grouping by. This will make the queries faster by allowing the database to quickly locate the rows it needs to process.
Use a multi-column index on the status_entries and server_descriptors tables, with the columns status_id, flag_id, and version_id as the leading columns in the index. This will allow the database to efficiently locate the rows it needs to process for the aggregated_flags and aggregated_versions INSERT statements.
Use the EXISTS operator instead of IN in the WHERE clause of the DELETE statement. This will allow the database to stop searching for a matching row as soon as it finds one, rather than checking all rows in the table.
Use the ON CONFLICT clause of the INSERT statement to update existing rows rather than deleting and re-inserting them. This will reduce the number of writes to the database and potentially improve performance.
Consider using a temporary table to store intermediate results, rather than performing multiple joins and aggregations on the same tables. This can potentially improve the performance of the function by reducing the amount of data that needs to be processed.
Use the EXPLAIN ANALYZE command to understand how the database is executing the queries in the function, and identify any potential performance bottlenecks. This can help you understand why the function is slow and identify areas for optimization.
**Here is how the `MATERIALIZED VIEW grouped_by_status_ipv6` could be optimized:**
Use the WHERE clause to filter rows as early as possible in the query, to minimize the number of rows that need to be processed by the rest of the query. For example, you could move the WHERE statuses.status_id IN (SELECT included_statuses_ipv6.status_id FROM included_statuses_ipv6) condition to the FROM clause, so that it filters rows from the statuses table before they are joined with the aggregated_ipv6 table.
Consider using a JOIN rather than a subquery in the WHERE clause. This can often be more efficient, especially if the subquery returns a large number of rows.
Use the GROUP BY clause to group rows before applying the aggregation functions. This can be more efficient than grouping rows after they have been aggregated.
Use the EXPLAIN ANALYZE command to see how the query is being executed, and identify any bottlenecks or areas for optimization.
If the view is used frequently, consider creating an index on the statuses table to improve the performance of the JOIN with the aggregated_ipv6 table.
If the view is used infrequently and the data is not expected to change often, consider creating a materialized view. This will store the results of the view in a table, which can improve the performance of queries against the view.
If the data in the statuses and aggregated_ipv6 tables is very large, you might consider partitioning the tables based on the valid_after column, so that queries only need to access a subset of the data.
It's worth noting that these optimization strategies may not necessarily result in a significant improvement in performance, and you should profile the function to determine which strategies are most effective in your specific case.HiroHirohttps://gitlab.torproject.org/tpo/network-health/metrics/website/-/issues/40078Updated metrics page for Tor applications2023-03-31T14:58:50ZrichardUpdated metrics page for Tor applicationsTor Browser has seen a few changes with the 12.0 release which affect the usefulness of the current https://metrics.torproject.org/webstats-tb.html page:
- Applications team now only publishes 1 Tor Browser version per `OS|arch` pair, r...Tor Browser has seen a few changes with the 12.0 release which affect the usefulness of the current https://metrics.torproject.org/webstats-tb.html page:
- Applications team now only publishes 1 Tor Browser version per `OS|arch` pair, rather then ~36 we had before because all locales are now packaged in a single binary. Therefore the `Tor Browser downloads and updates by locale` tab can be deprecated (as you can see, all the stats for individual locales have dropped to ~0 while `other` is approaching the sum of the previous curves).
- We would like to track downloads and update pings per desktop Tor Browser `OS|arch|channel` tuple.
- The currently supported `OS|arch` pairs are (see https://www.torproject.org/download/languages/ ):
- `Linux|x86`
- `Linux|x86_64`
- `Windows|x86`
- `Windows|x86_64`
- `macOS|universal` // aarch6 and x86_64 are bundled together for macOS
- The currently supported `channels` are `release` and `alpha`
- (dupe of https://gitlab.torproject.org/tpo/network-health/metrics/website/-/issues/29835 ) We would like to track downloads for our Android Tor Browser releases by aarch (`aarch64`,`arm`,`x86`, and `x86_64`) and channel (`release` and `alpha`) pair (see https://www.torproject.org/download/#android and https://www.torproject.org/download/alpha/) (ie `aarch64|alpha` or `x86|release`)
- (dupe of https://gitlab.torproject.org/tpo/network-health/metrics/website/-/issues/26030 ) We can deprecate the `Tor Messenger downloads and updates` tab
- We would like to add a tab for downloads of our tor expert bundle downloads (see https://www.torproject.org/download/tor/ ). These can be organized into an `OS|arch|channel` tuples (ie `macOS|universal|alpha` or `linux|x86|release`).
/cc @hirohttps://gitlab.torproject.org/tpo/network-health/metrics/website/-/issues/40074plot snowflake proxy metrics2023-09-25T17:07:41Zmeskiomeskio@torproject.orgplot snowflake proxy metricsWill be nice to have snowflake proxy metrics in the website. We could provide them by country, by nat type and by implementation.
We already have that kind of graphs in grafana, but those are not publicly visible.Will be nice to have snowflake proxy metrics in the website. We could provide them by country, by nat type and by implementation.
We already have that kind of graphs in grafana, but those are not publicly visible.https://gitlab.torproject.org/tpo/network-health/metrics/website/-/issues/40071Measure "time before new onion service becomes available"2023-05-03T15:32:47ZholmesworcesterMeasure "time before new onion service becomes available"# Background
In recent DoS attacks onion service connection times suffered generally, but one especially strong effect was on the delay before new onion services became available, which grew to multiple minutes in some cases. This delay...# Background
In recent DoS attacks onion service connection times suffered generally, but one especially strong effect was on the delay before new onion services became available, which grew to multiple minutes in some cases. This delay impacts testing of onion services and applications like [Briar](https://briarproject.org/), [Cwtch](https://cwtch.im/), and [Quiet](https://www.tryquiet.org) where users create onion services and almost immediately invite users to connect to them.
(I'm not sure about this, but it may have also affected the time for an onion service to become available after the Tor client hosting it went offline and returned online.)
# Problem
"Time before new onion service becomes available" is not currently measured and reported on Tor Metrics. As a result, Tor developers, developers building on Tor, and onion service operators currently lack visibility into this problem when it arises. This lack of information limits their ability to diagnose problems, and it delays fixes.
# Solution
Tor Metrics should measure "time before new onion service becomes available."
Also, if "time before existing but newly reconnected onion service becomes available" is a variable that behaves independently of "time before new onion service becomes available" then Tor Metrics should measure that too.https://gitlab.torproject.org/tpo/network-health/metrics/website/-/issues/40064Estimate for V3 onion services uses network fractions for V22023-07-03T13:23:19ZTTHEstimate for V3 onion services uses network fractions for V2I think the current estimate for the number of unique V3 onion addresses still uses the Network Fractions for V2.
The structure of the hidden service directory changed between version 2 and 3. V2 used the fingerprint to determine the de...I think the current estimate for the number of unique V3 onion addresses still uses the Network Fractions for V2.
The structure of the hidden service directory changed between version 2 and 3. V2 used the fingerprint to determine the descriptors the relay is responsible for, while [V3](https://gitweb.torproject.org/torspec.git/tree/rend-spec-v3.txt#n807) uses a hash of the ed25519 fingerprint, the current shared-random-value and the current time period like this:
```
hsdir_index(node) = H("node-idx" | node_identity |
shared_random_value |
INT_8(period_num) |
INT_8(period_length) )
```
Since the string "node-idx" never occurs within the source code of this project, I'm fairly certain that the correct network fractions for version 3 are never calculated.
More importantly, the hsdir_spread was changed from 3 to 4 for V3 onion services. There is a comment in [ComputedNetworkFractions.java:48](https://gitlab.torproject.org/tpo/network-health/metrics/website/-/blob/master/src/main/java/org/torproject/metrics/stats/hidserv/ComputedNetworkFractions.java#L48) that says that the fraction should be divided by 8 instead of 3 (not sure why 8, I would have divided it by 4) but as far as I can see, this never happens. Every follow up calculation uses the same network fraction that was divided by 3 (See [Parser.java:433](https://gitlab.torproject.org/tpo/network-health/metrics/website/-/blob/master/src/main/java/org/torproject/metrics/stats/hidserv/Parser.java#L433)).
Could someone with a better understanding of the source code please double check my analysis and confirm that this is indeed an issue?
_Disclaimer: I'm not a Tor developer, just a researcher who looked into Tor onion services previously (I wrote an [entry on the Tor Blog](https://blog.torproject.org/v3-onion-services-usage/) about a preliminary estimate on unique V3 onion addresses. I noticed that my estimate was too far off from the official one by Tor Metrics, so I tried to figure out what was wrong. Turns out that my calculation was wrong in a different way but while trying to reproduce the official numbers, I stumbled across this issue._HiroHirohttps://gitlab.torproject.org/tpo/network-health/metrics/website/-/issues/40054Relays get sometimes the overloaded notification bar without timestamp2022-10-17T10:11:13ZGeorg KoppenRelays get sometimes the overloaded notification bar without timestampI saw it last week and just again: sometimes relays get the overloaded notification bar *without* timestamp included:![najdorf_2022-06-13-07-36-00_overload](/uploads/e20dbfe4e4f21a95ef1a4c837b406828/najdorf_2022-06-13-07-36-00_overload.p...I saw it last week and just again: sometimes relays get the overloaded notification bar *without* timestamp included:![najdorf_2022-06-13-07-36-00_overload](/uploads/e20dbfe4e4f21a95ef1a4c837b406828/najdorf_2022-06-13-07-36-00_overload.png)
but they are not shown as overloaded otherwise:
![najdorf_2022-06-13-07-36-00_no_overload](/uploads/81d6b602fdcc947a1910891aa946e82f/najdorf_2022-06-13-07-36-00_no_overload.png)
In fact when that state kicks in it seems that *all* non-overloaded relays get tagged that way. The actually overloaded ones are both shown as overloaded and the notification bar contains a timestamp, too.https://gitlab.torproject.org/tpo/network-health/metrics/website/-/issues/40048Find a better way to visualize statistics2022-12-13T20:22:37ZHiroFind a better way to visualize statisticsWhile working on `https://gitlab.torproject.org/tpo/network-health/metrics/website/-/issues/40009` I have noticed that our current graph don't help us understanding trends and seasonality in our series.
To make a test I have run the fol...While working on `https://gitlab.torproject.org/tpo/network-health/metrics/website/-/issues/40009` I have noticed that our current graph don't help us understanding trends and seasonality in our series.
To make a test I have run the following simple decomposition analysis on bridge clients connecting from Russia between february and end of march.
```python
import pandas as pd
df = pd.read_csv('userstats-combined.csv')
```
```python
df.info()
```
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2457009 entries, 0 to 2457008
Data columns (total 8 columns):
# Column Dtype
--- ------ -----
0 date object
1 node object
2 country object
3 transport object
4 version float64
5 frac int64
6 low int64
7 high int64
dtypes: float64(1), int64(3), object(4)
memory usage: 150.0+ MB
```python
threshold = 100 # Anything that occurs less than this will be removed.
df = df[df.high >= threshold]
df = df[df.country != "??"]
date_th = '2022-02-01'
df = df[df.date >= date_th]
```
```python
df
```
<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>date</th>
<th>node</th>
<th>country</th>
<th>transport</th>
<th>version</th>
<th>frac</th>
<th>low</th>
<th>high</th>
</tr>
</thead>
<tbody>
<tr>
<th>2409766</th>
<td>2022-02-01</td>
<td>bridge</td>
<td>ae</td>
<td>obfs4</td>
<td>NaN</td>
<td>85</td>
<td>201</td>
<td>217</td>
</tr>
<tr>
<th>2409793</th>
<td>2022-02-01</td>
<td>bridge</td>
<td>ar</td>
<td>obfs4</td>
<td>NaN</td>
<td>85</td>
<td>113</td>
<td>124</td>
</tr>
<tr>
<th>2409799</th>
<td>2022-02-01</td>
<td>bridge</td>
<td>at</td>
<td>obfs4</td>
<td>NaN</td>
<td>85</td>
<td>184</td>
<td>199</td>
</tr>
<tr>
<th>2409805</th>
<td>2022-02-01</td>
<td>bridge</td>
<td>au</td>
<td>obfs4</td>
<td>NaN</td>
<td>85</td>
<td>410</td>
<td>438</td>
</tr>
<tr>
<th>2409824</th>
<td>2022-02-01</td>
<td>bridge</td>
<td>bd</td>
<td>obfs4</td>
<td>NaN</td>
<td>85</td>
<td>105</td>
<td>110</td>
</tr>
<tr>
<th>...</th>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
</tr>
<tr>
<th>2456964</th>
<td>2022-03-30</td>
<td>bridge</td>
<td>us</td>
<td>obfs4</td>
<td>NaN</td>
<td>92</td>
<td>6474</td>
<td>6577</td>
</tr>
<tr>
<th>2456966</th>
<td>2022-03-30</td>
<td>bridge</td>
<td>us</td>
<td>snowflake</td>
<td>NaN</td>
<td>92</td>
<td>681</td>
<td>682</td>
</tr>
<tr>
<th>2456974</th>
<td>2022-03-30</td>
<td>bridge</td>
<td>uz</td>
<td>obfs4</td>
<td>NaN</td>
<td>92</td>
<td>124</td>
<td>130</td>
</tr>
<tr>
<th>2456987</th>
<td>2022-03-30</td>
<td>bridge</td>
<td>vn</td>
<td>obfs4</td>
<td>NaN</td>
<td>92</td>
<td>195</td>
<td>199</td>
</tr>
<tr>
<th>2457000</th>
<td>2022-03-30</td>
<td>bridge</td>
<td>za</td>
<td>obfs4</td>
<td>NaN</td>
<td>92</td>
<td>162</td>
<td>169</td>
</tr>
</tbody>
</table>
<p>3837 rows × 8 columns</p>
</div>
```python
ru_ts = df[df['country']=='ru']
# Extract the names of the numerical columns
transports=['<OR>','obfs4','meek', 'snowflake']
metrics=['frac', 'high']
```
```python
ru_ts
```
<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>date</th>
<th>node</th>
<th>country</th>
<th>transport</th>
<th>version</th>
<th>frac</th>
<th>low</th>
<th>high</th>
</tr>
</thead>
<tbody>
<tr>
<th>2410472</th>
<td>2022-02-01</td>
<td>bridge</td>
<td>ru</td>
<td><OR></td>
<td>NaN</td>
<td>85</td>
<td>1335</td>
<td>1514</td>
</tr>
<tr>
<th>2410473</th>
<td>2022-02-01</td>
<td>bridge</td>
<td>ru</td>
<td>meek</td>
<td>NaN</td>
<td>85</td>
<td>2113</td>
<td>2120</td>
</tr>
<tr>
<th>2410475</th>
<td>2022-02-01</td>
<td>bridge</td>
<td>ru</td>
<td>obfs3</td>
<td>NaN</td>
<td>85</td>
<td>336</td>
<td>351</td>
</tr>
<tr>
<th>2410476</th>
<td>2022-02-01</td>
<td>bridge</td>
<td>ru</td>
<td>obfs4</td>
<td>NaN</td>
<td>85</td>
<td>24723</td>
<td>24918</td>
</tr>
<tr>
<th>2410478</th>
<td>2022-02-01</td>
<td>bridge</td>
<td>ru</td>
<td>snowflake</td>
<td>NaN</td>
<td>85</td>
<td>2456</td>
<td>2456</td>
</tr>
<tr>
<th>...</th>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
</tr>
<tr>
<th>2456826</th>
<td>2022-03-30</td>
<td>bridge</td>
<td>ru</td>
<td><OR></td>
<td>NaN</td>
<td>92</td>
<td>1700</td>
<td>1881</td>
</tr>
<tr>
<th>2456827</th>
<td>2022-03-30</td>
<td>bridge</td>
<td>ru</td>
<td>meek</td>
<td>NaN</td>
<td>92</td>
<td>1668</td>
<td>1675</td>
</tr>
<tr>
<th>2456828</th>
<td>2022-03-30</td>
<td>bridge</td>
<td>ru</td>
<td>obfs3</td>
<td>NaN</td>
<td>92</td>
<td>434</td>
<td>437</td>
</tr>
<tr>
<th>2456829</th>
<td>2022-03-30</td>
<td>bridge</td>
<td>ru</td>
<td>obfs4</td>
<td>NaN</td>
<td>92</td>
<td>32814</td>
<td>32994</td>
</tr>
<tr>
<th>2456831</th>
<td>2022-03-30</td>
<td>bridge</td>
<td>ru</td>
<td>snowflake</td>
<td>NaN</td>
<td>92</td>
<td>5164</td>
<td>5165</td>
</tr>
</tbody>
</table>
<p>288 rows × 8 columns</p>
</div>
First I plot statistics per transport. I plot frac and high metrics for each of them.
```python
import matplotlib.pyplot as plt
# Plot time series for each sensor with BROKEN state marked with X in red color
for t in transports:
serie = ru_ts[ru_ts.transport == t]
for m in metrics:
_ = plt.figure(figsize=(18,3))
_ = plt.plot(serie.date, serie[m], color='blue')
_ = plt.title("{} - {}".format(t,m))
_ = plt.gcf().autofmt_xdate()
for xc in serie.date:
plt.axvline(x=xc, color='black', linestyle='--')
_ = plt.axvline(x=xc, color='black', linestyle='--')
plt.show()
```
![output_6_0](/uploads/14e2c45f9d76d524acc9386de2018a2f/output_6_0.png)
![output_6_1](/uploads/baaad1ab15e67e6addb6ddb7cf4af8c2/output_6_1.png)
![output_6_2](/uploads/86c81b99998b5c84ffe6b8ab11f6d506/output_6_2.png)
![output_6_3](/uploads/4436565a4e09894e12c131e430f52440/output_6_3.png)
![output_6_4](/uploads/158beb4a36f01f4eadd6857ca1d0f2d2/output_6_4.png)
![output_6_5](/uploads/22eeabbbf4695c89583178ec74d34694/output_6_5.png)
![output_6_6](/uploads/d8e2f5da8f7ddf680d3ff8d31f98f753/output_6_6.png)
![output_6_7](/uploads/9858555f92e341fc47aeccb8fb3e2937/output_6_7.png)
Now I run a seasonal decomposition on the snowflake transport and high metric. I use a period of 8 days since in this paper https://arxiv.org/pdf/1507.05819.pdf they have identified a weekly seasionality for Tor users (which is generally the case for internet users).
```python
from statsmodels.tsa.seasonal import seasonal_decompose
serie = pd.DataFrame(ru_ts[ru_ts.transport == 'snowflake']['high'])
decompose_result_mult = seasonal_decompose(serie, model="multiplicative", extrapolate_trend='freq', period=8)
trend = decompose_result_mult.trend
seasonal = decompose_result_mult.seasonal
residual = decompose_result_mult.resid
_ = plt.figure(figsize=(18,10))
_ = plt.title("trend")
_ = trend.plot()
plt.show()
_ = plt.figure(figsize=(18,10))
_ = plt.title("seasonal")
_ = seasonal.plot()
plt.show()
_ = plt.figure(figsize=(18,10))
_ = plt.title("residual")
_ = residual.plot()
plt.show()
```
![output_7_0](/uploads/6e7b154501219271f89f8dc87e9b797b/output_7_0.png)
![output_7_1](/uploads/3956d7c83b05d23f30c1dbd8ca04e8c9/output_7_1.png)
![output_7_2](/uploads/1e5f80a70f909094e594ebd79f61dd70/output_7_2.png)
This last bit is some differentials I was playing with. It should be polished but gives an idea of how things change between one day and the next.
```python
for t in transports:
serie = ru_ts[ru_ts.transport == t]
for m in metrics:
_ = plt.figure(figsize=(18,3))
X = serie[m].values
diff = list()
for i in range(1, len(X)):
value = X[i] - X[i - 1]
diff.append(value)
_ = plt.plot(diff, color='blue')
_ = plt.title("{} - {}".format(t,m))
_ = plt.gcf().autofmt_xdate()
plt.show()
```
![output_8_0](/uploads/32e9e4ac72a2b0f8aaca68fb5814e105/output_8_0.png)
![output_8_1](/uploads/b8fa1d802e00e07223415ec4c2954462/output_8_1.png)
![output_8_2](/uploads/5cc2d509c785e2cbcfcb344668882fa3/output_8_2.png)
![output_8_3](/uploads/b611d66db2528371dc92c0c53ea7e471/output_8_3.png)
![output_8_4](/uploads/913ed4d87d38ee2a4ca40bb75b8becde/output_8_4.png)
![output_8_5](/uploads/60c0ef256b16e348757e57dad2091d76/output_8_5.png)
![output_8_6](/uploads/58c4765e4fac0d93dfabae5675939735/output_8_6.png)
![output_8_7](/uploads/7861799fe6e7132edde4f599c134f539/output_8_7.png)
```python
```HiroHirohttps://gitlab.torproject.org/tpo/network-health/metrics/website/-/issues/40046Display ratelimit and file descriptor overload information on relay search as...2022-10-17T10:11:28ZHiroDisplay ratelimit and file descriptor overload information on relay search as wellWe are currently displaying only the overload general information on relay search and it has been brought to our attention that we should probably display also the overload-fd-exhaustion and overload-ratelimits too.We are currently displaying only the overload general information on relay search and it has been brought to our attention that we should probably display also the overload-fd-exhaustion and overload-ratelimits too.https://gitlab.torproject.org/tpo/network-health/metrics/website/-/issues/40027Use onionperf to graph onionperf data2023-01-23T15:01:20ZHiroUse onionperf to graph onionperf dataOnionperf has a visualization module that could be used to graph onionperf data directly and print the graph we need for the website.Onionperf has a visualization module that could be used to graph onionperf data directly and print the graph we need for the website.https://gitlab.torproject.org/tpo/network-health/metrics/website/-/issues/40026Some changes in the website require reprocessing the entire history of tarballs.2022-03-21T15:58:32ZHiroSome changes in the website require reprocessing the entire history of tarballs.Enabling/deploying some new features in the website require changes that involve the step between reading descriptors and putting data *into* the database. This would require reprocessing the entire history of tarballs to ensure that all...Enabling/deploying some new features in the website require changes that involve the step between reading descriptors and putting data *into* the database. This would require reprocessing the entire history of tarballs to ensure that all the previous dates/times had the new calculation.Metrics OKRs 2021https://gitlab.torproject.org/tpo/network-health/metrics/website/-/issues/40017Add Directory traffic graphs to Relay Detail metrics site2021-09-28T14:20:22ZpseudonymisaTorAdd Directory traffic graphs to Relay Detail metrics siteAdd Directory traffic graphs to Relay Detail metrics site, at least for Relays that report them,
that is:
`dirreq-write-history` & `dirreq-read-history`
(non-zero) and have `DirCache` (consensus `tunnelled-dir-server`) enabled or Flag V2...Add Directory traffic graphs to Relay Detail metrics site, at least for Relays that report them,
that is:
`dirreq-write-history` & `dirreq-read-history`
(non-zero) and have `DirCache` (consensus `tunnelled-dir-server`) enabled or Flag V2Dir.
If you as Relay operator can see, how many usages actually you get on answering Directory Requests.
This will help us answer questions like:
"How many bytes of Traffic from my Relay are used for Directory answers, in percentage compared to relayed data?"
[`DirReqStatistics`](http://jqyzxhjk6psc6ul5jnfwloamhtyh7si74b4743k2qgpskwwxrzhsxmad.onion/docs/tor-manual-dev.html.en#DirReqStatistics)
> Relays and bridges only. When this option is enabled, a Tor directory writes statistics on the number and response time of network status requests to disk every 24 hours. Enables relay and bridge operators to monitor how much their server is being used by clients to learn about Tor network. If ExtraInfoStatistics is enabled, it will published as part of extra-info document. (Default: 1)https://gitlab.torproject.org/tpo/network-health/metrics/website/-/issues/40016Add IPv6 traffic graphs to Relay Detail metrics site2021-09-28T14:20:21ZpseudonymisaTorAdd IPv6 traffic graphs to Relay Detail metrics site
Add IPv6 traffic graphs to Relay Detail metrics site, at least for Relays that report them,
that is:
`ipv6-write-history` & `ipv6-read-history`
(non-zero) and have IPv6 OrPort or Flags; ReachableIPv6 or IPv6 Exit.
If you as Relay oper...
Add IPv6 traffic graphs to Relay Detail metrics site, at least for Relays that report them,
that is:
`ipv6-write-history` & `ipv6-read-history`
(non-zero) and have IPv6 OrPort or Flags; ReachableIPv6 or IPv6 Exit.
If you as Relay operator can see, how many usages actually you get on IPv6, it could hopefully increase the migration to IPv6.
This will help us answer questions like:
"How many bytes of Traffic from my Relay travel over IPv6 in percentage compared to IPv4?"
Related: #23761https://gitlab.torproject.org/tpo/network-health/metrics/website/-/issues/40015add ExitPortStatistics graph2021-09-28T14:20:20ZpseudonymisaToradd ExitPortStatistics graphSince Relay Operators can opt in to report their exit usage per port, show some metrics about it.
Add ExitPortStatistics graph to the relay page of Relay Search Details for: Exit
And globally?
[`ExitPortStatistics`](http://jqyzxhjk6ps...Since Relay Operators can opt in to report their exit usage per port, show some metrics about it.
Add ExitPortStatistics graph to the relay page of Relay Search Details for: Exit
And globally?
[`ExitPortStatistics`](http://jqyzxhjk6psc6ul5jnfwloamhtyh7si74b4743k2qgpskwwxrzhsxmad.onion/docs/tor-manual-dev.html.en#ExitPortStatistics)
> Exit relays only. When this option is enabled, Tor writes statistics on the number of relayed bytes and opened stream per exit port to disk every 24 hours. Enables exit relay operators to measure and monitor amounts of traffic that leaves Tor network through their exit node. If ExtraInfoStatistics is enabled, it will be published as part of extra-info document.
This will help us answer questions like:
What percentage of Exit Traffic is "Website" only traffic?
What is the biggest share of Traffic per destination port? Not torrents!
Which exit country does share the most Traffic for which destination ports?https://gitlab.torproject.org/tpo/network-health/metrics/website/-/issues/40014Add Exit Port Policy graph2021-09-28T14:20:18ZpseudonymisaTorAdd Exit Port Policy graphTo get an ExitFlag, a relay need exit to only Ports 80 & 443. Basically, metrics answer how many Exit Nodes will my Tor Browser possibly use for now.
Add Exit Port graphed by Consensus Weight and relay count.
This will help us answer qu...To get an ExitFlag, a relay need exit to only Ports 80 & 443. Basically, metrics answer how many Exit Nodes will my Tor Browser possibly use for now.
Add Exit Port graphed by Consensus Weight and relay count.
This will help us answer questions like:
How many Exit Relays use [ReducedExitPolicy](jqyzxhjk6psc6ul5jnfwloamhtyh7si74b4743k2qgpskwwxrzhsxmad.onion/docs/tor-manual-dev.html.en# ReducedExitPolicy)?
- How many Exit Relays may will my Port 25, 465, 587, 110, 143, 993, 995 E-Mail Circuits choose to use from?
- How many Exit Relays may will my Port 6667 & 6697 IRC Circuits choose to use from?
- How many Exit Relays may will my Port 22 SSH Circuits choose to use from?
... DNS (53), FTP (21) ...
Per Country? Per IP 4&6 Version?
User can learn if all of his connections to special service port will go over only low fraction of Exit or like for web traffic over large set of ExitRelays.