Trac issueshttps://gitlab.torproject.org/legacy/trac/-/issues2020-06-13T18:12:42Zhttps://gitlab.torproject.org/legacy/trac/-/issues/3856metrics/data page needs to link to format docs2020-06-13T18:12:42ZRoger Dingledinemetrics/data page needs to link to format docsPeriodically I tell people about datasets on the metrics page. But the text on the data page just says things are "sanitized", not how.
For example, in https://metrics.torproject.org/data.html#bridgeassignments what is the hash in the f...Periodically I tell people about datasets on the metrics page. But the text on the data page just says things are "sanitized", not how.
For example, in https://metrics.torproject.org/data.html#bridgeassignments what is the hash in the file?
We should document the formats of each of our data files, and especially point out where to learn about privacy-preserving changes we make.
We probably have much of this documentation already written, and scattered in various places. But it's not clear from the webpage where to look.Karsten LoesingKarsten Loesinghttps://gitlab.torproject.org/legacy/trac/-/issues/3836relay-search queries never answer2020-06-13T18:12:42ZRoger Dingledinerelay-search queries never answerhttps://metrics.torproject.org/relay-search.html and ask it about 'moria1'. It never answers.
Is the db screwed up somehow? Or is something else the matter?https://metrics.torproject.org/relay-search.html and ask it about 'moria1'. It never answers.
Is the db screwed up somehow? Or is something else the matter?Karsten LoesingKarsten Loesinghttps://gitlab.torproject.org/legacy/trac/-/issues/3822Should we not publish data points when the ratio of stat-reporting relays is ...2020-06-13T18:12:41ZRoger DingledineShould we not publish data points when the ratio of stat-reporting relays is too low?In #3338 we discovered that some of the big spikes on the usage graphs are due to dips in the fraction of directory mirrors reporting stats.
Perhaps we should go through and take out the data points for the most extreme dips? Right now ...In #3338 we discovered that some of the big spikes on the usage graphs are due to dips in the fraction of directory mirrors reporting stats.
Perhaps we should go through and take out the data points for the most extreme dips? Right now we show usage spikes that are false positives, which distract from the usage spikes that actually reflect an increase in users.
If we should, what fraction of reporting nodes should be the cutoff?Karsten LoesingKarsten Loesinghttps://gitlab.torproject.org/legacy/trac/-/issues/3821"no mans land" user usage stats points to wrong url?2020-06-13T18:12:41ZRoger Dingledine"no mans land" user usage stats points to wrong url?https://metrics.torproject.org/users.html#direct-users-table
includes 1.96% at "no-man's-land", and the link points to
https://metrics.torproject.org/users.html?graph=direct-users&country=??#direct-users
which doesn't appear to be handle...https://metrics.torproject.org/users.html#direct-users-table
includes 1.96% at "no-man's-land", and the link points to
https://metrics.torproject.org/users.html?graph=direct-users&country=??#direct-users
which doesn't appear to be handled by the main users graph.
Similarly, in the censorship detector, the link points to
https://metrics.torproject.org/users.html?graph=direct-users&country=ap&events=on#direct-users
which is also not handled.
So should it be ?? or ap? Once we pick one, we should also make it work.Karsten LoesingKarsten Loesinghttps://gitlab.torproject.org/legacy/trac/-/issues/3736metrics.tpo user graphs are very similar for european countries2020-06-13T18:12:40ZAndrew Lewmanmetrics.tpo user graphs are very similar for european countriesperhaps just a coincidence, but germany, france, switzerland, austria, and US all have very similar usage graphs.
See https://metrics.torproject.org/users.html?graph=direct-users&start=2011-05-16&end=2011-07-10&country=de&dpi=72#direct-...perhaps just a coincidence, but germany, france, switzerland, austria, and US all have very similar usage graphs.
See https://metrics.torproject.org/users.html?graph=direct-users&start=2011-05-16&end=2011-07-10&country=de&dpi=72#direct-users
versus
https://metrics.torproject.org/users.html?graph=direct-users&start=2011-05-16&end=2011-07-10&country=fr&dpi=72#direct-users
versus
https://metrics.torproject.org/users.html?graph=direct-users&start=2011-05-16&end=2011-07-10&country=us&dpi=72#direct-usersKarsten LoesingKarsten Loesinghttps://gitlab.torproject.org/legacy/trac/-/issues/3624Create graph of top ten countries by connection count2020-06-13T18:12:40ZAndrew LewmanCreate graph of top ten countries by connection countCan we have an automatically updating table of top ten countries by tor usage with a second column of percent of total?
Something like:
Country, # of connections seen in past 24 hours, % of total connections
Being able to choose timel...Can we have an automatically updating table of top ten countries by tor usage with a second column of percent of total?
Something like:
Country, # of connections seen in past 24 hours, % of total connections
Being able to choose timelines would also be good for this table.Karsten LoesingKarsten Loesinghttps://gitlab.torproject.org/legacy/trac/-/issues/3567Make relay-search look at exit list data2020-06-13T18:12:39ZRoger DingledineMake relay-search look at exit list dataWhen people are searching by IP address, they often want to know if a given IP address "was" a Tor relay. If they look up by the IP address they see in their weblogs, but the relay is multihomed, they could miss it.When people are searching by IP address, they often want to know if a given IP address "was" a Tor relay. If they look up by the IP address they see in their weblogs, but the relay is multihomed, they could miss it.Karsten LoesingKarsten Loesinghttps://gitlab.torproject.org/legacy/trac/-/issues/3397order more metrics data newest to oldest2020-06-13T18:12:39ZRoger Dingledineorder more metrics data newest to oldesthttp://metrics.torproject.org/data.html#bridgedesc should probably list the most recent month first.
Probably there are other data lists that could use similar reordering.http://metrics.torproject.org/data.html#bridgedesc should probably list the most recent month first.
Probably there are other data lists that could use similar reordering.Karsten LoesingKarsten Loesinghttps://gitlab.torproject.org/legacy/trac/-/issues/3364Make labels in relay version graph more useful2020-06-13T18:12:38ZKarsten LoesingMake labels in relay version graph more usefulRuna says new users have trouble making sense of the labels "0.2.1," "0.2.2," etc. in the [relay version graph](https://metrics.torproject.org/network.html#versions). They need something like "0.2.1 (stable)" and "0.2.2 (unstable)," or ...Runa says new users have trouble making sense of the labels "0.2.1," "0.2.2," etc. in the [relay version graph](https://metrics.torproject.org/network.html#versions). They need something like "0.2.1 (stable)" and "0.2.2 (unstable)," or they don't understand the graph.
Here are two ways to implement this:
- Add labels to the current stable and unstable version in the legend. This might be "0.2.1 (stable)," "0.2.2 (unstable)," and "0.2.3 (experimental)" right now. Don't add labels to older versions. However, this may be confusing if people look at a graph from, say, 2009, seeing 0.2.1 as the "stable" version. Also, we'd have to update the labels whenever a new version becomes stable. A graph from 2009 would look even funnier with 0.2.2 as the "stable" version.
- Add a comment to the text above the graph saying which versions are stable, unstable, and experimental right now. This works for people reading text, but who does that when there's a graph right below?
Right now I prefer the second approach. Are there other approaches?Karsten LoesingKarsten Loesinghttps://gitlab.torproject.org/legacy/trac/-/issues/3295metrics graphs shmush their labels together2020-06-13T18:12:37ZRoger Dingledinemetrics graphs shmush their labels togetherhttp://metrics.torproject.org/network.html?graph=bandwidth&start=2010-05-26&end=2011-05-26&dpi=72#bandwidth
has x axis labels like "Jun-2010" and "Jul-2010" that are run together.http://metrics.torproject.org/network.html?graph=bandwidth&start=2010-05-26&end=2011-05-26&dpi=72#bandwidth
has x axis labels like "Jun-2010" and "Jul-2010" that are run together.Karsten LoesingKarsten Loesinghttps://gitlab.torproject.org/legacy/trac/-/issues/3239Provide per-country graphs for multiple countries at once2020-06-13T18:12:37ZKarsten LoesingProvide per-country graphs for multiple countries at onceWe have a few graphs showing [statistics per country](https://metrics.torproject.org/users.html?graph=direct-users&country=de#direct-users). We should allow users to select multiple countries to generate a graph with statistics for all ...We have a few graphs showing [statistics per country](https://metrics.torproject.org/users.html?graph=direct-users&country=de#direct-users). We should allow users to select multiple countries to generate a graph with statistics for all selected countries.
The first question is how an intuitive interface for selecting multiple countries would look like. Is a multi-select list box with 247 entries even manageable? What other options are there that don't require JavaScript?
The next question is whether lines should be drawn in different colors using a single y axis or as multiple stacked y axes. The former makes it easier to compare absolute numbers to each other, the latter allows to compare trends even if absolute numbers have different orders of magnitudes. Both approaches are probably limited to something around 5 countries at a time before graphs become incomprehensible. Of course, we might offer the user to select one of the graph types without much additional effort.Karsten LoesingKarsten Loesinghttps://gitlab.torproject.org/legacy/trac/-/issues/3238Provide better descriptor lookup on metrics website2020-06-13T18:12:37ZKarsten LoesingProvide better descriptor lookup on metrics websiteThe metrics website provides a lookup function for [network status consensuses](https://metrics.torproject.org/consensus?valid-after=2011-05-19-11-00-00), [server descriptors](https://metrics.torproject.org/serverdesc?desc-id=7521462be2b...The metrics website provides a lookup function for [network status consensuses](https://metrics.torproject.org/consensus?valid-after=2011-05-19-11-00-00), [server descriptors](https://metrics.torproject.org/serverdesc?desc-id=7521462be2bb7c5892d2d47516444491ea8cd240), and [extra-info descriptors](https://metrics.torproject.org/extrainfodesc?desc-id=2a06c04652bc89288bcc291ac70b07b280d7e677). It does not, however, implement a lookup for network status votes, key certificates, or other directory protocol documents.
In theory, it should be possible to look up most or even all metrics data contained in the tarballs on the [Data page](https://metrics.torproject.org/data.html). A unified descriptor lookup interface wouldn't hurt, either.Karsten LoesingKarsten Loesinghttps://gitlab.torproject.org/legacy/trac/-/issues/3235Re-configure Apache running the metrics website to redirect 404s to Tomcat2020-06-13T18:12:36ZKarsten LoesingRe-configure Apache running the metrics website to redirect 404s to TomcatThe metrics website consists of an Apache that serves some static content and redirects requests for dynamic content to Tomcat. But when Apache encounters a problem with static content, e.g., when requesting a [non-existing file](https:...The metrics website consists of an Apache that serves some static content and redirects requests for dynamic content to Tomcat. But when Apache encounters a problem with static content, e.g., when requesting a [non-existing file](https://metrics.torproject.org/data/exit-list-2021-05.tar.bz2), it returns a 404 itself. Instead, Apache should show the same 404 error page that Tomcat would display. Users shouldn't even notice that the website is run by Apache and Tomcat. Very low prio.Karsten LoesingKarsten Loesinghttps://gitlab.torproject.org/legacy/trac/-/issues/2922Improve searching for relays in metrics database2020-06-13T18:12:34ZKarsten LoesingImprove searching for relays in metrics databaseOur [relay search](https://metrics.torproject.org/relay-search.html) function on the metrics website has serious performance problems. Some searches return after under a second, but some searches take 2 minutes or longer. It's okay for...Our [relay search](https://metrics.torproject.org/relay-search.html) function on the metrics website has serious performance problems. Some searches return after under a second, but some searches take 2 minutes or longer. It's okay for a search to take a few seconds, but there shouldn't be a variance this high.
All searches are based on a single (very large) table that contains one row per relay listed in a network status consensus. Our current assumption why searches are slow is that indexes have grown too large.
Sebastian and I tried to create separate tables for the fields that users can search for, which looked promising. But after one of the steps to populate these helper tables did not finish after five days, we gave up.
Someone should brainstorm about redesigning our [database schema](https://gitweb.torproject.org/metrics-web.git/blob/HEAD:/db/tordir.sql) and try out a couple of approaches to search for relays with a couple months of data. Once it turns out that one approach is better than the current one, we also need a migration strategy to convert our database to the new schema.Karsten LoesingKarsten Loesinghttps://gitlab.torproject.org/legacy/trac/-/issues/2913Tag and release metrics-web version 0.0.12020-06-13T18:12:33ZKarsten LoesingTag and release metrics-web version 0.0.1We should start tagging metrics-web versions and write a ChangeLog.
It may seem strange to tag a website, but I think we need stable versions for the underlying database and R parts. For example, people might want to reuse the database...We should start tagging metrics-web versions and write a ChangeLog.
It may seem strange to tag a website, but I think we need stable versions for the underlying database and R parts. For example, people might want to reuse the database parts, throw away the JSP/servlet parts, and write their own web front-end. Or they want to keep the R parts to generate graphs for their own website. Changes to these parts should be listed in the ChangeLog, in contrast to content changes.
Before we can tag we need to write down some documentation for setting up metrics-web. Ugh.Karsten LoesingKarsten Loesinghttps://gitlab.torproject.org/legacy/trac/-/issues/1841Implement node churn and uptime statistics2020-06-13T18:12:31ZTracImplement node churn and uptime statisticsI spent some time this summer designing a schema to support tracking of relay uptime and churn statistics. The relay churn statistic should be split up by platform, version, and guard/exit status for a more fine-tuned insight into the ne...I spent some time this summer designing a schema to support tracking of relay uptime and churn statistics. The relay churn statistic should be split up by platform, version, and guard/exit status for a more fine-tuned insight into the network. The uptime statistic should be split into guard/exit status and version, as it only sees individual platforms. Also, the data the query returns currently is good for a time graph (similar to Karsten's [Windows relay uptime](https://trac.torproject.org/projects/tor/ticket/1721) graph), but it could be portrayed as a box-plot distribution.
Relay churn is calculated from the unique routers from one week/month/year that appear in the following week/month/year, and is relatively straightforward to calculate. However, this query could use some optimization because it takes a very long time to group individual routers by the times they appear.
Relay uptime is more difficult to calculate with a database query because "uptime sessions" need to be calculated in order to get a correct average. This is near impossible to do with a database query, and must be done programatically (with cursors in pl/pgsql or elsewhere).
**Trac**:
**Username**: kjbbbKarsten LoesingKarsten Loesinghttps://gitlab.torproject.org/legacy/trac/-/issues/11084Define criteria for sending out welcome mails using Onionoo's details documents2020-06-13T18:11:36ZKarsten LoesingDefine criteria for sending out welcome mails using Onionoo's details documentsThe current Weather sends out welcome mails to operators of new relays. From the design document:
"""
The database backing Tor Weather stores information about every router seen in consensus within the past year (or for as long as Tor ...The current Weather sends out welcome mails to operators of new relays. From the design document:
"""
The database backing Tor Weather stores information about every router seen in consensus within the past year (or for as long as Tor Weather has been running, whichever is shorter). If a router in the database is flagged as stable, its operator has not been sent a welcome email, and its operator is not subscribed to Tor Weather, the router operator's email is parsed from the contact field in the descriptor file for that node. A welcome email containing information about Tor Weather is sent to these node operators if an email address was successfully parsed.
The welcome email thanks the operator for his/her contribution and encourages the operator to subscribe to Tor Weather. If the router allows exits to port 80, information is appended to the email containing links to legal help for exit relay operators.
The welcome emails are not a subscribable notification type and are sent to all new, stable relay operators who a) haven't subscribed to Tor Weather and b) provide a parsable (by our standards) email address in the contact field of their configuration file.
The welcome email is intended for new stable relay operators. To avoid sending the welcome email to long-term relay operators at startup, a 48-hour delay period has been implemented immediately following deployment. Any relays added to the database within the first 48-hours following deployment are exempt from the welcome email. That way, the operators who are emailed should largely be new to the network, and relay operators who have been running for a while shouldn't get the welcome email.
"""
I _think_ we can implement this with Onionoo's current details documents:
- Only consider relays with `first_seen > our_deployment_time` to exclude relays that were already around before Weather was (re-)deployed.
- Only consider relays with `Stable` in `flags` to exclude (currently) non-stable relays.
- Attempt to parse the email address from `contact`.
- Only send out a new welcome mail if we didn't send one before. Remember that we sent a welcome mail now.
- Delete all entries from the database with a timestamp more than 6 months ago.
- In fact, remember that cutoff time and only consider relays with `first_seen > max(our_deployment_time, cutoff_time)`.
So, I guess what I want to say is that we probably don't need a special field in Onionoo that says when the relay first obtained the Stable flag. Unless we need it?
Let's keep this ticket open until we have good criteria and maybe even until we have implement them in a welcome-mail-sending script.Karsten LoesingKarsten Loesinghttps://gitlab.torproject.org/legacy/trac/-/issues/10699Find out how many users are currently subscribed to Weather2020-06-13T18:11:32ZKarsten LoesingFind out how many users are currently subscribed to WeatherBlocking on #10698. Once I have access, I'll post some numbers here. These numbers are going to help us decide whether we should put more effort on the part of weather that informs relay operators about problems with their relay, or on...Blocking on #10698. Once I have access, I'll post some numbers here. These numbers are going to help us decide whether we should put more effort on the part of weather that informs relay operators about problems with their relay, or on the part that emails people (who are not yet subscribed) that they qualify for a t-shirt.Karsten LoesingKarsten Loesinghttps://gitlab.torproject.org/legacy/trac/-/issues/9197Make consensus-health message about missing signatures clearer2020-06-13T18:10:57ZKarsten LoesingMake consensus-health message about missing signatures clearerweasel was rightly confused about the message: "NOTICE: The consensuses downloaded from the following authorities are missing signatures from previously voting authorities: Faravahar, gabelmoo, maatuska, moria1, tor26, turtles"
This mes...weasel was rightly confused about the message: "NOTICE: The consensuses downloaded from the following authorities are missing signatures from previously voting authorities: Faravahar, gabelmoo, maatuska, moria1, tor26, turtles"
This message was because urras didn't sign the consensus, not because one of the listed authorities did something wrong. Rephrase this.Karsten LoesingKarsten Loesinghttps://gitlab.torproject.org/legacy/trac/-/issues/8393Don't warn when dirauths run a version that's too new2020-06-13T18:10:56ZSebastian HahnDon't warn when dirauths run a version that's too newI'm running master on gabelmoo currently, and consensus-health is complaining about an unrecommended version. The check should probably be adapted to only list versions which are too old, like Tor does when deciding whether to warn or ju...I'm running master on gabelmoo currently, and consensus-health is complaining about an unrecommended version. The check should probably be adapted to only list versions which are too old, like Tor does when deciding whether to warn or just give a notice. I'd prefer not to be notified at all in such a case, tho.Karsten LoesingKarsten Loesing