Trac issueshttps://gitlab.torproject.org/legacy/trac/-/issues2020-06-13T17:49:57Zhttps://gitlab.torproject.org/legacy/trac/-/issues/2966Include bridge country codes in sanitized bridge descriptors2020-06-13T17:49:57ZKarsten LoesingInclude bridge country codes in sanitized bridge descriptorsBefore May 2010, we resolved bridge IP addresses to country codes in the sanitizing process and appended them to the nickname, e.g., UnnamedDE. This was possible, because we sanitized descriptors in batches which was a semi-automatic pr...Before May 2010, we resolved bridge IP addresses to country codes in the sanitizing process and appended them to the nickname, e.g., UnnamedDE. This was possible, because we sanitized descriptors in batches which was a semi-automatic process.
When we switched to sanitizing descriptors automatically, we took country codes out, because the code was too complex and broke all the time. The difficulties were that descriptors can arrive in any order (while we still need to come up with the same set of sanitized descriptors) and that we need to update the GeoIP database periodically. That's not impossible to solve, but also not easy to get right.
We should find a way to make this work, maybe by relaxing the requirement to update the GeoIP database at all. The old code that we took out in May 2010 is [here](https://gitweb.torproject.org/collector.git/commit/?id=1622004ac7f272c33103ad54c92dd3516303f5d2).https://gitlab.torproject.org/legacy/trac/-/issues/4943Make network graphs with X / consensus weight?2020-06-13T18:12:49ZSebastian HahnMake network graphs with X / consensus weight?It'd be cool if we could get all the graphs that show a number of relays also with the consensus weight of those relaysIt'd be cool if we could get all the graphs that show a number of relays also with the consensus weight of those relayshttps://gitlab.torproject.org/legacy/trac/-/issues/5830Write tool to automate web queries to Tor; and use Stem to track stream/circ ...2020-06-13T17:07:47ZRoger DingledineWrite tool to automate web queries to Tor; and use Stem to track stream/circ allocation and resultsAs part of #5752 we need to know how many circuits we're making now, how many we're discarding early because a stream didn't work, etc.
This is a two-part project: first is a tool to automatically make a series of requests to Tor, in a ...As part of #5752 we need to know how many circuits we're making now, how many we're discarding early because a stream didn't work, etc.
This is a two-part project: first is a tool to automatically make a series of requests to Tor, in a repeatable way, and second is a Tor controller script, probably using Stem, that watches stream and circuit events (and maybe more), and tracks which streams get allocated to which circuits, how many total circuits are made, how quickly results return, and other statistics. Then we would change the underlying Tor, replay the same set of requests, and know what circuit behaviors to expect.
I expect we'll also discover that we don't export enough info via the control protocol to make good conclusions; in that case we'll also want to modify Tor to export this info.https://gitlab.torproject.org/legacy/trac/-/issues/6369Gather empirical data on AES/RSA operations performed by typical relays or br...2020-06-13T17:07:47ZKarsten LoesingGather empirical data on AES/RSA operations performed by typical relays or bridgesAt the Florence hackfest I was asked for the typical number of AES/RSA operations performed by a relay or bridge, say, per day. This information is relevant, e.g., for designing the hardware capabilities of a Torouter device.
Here's my...At the Florence hackfest I was asked for the typical number of AES/RSA operations performed by a relay or bridge, say, per day. This information is relevant, e.g., for designing the hardware capabilities of a Torouter device.
Here's my plan to find out:
- We can easily derive the number of AES operations by looking at the total traffic pushed by relays and bridges. Extra-info descriptors contain bandwidth histories that we could use here. If we assume 1 AES operation per 16 written or read bytes, we should be quite close to reality.
- Nick suggested to send a USR1 signal to a tor process to write lines like this to their log (this line comes from a client):
`Jul 10 18:31:22.904 [info] PK operations: 0 directory objects signed, 0 directory objects verified, 0 routerdescs signed, 2968 routerdescs verified, 216 onionskins encrypted, 0 onionskins decrypted, 30 client-side TLS handshakes, 0 server-side TLS handshakes, 0 rendezvous client operations, 0 rendezvous middle operations, 0 rendezvous server operations.`
We could ask a few friendly relay and bridge operators on tor-talk to tell us the number of encrypted and decrypted onionskins together with the fingerprint. We can then look at the uptime of that relay or bridge in the descriptor archives and compute the average number of operations per day.
Is there an easier way to find out how many AES/RSA operations a relay or bridge does per day?https://gitlab.torproject.org/legacy/trac/-/issues/6450Compass' command-line script can't encode unicode characters2020-06-13T17:52:59ZKarsten LoesingCompass' command-line script can't encode unicode charactersToday I found that `tail` and `less` are unhappy about the task #6329 script printing out unicode characters. When piping its output into `tail` or `less`, the script exits with a traceback. When writing to stdout directly, Python is h...Today I found that `tail` and `less` are unhappy about the task #6329 script printing out unicode characters. When piping its output into `tail` or `less`, the script exits with a traceback. When writing to stdout directly, Python is happy.
Here's how to reproduce the problem:
- Clone the metrics-tasks repository.
- Navigate to the #6329 script and make it download required data: `cd task-6329/; ./tor-relays-stats.py -d`
- Find a unicode character in an AS name: `grep -B1 "as_name.*\\\\u" details.json`
- Display relays in that AS, e.g. AS28548: `./tor-relays-stats.py -i -a 28548 | tail`
Python should print out the following traceback:
```
Traceback (most recent call last):
File "./tor-relays-stats.py", line 197, in <module>
short=70 if options.short else None)
File "./tor-relays-stats.py", line 110, in print_groups
print formatted_group[:short]
UnicodeEncodeError: 'ascii' codec can't encode character u'\xf3' in position 144: ordinal not in range(128)
```
I found that a possible solution is to replace all Unicode characters with '?'s, but that doesn't seem very elegant:
```
- exit, guard, country, as_number, as_name)
+ exit, guard, country, as_number, as_name.encode('ascii', 'replace'))
```
Are there better solutions?https://gitlab.torproject.org/legacy/trac/-/issues/6473Add research idea for bandwidth related anonymity set reduction2020-06-13T17:07:48ZproperAdd research idea for bandwidth related anonymity set reductionAttack:
* The target hosts a hidden service.
* A linguist determines, the target is living in country X.
* Or it's a blog about things in country X.
* Thus, the assumption that the target's hidden service is running in country X has ...Attack:
* The target hosts a hidden service.
* A linguist determines, the target is living in country X.
* Or it's a blog about things in country X.
* Thus, the assumption that the target's hidden service is running in country X has a high probability to be true.
* Easy to research (example): the fastest A Mbps line is only available in a very few parts of the country. Maybe only in one city. Most people have B Mbps and a few one still an old contract with the slow C Mbps.
* The adversary buys lots of servers in different countries, installs Tor on those servers and uses Tor as a client.
* The adversary can build now lots of circuits from geographical diverse places and probes the server by connecting to it's hidden service. The adversary can now accumulate how much down/upload speed the hidden service can provide.
* Thus, the adversary knows now something more about his target and if A Mbps is only available in a few places he has nailed down the amount of suspects.
Another unrelated open question:
* Preliminary consideration: Unless stream isolation is used, exit relays can correlate different activity from one user.
* Can exit nodes differentiate "This is the user who keeps on reading some.site with a A Mbps line vs this is the user who keeps reading some.site with a C Mbps line line?"?https://gitlab.torproject.org/legacy/trac/-/issues/6619Make AS number clickable when grouping by AS2020-06-13T17:53:01ZSathyanarayanan GunasekaranMake AS number clickable when grouping by ASWhen grouping by AS -- if the AS number is clicked, the list of relays under this AS should be shown.When grouping by AS -- if the AS number is clicked, the list of relays under this AS should be shown.https://gitlab.torproject.org/legacy/trac/-/issues/6639Rename navigation bar entries, add About page and logo2020-06-13T17:53:02ZKarsten LoesingRename navigation bar entries, add About page and logoPlease see [branch footers in my public repo](https://gitweb.torproject.org/user/karsten/compass.git/shortlog/refs/heads/footers). If you like it, please merge into origin master. Thanks!Please see [branch footers in my public repo](https://gitweb.torproject.org/user/karsten/compass.git/shortlog/refs/heads/footers). If you like it, please merge into origin master. Thanks!https://gitlab.torproject.org/legacy/trac/-/issues/6648Update to latest Bootstrap version2020-06-13T17:53:03ZSathyanarayanan GunasekaranUpdate to latest Bootstrap versionTwitter just updated bootstrap to 2.1 - http://twitter.github.com/bootstrap/index.html We should make compass use this. Currently compass is using 2.0.4. Setting priority to trivial since we don't want to break anything with this change.Twitter just updated bootstrap to 2.1 - http://twitter.github.com/bootstrap/index.html We should make compass use this. Currently compass is using 2.0.4. Setting priority to trivial since we don't want to break anything with this change.https://gitlab.torproject.org/legacy/trac/-/issues/6662Support grouping by family2020-06-13T18:05:50ZcypherpunksSupport grouping by familyCurrently https://compass.torproject.org/
offers to group by country or AS number.
It would be nice to have an additional check box for family.Currently https://compass.torproject.org/
offers to group by country or AS number.
It would be nice to have an additional check box for family.https://gitlab.torproject.org/legacy/trac/-/issues/6675Support grouping by contact2020-06-13T18:05:51ZcypherpunksSupport grouping by contactI'd suggest to add a feature to group relays by contact.
Only non-empty contacts should be grouped together.
It will likely result in similar groups as defined in #6662.I'd suggest to add a feature to group relays by contact.
Only non-empty contacts should be grouped together.
It will likely result in similar groups as defined in #6662.https://gitlab.torproject.org/legacy/trac/-/issues/6677Add a 'total' line at the bottom of the table2020-06-13T17:53:04ZcypherpunksAdd a 'total' line at the bottom of the tablePlease add a bottom line at the table that sums up all entries in the table. This is useful in different occasions (i.e. viewing all relays of a country).Please add a bottom line at the table that sums up all entries in the table. This is useful in different occasions (i.e. viewing all relays of a country).https://gitlab.torproject.org/legacy/trac/-/issues/6682Merge Atlas and Compass into a single tool2020-06-13T17:53:06ZcypherpunksMerge Atlas and Compass into a single toolIn the table on atlas [1] I see:
- nickname
- bandwidth
- uptime
- County (flag)
- IP address
- Flags
- ORPort
- DirPort
In the table on compass [2] I see:
- #
- CW
- adv. bandw. (%)
- guard probability
- middle probability
- exit prob...In the table on atlas [1] I see:
- nickname
- bandwidth
- uptime
- County (flag)
- IP address
- Flags
- ORPort
- DirPort
In the table on compass [2] I see:
- #
- CW
- adv. bandw. (%)
- guard probability
- middle probability
- exit probability
- nickname
- fingerprint
- exit [y/n]
- guard [y/n]
- cc
- AS number
- AS name
It would be nice to have a single place where all information can be found in a single table, maybe even with configurable columns [3] and advanced search [4]. I was always looking for a table that contained
[1] https://atlas.torproject.org/#search/foo
[2] https://compass.torproject.org/result?top=10
[3] http://torstatus.blutmagie.de/column_set.php
[4] http://torstatus.blutmagie.de/index.php#CustomQueryhttps://gitlab.torproject.org/legacy/trac/-/issues/6703Replace Exit and Guard flag checkboxes with radio buttons (off, yes, no)2020-06-13T17:53:10ZcypherpunksReplace Exit and Guard flag checkboxes with radio buttons (off, yes, no)Currently you can filter for hosts that have
- exit flag (regardless of guard flag)
- guard flag (regardless of exit flag)
- exit & guard flag
Replacing the two checkboxes with two radio buttons like in [1]
it would be possible to have...Currently you can filter for hosts that have
- exit flag (regardless of guard flag)
- guard flag (regardless of exit flag)
- exit & guard flag
Replacing the two checkboxes with two radio buttons like in [1]
it would be possible to have a more fine grained filter option.
Exit Flag
() off () yes () no
Guard Flag
() off () yes () no
These radio buttons would allow more fine grained search filters to get a list of the following relay groups:
- exit only nodes
- nodes that do not have the exit flag
- guard only nodes
- nodes that do not have the guard flag
- middle only nodes
[1] http://torstatus.blutmagie.de/index.php#CustomQuery
https://metrics.torproject.org/network.html#bwhist-flagshttps://gitlab.torproject.org/legacy/trac/-/issues/6713Update the README2020-06-13T17:53:10ZcypherpunksUpdate the READMEhttps://gitlab.torproject.org/legacy/trac/-/issues/6728Explain why a relay is an 'almost-fast-exit' and not a 'fast-exit'2020-06-13T17:53:11ZSathyanarayanan GunasekaranExplain why a relay is an 'almost-fast-exit' and not a 'fast-exit'We should provide more information on why a relay is in the 'almost-fast-exit' and not a 'fast-exit'. This will help relay operators check their relay status and hopefully correct it and make it a 'fast-exit'.
Maybe we should provide t...We should provide more information on why a relay is in the 'almost-fast-exit' and not a 'fast-exit'. This will help relay operators check their relay status and hopefully correct it and make it a 'fast-exit'.
Maybe we should provide this info when someone clicks on a relay when filtering by 'almost-fast-exit'?https://gitlab.torproject.org/legacy/trac/-/issues/6730Display newly added or recently disappeared relays or bridges2020-06-13T17:59:34ZMoritz BartlDisplay newly added or recently disappeared relays or bridgesIt would be nice to have a way to track new nodes appearing in the datasets. At the moment, I compare "by eyes" if there are any new relays popping up.
For example, a color bar for nodes that Compass saw for the first time for a few day...It would be nice to have a way to track new nodes appearing in the datasets. At the moment, I compare "by eyes" if there are any new relays popping up.
For example, a color bar for nodes that Compass saw for the first time for a few days or a specified timeframe would be great.https://gitlab.torproject.org/legacy/trac/-/issues/6818Display a message if JavaScript is disabled2020-06-13T17:53:13ZAndrew LewmanDisplay a message if JavaScript is disabledSomeone emailed me directly to tell me that compass.torproject.org doesn't work as claimed. Here's their workflow:
1. Go to https://compass.torproject.org/
2. In the 'Country Code' field, enter 'ro'
3. Hit submit.
Compass seems to igno...Someone emailed me directly to tell me that compass.torproject.org doesn't work as claimed. Here's their workflow:
1. Go to https://compass.torproject.org/
2. In the 'Country Code' field, enter 'ro'
3. Hit submit.
Compass seems to ignore the country code setting and instead returns the default result set.
Upon further testing, compass appears to ignore all input.https://gitlab.torproject.org/legacy/trac/-/issues/6855Support aggregation by tor version (exact and match on first three numbers)2020-06-13T18:05:55ZTracSupport aggregation by tor version (exact and match on first three numbers)The metrics page shows [1] the number of relays running different versions of tor (0.2.1, 0.2.2, ...) over time. It would be nice to know what portion of the overall traffic is handled by which tor version. I could see this feature in t...The metrics page shows [1] the number of relays running different versions of tor (0.2.1, 0.2.2, ...) over time. It would be nice to know what portion of the overall traffic is handled by which tor version. I could see this feature in two different granularities:
- group relay versions based on the first three numbers (like [1] does)
- group relay versions based on exact matches (example: 0.2.2.38 != 0.2.2.39)
[1] https://metrics.torproject.org/network.html#versions
Related feature requests:
#6662
#6675
**Trac**:
**Username**: cypherpunkxhttps://gitlab.torproject.org/legacy/trac/-/issues/6856Add graph on bandwidth by major Tor version and bandwidth by recommended flag2020-06-13T18:12:59ZTracAdd graph on bandwidth by major Tor version and bandwidth by recommended flagIt would be nice to have a graph similar to [1] but replacing the number of relays with overall traffic share.
example (numbers completely made up):
0.2.4 is handling 15% of the traffic
0.2.3 is handling 40% of the traffic
...
[1] ...It would be nice to have a graph similar to [1] but replacing the number of relays with overall traffic share.
example (numbers completely made up):
0.2.4 is handling 15% of the traffic
0.2.3 is handling 40% of the traffic
...
[1] https://metrics.torproject.org/network.html#versions
**Trac**:
**Username**: cypherpunkxhttps://gitlab.torproject.org/legacy/trac/-/issues/7330Make metrics.torproject.org link back to www.torproject.org2020-06-13T18:13:02ZbastikMake metrics.torproject.org link back to www.torproject.orgOnce you are on research.tpo.org and/or metrics you cant click somewhere to navigate to tpo.org
I'd like to see an footer to get to tpo.org (or something else)
(to be fair metrics links to:
https://www.torproject.org/docs/trademark-faq...Once you are on research.tpo.org and/or metrics you cant click somewhere to navigate to tpo.org
I'd like to see an footer to get to tpo.org (or something else)
(to be fair metrics links to:
https://www.torproject.org/docs/trademark-faq.html.en, which could be used to get to tpo.org/index)https://gitlab.torproject.org/legacy/trac/-/issues/7568Investigate use of datatables in compass2020-06-13T17:53:14ZSathyanarayanan GunasekaranInvestigate use of datatables in compasshttp://datatables.net/ seems to provide certain nifty features, but I'm not sure how sorting et all would work with compass since we'd have to make new requests each time we sort.
Placeholder ticket to collect thoughts on this.http://datatables.net/ seems to provide certain nifty features, but I'm not sure how sorting et all would work with compass since we'd have to make new requests each time we sort.
Placeholder ticket to collect thoughts on this.https://gitlab.torproject.org/legacy/trac/-/issues/7640Remove Compass' navigation bar2020-06-13T17:53:15ZKarsten LoesingRemove Compass' navigation barSince Compass is a single-page application, it doesn't really need a navigation bar anymore. Can we remove it?Since Compass is a single-page application, it doesn't really need a navigation bar anymore. Can we remove it?https://gitlab.torproject.org/legacy/trac/-/issues/7744Make output smaller by removing comments and unnecessary data2020-06-13T17:53:15ZgrarpampMake output smaller by removing comments and unnecessary dataCan someone remove this comment from the compass output.
It will save 120kB per full query, which via Tor is no small amount.
<!-- it's not a fingerprint -->Can someone remove this comment from the compass output.
It will save 120kB per full query, which via Tor is no small amount.
<!-- it's not a fingerprint -->https://gitlab.torproject.org/legacy/trac/-/issues/7834Add "total bandwidth" column2020-06-13T17:53:16ZSathyanarayanan GunasekaranAdd "total bandwidth" columnThis could be done by summing up the bandwidth entries from the bandwidth files from onionoo or we could change the onionoo spec to include some amount of bandwidth data in the details files.This could be done by summing up the bandwidth entries from the bandwidth files from onionoo or we could change the onionoo spec to include some amount of bandwidth data in the details files.https://gitlab.torproject.org/legacy/trac/-/issues/7879Improve compass performance2020-06-13T17:53:17ZSathyanarayanan GunasekaranImprove compass performanceCompass seems to be slower than usual and seems to block the UI, freezing the browser for significant amounts of time. We need to figure out why this is happening and fix it.Compass seems to be slower than usual and seems to block the UI, freezing the browser for significant amounts of time. We need to figure out why this is happening and fix it.https://gitlab.torproject.org/legacy/trac/-/issues/8054Remove command-line interface2020-06-13T17:53:18ZSathyanarayanan GunasekaranRemove command-line interfaceNow that we have a web frontend, I don't think anyone uses the cli at all. The cli printing code is a major pain if we want to extend compass since the printing code is a mess.
I suggest we entirely drop the cli code, if it isn't going...Now that we have a web frontend, I don't think anyone uses the cli at all. The cli printing code is a major pain if we want to extend compass since the printing code is a mess.
I suggest we entirely drop the cli code, if it isn't going to bother anyone.https://gitlab.torproject.org/legacy/trac/-/issues/8105Provide an overview of Analysis ticket results for researchers2020-06-13T17:47:44ZKarsten LoesingProvide an overview of Analysis ticket results for researcherspeer came up with a list of analysis ticket results and asks in #7241:
> How are the blurbs at [doc/AnalysisTicketResults](doc/AnalysisTicketResults) (for metrics-tasks)? Were any tickets misinterpreted?
Looks great! I didn't read all ...peer came up with a list of analysis ticket results and asks in #7241:
> How are the blurbs at [doc/AnalysisTicketResults](doc/AnalysisTicketResults) (for metrics-tasks)? Were any tickets misinterpreted?
Looks great! I didn't read all descriptions in detail, but I think it's a good start for people, and if something is wrong, they can always fix it.
A few ideas:
- Should tickets be listed in descending order, from newest to oldest?
- Should we add a header saying that code for most of the tickets is available in the metrics-tasks.git repository?
- Should we link to this wiki page from http://research.torproject.org/index.html, in particular in the "We're building a repository of tools" part, and throw out the link to the tools.html page?https://gitlab.torproject.org/legacy/trac/-/issues/8127Bring back the relays-by-country graph2020-06-13T18:13:08ZKarsten LoesingBring back the relays-by-country graphAggregating data for the [relays-by-country graph](https://metrics.torproject.org/network.html#relaycountries) has become prohibitively expensive. It keeps the server busy for 2 hours every day, affecting more important tasks like downl...Aggregating data for the [relays-by-country graph](https://metrics.torproject.org/network.html#relaycountries) has become prohibitively expensive. It keeps the server busy for 2 hours every day, affecting more important tasks like downloading descriptors. That's why I disabled this aggregation step on February 1 to the effect that relays-by-country graphs are still available but won't receive new data. The problem is the PostgreSQL-based IP-to-country lookup. I should look into making this lookup much, much faster. Creating this ticket so I don't forget.https://gitlab.torproject.org/legacy/trac/-/issues/8305Use clearer colors in the "Consumed bandwidth by Exit/Guard flag combination"...2020-06-13T18:13:09ZRoger DingledineUse clearer colors in the "Consumed bandwidth by Exit/Guard flag combination" graphhttps://metrics.torproject.org/bwhist-flags.html
You see the three blue-or-sort-of-blue lines? Which one is which in the graph?
There are other colors in the world besides blue-or-sort-of-blue, and we should consider using some of them...https://metrics.torproject.org/bwhist-flags.html
You see the three blue-or-sort-of-blue lines? Which one is which in the graph?
There are other colors in the world besides blue-or-sort-of-blue, and we should consider using some of them. :)https://gitlab.torproject.org/legacy/trac/-/issues/8461Make some minor UI tweaks2020-06-13T17:53:20ZgrarpampMake some minor UI tweaks> As I said, I don't want to choose defaults for CC, FP or AS.
There was no suggestion to choose any defaults.
> Placeholders are the "web development" equivalent of examples.
I suggested removing the example text from the boxes leavi...> As I said, I don't want to choose defaults for CC, FP or AS.
There was no suggestion to choose any defaults.
> Placeholders are the "web development" equivalent of examples.
I suggested removing the example text from the boxes leaving
nothing in the boxes at all. Put the examples in the text of the page.
And make at least the fingerprint box wide enough to display
a full fingerprint.
>> Also list the ISO CC codes in the drop down as 'CC, desc'.
> This feels like feature creep. But I've had to look up CC values
> multiple times myself so this wouldn't be bad thing to add.
Tor rightly does not accept full country names in its config, so CC's
are useful there.https://gitlab.torproject.org/legacy/trac/-/issues/8498Don't load all the data at once2020-06-13T17:53:21ZSathyanarayanan GunasekaranDon't load all the data at onceHellais had the idea of loading more data into compass as you scroll down, instead of loading all the data at once. Loading the data as a whole hangs up the browser.
The alternative would be to implement pagination.Hellais had the idea of loading more data into compass as you scroll down, instead of loading all the data at once. Loading the data as a whole hangs up the browser.
The alternative would be to implement pagination.https://gitlab.torproject.org/legacy/trac/-/issues/8667Distinguish between permanent and temporary Onionoo errors2020-06-13T18:06:00ZKarsten LoesingDistinguish between permanent and temporary Onionoo errorsAs of now, Onionoo has a "maintenance mode" that I'm planning to use very rarely. But sometimes it's necessary to shut down Onionoo to support new features or fix bugs. During this time, Onionoo responds to all requests with a 503 Serv...As of now, Onionoo has a "maintenance mode" that I'm planning to use very rarely. But sometimes it's necessary to shut down Onionoo to support new features or fix bugs. During this time, Onionoo responds to all requests with a 503 Service Unavailable status code. Atlas should respect this code and display a different warning than:
"Backend error! The backend server replied with an error to your query. This probably means that you did not properly format your query. If your query was properly formatted it may mean that there is an issue with your browser/add-ons. Please report which browser/addons/etc. you're using to the bug tracker."
How about this warning?
"Backend temporarily unavailable! The backend server is temporarily unavailable. If this issue persists for more than a few hours, please report it using the <a href="https://trac.torproject.org/projects/tor/newticket?component=Atlas">bug tracker</a>."https://gitlab.torproject.org/legacy/trac/-/issues/9350Sorting by number of relays in Compass is alphabetically, not alphanumerically2020-06-13T17:53:23ZKarsten LoesingSorting by number of relays in Compass is alphabetically, not alphanumericallyWay to reproduce this:
- go to https://compass.torproject.org/
- group relays by country
- hit submit
- click on fingerprint table header to sort in ascending order
- click on fingerprint table header again to sort in descending or...Way to reproduce this:
- go to https://compass.torproject.org/
- group relays by country
- hit submit
- click on fingerprint table header to sort in ascending order
- click on fingerprint table header again to sort in descending order
- note how, e.g., (96 relays) comes before (887 relays)https://gitlab.torproject.org/legacy/trac/-/issues/9778Add votes document type2020-06-13T17:59:39ZKarsten LoesingAdd votes document typeA few weeks ago, I changed the web output of the consensus-health checker to not display relay flags anymore. Here's an archived example (with broken CSS, sorry for that):
https://people.torproject.org/~karsten/volatile/consensus-healt...A few weeks ago, I changed the web output of the consensus-health checker to not display relay flags anymore. Here's an archived example (with broken CSS, sorry for that):
https://people.torproject.org/~karsten/volatile/consensus-health-2013-08-12-07-00-00.html
The main reason was that the page loads forever, which makes the remaining information on the page less useful. Though this will soon be irrelevant, because I'm planning to shut down the consensus health website entirely in favor of Damian's Python consensus-health checker.
We should try to rescue all information from that page that we want to keep. Roger says on IRC that he's interested in the flags that the directory authorities assign to a relay in a vote.
I wonder if we should add a new document type to Onionoo that contains useful information from votes, like flags and measured bandwidths. (I briefly thought about adding more stuff to details documents, but they're already pretty overloaded right now.) Here's how a votes document could look like for gabelmoo:
```
{
"relays_published": "2013-09-19 09:00:00",
"relays": [
{
"nickname": "gabelmoo",
"fingerprint": "F2044413DAC2E02E3D6BCF4735A19BCA1DE97281",
"flags": {
"Faravahar": [
"Authority",
"HSDir",
"Running",
"Stable",
"V2Dir",
"Valid"
],
"dannenberg": [
"Authority",
"HSDir",
"Running",
"Stable",
"V2Dir",
"Valid"
],
"dizum": [
"Authority",
"HSDir",
"Running",
"Stable",
"V2Dir",
"Valid"
],
"gabelmoo": [
"Authority",
"Named",
"Running",
"V2Dir",
"Valid"
],
"maatuska": [
"Authority",
"HSDir",
"Running",
"Stable",
"V2Dir",
"Valid"
],
"moria1": [
"Authority",
"HSDir",
"Running",
"Stable",
"V2Dir",
"Valid"
],
"tor26": [
"Authority",
"HSDir",
"Named",
"Running",
"Stable",
"V2Dir",
"Valid"
],
"urras": [
"Authority",
"HSDir",
"Running",
"Stable",
"V2Dir",
"Valid"
]
},
"measured": {
"moria1": 9
}
}
],
"bridges_published": "2013-09-19 08:37:03",
"bridges": []
}
```
What else might be worth adding to these documents? Here's what moria1 thinks about gabelmoo:
```
r gabelmoo 8gREE9rC4C49a89HNaGbyh3pcoE goM6QETDkFBEjjhRURvUUn2QDkU 2013-09-18 22:01:15 212.112.245.170 443 80
s Authority HSDir Running Stable V2Dir Valid
v Tor 0.2.5.0-alpha-dev
w Bandwidth=20 Measured=9
p reject 1-65535
m 8,9,10,11,12,13,14,15 sha256=lLaUe8rgCb1VRPKWKSC7dP+DLae6r62FyIOh/6+Mm8c
m 16,17 sha256=he0Eot/jfJv/uxqCnUFjkTLLbLbiwRtWIae9s0gyUYU
```
phw, rndm: If I add these documents to Onionoo, would you want to include them in relay detail pages of Atlas and Globe? Would you expect the data format to be different? Would you want less/more information from votes?
wfn: I think it makes sense to add these documents to good-old-Java-Onionoo, so that there's just one more document type that your shiny-new-Python-Onionoo can treat like summary, details, bandwidth, and weights documents. Does that work for you?
arma: Is there anything missing in this proposal that you'd want to have included?https://gitlab.torproject.org/legacy/trac/-/issues/9814Atlas should make clear when relay details come from outdated consensus2020-06-13T18:06:06ZwfnAtlas should make clear when relay details come from outdated consensusThis is relevant to both Atlas and Globe, as far as I gather. Wasn't entirely sure whom to CC, sorry if too many recipients.
Two very much related things:
1. Onionoo-using tools should make clear to the user when relay flags come from...This is relevant to both Atlas and Globe, as far as I gather. Wasn't entirely sure whom to CC, sorry if too many recipients.
Two very much related things:
1. Onionoo-using tools should make clear to the user when relay flags come from an outdated consensus.
2. Onionoo-using tools should be careful using the keyword _uptime_, especially when relay is not running (not present in the last consensus).
Example: [relay 0FB356FB... on Atlas](https://atlas.torproject.org/#details/0FB356FBE623B3F6C8A2CC3CA42E0E7681F72FE7) (if the relay gets included in the newest consensus, see attached image (atlas_details_relay_running.png)). Here the problem is that Atlas ([and Globe as well](http://globe.rndm.de/#/relay/0FB356FBE623B3F6C8A2CC3CA42E0E7681F72FE7)) may show the _Running_ flag to be present, but show the overall relay as not running (_running: false_).
The latter is because the relay wasn't featured in the last consensus (as reported by Onionoo); the former is because Onionoo is returning the last known flags (from the last consensus where the relay was featured in) for this relay, among which the _Running_ flag is present.
As Karsten said,
> The flags thing is a presentation problem, not a data problem.
> Onionoo should include the latest flags, but Atlas (and Globe?) should
> present them in a non-confusing way. Atlas (and Globe) can learn from
> last_seen when these flags were contained in a consensus.
Flags could either get some simple indication of being old / 'not fresh' (simply when _running: false_), and/or there could be a small field indicating where these flags are coming from ("reported by last available consensus" in green, vs. "reported by authorities at <Onionoo:last_seen> (outdated by <in_hours(Onionoo:relays_published - Onionoo:last_seen)> hours)", or somesuch.)
Likewise with the _uptime_ keyword: I suppose that either the name for that field (which is derived from Onionoo's _last_restarted_) should be changed (I guess _last restarted_ doesn't sound intuitive? It would actually make sense to me at least / would be more honest), or it should be removed when _running: false_.
Re: fixing this:
* re: 2., if it's only about removing the uptime field/span when relay not running, it'd just be about inserting a simple conditional at https://gitweb.torproject.org/atlas.git/blob/HEAD:/templates/details/main.html#l101 I suppose?
* re: 1., depends on decision, but should be similarly simple.
* haven't looked at Globe's codehttps://gitlab.torproject.org/legacy/trac/-/issues/9993Explain better what "Fast exits relays any network" means in Compass2020-06-13T17:53:24ZKarsten LoesingExplain better what "Fast exits relays any network" means in CompassReported by cypherpunks:
In compass, the third option is listed as "Fast exits relays any network." It's not clear to me whether this means "Fast exits (plural), PLUS relays any network" or Fast exit relay. Having seen the code in git a...Reported by cypherpunks:
In compass, the third option is listed as "Fast exits relays any network." It's not clear to me whether this means "Fast exits (plural), PLUS relays any network" or Fast exit relay. Having seen the code in git and the discussion above, I'm even less clear now on what that third option does. Maybe it's just the extra 's' appended to 'exits,' but could that third descriptor be clarified?https://gitlab.torproject.org/legacy/trac/-/issues/10001Unable to bookmark / reload a resultpage2020-06-13T17:53:24ZTracUnable to bookmark / reload a resultpageProbably caused by putting all parameters in the anchor; when bookmarking or reloading a result-page; you end up on the default form with no results.
It would be nice if the parameters would be put in the form and results were shown.
A...Probably caused by putting all parameters in the anchor; when bookmarking or reloading a result-page; you end up on the default form with no results.
It would be nice if the parameters would be put in the form and results were shown.
A (obvious) non-working example is https://compass.torproject.org/#?exit_filter=all_relays&links&sort=cw&sort_reverse&country=AF
**Trac**:
**Username**: Spider.007https://gitlab.torproject.org/legacy/trac/-/issues/10222Implement network wide and router specific views of malicious BGP route events2020-06-13T17:47:46ZTracImplement network wide and router specific views of malicious BGP route eventsDisplay the quantity, duration, and detail(TBD) of malicious BGP route attacks affecting any addresses used by active routers in the Tor network for the time period in question.
"Malicious route attack" is explicitly intended here as di...Display the quantity, duration, and detail(TBD) of malicious BGP route attacks affecting any addresses used by active routers in the Tor network for the time period in question.
"Malicious route attack" is explicitly intended here as distinct from anomalous route changes or advertisement behavior, nor does it encompass benign incompetence affecting widespread route behavior of an indiscriminate nature.
See also:
​http://www.renesys.com/2013/11/mitm-internet-hijacking/
​http://www.renesys.com/2010/11/chinas-18-minute-mystery/
**Trac**:
**Username**: anonhttps://gitlab.torproject.org/legacy/trac/-/issues/10223Add BGP route attack email notification service to metrics utilities / tools ...2020-06-13T17:56:04ZTracAdd BGP route attack email notification service to metrics utilities / tools as descriptor endpoints are announcedAdd a new opt-in/out? email notification service for relay operators for endpoints in descriptors as they are announced; before adding to consensus, as authorities may one day opt not to publish descriptors with endpoints under active at...Add a new opt-in/out? email notification service for relay operators for endpoints in descriptors as they are announced; before adding to consensus, as authorities may one day opt not to publish descriptors with endpoints under active attack.
As new descriptors are processed, sned an email alert to designated recipients if the listed endpoint is under an "active attack", as distinct from anomalous route changes or advertisement behavior, nor does it encompass benign incompetence affecting widespread route behavior of an indiscriminate nature.
Note that third parties may opt to receive email alerts of BGP attacks for arbitrary relay identities; this should be supported, and perhaps linked to via Atlas?
Rate limiting and or / summaries should be implemented to avoid sending excessive numbers of email notifications to the same recipient.
**Trac**:
**Username**: anonhttps://gitlab.torproject.org/legacy/trac/-/issues/10306Show relays by nickname substring2020-06-13T17:53:25ZRoger DingledineShow relays by nickname substringhttps://atlas.torproject.org/#search/silkroad shows ten-ish relays that have that substring in their nickname.
https://compass.torproject.org/#?exit_filter=all_relays&links&sort=cw&sort_reverse&country=&family=silkroad shows no results....https://atlas.torproject.org/#search/silkroad shows ten-ish relays that have that substring in their nickname.
https://compass.torproject.org/#?exit_filter=all_relays&links&sort=cw&sort_reverse&country=&family=silkroad shows no results.
I wonder if there's an easy way to get that answer?https://gitlab.torproject.org/legacy/trac/-/issues/10680Provide more statistics on current public bridges2020-06-13T18:09:02ZMatthew FinkelProvide more statistics on current public bridgesIt will be very useful to understand the attributes of the current public bridges, most importantly the platforms on which they run, available pluggable transports, and which tor versions are being used. We should use the sanitized descr...It will be very useful to understand the attributes of the current public bridges, most importantly the platforms on which they run, available pluggable transports, and which tor versions are being used. We should use the sanitized descriptors to obtain this information. Do we want anything else?https://gitlab.torproject.org/legacy/trac/-/issues/10859Make URL reusable even without explicitly specifying the number of results2020-06-13T17:53:26ZRoger DingledineMake URL reusable even without explicitly specifying the number of resultsGo to https://compass.torproject.org/ and then click submit. You get a list of the top ten relays, and your URL changes to "https://compass.torproject.org/#?exit_filter=all_relays&links&sort=cw&sort_reverse&country=". Great.
Then open a...Go to https://compass.torproject.org/ and then click submit. You get a list of the top ten relays, and your URL changes to "https://compass.torproject.org/#?exit_filter=all_relays&links&sort=cw&sort_reverse&country=". Great.
Then open a new tab and paste the above long URL into it. No relays listed!
If you type in '10' rather than leaving it to be the default, then you get a URL of "https://compass.torproject.org/#?exit_filter=all_relays&links&sort=cw&sort_reverse&country=&top=10", which does work when you paste it into a new tab.https://gitlab.torproject.org/legacy/trac/-/issues/11430Add new field last_running for "seen in a network status with the Running fla...2020-06-13T17:59:47ZKarsten LoesingAdd new field last_running for "seen in a network status with the Running flag" in addition to last_seen for "seen in a network status"There are two fields in relay details documents:
> `last_seen`: UTC timestamp (YYYY-MM-DD hh:mm:ss) when this relay was last seen in a network status consensus.
> `first_seen`: UTC timestamp (YYYY-MM-DD hh:mm:ss) when this relay was firs...There are two fields in relay details documents:
> `last_seen`: UTC timestamp (YYYY-MM-DD hh:mm:ss) when this relay was last seen in a network status consensus.
> `first_seen`: UTC timestamp (YYYY-MM-DD hh:mm:ss) when this relay was first seen in a network status consensus.
And there are similar two fields in bridge details documents:
> `last_seen`: UTC timestamp (YYYY-MM-DD hh:mm:ss) when this bridge was last seen in a bridge network status.
> `first_seen`: UTC timestamp (YYYY-MM-DD hh:mm:ss) when this bridge was first seen in a bridge network status.
Turns out that these definitions are confusing. We're not really interested in whether a relay or bridge was _seen_ in a network status, but whether the directory authorities or the bridge authority thought it was _running_. So, whether the relay or bridge had the `Running` flag.
For relays this doesn't matter, because the consensus only contains running relays. (It _will_ matter though once we add votes to Onionoo, because those include relays that don't have the `Running` flag.)
But for bridges it matters. Tonga includes all bridges in its network status that send it descriptors, including those without the `Running` flag.
Once we change the meaning of `last_seen`, we'll also stop giving out bridges that haven't been running for a week but that kept sending descriptors to Tonga. Should be fine.
This is going to fix #11410.https://gitlab.torproject.org/legacy/trac/-/issues/11434Make various UI improvements2020-06-13T17:53:26ZgrarpampMake various UI improvementsWould be nice to have a checkbox:
'Select only non-exit relays'
I'd also rework the speed/network radio to be independant
of whichever non-exit/exit relay type is chosen. Since
non-exits don't have ports, ports should be unrolled from
s...Would be nice to have a checkbox:
'Select only non-exit relays'
I'd also rework the speed/network radio to be independant
of whichever non-exit/exit relay type is chosen. Since
non-exits don't have ports, ports should be unrolled from
speed/network to fit that context as well. ie:
heading: params (was: exits)
radio: all relays (default)
radio: select relays by param
subheading: speeds
radio: rateA (default)
radio: rateB
subheading: port sets
radio: full (default)
radio: reduced
radio: sponsored
subheading: /netsize restrictions
radio: ...
'K' is not a valid SI/ISO-IEC prefix, please correct this
to use 'k' (x10) or 'ki' (power2).
https://en.wikipedia.org/wiki/Binary_prefix
'95+ <rate1>, 5000+ <rate2>' - Add description of what these
two paired numbers mean (currently seen in three places).
Add blurb that terms '[non-]exit, running/inactive, guard'
refer to specific thing of 'flags'.https://gitlab.torproject.org/legacy/trac/-/issues/11573Store pre-generated response parts in a database rather than in plain files2020-06-13T17:59:49ZKarsten LoesingStore pre-generated response parts in a database rather than in plain filesWith #11350, we're almost crossing the line where keeping indexes in memory and contents on disk doesn't scale anymore. In the long term we should consider using a database to handle requests.With #11350, we're almost crossing the line where keeping indexes in memory and contents on disk doesn't scale anymore. In the long term we should consider using a database to handle requests.https://gitlab.torproject.org/legacy/trac/-/issues/12131Measure connectivity patterns between relays2020-06-13T17:47:49ZRoger DingledineMeasure connectivity patterns between relayshttps://lists.torproject.org/pipermail/tor-relays/2014-May/004598.html makes me wonder how many relays are firewalling certain outbound ports (and thus messing with connectivity inside the Tor network). It would be great if somebody woul...https://lists.torproject.org/pipermail/tor-relays/2014-May/004598.html makes me wonder how many relays are firewalling certain outbound ports (and thus messing with connectivity inside the Tor network). It would be great if somebody would start scanning pairs of relays to see which of them can reach each other and which can't, with the goal of understanding how far from a clique our network topology actually is, and then helping with an awareness campaign to correct it if it's a problem.
Tools that might be helpful building blocks here:
- Meejah's exitscanner builds circuits, and makes sure it isn't building too many at once. Uses txtorcon and thus twisted. https://github.com/meejah/txtorcon/blob/exit_scanner/apps/exit_scanner/guard-exit-coverage.py
- phw's exitmap does something similar, but with stem rather than txtorcon. https://gitweb.torproject.org/user/phw/exitmap.git/tree
Other thoughts:
- You likely want to turn on FastFirstHopPK on the client, so it doesn't waste cpu power on handshakes at the first relay.
- If you make each relay connect to 6000 other relays in succession, and some of the relays can't handle 6000 open file descriptors at once, then you might mistakenly misinterpret "could not extend to that relay" as a property of the link between the relays when actually it's a property of the first relay. One option is to scan 500 and then move on to another first hop. Another option is to declare this a feature, and try to detect which relays can and which can't handle 6000 open file descriptors at once.
- n^2^ where n is 5000 is actually a heck of a lot of circuits. Should you just build circuits forever in the background, or are there some smarter algorithms for finding interesting patterns without making all 25 million circuits? In particular, there will be a background failure rate anyway, from e.g. relays that happen to be overloaded at that moment. So even 25 million circuits won't be enough.https://gitlab.torproject.org/legacy/trac/-/issues/12522Add sitemap.xml to make Relay Search pages indexed by Google et al.2020-06-13T18:13:20ZKarsten LoesingAdd sitemap.xml to make Relay Search pages indexed by Google et al.Right now, Google doesn't index Relay Search details pages.
One way to provide this file is to fetch the full `https://onionoo.torproject.org/summary` and generate the details page URLs from that. Caching the Onionoo summary file and u...Right now, Google doesn't index Relay Search details pages.
One way to provide this file is to fetch the full `https://onionoo.torproject.org/summary` and generate the details page URLs from that. Caching the Onionoo summary file and using the `If-Modified-Modified` header would be very much appreciated here.
The specification for sitemap.xml can be found at:
https://www.sitemaps.org/index.htmlhttps://gitlab.torproject.org/legacy/trac/-/issues/13137Provide more historical data to facilitate debugging network problems2020-06-13T18:00:10ZSebastian HahnProvide more historical data to facilitate debugging network problemsIt'd be great if it was possible to get an overview of historical data (for example, the past month). When was a node in the consensus, with what flags, bw, etc.It'd be great if it was possible to get an overview of historical data (for example, the past month). When was a node in the consensus, with what flags, bw, etc.https://gitlab.torproject.org/legacy/trac/-/issues/13424Add new `descriptor` parameter that returns relays or bridges by digest of re...2020-06-13T18:00:15ZKarsten LoesingAdd new `descriptor` parameter that returns relays or bridges by digest of recently published descriptorsThis suggestion is based on discussions with Sebastian related to #13135.
We should add a new parameter, let's call it `descriptor`, that returns relays or bridges by the digest of recently published descriptors. This includes all kind...This suggestion is based on discussions with Sebastian related to #13135.
We should add a new parameter, let's call it `descriptor`, that returns relays or bridges by the digest of recently published descriptors. This includes all kinds of descriptors published by relays/bridges, so server descriptors and extra-info descriptors, and descriptors derived from those by the directory authorities, namely microdescriptors. We probably don't have to support partial digests, but we might want to support both hex-encoded and base64-encoded digests.
Example request (not working yet):
https://onionoo.torproject.org/details?descriptor=S+wzq/7JtC5fWkQmH3pxVprpdeWpGRLvrmKr6ZO/+3Yhttps://gitlab.torproject.org/legacy/trac/-/issues/13425Add new document type `debug` that includes digests of recently published des...2020-06-13T18:00:16ZKarsten LoesingAdd new document type `debug` that includes digests of recently published descriptors and statuses they're referenced fromThis suggestion is based on discussions with Sebastian related to #13135.
We should add a new document type, let's call it `debug`, that includes digests of recently published descriptors together with the valid-after/published times of...This suggestion is based on discussions with Sebastian related to #13135.
We should add a new document type, let's call it `debug`, that includes digests of recently published descriptors together with the valid-after/published times of statuses referencing them. This includes all kinds of descriptors published by relays/bridges, so server descriptors and extra-info descriptors, and descriptors derived from those by the directory authorities, namely microdescriptors. We should only include recently published/referenced descriptors here, which could mean in the past 24 hours.
The reason for creating a new document type is that most Onionoo clients won't care, and we shouldn't make them download something they're not interested in. There's other stuff that would go into the new document type, like vote details that can be used to debug problems with the voting process.
Example request and response (not working yet):
https://onionoo.torproject.org/debug?search=F2044413DAC2E02E3D6BCF4735A19BCA1DE97281
```
{
"version": "1.1",
"next_major_version_scheduled": "2014-11-15",
"relays_published": "2014-10-15 15:00:00",
"relays": [
{
"fingerprint": "F2044413DAC2E02E3D6BCF4735A19BCA1DE97281",
"server_descriptors": {
"351CB6585CF9BA311204B4E93A509183B85EBFD7": [
"2014-10-14 16:00:00",
"2014-10-14 17:00:00",
"2014-10-14 18:00:00",
"2014-10-14 19:00:00",
"2014-10-14 20:00:00",
"2014-10-14 21:00:00",
"2014-10-14 22:00:00",
"2014-10-14 23:00:00",
"2014-10-15 00:00:00",
"2014-10-15 01:00:00",
"2014-10-15 02:00:00",
"2014-10-15 03:00:00",
"2014-10-15 04:00:00",
"2014-10-15 05:00:00",
"2014-10-15 06:00:00",
"2014-10-15 07:00:00",
"2014-10-15 08:00:00"
],
"A37A87670740D9DF0FF05EB47872D955A730FDE2": [
"2014-10-15 09:00:00",
"2014-10-15 10:00:00",
"2014-10-15 11:00:00",
"2014-10-15 12:00:00",
"2014-10-15 13:00:00",
"2014-10-15 14:00:00",
"2014-10-15 15:00:00"
]
},
"extrainfo_descriptors": {
"F96656C6ED7E3334C9253E7B4F91D6ECCE854DA1": [
"2014-10-14 16:00:00",
"2014-10-14 17:00:00",
"2014-10-14 18:00:00",
"2014-10-14 19:00:00",
"2014-10-14 20:00:00",
"2014-10-14 21:00:00",
"2014-10-14 22:00:00",
"2014-10-14 23:00:00",
"2014-10-15 00:00:00",
"2014-10-15 01:00:00",
"2014-10-15 02:00:00",
"2014-10-15 03:00:00",
"2014-10-15 04:00:00",
"2014-10-15 05:00:00",
"2014-10-15 06:00:00",
"2014-10-15 07:00:00",
"2014-10-15 08:00:00"
],
"A8DC5276EEE740F210276DEC49026270FB2B1437": [
"2014-10-15 09:00:00",
"2014-10-15 10:00:00",
"2014-10-15 11:00:00",
"2014-10-15 12:00:00",
"2014-10-15 13:00:00",
"2014-10-15 14:00:00",
"2014-10-15 15:00:00"
]
},
"micro_descriptors": {
"S+wzq/7JtC5fWkQmH3pxVprpdeWpGRLvrmKr6ZO/+3Y": [
"2014-10-14 16:00:00",
"2014-10-14 17:00:00",
"2014-10-14 18:00:00",
"2014-10-14 19:00:00",
"2014-10-14 20:00:00",
"2014-10-14 21:00:00",
"2014-10-14 22:00:00",
"2014-10-14 23:00:00",
"2014-10-15 00:00:00",
"2014-10-15 01:00:00",
"2014-10-15 02:00:00",
"2014-10-15 03:00:00",
"2014-10-15 04:00:00",
"2014-10-15 05:00:00",
"2014-10-15 06:00:00",
"2014-10-15 07:00:00",
"2014-10-15 08:00:00",
"2014-10-15 09:00:00",
"2014-10-15 10:00:00",
"2014-10-15 11:00:00",
"2014-10-15 12:00:00",
"2014-10-15 13:00:00",
"2014-10-15 14:00:00",
"2014-10-15 15:00:00"
]
}
}
],
"bridges_published": "2014-10-15 14:37:04",
"bridges": []
}
```https://gitlab.torproject.org/legacy/trac/-/issues/13562Add more detailed logging to backend and frontend components2020-06-13T18:00:16ZiwakehAdd more detailed logging to backend and frontend componentsDerive new log messages (warning/error/info) from the current statistics log statements.
This entails adding new log statements to 'ResourceServlet' and 'core.Main'
(see parent issue comment 7 for details).Derive new log messages (warning/error/info) from the current statistics log statements.
This entails adding new log statements to 'ResourceServlet' and 'core.Main'
(see parent issue comment 7 for details).https://gitlab.torproject.org/legacy/trac/-/issues/13600Improve bulk imports of descriptor archives2020-06-13T18:00:19ZKarsten LoesingImprove bulk imports of descriptor archivesWe need to improve bulk imports of descriptor archives. Whenever somebody wants to initialize Onionoo with existing data, they'll need to process years of descriptors. The current code is not at all optimized for that, but it's designe...We need to improve bulk imports of descriptor archives. Whenever somebody wants to initialize Onionoo with existing data, they'll need to process years of descriptors. The current code is not at all optimized for that, but it's designed for running once per hour and updating things as quickly as possible. Let's fix that and support bulk imports better.
Here's what we should do:
- We define a new directory `in/archive/` where operators can put descriptor archives fetched from CollecTor. Whenever there are files in that directory we import them first (before descriptors in `in/recent/`). In particular, we iterate over files twice: in the first iteration we look at the first contained descriptor to determine its type, and in the second iteration we parse files containing server descriptors and then files containing other descriptors. (This order is important for computing advertised bandwidth fractions, which only works if we parse server descriptors before consensuses.) This process will take very long, so we should log whenever we complete a tarball, and ideally we'd print out how many tarballs we already parsed and how many more we need to parse.
- We add a new command-line switch `--update-only` for only updating status files and not downloading descriptors or writing document files. Operators could then import archives, which would take days or even weeks, and then switch to downloading and processing recent descriptors. My branch task-12651-2 is a major improvement here, because it ensures that _all_ documents will be written once the bulk import is done, not just the ones for relays and bridges that were contained in recent descriptors. Future command-line options would be `--download-only` and `--write-only` for the other two phases and `--single-run` that does what's the current default but once we switch from being called by cron every hour to scheduling our own hourly runs internally.
I somewhat expect us to run into memory problems when importing months or even years of data at once. So, part of the challenge here will be to keep an eye on memory usage and fix any memory issues.https://gitlab.torproject.org/legacy/trac/-/issues/14940Make compatible with GNU LibreJS2020-06-13T18:13:24ZcypherpunksMake compatible with GNU LibreJSCurrently LibreJS blocks the Javascript used on atlas.torproject.org, that can be prevented by adding licensing information either through a Web Labels page, or various other means that are explained on https://www.gnu.org/software/libre...Currently LibreJS blocks the Javascript used on atlas.torproject.org, that can be prevented by adding licensing information either through a Web Labels page, or various other means that are explained on https://www.gnu.org/software/librejs/free-your-javascript.htmlhttps://gitlab.torproject.org/legacy/trac/-/issues/15594Add graph with new relays per day (identified by fingerprint)2020-06-13T18:13:26ZcypherpunksAdd graph with new relays per day (identified by fingerprint)It is not to easy to spot days that have an unusually high rate of newly added relays by looking at [1]. What about a new graph that shows only the number of new unique relay fingerprints first seen on a given day?
[1] https://metrics....It is not to easy to spot days that have an unusually high rate of newly added relays by looking at [1]. What about a new graph that shows only the number of new unique relay fingerprints first seen on a given day?
[1] https://metrics.torproject.org/networksize.htmlhttps://gitlab.torproject.org/legacy/trac/-/issues/15799Find out why different instances may report different timestamps in last_chan...2020-06-13T18:00:34ZcypherpunksFind out why different instances may report different timestamps in last_changed_address_or_portKarsten asked me to open a ticket for this, so I do.
diff between onionoo.tpo vs. onionoo.thecthulhu.com
```
< 259D44BDF3734077902CD71606BAD95F994A606B"2015-04-13 08:00:00
---
> 259D44BDF3734077902CD71606BAD95F994A606B"2015-04-11 12:00...Karsten asked me to open a ticket for this, so I do.
diff between onionoo.tpo vs. onionoo.thecthulhu.com
```
< 259D44BDF3734077902CD71606BAD95F994A606B"2015-04-13 08:00:00
---
> 259D44BDF3734077902CD71606BAD95F994A606B"2015-04-11 12:00:00
< 3737F4542BBA0C43345BCD91C4F1E194418B313F"2015-02-14 12:00:00
---
> 3737F4542BBA0C43345BCD91C4F1E194418B313F"2015-02-15 12:00:00
< 9F938AE96C6B63F726BB885E4F2D1319C84A25BB"2015-04-12 14:00:00
---
> 9F938AE96C6B63F726BB885E4F2D1319C84A25BB"2015-04-11 12:00:00
< 4E8CE6F5651E7342C1E7E5ED031E82078134FB0D"2015-01-28 11:00:00
---
> 4E8CE6F5651E7342C1E7E5ED031E82078134FB0D"2015-01-26 03:00:00
< 73AB1555F0DA2E6D6B2AB2A603A8CB34F2981B3D"2014-12-30 20:00:00
---
> 73AB1555F0DA2E6D6B2AB2A603A8CB34F2981B3D"2014-12-30 13:00:00
```https://gitlab.torproject.org/legacy/trac/-/issues/15844Develop database schema to support Onionoo's search parameter efficiently2020-06-13T18:00:38ZKarsten LoesingDevelop database schema to support Onionoo's search parameter efficientlyThe current way for handling incoming client requests is to load all search-relevant data about relays and bridges into memory and handle requests from there. This has two major downsides: it's difficult to extend and it requires us to ...The current way for handling incoming client requests is to load all search-relevant data about relays and bridges into memory and handle requests from there. This has two major downsides: it's difficult to extend and it requires us to limit searches to relays and bridges that have been running in the past seven days. We would like to overcome both.
After some experimenting with database schemas it seems that supporting the `search` parameter efficiently will be most difficult. Here's what it does (from https://onionoo.torproject.org/protocol.html):
_Return only (1) relays with the parameter value matching (part of a) nickname, (possibly $-prefixed) beginning of a hex-encoded fingerprint, beginning of a base64-encoded fingerprint without trailing equal signs, or beginning of an IP address, (2) bridges with (part of a) nickname or (possibly $-prefixed) beginning of a hashed hex-encoded fingerprint, and (3) relays and/or bridges matching a given qualified search term. Searches by relay IP address include all known addresses used for onion routing and for exiting to the Internet. Searches for beginnings of IP addresses are performed on textual representations of canonical IP address forms, so that searches using CIDR notation or non-canonical forms will return empty results. Searches are case-insensitive, except for base64-encoded fingerprints. If multiple search terms are given, separated by spaces, the intersection of all relays and bridges matching all search terms will be returned. Complete hex-encoded fingerprints should always be hashed using SHA-1, regardless of searching for a relay or a bridge, in order to not accidentally leak non-hashed bridge fingerprints in the URL. [...]_
Before providing my experimental code (which I'd have to clean up anyway), I'd want to keep this discussion as open as possible and only present my general ideas how this could be implemented:
- In the following I'll assume that we're going to use PostgreSQL as database. I already have some experience with it from other Tor-related projects, and our sysadmins like it more than other SQL databases. If NoSQL turns out to be superior for this use case based on some actual performance evaluations, I'm happy to consider that.
- We'll have to support three comparison modes for the `search` paramter: "starts with", "starts with ignore case", and "contains as substring".
- PostgreSQL does not support substring searches (`LIKE '%foo%'`) out of the box, at least not efficiently, but there's a package called `pg_trgm` that can "determine the similarity of text based on trigram matching". It's contained in Debian's `postgresql-contrib` package, so it should be available to us.
- Right now, search terms are supported starting at a minimum length of 1 character. I could imagine raising that to 3 characters if it has major benefits to search efficiency. Though if it doesn't, let's keep supporting searches for 1 or 2 characters.
- I briefly experimented with a normalized database schema with a `servers` table containing one row per relay or bridge, an `addresses` table with one or more addresses per server, and a `fingerprints` table with original and hashed fingerprint per server. The performance was not very promising, because searches would have to happen in all three tables. Happy to try again if somebody has hints what I could have done wrong.
- I also considered (but did not test) a schema with a single `servers` table that encodes all fields that are relevant for the `search` parameter in a single string with format `"lower-case-nickname#base64-fingerprint|lower-case-hex-fingerprint|lower-case-hashed-hex-fingerprint|lower-case-address1|lower-case-address2"`. For example, Tonga would have the combined string `"tonga#SgzNLdx5lQg9c/XWZxAMilgx8W0|4a0ccd2ddc7995083d73f5d667100c8a5831f16d|e654ae16b76cf002bd26adaf060f8a9c5d333cc9|82.94.251.203"`, and searches for `Tonga` would use the following condition: `WHERE search LIKE '%tonga%#' OR search LIKE '%#Tonga%' OR search LIKE '%|tonga%'`.
- There may be variants of these two schemas that have advantages that I didn't think of yet. Suggestions very welcome.
If we can find a good database schema for the `search` parameter, implementing the other parameters should be relatively easy.
Here's Tonga's search data for a very first sample:
```
{
"t": true,
"f": "4A0CCD2DDC7995083D73F5D667100C8A5831F16D",
"n": "Tonga",
"ad": [
"82.94.251.203"
],
"cc": "nl",
"as": "AS3265",
"fs": "2007-10-27 12:00:00",
"ls": "2015-04-18 13:00:00",
"rf": [
"Authority",
"Fast",
"HSDir",
"Running",
"Stable",
"V2Dir",
"Valid"
],
"cw": 20,
"r": true,
"c": "4096/fd3428b4 lucky green <shamrock@cypherpunks.to>"
}
```
I also uploaded more [sample search data](https://people.torproject.org/~karsten/volatile/summary.xz) in case that helps the discussion.https://gitlab.torproject.org/legacy/trac/-/issues/15846Publish (hashes of) historic Onionoo details documents2020-06-13T17:50:11ZKarsten LoesingPublish (hashes of) historic Onionoo details documentsNusenu [writes on tor-dev@](https://lists.torproject.org/pipermail/tor-dev/2015-April/008726.html): "I might want to prove to third parties that I'm indeed processing/providing authentic historic onionoo documents from onionoo.tpo. What ...Nusenu [writes on tor-dev@](https://lists.torproject.org/pipermail/tor-dev/2015-April/008726.html): "I might want to prove to third parties that I'm indeed processing/providing authentic historic onionoo documents from onionoo.tpo. What do you think of signing them?"
Let's consider signing responses. What would we gain, and how would we do it?
Pros:
- We already provide Onionoo via https. I guess including a signature in the response would enable people to archive that signature and verify it later, which is not possible with https. That's what Nusenu has in mind, I think.
Cons:
- Signing responses causes some computation overhead, and it makes responses larger.
- Where in the JSON document would we add the signature? Are there standards for this, and are there tools supporting them that can be found in Debian stable? This is a con, because it's probably non-trivial to do.
Similar to #15845, I'm leading towards no, but maybe I'm overlooking something.https://gitlab.torproject.org/legacy/trac/-/issues/15848Update details documents in a single, atomic step2020-06-13T18:00:39ZKarsten LoesingUpdate details documents in a single, atomic stepRight now, we're writing details documents in the last phase of the hourly updater, but one after the other. And when we're done, we're writing the `relays_published` timestamp indicating which consensus these details documents are base...Right now, we're writing details documents in the last phase of the hourly updater, but one after the other. And when we're done, we're writing the `relays_published` timestamp indicating which consensus these details documents are based on.
[Nusenu points out on tor-dev@](https://lists.torproject.org/pipermail/tor-dev/2015-April/008738.html) that this may create a situation when a details response is returned that contains details documents based on a later consensus than what's contained in the `relays_published` timestamp that is given in the response.
Another downside of not updating all documents at once is that the sum of network fractions is not always exactly 100%, depending on which documents have already been updated.
A possible workaround would be to include a timestamp in each details documents referencing the last known consensus at the time of writing the document. That would fix the problem pointed out by Nusenu, but not the problem with fractions. That timestamp would also be a potential source for confusion, because the details documents of non-running relays or bridges are not rewritten every hour. Also, another workaround for checking whether a details document was updated recently might be to check whether `last_seen > relays_published || (last_seen == relays_published && !running)` (untested).
A possible real fix would be to store all documents in a database and update them in a single transaction.
Not sure if there are other (simple) solutions available that don't require switching to a database just yet.https://gitlab.torproject.org/legacy/trac/-/issues/16225Unify exception/error handling in metrics-lib2020-06-13T17:56:46ZKarsten LoesingUnify exception/error handling in metrics-libThere are now four different descriptor sources in metrics-lib: `DescriptorParser`, `DescriptorReader`, `DescriptorCollector`, and `DescriptorDownloader`.
We should think about unifying how we're handling exceptions and errors and telli...There are now four different descriptor sources in metrics-lib: `DescriptorParser`, `DescriptorReader`, `DescriptorCollector`, and `DescriptorDownloader`.
We should think about unifying how we're handling exceptions and errors and telling the user about them. These thoughts should include whether or not to log exceptions and errors, though just doing that is probably not sufficient.
Let's go through the interfaces one by one:
```
public interface DescriptorParser {
public List<Descriptor> parseDescriptors(byte[] rawDescriptorBytes,
String fileName, boolean failUnrecognizedDescriptorLines)
throws DescriptorParseException;
}
```
The (slightly modified, as compared to current master) `parseDescriptors` method handles a single file and throws an exception whenever something goes wrong. That works quite well, because that method blocks the caller, so they can easily wrap the call inside a `try/catch` block.
That's somewhat different in the next interface:
```
public interface DescriptorCollector {
public void collectRemoteFiles(String collecTorBaseUrl,
String[] remoteDirectories, long minLastModified,
File localDirectory, boolean deleteExtraneousLocalFiles);
}
```
This method (which we should have called `collectDescriptors`, in retrospect) blocks the caller, too, but it's unclear whether it should abort its operation when it hits the first exception. Right now it's rather silent about problems and only prints stack traces to `System.err`. But that doesn't enable the caller to handle problems, neither. At least the method makes sure that it doesn't delete extraneous local files that might still exist remotely in case of an error.
The next interface is trickier:
```
public interface DescriptorReader {
public Iterator<Descriptor> readDescriptors(File[] directories,
File[] tarballs, SortedMap<String, Long> excludeFiles,
boolean failUnrecognizedDescriptorLines, int maxDescriptorsInQueue);
}
```
Note that this interface does not yet exist, but it's what I could imagine doing to simplify the current interface and make it more similar to the `DescriptorCollector` interface. I could also imagine overloading this method.
The idea here is that the map passed in `excludeFiles` would be updated while reading descriptors, so that the caller could use that to update a local history file. (This would be documented, of course.)
However, it's unclear how we would handle problems at all here. What we _could_ do is extend the implementation of the returned `Iterator<Descriptor>` to return a (runtime) exception whenever there was a problem reading the next descriptor that would be added next. Or maybe we should write our own `Iterator`-like interface for returning descriptors and have its `hasNextDescriptor()` and `nextDescriptor()` methods throw `DescriptorParseException`.
By the way, the current `DescriptorReader` interface has a `getExceptions()` method in the additional `DescriptorFile` interface that is returned instead of `Descriptor`. But that's something I'd like to get rid of, too. I also don't expect many users to look at those exceptions.
The last interface is the mostly dysfunctional `DescriptorDownloader` interface. I could imagine that we handle exceptions/errors there similar to `DescriptorReader`. Or we throw out that class, because it's only used by CollecTor, and generalizing its functionality might take more effort than writing it cleanly as part of CollecTor.
Thinking about the future, we might want to add another interface `DescriptorWriter`, and we should think about ways to handle exceptions/errors there, too.
There, I mostly sketched out the problem here, without having good solutions. But maybe we can come up with good answers in this discussion?metrics-lib 3.0.0https://gitlab.torproject.org/legacy/trac/-/issues/16426Add parse history for tarballs in archive directory2020-06-13T18:00:46ZKarsten LoesingAdd parse history for tarballs in archive directoryIn #13600 we improved support for importing descriptor archives in bulk, but we didn't include a parse history for tarballs in the archive directory. Let's add one, so that service operators don't need to watch out and remove or move aw...In #13600 we improved support for importing descriptor archives in bulk, but we didn't include a parse history for tarballs in the archive directory. Let's add one, so that service operators don't need to watch out and remove or move away tarballs after importing themselves.https://gitlab.torproject.org/legacy/trac/-/issues/16520Add research idea to Run some onion services to observe crawling trends2020-06-13T17:07:49ZRoger DingledineAdd research idea to Run some onion services to observe crawling trendsWe know some research groups that are doing full crawling of onion services. We also know that Ahmia et al are doing it. I keep hearing these days about big security companies selling "onion intelligence" or the like.
What are the chara...We know some research groups that are doing full crawling of onion services. We also know that Ahmia et al are doing it. I keep hearing these days about big security companies selling "onion intelligence" or the like.
What are the characteristics of these crawls? Are many of them one level deep, or k levels deep, or full crawls? Do they obey robots.txt? Do they identify themselves by their user agent? Do they visit urls that are embedded in html comments that humans would never find? Do they de-obfuscate urls and visit those? Do they get suckered by web tarpits that produce infinite pages? Are the crawling trends going up quickly or slowly?
We should consider running a couple of onion services with various characteristics, and monitor their usage and see if we learn anything.https://gitlab.torproject.org/legacy/trac/-/issues/16553Add support for searching by (partial) host name2020-06-13T18:00:50ZKarsten LoesingAdd support for searching by (partial) host nameSearching by host name is currently not supported, but it might be useful to add that. I could imagine supporting it by returning all host names ending with a given search term. For example, `sampo.ru` would return that relay and all o...Searching by host name is currently not supported, but it might be useful to add that. I could imagine supporting it by returning all host names ending with a given search term. For example, `sampo.ru` would return that relay and all others in the same domain. It would be a new qualified search term though, that is, one would have to search for it in Atlas like this: `"hostname:sampo.ru"`.
This enhancement request originates from #10128 and has low priority.Onionoo-1.7.0https://gitlab.torproject.org/legacy/trac/-/issues/16555Make user statistics more robust against outliers2020-06-13T18:09:03ZKarsten LoesingMake user statistics more robust against outliers**tl;wr:** From June 11 to 13, 2015, the [number of bridge users](https://metrics.torproject.org/userstats-bridge-country.html?graph=userstats-bridge-country&start=2015-06-01&end=2015-06-30&country=all) briefly went up from around 20k to...**tl;wr:** From June 11 to 13, 2015, the [number of bridge users](https://metrics.torproject.org/userstats-bridge-country.html?graph=userstats-bridge-country&start=2015-06-01&end=2015-06-30&country=all) briefly went up from around 20k to 140k. A closer investigation of the underlying data revealed that the aggregate statistics reported by a single bridge were responsible for this major spike. The [estimation method used for user statistics](https://research.torproject.org/techreports/counting-daily-bridge-users-2012-10-24.pdf) should be made robust against outliers, possibly by applying the more recently developed [techniques that are used to extrapolate hidden-service statistics](https://research.torproject.org/techreports/extrapolating-hidserv-stats-2015-01-31.pdf).
Here are more details about that single bridge reporting almost unbelievable high statistics: It's the bridge with nickname "solemnizersfiaun" and hashed fingerprint [420C39C86B0E71F653E18552B28B9189DA2F1377](https://globe.torproject.org/#/bridge/420C39C86B0E71F653E18552B28B9189DA2F1377) that reported to have served up to 80k users. But from the bandwidth statistics it looks like that bridge actually answered a huge number of consensus requests during those days in June. It pushed up to 20 MB/s, which is probably rather unusual for a bridge. A closer look at the descriptor tells us that most of these bytes were used to answer directory requests. (I didn't do the math whether a such a burst over a few hours would be sufficient to write 800k compressed consensuses.) So, either the bridge is telling us the truth, or it's lying to us in a very sophisticated way.
And it's not only that bridge that reported very high statistics in June. There's another bridge with nickname "Unnamed" and hashed fingerprint [82F37B9A8400A1E0C0730D8E4639150AE11AC640](https://globe.torproject.org/#/bridge/82F37B9A8400A1E0C0730D8E4639150AE11AC640) that reported to have served around 10k users on June 18 and 22. Similarly, that bridge reported extremely high traffic during those days. I didn't look for more bridges, but it's possible that there were more that reported unusual numbers that didn't stand out as much as these.
So, I'm not sure if we'll find out what exactly happened there, but it seems very unrealistic that these directory requests were generated by actual human users. That's why I think we should remove these outliers in our estimation method.https://gitlab.torproject.org/legacy/trac/-/issues/16590Attempt to parse dates in non-ISO 8601 formats2020-06-13T17:54:51ZKarsten LoesingAttempt to parse dates in non-ISO 8601 formatsWe currently only accept dates in ISO 8601 format and print out a warning if the user puts in a date in a different format. We could make this part more user-friendly by guessing the date format of a non-ISO 8601 date and including a li...We currently only accept dates in ISO 8601 format and print out a warning if the user puts in a date in a different format. We could make this part more user-friendly by guessing the date format of a non-ISO 8601 date and including a link to results using that guessed date.
Example:
"Sorry, "3/24/15" is not a valid date. The expected date format is "YYYY-MM-DD". Example: "2015-07-02". Did you mean: 2014-03-24"
Fortunately, Wikipedia has a list of [common date formats used in the world](https://en.wikipedia.org/wiki/Date_format_by_country).https://gitlab.torproject.org/legacy/trac/-/issues/16594Decide about adding more translations2020-06-13T17:54:53ZKarsten LoesingDecide about adding more translationsRight now, ExoneraTor only supports English and German. We should think about adding more translations. Some parts of this process are probably easy: we could probably put up the [English file](https://gitweb.torproject.org/exonerator....Right now, ExoneraTor only supports English and German. We should think about adding more translations. Some parts of this process are probably easy: we could probably put up the [English file](https://gitweb.torproject.org/exonerator.git/tree/res/ExoneraTor.properties) on Transifex and manually add or update translation files like we did with the [German translation](https://gitweb.torproject.org/exonerator.git/tree/res/ExoneraTor_de.properties).
The harder part is that we want to be really sure that translations are not misleading for law enforcement people. Maybe that means that we'll want translators to be lawyers or to have consulted a lawyer in their country before submitting translations. Or maybe we'll just want a second person to confirm that a translation is correct before making it available.
What do our lawyers say?https://gitlab.torproject.org/legacy/trac/-/issues/16659Add research idea for Linux TCP Initial Sequence Numbers may aid correlation2020-06-13T17:07:50ZTracAdd research idea for Linux TCP Initial Sequence Numbers may aid correlationTCP Sequence Numbers seem to be one more way to leak the host clock on GNU/Linux systems. Its the last major vector in the literature thats not addressed yet.[1] The kernel embeds the system time in microseconds in TCP connections. Some ...TCP Sequence Numbers seem to be one more way to leak the host clock on GNU/Linux systems. Its the last major vector in the literature thats not addressed yet.[1] The kernel embeds the system time in microseconds in TCP connections. Some opinions say the TCP ISNs are salted hashes and can't be abused but my impression from Steve Murdoch's papers are that its feasible and already carried out in his tests. [2][3]
There is no sysctl option to disable it and it must be patched upstream [4][5]
Nick has done exceptional work to get OpenSSL upstream to throw out mandatory timestamping in the protocol. TAILS and Whonix disable TCP Timestamps in the kernel sysctl. TCP Timestamps are a different vector from TCP ISNs discussed here - it would be great if upstream kernel disables this as well so all distros have it.
[1]https://www.cl.cam.ac.uk/~sjm217/papers/ccs06hotornot.pdf
[2]http://caia.swin.edu.au/talks/CAIA-TALK-080728A.pdf
[3]http://www.cl.cam.ac.uk/~sjm217/papers/ih05coverttcp.pdf
[4]https://stackoverflow.com/a/12232126
[5]http://lxr.free-electrons.com/source/net/core/secure_seq.c?v=3.16
**Trac**:
**Username**: sourcehttps://gitlab.torproject.org/legacy/trac/-/issues/16843Add all bwauth measurements (from votes)2020-06-13T18:00:56ZcypherpunksAdd all bwauth measurements (from votes)As discussed in #16020, it would be handy to have all measurements from bwauths included in onionoo data.
The main concert is the additional workload, Karsten wrote:
"
Including all bwauth measurements would certainly be handy, but that...As discussed in #16020, it would be handy to have all measurements from bwauths included in onionoo data.
The main concert is the additional workload, Karsten wrote:
"
Including all bwauth measurements would certainly be handy, but that would require parsing votes which we don't do right now. Onionoo is already choking on parsing all the descriptors published every hour, and votes are not exactly tiny. I'd say don't expect this to happen anytime soon. But I agree that it would be really useful to have.
"https://gitlab.torproject.org/legacy/trac/-/issues/16884Extend "Relays by relay flags" graph to display flag combinations2020-06-13T18:13:27ZNima FatemiExtend "Relays by relay flags" graph to display flag combinationsWhen I select 'Running', 'Exit', and 'Fast' flags on [this graph](https://metrics.torproject.org/relayflags.html?graph=relayflags&start=2015-03-12&end=2015-06-10&flag=Running&flag=Exit&flag=Fast), I'm assuming it should return the number...When I select 'Running', 'Exit', and 'Fast' flags on [this graph](https://metrics.torproject.org/relayflags.html?graph=relayflags&start=2015-03-12&end=2015-06-10&flag=Running&flag=Exit&flag=Fast), I'm assuming it should return the number of "fast and currently running exit nodes" but instead of showing me a mix of all those things, it shows me the list of those flags separately. Hence it's unclear whether the list of exits that I'm seeing are all running or not.
Maybe we should have an <AND>/<OR> kind of filtering here?https://gitlab.torproject.org/legacy/trac/-/issues/17430Add graph with top-10 countries by directly connecting users2020-06-13T18:13:28ZNima FatemiAdd graph with top-10 countries by directly connecting userson the main [user stat page](https://metrics.torproject.org/userstats-relay-table.html), we have the list of top ten countries connecting directly to Tor. and on the left side of the table, there's a huge white space.
Now I don't know i...on the main [user stat page](https://metrics.torproject.org/userstats-relay-table.html), we have the list of top ten countries connecting directly to Tor. and on the left side of the table, there's a huge white space.
Now I don't know if it's been left blank intentionally or not, but it'd be great to have a graph showing all the top ten countries with different colors on it.https://gitlab.torproject.org/legacy/trac/-/issues/17488ExoneraTor hangs forever on old known-positive test2020-06-13T17:54:57ZstarlightExoneraTor hangs forever on old known-positive test98.113.149.36
2011-04-2998.113.149.36
2011-04-29https://gitlab.torproject.org/legacy/trac/-/issues/17738Remove obsolete Advertised Bandwidth column2020-06-13T17:53:28ZteorRemove obsolete Advertised Bandwidth columnIn the Compass country table[0], the advertised bandwidth percentages are all 0. I would expect them to be the sum of the advertised bandwidth in the country.
[0]: https://compass.torproject.org/#?exit_filter=all_relays&links&sort=cw&...In the Compass country table[0], the advertised bandwidth percentages are all 0. I would expect them to be the sum of the advertised bandwidth in the country.
[0]: https://compass.torproject.org/#?exit_filter=all_relays&links&sort=cw&sort_reverse&country=&by_country&top=-1https://gitlab.torproject.org/legacy/trac/-/issues/17861Consider adding a new interface RelayNetworkStatusMicrodescConsensus2020-06-13T17:58:01ZKarsten LoesingConsider adding a new interface RelayNetworkStatusMicrodescConsensusThere are currently three different version 3 network status document types, identified by the following `@type` annotations:
1. `@type network-status-vote-3 1.0`: these are votes exchanged by directory authorities;
2. `@type network-s...There are currently three different version 3 network status document types, identified by the following `@type` annotations:
1. `@type network-status-vote-3 1.0`: these are votes exchanged by directory authorities;
2. `@type network-status-consensus-3 1.0`: these are (unflavored) consensuses based on votes and published by directory authorities;
3. `@type network-status-microdesc-consensus-3 1.0`: these are the same consensuses as before but using a specific flavor, in this case one that references microdescriptors rather than server descriptors.
So, while we're using a separate interface for the first type (`RelayNetworkStatusVote`), we're using the same interface (`RelayNetworkStatusConsensus`) for the second and third type. This only works, because the only differences between unflavored and microdesc-flavored consensuses can be found in the network status entries, and we're using the generic `NetworkStatusEntry` for those. But as soon as there will be new keywords in either unflavored or microdesc-flavored consensuses, we'll have to add support for them to `RelayNetworkStatusConsensus`, even though the other flavor doesn't support them.
We might consider adding another interface `RelayNetworkStatusMicrodescConsensus` for microdesc-flavored consensus, which would probably just extend `RelayNetworkStatusConsensus` for now (or copy over everything from it? uhhh). But if microdesc-flavored consensuses ever derive from unflavored consensus in header or footer, we'll be able to model those changes correctly.
Related to this suggestion, we might consider making `NetworkStatusEntry` less generic by using a separate `Entry` interface like we just did for `ExitList`. Following that model, `RelayNetworkStatusVote.Entry` would keep its `getMicrodescriptorDigests()` method, `RelayNetworkStatusConsensus.Entry` would drop that method, and `RelayNetworkStatusMicrodescConsensus.Entry` would get a `getMicrodescriptorDigest()` (singular) method.
I don't think these changes are super urgent, but I wanted to write them down before resolving #17000 where they first came up.https://gitlab.torproject.org/legacy/trac/-/issues/17939Optimize the construction of details documents with field constraints2020-06-13T18:01:01ZTracOptimize the construction of details documents with field constraintsIn a [recent post to metrics-team@](https://lists.torproject.org/pipermail/metrics-team/2015-December/000026.html), Karsten pointed toward an expensive operation within the response builder:
> Once per hour, the updater fetches new data...In a [recent post to metrics-team@](https://lists.torproject.org/pipermail/metrics-team/2015-December/000026.html), Karsten pointed toward an expensive operation within the response builder:
> Once per hour, the updater fetches new data and in the end produces JSON-formatted strings that it writes to disk. The servlet reads a (comparatively) small index to memory that it uses to handle requests, and when it builds responses, it tries hard to avoid (de-)serializing JSON.
>
> The only situation where this fails is when [a] request [to the /details endpoint] contains the fields parameter. Only in that case we'll have to deserialize, pick the fields we want, and serialize again. I could imagine that this shows up in profiles pretty badly, and I'd love to fix this, I just don't know how.
I think we can exploit a few properties of the updater to handle this case in a more efficient manner.
It seems safe to assume that: (1) the produced response is always the concatenation of a sequence of a substrings within the written document ^[#fn1 1]^; (2) that the documents on disk are legal JSON and correctly typed (having been written by the updater, which we trust and control); and (3) that the contents of the file are trivially parsed (belonging to a restriction of JSON with known and non-redundant keys, the grammar is at most context-free).
I believe these conditions admit introducing a relatively efficient parser generator pair, one that avoids request-time de-serialisation. Given a request, the result of the parser would be a sequence of pairs of indices marking the boundaries of each field. The generator would reproduce the input, but for excluding text regions corresponding to fields excluded by the request.
No patch yet, but I've hacked together a small (inefficient mess of a..) proof of concept that hopefully illustrates the basic idea:
http://hack.rs/~vi/onionoo/IndexJSON.hs
sha256: 14a09f26fadab8d989263dc76d368e41e63ba6c5279d37443878d6c1d0c87834
http://www.webcitation.org/6e3NEOLJg
```
% jq . 96B16C78BB54BA0F56EEA8721781C9BD01B7E9AE
{
"nickname": "Unnamed",
"hashed_fingerprint": "96B16C78BB54BA0F56EEA8721781C9BD01B7E9AE",
"or_addresses": [
"10.103.224.131:443"
],
"last_seen": "2015-11-23 03:40:44",
"first_seen": "2015-11-20 04:38:22",
"running": false,
"flags": [
"Valid"
],
"last_restarted": "2015-11-22 01:23:06",
"advertised_bandwidth": 49168,
"platform": "Tor 0.2.4.22 on Windows 8"
}
% index-json 96B16C78BB54BA0F56EEA8721781C9BD01B7E9AE
("nickname",(2,21,22))
("hashed_fingerprint",(23,85,86))
("or_addresses",(87,123,124))
("last_seen",(125,157,158))
("first_seen",(159,192,193))
("running",(194,208,209))
("flags",(210,226,227))
("last_restarted",(228,265,266))
("advertised_bandwidth",(267,294,295))
("platform",(296,333,333))
% cut -c1 -c23-158 -c194- 96B16C78BB54BA0F56EEA8721781C9BD01B7E9AE | jq .
{
"hashed_fingerprint": "96B16C78BB54BA0F56EEA8721781C9BD01B7E9AE",
"or_addresses": [
"10.103.224.131:443"
],
"last_seen": "2015-11-23 03:40:44",
"running": false,
"flags": [
"Valid"
],
"last_restarted": "2015-11-22 01:23:06",
"advertised_bandwidth": 49168,
"platform": "Tor 0.2.4.22 on Windows 8"
}
```
What do you think?
,,
[=#fn1 ^1^] There's a factor of surprise in the treatment of nullable properties, but it turns out that the existing behaviour works in our favour. GSON removes 'null'ed fields in writing documents to disk; e.g. note the absence of an AS number here:
```
% pwd
/srv/onionoo.torproject.org/onionoo/out/details
% jq . $(ls | shuf -n1)
{
"nickname": "Unnamed",
"hashed_fingerprint": "CE0A4E1B6C545FF9F25A9CAF5926732559A2C0FE",
"or_addresses": [
"10.190.9.13:443"
],
"last_seen": "2015-12-16 22:41:56",
"first_seen": "2015-11-11 21:01:43",
"running": true,
"flags": [
"Fast",
"Valid"
],
"last_restarted": "2015-12-16 02:13:40",
"advertised_bandwidth": 59392,
"platform": "Tor 0.2.4.23 on Windows 8"
}
```
,,
But it *also* excludes them from /details responses, even when specified by name using the 'fields' parameter:
```
% curl -s 'http://onionoo.local/details?lookup=CE0A4E1B6C545FF9F25A9CAF5926732559A2C0FE&fields=hashed_fingerprint,as_number' | jq .bridges[]
{
"hashed_fingerprint": "CE0A4E1B6C545FF9F25A9CAF5926732559A2C0FE"
}
```
,,So it doesn't seem necessary to add any text atop the persisted serialisation, even in this case.
**Trac**:
**Username**: fmaphttps://gitlab.torproject.org/legacy/trac/-/issues/18167Don't trust "bridge-ips" blindly for user number estimates2020-06-13T18:09:04ZKarsten LoesingDon't trust "bridge-ips" blindly for user number estimatesI think I found a bug in the user number estimates that led to the [confusion on #13171](https://trac.torproject.org/projects/tor/ticket/13171#comment:14).
When I developed the [algorithm for estimating user numbers](https://research.to...I think I found a bug in the user number estimates that led to the [confusion on #13171](https://trac.torproject.org/projects/tor/ticket/13171#comment:14).
When I developed the [algorithm for estimating user numbers](https://research.torproject.org/techreports/counting-daily-bridge-users-2012-10-24.pdf), bridges only reported how many directory requests they responded to (`"dirreq-v3-resp"`), but not how these directory requests were distributed to countries (`"dirreq-v3-reqs"`). What they did report was how many different IP addresses by country connected to the bridge (`"bridge-ips"`). The goal back then was to provide better user numbers per country, so I put in the assumption that the geographic distributions of directory responses and connecting IP addresses would be roughly the same. And I think that assumption is still valid for most cases.
However, the meek version _before_ the #13171 fix broke this assumption. Here's an example from a meek bridge that didn't have this fix yet (descriptor digest `462a2bcc..`):
```
extra-info UtahMeekBridge 88F745840F47CE0C6A4FE61D827950B06F9E4534
published 2015-12-09 22:53:48
dirreq-v3-resp ok=17656,not-enough-sigs=0,unavailable=0,not-found=0,not-modified=6160,busy=0
bridge-ips de=16,cn=8,us=8
```
It's rather unlikely that 17656 responses were sent back to 32 IP addresses or less. Still, following the assumption above, we're saying that half of those 17656 responses were sent back to Germany and one quarter each to China and the U.S.A., and that seems dangerously wrong.
I'm going to attach a scatter plot in a minute, `dirreq-resp-by-bridge-ips-2016-01-27.png`, that puts the numbers of `"dirreq-v3-resp ok=..."` and `"bridge-ips"` in relation for statistics reported between December 1, 2015 and last week. The two meek bridges `88F7..` and `AA03..` stand out quite a bit there as clusters close to the y axis.
I have a few possible fixes in mind. The first part would be to ignore all statistics where 1 unique IP address was reported to make, say, 10 directory requests or more. That would remove all dots to the left of the dashed line in the graph.
The second part of the fix would be to switch from combining `"dirreq-v3-resp"` and `"bridge-ips"` numbers and instead use reported distributions of directory requests to countries (`"dirreq-v3-reqs"`) that were not available 3.5 years ago. But [starting roughly 2 years ago](https://trac.torproject.org/projects/tor/ticket/5824#comment:17), these statistics are being published by more and more bridges.
Here's a descriptor (`fe171d40..`) that was published last week by the same bridge as above, now named `MeekGoogle`, which was after the meek-specific #13171 fix:
```
extra-info MeekGoogle 88F745840F47CE0C6A4FE61D827950B06F9E4534
published 2016-01-22 13:11:10
dirreq-v3-reqs us=7200,ru=1576,de=1520,[..],cn=88,[..]
dirreq-v3-resp ok=22016,not-enough-sigs=0,unavailable=0,not-found=0,not-modified=6016,busy=0
bridge-ips us=3016,ru=632,gb=536,de=528,[..],cn=40,[..]
bridge-ip-versions v4=8752,v6=64
bridge-ip-transports <OR>=8,meek=8808
```
I'm attaching a second scatter plot, `dirreq-resp-by-dirreq-reqs-2016-01-27.png`, that compares the numbers of `"dirreq-v3-resp ok=..."` to `"dirreq-v3-reqs"`. The correlation is close to linear, which makes sense, because the number of directory requests should roughly match the number of directory responses. I think we can make the user number estimates a bit more accurate by making this switch. We would still fall back to `"bridge-ips"` if `"dirreq-v3-reqs"` is empty, but that would mostly affect older statistics.
Part three of the plan would be to remove the `"bridge-ips"` line entirely from little-t-tor, because we wouldn't use it anymore. It's worth noting that we'd lose the ability to filter out meek bridges that don't have the #13171 fix and that don't report usable `"dirreq-v3-reqs"` statistics. Or rather, we wouldn't spot future meek-like bridges affected by a similar bug.
Here's why. The first bridge descriptor above also contained a `"dirreq-v3-reqs"` line that I left out before:
```
extra-info UtahMeekBridge 88F745840F47CE0C6A4FE61D827950B06F9E4534
published 2015-12-09 22:53:48
dirreq-v3-resp ok=17656,not-enough-sigs=0,unavailable=0,not-found=0,not-modified=6160,busy=0
dirreq-v3-reqs us=17648,cn=8
bridge-ips de=16,cn=8,us=8
```
We wouldn't be able to filter out this bridge without the `"bridge-ips"` line. We would have to assume that the vast majority of requests to this bridge came from the U.S.A., and a tiny minority from China.
I think this is acceptable, because the purpose of statistics shouldn't be to validate the correctness of other statistics.
To summarize my plan, here's what I'd like to do:
1. If a bridge reports both a `"dirreq-v3-resp`" and a `"bridge-ips"` line, check if the first number is smaller than 10 times the second number; if not, ignore these directory-request statistics reported by this bridge.
2. If a bridge only reports a `"bridge-ips"` line and no `"dirreq-v3-reqs"` line, assume that the country distributions are the same, which is what we're doing right now.
3. If a bridge reports a `"dirreq-v3-reqs"` line, use that for user number estimates and ignore the `"bridge-ips"` line in case it's present.
Hope this report was not too confusing. Feedback much appreciated.https://gitlab.torproject.org/legacy/trac/-/issues/18203Base direct user estimates on responses to directory requests, rather than re...2020-06-13T18:14:15ZKarsten LoesingBase direct user estimates on responses to directory requests, rather than responsesRelays report two different statistics related to directory requests:
- `"dirreq-v3-reqs"` contain the number of _requests_ broken down by country and
- `"dirreq-v3-resp"` contain the number of _responses_ broken down by status code.
...Relays report two different statistics related to directory requests:
- `"dirreq-v3-reqs"` contain the number of _requests_ broken down by country and
- `"dirreq-v3-resp"` contain the number of _responses_ broken down by status code.
We're using the number of responses to estimate bridge users but the number of requests to estimate direct users. We should use the same input for both estimates. Using the number of responses (successful requests) seems more intuitive, because we're assuming that each client needs to fetch 10 fresh consensuses every day, and they might have to make more than 10 requests if one or more of them is not successful.
I'll attach an analysis shortly.
This minor enhancement came up when working on #18167.https://gitlab.torproject.org/legacy/trac/-/issues/18732Document release process for Java projects2020-06-13T18:09:52ZiwakehDocument release process for Java projectsThe Release Process description should be based on existing documentation:
[metrics-lib's CONTRIB.md](https://gitweb.torproject.org/metrics-lib.git/tree/CONTRIB.md#n142)
and after completion be referenced by metrics-lib's READMEThe Release Process description should be based on existing documentation:
[metrics-lib's CONTRIB.md](https://gitweb.torproject.org/metrics-lib.git/tree/CONTRIB.md#n142)
and after completion be referenced by metrics-lib's READMEhttps://gitlab.torproject.org/legacy/trac/-/issues/18797Create a DescriptorGenerator for testing and maybe other purposes2020-06-13T17:57:04ZiwakehCreate a DescriptorGenerator for testing and maybe other purposesquote from #16873:
I pondered adding a `DescriptorGenerator` a while ago, where one would construct a new instance of a `Descriptor`, call its setters to give it some values, and then have the generator generate a text representation o...quote from #16873:
I pondered adding a `DescriptorGenerator` a while ago, where one would construct a new instance of a `Descriptor`, call its setters to give it some values, and then have the generator generate a text representation of that descriptor. This would be useful for testing and maybe other things. Now, would the Javadoc of these setters copy the Javadoc of the getters or simply link to them? Or would we use some smarter pattern for building these instances? (And is there a smarter pattern for parsed descriptors than a huge interface with dozens of getters?)https://gitlab.torproject.org/legacy/trac/-/issues/18798Analyze descriptor completeness2020-06-13T17:50:21ZiwakehAnalyze descriptor completenessI started a wiki page [here](https://trac.torproject.org/projects/tor/wiki/doc/CollecTor/AnalysisDescriptorCompleteness).I started a wiki page [here](https://trac.torproject.org/projects/tor/wiki/doc/CollecTor/AnalysisDescriptorCompleteness).https://gitlab.torproject.org/legacy/trac/-/issues/19169Verify, correct, and extend runtime statistics2020-06-13T17:50:35ZiwakehVerify, correct, and extend runtime statisticssee [Analysis Part 2](https://trac.torproject.org/projects/tor/wiki/doc/CollecTor/AnalysisVotesAndConsensusCompleteness) for background information.
* verify current stats
* avoid ambiguous log statements
* maybe, separate stats for dow...see [Analysis Part 2](https://trac.torproject.org/projects/tor/wiki/doc/CollecTor/AnalysisVotesAndConsensusCompleteness) for background information.
* verify current stats
* avoid ambiguous log statements
* maybe, separate stats for download and import
* ...https://gitlab.torproject.org/legacy/trac/-/issues/19183Add sybilhunter's visualisations to Metrics website2020-06-13T18:13:34ZPhilipp Winterphw@torproject.orgAdd sybilhunter's visualisations to Metrics websiteIt would be great to have sybilhunter's [churn](https://nymity.ch/sybilhunting/churn-values/slide_2016-01.html) and [uptime](https://nymity.ch/sybilhunting/uptime-visualisation/slide_2014-01.html) visualisations on the Metrics website. T...It would be great to have sybilhunter's [churn](https://nymity.ch/sybilhunting/churn-values/slide_2016-01.html) and [uptime](https://nymity.ch/sybilhunting/uptime-visualisation/slide_2014-01.html) visualisations on the Metrics website. The churn plots are time series, just like the ones we already have on Metrics. Uptime visualisations are jpeg images. We could have weekly or monthly uptime images, and daily churn diagrams.
Sybilhunter is a Go program that expects as input files that are structured like CollecTor's archives. It should be straightforward to run it over cron.
Karsten, I don't know ggplot2. Could you help with plotting the churn values? The format is quite simple. Every line represents the churn changes for the current consensus, and starts with a timestamp, which is then followed by flag-specific churn values in the interval [0, 1].
As I understand it, at least the following two steps are necessary to incorporate both visualisations:
* Modify `./website/etc/metrics.json`.
* Write a shell script for the cron job to run.
Is there anything else we need?https://gitlab.torproject.org/legacy/trac/-/issues/19249Onionoo server runs out of memory when importing a full month of data2020-06-13T18:01:11ZKarsten LoesingOnionoo server runs out of memory when importing a full month of dataI had to re-import all of May on the Onionoo mirror because it was offline for more than three days. Now it's running out of memory in the shut-down process. Logs and exception below:
```
2016-06-01 09:30:33,944 INFO o.t.o.cron.Main:9...I had to re-import all of May on the Onionoo mirror because it was offline for more than three days. Now it's running out of memory in the shut-down process. Logs and exception below:
```
2016-06-01 09:30:33,944 INFO o.t.o.cron.Main:92 Going to run one-time updater ...
2016-06-01 09:30:34,002 INFO o.t.o.cron.Main:130 Initializing.
2016-06-01 09:30:34,005 INFO o.t.o.cron.Main:133 Acquired lock
2016-06-01 09:30:34,005 DEBUG o.t.o.cron.Main:152 Started update ...
2016-06-01 09:30:34,007 INFO o.t.o.cron.Main:155 Initialized descriptor source
2016-06-01 09:30:34,012 INFO o.t.o.cron.Main:159 Initialized document store
2016-06-01 09:30:34,029 INFO o.t.o.cron.Main:163 Initialized status update runner
2016-06-01 09:30:34,040 INFO o.t.o.cron.Main:168 Initialized document writer runner
2016-06-01 09:30:34,041 INFO o.t.o.cron.Main:176 Downloading descriptors.
2016-06-01 09:30:34,041 INFO o.t.o.u.DescriptorSource:64 Loading: RELAY_CONSENSUSES
2016-06-01 09:33:02,861 INFO o.t.o.u.DescriptorSource:64 Loading: RELAY_SERVER_DESCRIPTORS
2016-06-01 09:35:39,639 INFO o.t.o.u.DescriptorSource:64 Loading: RELAY_EXTRA_INFOS
2016-06-01 09:38:10,562 INFO o.t.o.u.DescriptorSource:64 Loading: EXIT_LISTS
2016-06-01 09:38:51,159 INFO o.t.o.u.DescriptorSource:64 Loading: BRIDGE_STATUSES
2016-06-01 09:40:29,716 INFO o.t.o.u.DescriptorSource:64 Loading: BRIDGE_SERVER_DESCRIPTORS
2016-06-01 09:41:58,737 INFO o.t.o.u.DescriptorSource:64 Loading: BRIDGE_EXTRA_INFOS
2016-06-01 09:43:44,958 INFO o.t.o.cron.Main:184 Reading descriptors.
2016-06-01 09:43:44,959 INFO o.t.o.u.DescriptorSource:153 Reading archived descriptors...
2016-06-02 02:51:09,249 INFO o.t.o.u.DescriptorSource:200 Read archived descriptors
2016-06-02 02:51:09,249 DEBUG o.t.o.u.DescriptorSource:84 Reading recent RELAY_SERVER_DESCRIPTORS ...
2016-06-02 02:53:01,224 INFO o.t.o.u.DescriptorSource:129 Read recent relay server descriptors
2016-06-02 02:53:01,224 DEBUG o.t.o.u.DescriptorSource:88 Reading recent RELAY_EXTRA_INFOS ...
2016-06-02 03:31:50,889 INFO o.t.o.u.DescriptorSource:132 Read recent relay extra-info descriptors
2016-06-02 03:31:50,890 DEBUG o.t.o.u.DescriptorSource:91 Reading recent EXIT_LISTS ...
2016-06-02 03:32:13,478 INFO o.t.o.u.DescriptorSource:135 Read recent exit lists
2016-06-02 03:32:13,479 DEBUG o.t.o.u.DescriptorSource:94 Reading recent RELAY_CONSENSUSES ...
2016-06-02 08:29:52,761 INFO o.t.o.u.DescriptorSource:126 Read recent relay network consensuses
2016-06-02 08:29:52,765 DEBUG o.t.o.u.DescriptorSource:97 Reading recent BRIDGE_SERVER_DESCRIPTORS ...
2016-06-02 08:31:32,294 INFO o.t.o.u.DescriptorSource:141 Read recent bridge server descriptors
2016-06-02 08:31:32,295 DEBUG o.t.o.u.DescriptorSource:101 Reading recent BRIDGE_EXTRA_INFOS ...
2016-06-02 09:22:29,247 INFO o.t.o.u.DescriptorSource:144 Read recent bridge extra-info descriptors
2016-06-02 09:22:29,247 DEBUG o.t.o.u.DescriptorSource:104 Reading recent BRIDGE_STATUSES ...
2016-06-02 09:23:44,681 INFO o.t.o.u.DescriptorSource:138 Read recent bridge network statuses
2016-06-02 09:23:44,682 INFO o.t.o.cron.Main:186 Updating internal status files.
2016-06-02 09:23:44,682 DEBUG o.t.o.u.StatusUpdateRunner:36 Begin update of NodeDetailsStatusUpdater
2016-06-02 09:25:02,021 INFO o.t.o.u.NodeDetailsStatusUpdater:379 Read node statuses
2016-06-02 09:25:12,500 INFO o.t.o.u.NodeDetailsStatusUpdater:381 Started reverse domain name lookups
2016-06-02 09:31:29,006 INFO o.t.o.u.NodeDetailsStatusUpdater:383 Looked up cities and ASes
2016-06-02 09:31:29,112 INFO o.t.o.u.NodeDetailsStatusUpdater:385 Calculated path selection probabilities
2016-06-02 09:31:29,224 INFO o.t.o.u.NodeDetailsStatusUpdater:387 Computed effective and extended families
2016-06-02 09:31:29,252 INFO o.t.o.u.NodeDetailsStatusUpdater:389 Finished reverse domain name lookups
2016-06-02 09:34:37,918 INFO o.t.o.u.NodeDetailsStatusUpdater:391 Updated node and details statuses
2016-06-02 09:34:37,918 INFO o.t.o.u.StatusUpdateRunner:38 NodeDetailsStatusUpdater updated status files
2016-06-02 09:34:37,918 DEBUG o.t.o.u.StatusUpdateRunner:36 Begin update of BandwidthStatusUpdater
2016-06-02 09:34:37,918 INFO o.t.o.u.StatusUpdateRunner:38 BandwidthStatusUpdater updated status files
2016-06-02 09:34:37,918 DEBUG o.t.o.u.StatusUpdateRunner:36 Begin update of WeightsStatusUpdater
2016-06-02 09:34:37,918 INFO o.t.o.u.StatusUpdateRunner:38 WeightsStatusUpdater updated status files
2016-06-02 09:34:37,919 DEBUG o.t.o.u.StatusUpdateRunner:36 Begin update of ClientsStatusUpdater
2016-06-02 09:47:57,798 INFO o.t.o.u.StatusUpdateRunner:38 ClientsStatusUpdater updated status files
2016-06-02 09:47:57,799 DEBUG o.t.o.u.StatusUpdateRunner:36 Begin update of UptimeStatusUpdater
2016-06-02 10:28:41,049 INFO o.t.o.u.StatusUpdateRunner:38 UptimeStatusUpdater updated status files
2016-06-02 10:28:41,049 INFO o.t.o.cron.Main:194 Updating document files.
2016-06-02 10:28:41,049 DEBUG o.t.o.w.DocumentWriterRunner:28 Writing SummaryDocumentWriter
2016-06-02 10:29:03,109 INFO o.t.o.w.SummaryDocumentWriter:97 Wrote summary document files
2016-06-02 10:29:03,110 DEBUG o.t.o.w.DocumentWriterRunner:28 Writing DetailsDocumentWriter
2016-06-02 10:33:58,870 INFO o.t.o.w.DetailsDocumentWriter:46 Wrote details document files
2016-06-02 10:33:58,870 DEBUG o.t.o.w.DocumentWriterRunner:28 Writing BandwidthDocumentWriter
2016-06-02 11:45:04,633 INFO o.t.o.w.BandwidthDocumentWriter:54 Wrote bandwidth document files
2016-06-02 11:45:04,634 DEBUG o.t.o.w.DocumentWriterRunner:28 Writing WeightsDocumentWriter
2016-06-02 12:19:42,480 INFO o.t.o.w.WeightsDocumentWriter:55 Wrote weights document files
2016-06-02 12:19:42,481 DEBUG o.t.o.w.DocumentWriterRunner:28 Writing ClientsDocumentWriter
2016-06-02 12:23:22,577 INFO o.t.o.w.ClientsDocumentWriter:84 Wrote clients document files
2016-06-02 12:23:22,577 DEBUG o.t.o.w.DocumentWriterRunner:28 Writing UptimeDocumentWriter
2016-06-02 12:39:58,477 INFO o.t.o.w.UptimeDocumentWriter:57 Wrote uptime document files
2016-06-02 12:39:58,477 INFO o.t.o.cron.Main:199 Shutting down.
2016-06-02 12:39:58,477 DEBUG o.t.o.u.DescriptorSource:204 Writing parse histories for recent descriptors...
2016-06-02 12:39:58,492 INFO o.t.o.cron.Main:202 Wrote parse histories
karsten@onionoo:/srv/onionoo.thecthulhu.com/onionoo$ java -DLOGBASE=/srv/onionoo.thecthulhu.com/onionoo/log-cron/ -Xmx4g -jar dist/onionoo-3.1.0.jar --single-run && java -DLOGBASE=/srv/onionoo.thecthulhu.com/onionoo/log-cron/ -Xmx4g -jar dist/onionoo-3.1.0.jar
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:2367)
at java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:130)
at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:114)
at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:415)
at java.lang.StringBuilder.append(StringBuilder.java:132)
at org.torproject.onionoo.docs.DocumentStore.writeNodeStatuses(DocumentStore.java:710)
at org.torproject.onionoo.docs.DocumentStore.flushDocumentCache(DocumentStore.java:669)
at org.torproject.onionoo.cron.Main.shutDown(Main.java:205)
at org.torproject.onionoo.cron.Main.run(Main.java:121)
at org.torproject.onionoo.cron.Main.runOrScheduleExecutions(Main.java:93)
at org.torproject.onionoo.cron.Main.main(Main.java:32)
karsten@onionoo:/srv/onionoo.thecthulhu.com/onionoo$ ls -lh in/archive/
total 6.9G
-rw-r--r-- 1 karsten karsten 2.9G May 31 23:40 bridge-descriptors-2016-05.tar
-rw-r--r-- 1 karsten karsten 1.1G May 30 23:19 consensuses-2016-05.tar
-rw-r--r-- 1 karsten karsten 1.5G May 31 15:57 extra-infos-2016-05.tar
-rw-r--r-- 1 karsten karsten 1.5G May 31 12:41 server-descriptors-2016-05.tar
```
I didn't really start investigating. Note that it takes over 24 hours to do the processing, so we cannot reproduce this bug as easily.https://gitlab.torproject.org/legacy/trac/-/issues/19282Avoid truncating descriptors while storing them2020-06-13T17:50:39ZKarsten LoesingAvoid truncating descriptors while storing themThomas/oma found this while processing archived votes:
```
org.torproject.descriptor.DescriptorParseException: r line 'r puffytor' has fewer space-separated elements than expected.
in votes-2016-04/01/2016-04-01-07-00-00-vote-14C131...Thomas/oma found this while processing archived votes:
```
org.torproject.descriptor.DescriptorParseException: r line 'r puffytor' has fewer space-separated elements than expected.
in votes-2016-04/01/2016-04-01-07-00-00-vote-14C131DFC5C6F93646BE72FA1401C02A8DF2E8B4-16E97744121B2FFD387C43246F22153741FD22F8
org.torproject.descriptor.DescriptorParseException: 'YbVD6iBkPXGbsypMnn4d/sQA4I3hBxF' in line 'm 18,19,20 sha256=YbVD6iBkPXGbsypMnn4d/sQA4I3hBxF' is not a valid base64-encoded 32-byte value.
in votes-2016-04/01/2016-04-01-06-00-00-vote-49015F787433103580E3B66A1707A00E60F2D15B-B98C42D8E4178E780E3DA2F048C5D25434E8B66D
org.torproject.descriptor.DescriptorParseException: r line 'r KingJasperTheGreat JW3pOXIv/G' has fewer space-separated elements than expected.
in votes-2016-04/01/2016-04-01-08-00-00-vote-49015F787433103580E3B66A1707A00E60F2D15B-2C4AA4675C6BE8DF0D2898A42B1D83DE60FF5348
org.torproject.descriptor.DescriptorParseException: '13,14,' in line 'm 13,14,' is not a valid base64-encoded 32-byte value.
in votes-2016-04/01/2016-04-01-09-00-00-vote-D586D18309DED4CD6D57C18FDB97EFA96D330566-4BF4EEF4A8618447D6E363ED7D3E3835CEB5D379
org.torproject.descriptor.DescriptorParseException: Keyword 'directory-signature' is contained 0 times, but must be contained exactly once.
in votes-2016-04/01/2016-04-01-10-00-00-vote-E8A9C45EDE6D711294FADF8E7951F4DE6CA56B58-81E1E9BEBAA47A305D404AFE0E54A32AD603F648
org.torproject.descriptor.DescriptorParseException: 'J35KTpJ2' in line 'm 18,19,20,22 sha256=J35KTpJ2' is not a valid base64-encoded 32-byte value.
in votes-2016-04/01/2016-04-01-05-00-00-vote-D586D18309DED4CD6D57C18FDB97EFA96D330566-636CBFED055E4C21442FC6213FBF172A34355BC4
org.torproject.descriptor.DescriptorParseException: r line 'r seele AAoQ1DAR6kkoo19hBAX5K0QztNw xgEIpqUwO/AGzMqJuybadp0MHDg 2016-04-0' has fewer space-separated elements than expected.
in votes-2016-04/01/2016-04-01-09-00-00-vote-23D15D965BC35114467363C165C4F724B64B4F66-01917F128E43315854AC6D3CDDAF5AC0F2C6ACFC
org.torproject.descriptor.DescriptorParseException: '1iWRvv4H9t0LHHLRZL' in line 'm 13,14,15 sha256=1iWRvv4H9t0LHHLRZL' is not a valid base64-encoded 32-byte value.
in votes-2016-04/01/2016-04-01-07-00-00-vote-49015F787433103580E3B66A1707A00E60F2D15B-8DC4AB2F5E83E5117CB1BF8590658BFB2EEF40B4
org.torproject.descriptor.DescriptorParseException: Keyword 'directory-signature' is contained 0 times, but must be contained exactly once.
in votes-2016-04/01/2016-04-01-05-00-00-vote-49015F787433103580E3B66A1707A00E60F2D15B-DC09DDCD3B34DF976A0D6D1645DE8BEBAB3646B4
org.torproject.descriptor.DescriptorParseException: r line 'r Unnamed GfWunVflTNsyPhaL2FetR0MO96U 2Zw9flyVAbb0Uc13fWa2+o//rg8 2016-03-31 12:' has fewer space-separated elements than expected.
in votes-2016-04/01/2016-04-01-09-00-00-vote-E8A9C45EDE6D711294FADF8E7951F4DE6CA56B58-1F1431417D0CEC7086442D36FB88791998CC1961
org.torproject.descriptor.DescriptorParseException: Keyword 'directory-signature' is contained 0 times, but must be contained exactly once.
in votes-2016-04/01/2016-04-01-10-00-00-vote-EFCBE720AB3A82B99F9E953CD5BF50F7EEFC7B97-48C075797027F060F9C2ECA203DA83B519ADD488
org.torproject.descriptor.DescriptorParseException: r line 'r seele AAoQ1DAR6kkoo19hBAX5K0QztNw xgEIpqUwO/AGzMqJuybadp0MHDg 2016-04-0' has fewer space-separated elements than expected.
in votes-2016-04/01/2016-04-01-10-00-00-vote-23D15D965BC35114467363C165C4F724B64B4F66-03532324381822931CBDBB8C47BBC23C13A1FDE6
org.torproject.descriptor.DescriptorParseException: Keyword 'directory-signature' is contained 0 times, but must be contained exactly once.
in votes-2016-04/01/2016-04-01-06-00-00-vote-EFCBE720AB3A82B99F9E953CD5BF50F7EEFC7B97-5132376E2D44A7DEDAB25D6C2B1E5D168DC39EB5
org.torproject.descriptor.DescriptorParseException: Illegal key-value pair in line 'w Bandwi'.
in votes-2016-04/01/2016-04-01-11-00-00-vote-D586D18309DED4CD6D57C18FDB97EFA96D330566-7EF062C0774ED2AFA0200770F9DE0141006AD4EB
org.torproject.descriptor.DescriptorParseException: 'GYa' in line 'm 13,14,15 sha256=GYa' is not a valid base64-encoded 32-byte value.
in votes-2016-04/01/2016-04-01-10-00-00-vote-0232AF901C31A04EE9848595AF9BB7620D4C5B2E-19FD6206F29626F8B94EDDAC25CE17A264DCF998
org.torproject.descriptor.DescriptorParseException: r line 'r seele AAoQ1DAR6kkoo19hBAX5K0QztNw iU9ixCKwJOUEHUO1h+KIEkQf0qU 2' has fewer space-separated elements than expected.
in votes-2016-04/01/2016-04-01-05-00-00-vote-23D15D965BC35114467363C165C4F724B64B4F66-CB23F13DC89D5F4A61FF54AC5E2D591DE06A91AE
org.torproject.descriptor.DescriptorParseException: r line 'r yashodara PW/bW8B4bVFa55aETVXD8Q8aR' has fewer space-separated elements than expected.
in votes-2016-04/01/2016-04-01-08-00-00-vote-14C131DFC5C6F93646BE72FA1401C02A8DF2E8B4-758D6B9D8A8E586DC6E04377669AD429284F96D6
```
The reason is that the disk ran full on that day, and apparently we only stored truncated votes to disk. We should detect that and only store complete descriptors, maybe by storing to a `.tmp` file and then renaming. This issue might not be limited to votes.https://gitlab.torproject.org/legacy/trac/-/issues/19611Make all Metrics Java projects confirm to guidelines2020-06-13T18:09:54ZiwakehMake all Metrics Java projects confirm to guidelinesThis is a meta-issue as parent for all the separate project issues.
All java projects should confirm to this [guide](https://trac.torproject.org/projects/tor/wiki/org/teams/MetricsTeam/MetricsJavaStyleGuide).This is a meta-issue as parent for all the separate project issues.
All java projects should confirm to this [guide](https://trac.torproject.org/projects/tor/wiki/org/teams/MetricsTeam/MetricsJavaStyleGuide).https://gitlab.torproject.org/legacy/trac/-/issues/19616Consider renaming metrics-lib2020-06-13T17:57:13ZiwakehConsider renaming metrics-libapplies to
* git
* trac
* documentation
* where else?applies to
* git
* trac
* documentation
* where else?metrics-lib 3.0.0https://gitlab.torproject.org/legacy/trac/-/issues/19622use java 8 in DescripTor2020-06-13T17:57:13Ziwakehuse java 8 in DescripTorthis should only be finished when all depending Metrics projects use java 8.this should only be finished when all depending Metrics projects use java 8.metrics-lib 2.1.0https://gitlab.torproject.org/legacy/trac/-/issues/19650Keep non-printable characters out of details documents2020-06-13T18:01:19ZcypherpunksKeep non-printable characters out of details documentsFuture Tor will not publish non-ASCII descriptors and dir auths will reject them at some point.
Since this is quite far in the future, what do you think about removing such strings in onionoo (before it reaches tor)?
I'm not suggesting...Future Tor will not publish non-ASCII descriptors and dir auths will reject them at some point.
Since this is quite far in the future, what do you think about removing such strings in onionoo (before it reaches tor)?
I'm not suggesting to ignore the entire descriptor in such a case but just replace such chars with "?" ?
"??B`?" -> "??B`??"
https://atlas.torproject.org/#details/21E84B294794821E2898E8ED18402E45E4FC351E
Note: I used "non-printable" (vs. non-ASCII) since onionoo data includes printable but non-ASCII chars.
https://lists.torproject.org/pipermail/tor-relays/2016-July/009667.html
related:
[1] https://trac.torproject.org/projects/tor/ticket/19647
[2] https://trac.torproject.org/projects/tor/ticket/18938https://gitlab.torproject.org/legacy/trac/-/issues/19654Handle situations more gracefully when SVG is disabled (like in Tor Browser's...2020-06-13T18:08:18ZcypherpunksHandle situations more gracefully when SVG is disabled (like in Tor Browser's High Security mode)When TBB is set to high security, all Graphs show 'No Data Available' (Javascript enabled using 'Temporarily allow all this page'). Graphs work on medium-high security.When TBB is set to high security, all Graphs show 'No Data Available' (Javascript enabled using 'Temporarily allow all this page'). Graphs work on medium-high security.https://gitlab.torproject.org/legacy/trac/-/issues/19756research.tp.o suggests mailing tor-assistants2020-06-13T17:24:01ZRoger Dingledineresearch.tp.o suggests mailing tor-assistantsWe should clearly have it suggest mailing someplace else.
But where?
We have methodically shut down all ways for outside people to reach us.
Should we make a new, research-oriented, list and put some caretakers on it who will then for...We should clearly have it suggest mailing someplace else.
But where?
We have methodically shut down all ways for outside people to reach us.
Should we make a new, research-oriented, list and put some caretakers on it who will then forward non-spam mails to some larger internal list?
Hm.https://gitlab.torproject.org/legacy/trac/-/issues/19828Extend descriptorCutOff in CollecTor's RelayDescriptorDownloader by 6 hours2020-06-13T17:50:54ZKarsten LoesingExtend descriptorCutOff in CollecTor's RelayDescriptorDownloader by 6 hoursCollecTor's RelayDescriptorDownloader only downloads server and extra-info descriptors that have been published up to 24 hours before the current system time. This makes sense, so that missing descriptors that cannot be obtained are not...CollecTor's RelayDescriptorDownloader only downloads server and extra-info descriptors that have been published up to 24 hours before the current system time. This makes sense, so that missing descriptors that cannot be obtained are not retried forever.
However, there are cases when a valid consensus or vote references a server descriptor that was published over 24 hours ago:
- CollecTor may run at any time of the hour, at which point the valid-after time of current consensuses and votes may already be up to **1 hour** behind the current system time.
- The votes that a consensus is based on are generated **10 minutes** before the valid-after time, and they may contain server descriptors that have been published in the past 24 hours.
- Directory authorities may serve an older consensus than the current consensus, say, one that is already **2 hours** older than the current one.
All in all, CollecTor should attempt to fetch descriptors that are **27:10 hours** old, or let's say **30 hours** for simplicity and to account for cases we didn't consider here.
The downside is that missing descriptors will be retried for 6 more hours, but that doesn't seem to be that much of a problem, given that missing descriptors will be retried in batches of up to 96.
Here's a trivial patch:
```
diff --git a/src/main/java/org/torproject/collector/relaydescs/RelayDescriptorDownloader.java b/src/main/java/org/torproject/collector/relaydescs/RelayDescriptorDownloader.java
index f4e38f4..21b1ee4 100644
--- a/src/main/java/org/torproject/collector/relaydescs/RelayDescriptorDownloader.java
+++ b/src/main/java/org/torproject/collector/relaydescs/RelayDescriptorDownloader.java
@@ -185,7 +185,9 @@ public class RelayDescriptorDownloader {
/**
* Cut-off time for missing server and extra-info descriptors, formatted
* "yyyy-MM-dd HH:mm:ss". This time is initialized as the current system
- * time minus 24 hours.
+ * time minus 30 hours (24 hours for the maximum age of descriptors to be
+ * referenced plus 6 hours for the time between generating votes and
+ * processing a consensus).
*/
private String descriptorCutOff;
@@ -330,7 +332,7 @@ public class RelayDescriptorDownloader {
long now = System.currentTimeMillis();
this.currentValidAfter = format.format((now / (60L * 60L * 1000L))
* (60L * 60L * 1000L));
- this.descriptorCutOff = format.format(now - 24L * 60L * 60L * 1000L);
+ this.descriptorCutOff = format.format(now - 30L * 60L * 60L * 1000L);
this.currentTimestamp = format.format(now);
this.downloadAllDescriptorsCutOff = format.format(now
- 23L * 60L * 60L * 1000L - 30L * 60L * 1000L);
```https://gitlab.torproject.org/legacy/trac/-/issues/19834Rethink how we handle issues while sanitizing bridge descriptors2021-08-23T14:43:21ZKarsten LoesingRethink how we handle issues while sanitizing bridge descriptorsThe bridge descriptor sanitizer parses tarballs containing non-sanitized bridge descriptors, modifies their content by removing bridge IP addresses and other sensitive parts, and writes sanitized versions of those bridge descriptors to d...The bridge descriptor sanitizer parses tarballs containing non-sanitized bridge descriptors, modifies their content by removing bridge IP addresses and other sensitive parts, and writes sanitized versions of those bridge descriptors to disk.
The sanitizer needs to recognize the lines contained in bridge descriptors to distinguish between lines that must be changed and others that can be kept unchanged, and it needs to be able to understand the exact format of certain lines in order to sanitize their contents.
This process can go wrong in various ways, and we need to decide how to handle those situations. Possible situations are:
1. A tarball is malformed or can otherwise not be opened.
2. A tarball contains one or more files that cannot be opened.
3. A tarball file contains an unknown descriptor type.
4. An internal problem prohibits sanitizing descriptor parts (e.g., missing secret for sanitizing IP address).
5. A descriptor is missing parts that are required for properly sanitizing its contents.
6. A descriptor contains an unrecognized line.
7. A descriptor line doesn't follow the expected format, contains fewer or more arguments, etc.
Possible ways of handling such situations are:
A. Skip a line we don't understand and keep the rest of the descriptor.
B. Skip a descriptor.
C. Skip the file contained in the tarball and continue with the next.
D. Abort processing the tarball.
E. Skip the entire tarball, including discarding any descriptors processed before running into the problem, and attempt to process the tarball again in the next execution.
F. Abstain from processing a given descriptor type until a problem has been resolved.
G. Discard any descriptors processed in a tarball until running into the problem, abort the current execution, and refuse starting the next execution until the problem has been resolved.
H. (in addition to A-G). Inform the operator by logging the problem.
I. (in addition to A-G). Warn the operator and ask them to resolve the problem.
Looking at this list, I think that my preferred ways of handling problems would be something like:
- B+H in situations 5, 6, and 7;
- E+I in situations 1, 2, and 3; and
- G+I in situation 4.
That's not exactly what we're currently doing. And I'm not even sure if somebody else operating a CollecTor instance with the bridgedescs module would have the same preferences.
Let's discuss!https://gitlab.torproject.org/legacy/trac/-/issues/19836Prepare relay descriptor downloader for consensuses published at :30 of the hour2020-06-13T17:50:57ZKarsten LoesingPrepare relay descriptor downloader for consensuses published at :30 of the hourConsensuses have always been published once per hour, so with valid-after time :00 in the past. This might change in the future, as has been discussed in the context of consensus diffs, but which might also happen independent of those.
...Consensuses have always been published once per hour, so with valid-after time :00 in the past. This might change in the future, as has been discussed in the context of consensus diffs, but which might also happen independent of those.
However, CollecTor is not ready for such a change, because `currentValidAfter` in `RelayDescriptorDownloader` is always initialized with :00 of the hour. This would prevent us from accepting a consensus published at :30. Of course, when we make such a change we should also accept other valid-after times that :00 and :30.
Testing a fix might require setting up a testing network using Chutney and archiving network data produced by that. That is, we could also create a testing environment and write unit tests for this, which we should do anyway. But that shouldn't block us here, unless the fix is more complex than I currently anticipate.https://gitlab.torproject.org/legacy/trac/-/issues/20053Plan refactoring of metrics-web modules2020-06-13T18:09:07ZiwakehPlan refactoring of metrics-web modulesmany modules are very old and should be modernized and refactured
this is a bigger task ticket is for planning the first steps.
depends on #19730
see also #20049many modules are very old and should be modernized and refactured
this is a bigger task ticket is for planning the first steps.
depends on #19730
see also #20049Metrics 1.0.0https://gitlab.torproject.org/legacy/trac/-/issues/20098Make reference checker more accurate2020-06-13T17:51:05ZKarsten LoesingMake reference checker more accurateAs of February this year we're using a reference checker to spot missing descriptors that reads files in `recent/relay-descriptors/` and warns if too many referenced descriptors cannot be found.
However, our reference checker has been t...As of February this year we're using a reference checker to spot missing descriptors that reads files in `recent/relay-descriptors/` and warns if too many referenced descriptors cannot be found.
However, our reference checker has been too noisy for me to pay much attention.
I didn't look at the logs in detail yet, but I came up with a possible improvement: we should only count an extra-info descriptor as missing if the referencing server descriptor is referenced from a consensus or vote. This is supposed to exclude all extra-info descriptors that are referenced from server descriptors uploaded to the directory authorities by bogus relays without also uploading the corresponding extra-info descriptors.
Maybe there are other tweaks that make these warnings more accurate and again worth checking by the operator.https://gitlab.torproject.org/legacy/trac/-/issues/20228Append all votes with same valid-after time to a single file in `recent/`2020-06-13T17:51:09ZKarsten LoesingAppend all votes with same valid-after time to a single file in `recent/`We're currently creating a new file per vote in `recent/relay-descriptors/votes/`, which might be excessive. We could easily append all votes with the same valid-after time to a single file there, so instead of:
```
2016-09-23-14-00-00...We're currently creating a new file per vote in `recent/relay-descriptors/votes/`, which might be excessive. We could easily append all votes with the same valid-after time to a single file there, so instead of:
```
2016-09-23-14-00-00-vote-EFCBE720AB3A82B99F9E953CD5BF50F7EEFC7B97-927994982CFB4E2F24D22D0B74D693574EC04DE5
2016-09-23-14-00-00-vote-ED03BB616EB2F60BEC80151114BB25CEF515B226-915DB92F8614D94EB6390621EF4ADD65510A6AB7
2016-09-23-14-00-00-vote-D586D18309DED4CD6D57C18FDB97EFA96D330566-C77126E7595C15F242DF13F740528526E71CC063
2016-09-23-14-00-00-vote-49015F787433103580E3B66A1707A00E60F2D15B-1307B591CA002EB4FD55E5B183F8D757A64F0963
2016-09-23-14-00-00-vote-23D15D965BC35114467363C165C4F724B64B4F66-9489196D1A9647F7A0B6CEDD3E48C7CFAECC57F0
2016-09-23-14-00-00-vote-14C131DFC5C6F93646BE72FA1401C02A8DF2E8B4-8C8DEA53F89447781098CFCDA9A59ED8B6987C96
2016-09-23-14-00-00-vote-0232AF901C31A04EE9848595AF9BB7620D4C5B2E-8E11EBEEC56D1DBECC9105192F0036292B35A721
```
we'd just provide a single file:
```
2016-09-23-14-00-00-votes
```
just like we're providing a single file for the consensus:
```
2016-09-23-14-00-00-consensus
```
I just looked at the `index.json` we provide, and of the 1825 files in `recent/`, 504 are votes (28%). My rough estimate is that we'd cut down the size of `index.json` to 75% of its current size.
The question is whether anybody downloads these votes (manually) and relies on them being contained in separate files.
Note that this change would not affect how votes are stored in tarballs. They can stay in separate files there.https://gitlab.torproject.org/legacy/trac/-/issues/20236Make changes to bridgedescs module for bulk-processing tarballs2020-06-13T17:51:10ZKarsten LoesingMake changes to bridgedescs module for bulk-processing tarballsI recently finished re-processing the entire bridge descriptor archive for #19317. However, I had to make some changes to avoid running out of memory or wasting time on unnecessary operations. I now went through the changes and cleaned...I recently finished re-processing the entire bridge descriptor archive for #19317. However, I had to make some changes to avoid running out of memory or wasting time on unnecessary operations. I now went through the changes and cleaned them up a bit, because I'd like to merge some/most/all (?) of them for the next time we need to bulk-process the bridge descriptor archive. I'll post a branch once I have a ticket number.
We should discuss which of these commits should go in by default (maybe ed48f03, ae5c53c, and e514d30?), which should only be enabled in a special bulk-processing mode (maybe df96751, 27cbfc8, and 68b29c2?), which should have their own config option (ugh!), or which we drop because we don't need as badly for processing descriptors in bulk.
Clearly, these commits need work, but I figured it's better to clean them up a bit now than attempt to do that in four or eight weeks. Branch follows in a minute.https://gitlab.torproject.org/legacy/trac/-/issues/20325Perform available space check using the partition recent is located on2020-06-13T17:51:10ZiwakehPerform available space check using the partition recent is located onCurrently, the root path of the configured 'recent' directory is used when measuring the available space.
This might not make sense in situations when partitions are mounted elsewhere.
For example:
```
/dev/sda1 on / (just the os)
...Currently, the root path of the configured 'recent' directory is used when measuring the available space.
This might not make sense in situations when partitions are mounted elsewhere.
For example:
```
/dev/sda1 on / (just the os)
/dev/sdb1 on /data (here the 'recent' folder resides somewhere)
```
In this case sda1 might be small and sdb1 could be the huge data partition. Thus, the measurement of free space for `/` is useless.
Suggestion:
(cf. comments)
Last resort: add `CheckSpacePath` property and default to the root of the recent path in case it doesn't exist.https://gitlab.torproject.org/legacy/trac/-/issues/20345Add support for synchronizing microdescriptors from another instance2020-06-13T17:51:12ZiwakehAdd support for synchronizing microdescriptors from another instancecf. comments 30+ in #18910
microdescs are stored according to the referencing microcons valid after time, which cannot be inferred from the microdescriptor itself.cf. comments 30+ in #18910
microdescs are stored according to the referencing microcons valid after time, which cannot be inferred from the microdescriptor itself.https://gitlab.torproject.org/legacy/trac/-/issues/20350Replace create-tarball.sh shell script with Java module2020-06-13T17:51:13ZiwakehReplace create-tarball.sh shell script with Java moduleThis [script's](https://gitweb.torproject.org/collector.git/tree/src/main/resources/create-tarballs.sh) should be transferred to java.
The new `createtars` module should:
* provide at least the functionality of the script
* be configur...This [script's](https://gitweb.torproject.org/collector.git/tree/src/main/resources/create-tarballs.sh) should be transferred to java.
The new `createtars` module should:
* provide at least the functionality of the script
* be configurable as other CollecTor modules
* not impede other modules
Please collect more features and functionality that the script can't/doesn't provide, but which should be part of this module in the comments below.CollecTor 2.0.0https://gitlab.torproject.org/legacy/trac/-/issues/20351Turn the updateindex module into a function that runs after each module run2020-06-13T17:51:13ZiwakehTurn the updateindex module into a function that runs after each module runOnce, #20350 is in place. updateindex can be plainly added as a function that runs after each module run.Once, #20350 is in place. updateindex can be plainly added as a function that runs after each module run.CollecTor 1.7.0