Collector issueshttps://gitlab.torproject.org/tpo/network-health/metrics/collector/-/issues2024-01-10T13:59:08Zhttps://gitlab.torproject.org/tpo/network-health/metrics/collector/-/issues/40037stats/references file getting to big2024-01-10T13:59:08ZHirostats/references file getting to bigIt seems collector writes a big references file to track what archives have been created. The file is 3.4GB and is kept all in memory right now. I think we should find a way to keep the file smaller somehow as this can add on the memory ...It seems collector writes a big references file to track what archives have been created. The file is 3.4GB and is kept all in memory right now. I think we should find a way to keep the file smaller somehow as this can add on the memory issues we have on the machine.HiroHirohttps://gitlab.torproject.org/tpo/network-health/metrics/collector/-/issues/40035re-evaluate how webstats are published2023-12-12T14:26:30ZKezre-evaluate how webstats are publishedi think we should re-evaluate how webstats are published. the current method of just publishing tarballs of apache logs is error prone, and one could accidentally create a PII leak without realizing it. i opened tpo/tpa/team#41432 becaus...i think we should re-evaluate how webstats are published. the current method of just publishing tarballs of apache logs is error prone, and one could accidentally create a PII leak without realizing it. i opened tpo/tpa/team#41432 because i believed i had accidentally caused one such PII leak. i confirmed that there was no leak, but i think the point remains that it's possible to very easily accidentally cause a problem.
i'm also not sure how useful these raw access logs are. one would have to download a tarball (possibly several), un-tar it, recursively `unxz` the contents, then parse the logs themselves before being able to do any kind of analysis with them. i think there has to be a better way to handle and publish these stats.https://gitlab.torproject.org/tpo/network-health/metrics/collector/-/issues/40030Rewrite collector in rust2023-10-25T06:34:25ZHiroRewrite collector in rustOur current metrics service are often slow and run legacy java code. A good part of our code handles routines and interactions (like log rotation and archive creation) that could be replaced with optimized libraries in a more modern (and...Our current metrics service are often slow and run legacy java code. A good part of our code handles routines and interactions (like log rotation and archive creation) that could be replaced with optimized libraries in a more modern (and more suitable for the task) language.
I think collector could be rewritten in rust for the most part, especially now that arti supports parsing descriptor data. We might have to still rely on some java code dependent on metrics library, but we might put that in a jar and replace it over time.
This issue shall be used to track planning and progression towards rewriting collector in rust.https://gitlab.torproject.org/tpo/network-health/metrics/collector/-/issues/40029Clean up torperf archive and stop archiving "new" data2024-03-21T17:14:43ZGeorg KoppenClean up torperf archive and stop archiving "new" dataTorperf is long gone but we are still archiving "new" data, see: https://metrics.torproject.org/collector/archive/torperf/. We should stop that and remove the "archives" from 2020-06 on (inclusive).Torperf is long gone but we are still archiving "new" data, see: https://metrics.torproject.org/collector/archive/torperf/. We should stop that and remove the "archives" from 2020-06 on (inclusive).HiroHirohttps://gitlab.torproject.org/tpo/network-health/metrics/collector/-/issues/40020Make it easier to deploy collector2023-01-23T15:04:45ZHiroMake it easier to deploy collectorAs with all metrics services to deploy collector, currently, we have to create a release and copy the files to the VM where the service run.
I think we should use puppet and git to deploy metrics services in a way that is more automated.As with all metrics services to deploy collector, currently, we have to create a release and copy the files to the VM where the service run.
I think we should use puppet and git to deploy metrics services in a way that is more automated.https://gitlab.torproject.org/tpo/network-health/metrics/collector/-/issues/40014Repeated bandwidth files with same timestamp and votes with the same bandwidt...2023-01-23T15:03:27ZjugaRepeated bandwidth files with same timestamp and votes with the same bandwidth file timestampTo work on several things related to bwauths, as tpo/network-health/metrics/analysis#33077, we need to know which bandwidth file correspond to which bwauth. The only way to be certain so far is to look at the vote bandwidth-file-headers ...To work on several things related to bwauths, as tpo/network-health/metrics/analysis#33077, we need to know which bandwidth file correspond to which bwauth. The only way to be certain so far is to look at the vote bandwidth-file-headers timestamp and see which bandwidth file has the same timestamp (it could also be done with the bandwidth file digest), until we implement tpo/network-health/sbws#40071.
After decompressing https://collector.torproject.org/archive/relay-descriptors/bandwidths/bandwidths-2021-09.tar.xz i found that there are 2 files with timestamp 1630670637 (maybe this also happen with other timestamps, but i've not checked):
- 2021-09-03-12-03-57-bandwidth-1F47938127C2BC6BD2C3A43F3364153808F691ACC503995ABE5AF6085B56F21A
- 2021-09-03-12-03-57-bandwidth-E75743E733946CB570382FF3FA63702EF7B805227F69F7C93D909F985D762F78
The second one has 2 lines less, though i've not checked which ones.
While is possible that a bwauth doesn't generate a new bandwidth file for a next vote cause the scanner might have died and use the last one, i don't think it makes sense that it'll have the same timestamp but differ in content.
Searching for that timestamp in the votes inside https://collector.torproject.org/archive/relay-descriptors/votes/votes-2021-09.tar.xz, i also found 2:
- 2021-09-03-17-35-27-bandwidth-33F3779A0DC41794B42B507DAE65AABA0E919AFD2090528A10A41423224FABA7 (with `bandwidth-file-digest sha256=51dD5zOUbLVwOC/z+mNwLve4BSJ/affJPZCfmF12L3g`)
- 2021-09-03-17-35-57-bandwidth-DA019C95D0D4C205DD1F5170705E46470BB1A65034C8CF7F12FA82EA40D80B33 (with `bandwidth-file-digest sha256=H0eTgSfCvGvSw6Q/M2QVOAj2kazFA5lavlr2CFtW8ho`)
It's possible (haven't check the digest) that each bandwidth file (with same timestamp) was used in each vote.
In this case the bwauth is Faravahar. It'd be great to see whether this only happen for this bwauth (a bwauth problem?) or the problem come from Torflow, sbws too or collector.https://gitlab.torproject.org/tpo/network-health/metrics/collector/-/issues/40007onionperf: No longer need to download tpf files2022-09-28T07:02:04Zirlonionperf: No longer need to download tpf filesThe OnionPerf module still has residual code related to the download of *.tpf files, which are no longer produced by modern OnionPerf. This code could be removed, and in the process might make the JSON downloading code that remains more ...The OnionPerf module still has residual code related to the download of *.tpf files, which are no longer produced by modern OnionPerf. This code could be removed, and in the process might make the JSON downloading code that remains more robust.
Relevant code: https://gitlab.torproject.org/tpo/metrics/collector/-/blob/master/src/main/java/org/torproject/metrics/collector/onionperf/OnionPerfDownloader.javaHiroHirohttps://gitlab.torproject.org/tpo/network-health/metrics/collector/-/issues/33502Do not let appended descriptor files grow too large2021-10-05T14:44:34ZKarsten LoesingDo not let appended descriptor files grow too largeI revisited legacy/trac#20395 last week. The issue is that metrics-lib cannot handle large descriptor files, because it first reads the entire file into memory before splitting it into single descriptors and parsing them. While it would ...I revisited legacy/trac#20395 last week. The issue is that metrics-lib cannot handle large descriptor files, because it first reads the entire file into memory before splitting it into single descriptors and parsing them. While it would be possible to parse large descriptor files after making some major code changes (using `FileChannel` and doing lazy parsing), I don't think that we have to do that. After all, we're writing these large descriptor files ourselves in CollecTor, and it's up to us to stop doing that.
Going back in time, the original reason for concatenating multiple descriptors into a single file was that rsyncing many tiny files from one host to another host was just slow. So we appended server descriptors and extra-info descriptors into a single file. This works well with server descriptors or extra-info descriptors published within 1 hour or even 10 hours. It does not work that well anymore with all server descriptors or extra-info descriptors synced from another CollecTor instance when starting a new instance (legacy/trac#20335). It works even less well when importing one or more monthly tarballs containing server descriptors or extra-info descriptors (legacy/trac#27716).
My suggestion is that we define a configurable limit for appended descriptor files of, say, 20 MiB. And when storing a descriptor, we check whether appending a descriptor to an existing descriptor file would exceed this limit and start a new descriptor file in that case.
There are some technical details to work out, but I think they can be solved. I also don't expect this to produce a lot of code, not even complex code changes. The benefit would be that we could resolve legacy/trac#20395 and legacy/trac#27716 by implementing this.
Thoughts on the general idea?https://gitlab.torproject.org/tpo/network-health/metrics/collector/-/issues/31695Allow pushing Metrics to CollecTor from trusted endpoints2020-12-01T10:53:51ZirlAllow pushing Metrics to CollecTor from trusted endpointsSwitch from pull to push model for archiving OnionPerf data: Another aspect related to collecting data is that, right now, data collection works by periodically pulling new .tpf files from known OnionPerf instances. This has at least two...Switch from pull to push model for archiving OnionPerf data: Another aspect related to collecting data is that, right now, data collection works by periodically pulling new .tpf files from known OnionPerf instances. This has at least two problems: there's a delay between OnionPerfs producing new files and CollecTor pulling them, and adding new instances requires editing a config file on the CollecTor host. Maybe we can switch to a push model where CollecTor accepts measurements from any OnionPerf instance, and CollecTor clients like the Tor Metrics website decide which measurements to aggregate and visualize. Note that switching to a push model requires installing some basic authentication mechanisms like cryptographic identities and signatures, in order to prevent anyone from pushing wrong data, overwriting correct data, or even storing arbitrary data.https://gitlab.torproject.org/tpo/network-health/metrics/collector/-/issues/28324Extend CollecTor to fetch recent, non-current consensuses and votes2023-01-23T14:54:17ZKarsten LoesingExtend CollecTor to fetch recent, non-current consensuses and votesThere are discussions to extend dir-spec to serve recent, non-current consensuses and votes (legacy/trac#21378). As of now, only the most recent, current consensus and votes are available, as well as the next ones, 5-10 minutes before th...There are discussions to extend dir-spec to serve recent, non-current consensuses and votes (legacy/trac#21378). As of now, only the most recent, current consensus and votes are available, as well as the next ones, 5-10 minutes before they become valid.
This extension is fantastic news, because we currently rely on CollecTor to run once per hour. And if it doesn't, we'd be missing the consenus and votes from that hour. We can compensate temporary failures to some extent by having two CollecTor instances running and synchronizing missing descriptors. But ideally, we'd be able to fetch previous consensuses and votes from the Tor directories.
This is currently blocking on legacy/trac#21378. But as soon as that ticket is resolved, we can start extending CollecTor to fetch recent, non-current consensuses and votes.https://gitlab.torproject.org/tpo/network-health/metrics/collector/-/issues/26089collect and archive DNS resolver data of tor exits2023-01-23T13:58:30Zcypherpunkscollect and archive DNS resolver data of tor exitscontext:
https://medium.com/@nusenu/who-controls-tors-dns-traffic-a74a7632e8ca
"
5. Add DNS related information to Relay Search (a long term item)
It would be nice and probably effective to have information about DNS resolvers show up ...context:
https://medium.com/@nusenu/who-controls-tors-dns-traffic-a74a7632e8ca
"
5. Add DNS related information to Relay Search (a long term item)
It would be nice and probably effective to have information about DNS resolvers show up on Relay Search, because it is a popular tool for relay operators to check on their relay state. Operators could easily see if they use any less desirable DNS resolvers if that information is shown on Relay Search. That way we could even reach operators who have no or invalid ContactInfo data, but multiple steps are required before this could happen:
The currently unavailable data needs to be
collected and regularly updated
"https://gitlab.torproject.org/tpo/network-health/metrics/collector/-/issues/24431Provide fallback mirror lists2021-10-05T14:38:35ZiwakehProvide fallback mirror listsRetrieve and provide fallback mirror lists (for details see parent).
Steps:
* determine an automated way to retrieve the latest list in a timely manner, i.e., as soon as it is used.
* create a CollecTor module for retrieving and storing...Retrieve and provide fallback mirror lists (for details see parent).
Steps:
* determine an automated way to retrieve the latest list in a timely manner, i.e., as soon as it is used.
* create a CollecTor module for retrieving and storing the lists (a parser will be provided by metrics-lib).https://gitlab.torproject.org/tpo/network-health/metrics/collector/-/issues/22834Decide what to do with UnparseableDescriptors while synchronizing from anothe...2023-10-02T08:38:43ZiwakehDecide what to do with UnparseableDescriptors while synchronizing from another instanceLet's find out, if a code change is needed or things are fine as they are.
Current situation when one CollecTor fetches data from another CollecTor:
Metrics Lib cannot parse the fetched content, provides an object of `UnparseableDescrip...Let's find out, if a code change is needed or things are fine as they are.
Current situation when one CollecTor fetches data from another CollecTor:
Metrics Lib cannot parse the fetched content, provides an object of `UnparseableDescriptor`, and CollecTor simply ignores this object as unknown (only logs on trace level).
During normal collecting an unparseable descriptor is stored anyway.https://gitlab.torproject.org/tpo/network-health/metrics/collector/-/issues/21515Add auxiliary data on Tor relays and bridges to CollecTor2020-12-02T09:38:33ZKarsten LoesingAdd auxiliary data on Tor relays and bridges to CollecTorThis ticket is the result of a local TODO list review and combines a few related ideas. Some of the ideas here are new, but some are really old and have been sitting on my list forever.
The general idea here is that CollecTor could pro...This ticket is the result of a local TODO list review and combines a few related ideas. Some of the ideas here are new, but some are really old and have been sitting on my list forever.
The general idea here is that CollecTor could provide auxiliary data on Tor relays and bridges. The main goal would be that other applications like Onionoo and Metrics but also Nyx can use this data to provide richer information on relays and bridges to their users. A secondary goal would be that CollecTor would serve as an archive for this data for future applications that don't exist yet.
Auxiliary data might include:
1. GeoIP country database: This is the same data as the Tor daemon uses internally to resolve relay IP addresses to country codes. We would be able to produce historical data by extracting `src/config/geoip` files from the Tor daemon Git repository. This data could be used by Metrics to bring back the relays by country graph.
2. GeoIP city database: This data would be the same as Onionoo uses to resolve relay IP addresses to city names. The main advantage of having this file in CollecTor would be that Onionoo could automatically pull this data instead of relying on the operator to update GeoIP files.
3. GeoIP ASN database: This is similar to 2 but for ASN information.
4. Bridge GeoIP country database: Here's an idea to provide country information for bridges despite replacing IP addresses by hashes. CollecTor could keep a list of all bridge IP addresses in a given month and use the GeoIP country database from 1 to produce a custom database for resolving bridge IP addresses to country codes. Basically, that database would contain hashed fingerprints, 10.x.y.z IP addresses, and country codes. CollecTor would add a new line to this file whenever it observes a new bridge IP address, which would happen once per hour in particular at the beginning of a month. This file would change once per month when hashes for 10.x.y.z addresses change. However, this means that we'd have to reprocess the entire bridge tarball archive to generate older database files, because we have long deleted the inputs for generating those old 10.x.y.z IP addresses. Consumers of this data would be Onionoo but also Metrics for a new bridge country graph.
5. Relay reverse DNS entries: Right now, Onionoo runs its own rDNS resolver. But we could as well run that as part of CollecTor and provide the output data in a new data format to everyone who needs it. There would also be other consumers of this data, including the relay controller Nyx which would be display rDNS entries without risking to leak who is fetching that information.
This is a lot, but maybe there's even more. It's probably useful to discuss these different new data sets together. Once we decide we want to provide some or even all of them we should switch to child tickets. And just to set expectations right, it's probably going to take months to find enough time to implement these new data sets, if we think it's a good idea.https://gitlab.torproject.org/tpo/network-health/metrics/collector/-/issues/21087Separate truncated descriptor(s) from next complete descriptor2023-01-23T15:00:24ZDamian JohnsonSeparate truncated descriptor(s) from next complete descriptorHi Karsten, a user reached out to me because Stem's validator warns about a [CollecTor tarball](https://collector.torproject.org/archive/relay-descriptors/server-descriptors/server-descriptors-2016-09.tar.xz). In particular it's surprise...Hi Karsten, a user reached out to me because Stem's validator warns about a [CollecTor tarball](https://collector.torproject.org/archive/relay-descriptors/server-descriptors/server-descriptors-2016-09.tar.xz). In particular it's surprised by @source annotations in the server descriptors.
Here's the *server-descriptors-2016-09/2/2/228e3ecf654e1b7b4f01a0027e599e7ba14b216c* descriptor from the tarball for an example...
```
@type server-descriptor 1.0
router sauronkingofmortor 137.74.116.214 9001 0 9030
identity-ed25519
-----BEGIN ED25519 CERT-----
AQQABkAYAXe8xhBhoRVgI2ZswouGG50gLzYsWudXIp96bCAloSStAQAgBADs9XUH
7zgiFd+mjPWwFLUpvma8qvdtChcgp4K6WDDnU6ub3BDNZ7nGTDvYPHVmq4URzobG
uAsjOIPlf1vkU3YJdpBe0KGHy5JeuJ10TDQwlK1F761pSApIdH1ocIg4oAE=
-----END ED25519 CERT-----
master-key-ed25519 7PV1B+84IhXfpoz1sBS1Kb5mvKr3bQoXIKeCulgw51M
platform Tor 0.2.8.7 on Linux
protocols Link 1 2 Circuit 1
published 2016-09-15 09:23:41
fingerprint 2D8A FA91 2E2B 8623 BB2C DACD 1933 2209 D524 D1A3
uptime 860586
bandwidth 12288000 12288000 7792456
extra-info-digest 2017D54A2C28B100CE173351E0799E15153B703B D2vKVNwaxArp6bf11NWPRNoYGQ0lBgIwziSXNkL9TCw
onion-key
-----BEGIN RSA PUBLIC KEY-----
MIGJAoGBAMNJzNJiDwd8y7ge4aXjkUCBKDncNhC91i5SQkxTHX4ZR/05+/liwR5O
TPgoIG0FDQSEUMYDPY92XsRmgPXkpHBSga0ojrhwnYutXAPMRuT4Dm24kpJctdbG
kwW6aovjNcoeJE3iB5ahUCv/TDnuiijioRSfjTPQsW68gHo1rOxJAgMBAAE=
-----END RSA PUBLIC KEY-----
signing-key
-----BEGIN RSA PUBLIC KEY-----
MIGJAoGBAJhvAVj6wurlz3khW1Z/2x8sAnyr9lBdiHMp8UEAhYw+7ct1fdmuZXbA
I9aZbb7GEgR9UBW67qYd0aN1XHbDwb4OvAW+TOzcCjBmqiSLl5WACl0wIjuif7++
xNVcRw04kmmbBf7IyjmmuCc6ihjGeG02aREitZGBSkyZwt8SAz0fAgMBAAE=
-----END RSA PUBLIC KEY-----
onion-key-crosscert
-----BEGIN CROSSCERT-----
ct5RfDtMM5h5G6T6pFkRANCsJGcjwpPK+b47yWoQSdH7C0Y4yjWX5Z48l511fPK6
1v4IINEnuiCMkDp4HGpSW87aHatUaWP6MVo6pwQB2uqi8SpjPdlf6pJfSYNsvaZh
00P6ENAXzDnFFvcNla0WI7o6rIE2tuP3qd7bxazACUU=
-----END CROSSCERT-----
ntor-onion-key-crosscert 0
-----BEGIN ED25519 CERT-----
AQoABj/6Aez1dQfvOCIV36aM9bAUtSm+Zryq920KFyCngrpYMOdTANd0d0EMe6BU
CZrDB67jdOEX8P0T1MY1razuVMyvAjS1MPsM/F7uvCvgf1Su4NJFodWWPGLXWnHZ
RFSpVcHmmg8=
-----END ED25519 CERT-----
hidden-service-dir
contact luciole <luciolesauronkingofmortor@yopmail.com>
ntor-onion-key lhvzaL7Ze85GFMWMQscMgIt9IOx6srmOiXqD85kOekI=
reject *:@uploaded-at 2016-09-15 09:24:06
@source "82.1.128.70"
router torbeornottorbe 82.1.128.70 9001 0 0
platform Tor 0.2.4.27 on Linux
protocols Link 1 2 Circuit 1
published 2016-09-15 09:24:06
fingerprint C6EE 9826 7F82 962C C2FC 1E9E 2AE5 F317 B2D2 D6F0
uptime 762082
bandwidth 1024000 2048000 106721
extra-info-digest 08D7C6A9FF860F6A5D12FB43BD2051ACC06BCE52
onion-key
-----BEGIN RSA PUBLIC KEY-----
MIGJAoGBAMb/ajivr7C1z7cnVSz4dPe+T0cOvB6ickNb8vjquDM8eZh7mLecSACT
H1D5DO97aJ0L1Bw5oOLzU77zx/2e/UUnHftiyZ8sNLmAE7smgEdUvhqNZSY+VSgN
E1Qyc6CdBpJWdSRp1+/AbYq0XWXMTrkb7YvRyR0iuYDn03s82DU/AgMBAAE=
-----END RSA PUBLIC KEY-----
signing-key
-----BEGIN RSA PUBLIC KEY-----
MIGJAoGBAK5CqRjTHbA+AHxLqSCWoEOpiVUNqpdiEUVTvdmu7aQgPcR2VI/fS/oc
tPmfC6L0l4eL0u1zZzbxJ8z5mop0M+0Wss8gWdpO7t7MNHu/GJ78gRRhb6Yz2JQf
jTZcVGyDsI8PJZoH+if3slVCUcq14zy85hb9sF9spaDhTEBbhx+rAgMBAAE=
-----END RSA PUBLIC KEY-----
family $5A8B78AB293475D6D55F1CBFA5D2A1CEEB09545B $EAE900D1DB28D56F4535C06F1BAEB92B9E3BFEE6
hidden-service-dir
contact A36F 07B9 285A C895 3E42 69F3 0CCC 0AF6 2FEC DB6E Random Person <nae AT blueyonder dot co dot uk>
ntor-onion-key 90dT83YmTzH/uojnATf+KOtwJssKGURO/qdu3SR0XgE=
reject *:*
router-signature
-----BEGIN SIGNATURE-----
PohhIu5DPg4iK+5AV3/sLMbpiwCItMbnaNVWrve9nKXyHM18eskYpL1sLyj7/3Nk
YKmFheD/alawStTr3rHkopdR8yj+1LZmWPlSHTy3x/U+uAzQl+66YcECEdw1xKMY
oaYngrHlZSrCEgwDKwIS4GJ/rOYjGUl0HCC9z0OaZ5M=
-----END SIGNATURE-----
```
Seems this is two descriptors concatenated together with a @source in the middle? Any hints on where these come from?
Thanks!https://gitlab.torproject.org/tpo/network-health/metrics/collector/-/issues/20421Investigate invalid descriptors in out/ and recent/ subdirectories2023-01-23T15:00:17ZKarsten LoesingInvestigate invalid descriptors in out/ and recent/ subdirectoriesI just found more bad descriptors like the one mentioned in legacy/trac#20412 by looking through Onionoo logs. This might be a bigger problem than first thought.
I don't have the time to investigate this now, but it looks like we might...I just found more bad descriptors like the one mentioned in legacy/trac#20412 by looking through Onionoo logs. This might be a bigger problem than first thought.
I don't have the time to investigate this now, but it looks like we might not be flushing a buffer properly somewhere. For example:
```
hidden-@uploaded-at 2016-10-19 06:53:16
```
The `hidden-` part still belongs to the previous descriptor and then we start writing `@uploaded-at 2016-10-19 06:53:16` as the first line of the next descriptor. Note that we wouldn't include that line under normal circumstances. How does it get there?
I'll attach four files from CollecTor's `recent/` subdirectory which would otherwise be deleted soon.
I'm afraid we'll not only have to fix this bug but also go through the tarballs from the past months and see if those contain invalid descriptors. Hopefully it's a new bug.https://gitlab.torproject.org/tpo/network-health/metrics/collector/-/issues/20350Replace create-tarball.sh shell script with Java module2023-01-23T14:51:08ZiwakehReplace create-tarball.sh shell script with Java moduleThis [script's](https://gitweb.torproject.org/collector.git/tree/src/main/resources/create-tarballs.sh) should be transferred to java.
The new `createtars` module should:
* provide at least the functionality of the script
* be configur...This [script's](https://gitweb.torproject.org/collector.git/tree/src/main/resources/create-tarballs.sh) should be transferred to java.
The new `createtars` module should:
* provide at least the functionality of the script
* be configurable as other CollecTor modules
* not impede other modules
Please collect more features and functionality that the script can't/doesn't provide, but which should be part of this module in the comments below.CollecTor 2.0.0https://gitlab.torproject.org/tpo/network-health/metrics/collector/-/issues/20236Make changes to bridgedescs module for bulk-processing tarballs2020-12-01T15:08:07ZKarsten LoesingMake changes to bridgedescs module for bulk-processing tarballsI recently finished re-processing the entire bridge descriptor archive for legacy/trac#19317. However, I had to make some changes to avoid running out of memory or wasting time on unnecessary operations. I now went through the changes ...I recently finished re-processing the entire bridge descriptor archive for legacy/trac#19317. However, I had to make some changes to avoid running out of memory or wasting time on unnecessary operations. I now went through the changes and cleaned them up a bit, because I'd like to merge some/most/all (?) of them for the next time we need to bulk-process the bridge descriptor archive. I'll post a branch once I have a ticket number.
We should discuss which of these commits should go in by default (maybe ed48f03, ae5c53c, and e514d30?), which should only be enabled in a special bulk-processing mode (maybe df96751, 27cbfc8, and 68b29c2?), which should have their own config option (ugh!), or which we drop because we don't need as badly for processing descriptors in bulk.
Clearly, these commits need work, but I figured it's better to clean them up a bit now than attempt to do that in four or eight weeks. Branch follows in a minute.https://gitlab.torproject.org/tpo/network-health/metrics/collector/-/issues/20098Make reference checker more accurate2023-01-23T14:51:05ZKarsten LoesingMake reference checker more accurateAs of February this year we're using a reference checker to spot missing descriptors that reads files in `recent/relay-descriptors/` and warns if too many referenced descriptors cannot be found.
However, our reference checker has been t...As of February this year we're using a reference checker to spot missing descriptors that reads files in `recent/relay-descriptors/` and warns if too many referenced descriptors cannot be found.
However, our reference checker has been too noisy for me to pay much attention.
I didn't look at the logs in detail yet, but I came up with a possible improvement: we should only count an extra-info descriptor as missing if the referencing server descriptor is referenced from a consensus or vote. This is supposed to exclude all extra-info descriptors that are referenced from server descriptors uploaded to the directory authorities by bogus relays without also uploading the corresponding extra-info descriptors.
Maybe there are other tweaks that make these warnings more accurate and again worth checking by the operator.https://gitlab.torproject.org/tpo/network-health/metrics/collector/-/issues/19834Rethink how we handle issues while sanitizing bridge descriptors2023-01-23T15:00:57ZKarsten LoesingRethink how we handle issues while sanitizing bridge descriptorsThe bridge descriptor sanitizer parses tarballs containing non-sanitized bridge descriptors, modifies their content by removing bridge IP addresses and other sensitive parts, and writes sanitized versions of those bridge descriptors to d...The bridge descriptor sanitizer parses tarballs containing non-sanitized bridge descriptors, modifies their content by removing bridge IP addresses and other sensitive parts, and writes sanitized versions of those bridge descriptors to disk.
The sanitizer needs to recognize the lines contained in bridge descriptors to distinguish between lines that must be changed and others that can be kept unchanged, and it needs to be able to understand the exact format of certain lines in order to sanitize their contents.
This process can go wrong in various ways, and we need to decide how to handle those situations. Possible situations are:
1. A tarball is malformed or can otherwise not be opened.
2. A tarball contains one or more files that cannot be opened.
3. A tarball file contains an unknown descriptor type.
4. An internal problem prohibits sanitizing descriptor parts (e.g., missing secret for sanitizing IP address).
5. A descriptor is missing parts that are required for properly sanitizing its contents.
6. A descriptor contains an unrecognized line.
7. A descriptor line doesn't follow the expected format, contains fewer or more arguments, etc.
Possible ways of handling such situations are:
A. Skip a line we don't understand and keep the rest of the descriptor.
B. Skip a descriptor.
C. Skip the file contained in the tarball and continue with the next.
D. Abort processing the tarball.
E. Skip the entire tarball, including discarding any descriptors processed before running into the problem, and attempt to process the tarball again in the next execution.
F. Abstain from processing a given descriptor type until a problem has been resolved.
G. Discard any descriptors processed in a tarball until running into the problem, abort the current execution, and refuse starting the next execution until the problem has been resolved.
H. (in addition to A-G). Inform the operator by logging the problem.
I. (in addition to A-G). Warn the operator and ask them to resolve the problem.
Looking at this list, I think that my preferred ways of handling problems would be something like:
- B+H in situations 5, 6, and 7;
- E+I in situations 1, 2, and 3; and
- G+I in situation 4.
That's not exactly what we're currently doing. And I'm not even sure if somebody else operating a CollecTor instance with the bridgedescs module would have the same preferences.
Let's discuss!