The Tor Project issueshttps://gitlab.torproject.org/groups/tpo/-/issues2022-03-04T13:01:44Zhttps://gitlab.torproject.org/tpo/network-health/analysis/-/issues/14Number of direct users mysteriously spikes in US & NL2022-03-04T13:01:44ZcypherpunksNumber of direct users mysteriously spikes in US & NL
![https://metrics.torproject.org/userstats-relay-country.png?start=2019-11-08&end=2020-02-06&country=us&events=off](https://metrics.torproject.org/userstats-relay-country.png?start=2019-11-08&end=2020-02-06&country=us&events=off)
![h...
![https://metrics.torproject.org/userstats-relay-country.png?start=2019-11-08&end=2020-02-06&country=us&events=off](https://metrics.torproject.org/userstats-relay-country.png?start=2019-11-08&end=2020-02-06&country=us&events=off)
![https://metrics.torproject.org/userstats-relay-country.png?start=2019-11-08&end=2020-02-06&country=nl&events=off](https://metrics.torproject.org/userstats-relay-country.png?start=2019-11-08&end=2020-02-06&country=nl&events=off)
Seems to follow the same pattern as well. I've checked the other top 10 countries but none seems to be affected by this.https://gitlab.torproject.org/tpo/network-health/team/-/issues/178Number of direct users mysteriously spikes in US & NL2022-03-04T13:01:44ZcypherpunksNumber of direct users mysteriously spikes in US & NL
![https://metrics.torproject.org/userstats-relay-country.png?start=2019-11-08&end=2020-02-06&country=us&events=off](https://metrics.torproject.org/userstats-relay-country.png?start=2019-11-08&end=2020-02-06&country=us&events=off)
![h...
![https://metrics.torproject.org/userstats-relay-country.png?start=2019-11-08&end=2020-02-06&country=us&events=off](https://metrics.torproject.org/userstats-relay-country.png?start=2019-11-08&end=2020-02-06&country=us&events=off)
![https://metrics.torproject.org/userstats-relay-country.png?start=2019-11-08&end=2020-02-06&country=nl&events=off](https://metrics.torproject.org/userstats-relay-country.png?start=2019-11-08&end=2020-02-06&country=nl&events=off)
Seems to follow the same pattern as well. I've checked the other top 10 countries but none seems to be affected by this.https://gitlab.torproject.org/tpo/network-health/team/-/issues/130Investigate performance issues for October 20212022-03-04T13:08:43ZGeorg KoppenInvestigate performance issues for October 2021Users report Tor being particularly slow since a couple of days, resulting in slow downloads of files, generally longer waiting times when requesting web pages etc.
This ticket is for investigating what is actually going on and, ideally...Users report Tor being particularly slow since a couple of days, resulting in slow downloads of files, generally longer waiting times when requesting web pages etc.
This ticket is for investigating what is actually going on and, ideally, figuring our a plan to resolve the issue.
There are a couple of indicators we have about things that go wrong:
1) We have onionperf graphs which show the download time of (in this case 5MiB) files over Tor both for public relays:![torperf-public-2021-07-01-2021-10-20-5mb](/uploads/cbad7f37fdc37da68aac8786ea903803/torperf-public-2021-07-01-2021-10-20-5mb.png))
and onion services:![torperf-onion-2021-07-01-2021-10-20-5mb](/uploads/156aa4ed699b12f4a558d98944033b5b/torperf-onion-2021-07-01-2021-10-20-5mb.png)
2) We have data about onion service traffic:![hidserv-rend-relayed-cells-2021-07-01-2021-10-20](/uploads/c142d9d4d902bb037071a39e09248544/hidserv-rend-relayed-cells-2021-07-01-2021-10-20.png)
3) We have data about the advertised/used total bandwidth:![bandwidth-2021-07-01-2021-10-20](/uploads/92d0588a7374a8165a7af5d3485b351a/bandwidth-2021-07-01-2021-10-20.png)Sponsor 61 - Making the Tor network faster & more reliable for users in Internet-repressive placesGeorg KoppenGeorg Koppenhttps://gitlab.torproject.org/tpo/network-health/team/-/issues/122Figure out how much network churn we have, why we have it, and fix it2022-03-04T13:09:42ZGeorg KoppenFigure out how much network churn we have, why we have it, and fix itWe know we have network churn in our network but it's not clear how much, why and how to fix it. This ticket is the start to track this work.We know we have network churn in our network but it's not clear how much, why and how to fix it. This ticket is the start to track this work.https://gitlab.torproject.org/tpo/network-health/team/-/issues/120Get new snapshot of "stream bandwidth by relay weight"2022-03-04T13:10:10ZRoger DingledineGet new snapshot of "stream bandwidth by relay weight"Check out Figure 10 of Mike's original torflow paper: <br>
https://research.torproject.org/techreports/torflow-2009-08-07.pdf
In that paper, Mike found that the fastest relays in the network were providing better bandwidth to clients th...Check out Figure 10 of Mike's original torflow paper: <br>
https://research.torproject.org/techreports/torflow-2009-08-07.pdf
In that paper, Mike found that the fastest relays in the network were providing better bandwidth to clients than the slower relays. Whereas in an ideal world, the load on the network should be balanced such that every relay offers the same experience (because every relay has the appropriate load).
The world has moved on a lot from 2009, including newer bwauths and impacts from many other network dynamics.
We have plans in the future to make even more improvements, for example with Mike's upcoming flow control changes.
Let's:
* (a) get a baseline now, and
* (b) get the tools together to enable us to more easily get more snapshots as we proceed.https://gitlab.torproject.org/tpo/network-health/team/-/issues/113How many relays are leaving the Tor network2022-03-04T13:11:18ZGusHow many relays are leaving the Tor networkAs part of understanding the Tor Community and operators, we want to know how many relays are leaving the Tor network.
If they fill specific criteria (eg, good contactinfo, running more than 1 week, fast), we will want to contact the re...As part of understanding the Tor Community and operators, we want to know how many relays are leaving the Tor network.
If they fill specific criteria (eg, good contactinfo, running more than 1 week, fast), we will want to contact the relay operator and see what was the issue/why they stopped to run their relay. FWIW, Exonerator is just an archive, and here we want the relay operator contact info.https://gitlab.torproject.org/tpo/community/support/-/issues/40067Rotate bridges distributed on frontdesk and cdr.link for Russian users2022-03-04T13:15:05ZGusRotate bridges distributed on frontdesk and cdr.link for Russian usersToday Roskomnadzor blocked some bridges that we were distributing to Russian users on frontdesk and cdr.link.Today Roskomnadzor blocked some bridges that we were distributing to Russian users on frontdesk and cdr.link.Sponsor 125: Rapid Response Fund for Russia censorship circumventionGusGushttps://gitlab.torproject.org/tpo/network-health/team/-/issues/191Understand how accurate the bandwidth authority estimates are2022-03-04T13:16:09ZKarsten LoesingUnderstand how accurate the bandwidth authority estimates are(Re-using text from Roger and Mike for this ticket description.)
It would be good to have a better understanding of how accurate the bandwidth authority estimates are. Why do some really fast relays get huge weights, and other really f...(Re-using text from Roger and Mike for this ticket description.)
It would be good to have a better understanding of how accurate the bandwidth authority estimates are. Why do some really fast relays get huge weights, and other really fast relays don't? Does it have to do with location of the measurers? What exactly is the trade-off between having fast nodes all nearby each other (and nearby the bandwidth authorities) in the network, and having nodes in geographically dispersed places?
We probably should figure out a way for the bandwidth authorities to utilize per-node as well as ambient circuit failure (legacy/trac#7023, legacy/trac#7037).
There's a bunch of related stuff for TCP socket exhaustion, too. All of it probably involves some fairly diligent monitoring of results and experimentation though.https://gitlab.torproject.org/tpo/network-health/team/-/issues/143Measure how often tor clients fetch the consensus2022-03-04T13:18:16ZirlMeasure how often tor clients fetch the consensusWe have an idea of how often this *should* be happening but we don't have (as far as I can see) any data on how often this actually is happening.
This is important because the network team may use the data to identify bugs and reduce lo...We have an idea of how often this *should* be happening but we don't have (as far as I can see) any data on how often this actually is happening.
This is important because the network team may use the data to identify bugs and reduce load, and because our user count data is based on directory fetches and if we have the wrong number here, then our user count is wrong.
This could be achieved by logging fetches from onionperf, where we already have a long running tor daemon, or we could run daemons independently. Running them independently would give us the ability to track how often fetches occur for different versions of tor.https://gitlab.torproject.org/tpo/network-health/team/-/issues/102Emulate different Fast/Guard cutoffs in historical consensuses2022-03-04T13:18:47ZirlEmulate different Fast/Guard cutoffs in historical consensusesThere are many things that we can tune in producing votes and consensuses that will affect the ways that clients use the network, that might result in better load balancing.
We need tools for simulating what happens when we make those c...There are many things that we can tune in producing votes and consensuses that will affect the ways that clients use the network, that might result in better load balancing.
We need tools for simulating what happens when we make those changes, using data (either historical or live) for the public Tor network.
We can consider the MVP for this complete once we have a tool that allows us to take server descriptors and simulate votes and consensus generation using alternate Fast/Guard cutoffs.
Extensions to this would be allowing alternative consensus methods, or other tunables.
By reducing the cost of performing these simulations we can allow faster iteration on ideas that will hopefully allow for better user experience.https://gitlab.torproject.org/tpo/network-health/team/-/issues/39faravahar won't connect to specific IPv6 subnets2022-03-04T13:19:05Zcypherpunksfaravahar won't connect to specific IPv6 subnetsFor a long time faravahar won't connect to relays at specific IPv6 subnets.
At https://consensus-health.torproject.org/ issue can check against other authorities by voted ReachableIPv6 flag.
Manual diff/list of relevant relays without R...For a long time faravahar won't connect to relays at specific IPv6 subnets.
At https://consensus-health.torproject.org/ issue can check against other authorities by voted ReachableIPv6 flag.
Manual diff/list of relevant relays without ReachableIPv6 flag from faravahar vote:
-r Aleya n+cQbOAjrys3+LMlf8DDJQqYctc 84.160.43.144 10001 0 [2003:e8:5f0e:2d00:cad:d6df:f324:b4ba]:10002
+r Aleya n+cQbOAjrys3+LMlf8DDJQqYctc 84.160.43.144 10001 0
-r GoofyRooster JDmW5GIYZmwcrd4XtDDqf5UST5Y 92.223.105.93 443 0 [2a03:90c0:83:2908::19]:443
+r GoofyRooster JDmW5GIYZmwcrd4XtDDqf5UST5Y 92.223.105.93 443 0
-r KarlRanseieristTot zJd8rHkawDunQtYbmkB04aB1Caw 87.140.73.81 9001 0 [2003:a:1109:5100:6a05:caff:fe12:9111]:9001
+r KarlRanseieristTot zJd8rHkawDunQtYbmkB04aB1Caw 87.140.73.81 9001 0
-r LostArkLux1 85Nvl8FO/P0YFU30RDuR/t0HH0c 92.38.163.83 443 9030 [2a03:90c0:83:2908::1c4]:443
+r LostArkLux1 85Nvl8FO/P0YFU30RDuR/t0HH0c 92.38.163.83 443 9030
-r MikeDiaIsGone gCKKtcvQrH6RHmlbzu2GBFKP680 80.147.218.187 8001 0 [2003:a:b5a:6100:11:32ff:fe2b:36b4]:8001
+r MikeDiaIsGone gCKKtcvQrH6RHmlbzu2GBFKP680 80.147.218.187 8001 0
-r Morpheus S0g/V9lmDr8z+jzKZw/9G7sPMVo 92.223.105.117 443 0 [2a03:90c0:83:2908::7a]:443
+r Morpheus S0g/V9lmDr8z+jzKZw/9G7sPMVo 92.223.105.117 443 0
-r motor u7u61FMmPXhuw0q2igYhQoiRA0U 46.81.13.47 9321 9322 [2003:cc:6f0f:83bd:bbbb:bad4:5326:3d78]:443
+r motor u7u61FMmPXhuw0q2igYhQoiRA0U 46.81.13.47 9321 9322
-r rinderwahnRelay9L koi3W1/4hh7/Mqa+iCXMOKT5+MI 92.38.163.21 443 80 [2a03:90c0:83:2908::101]:443
+r rinderwahnRelay9L koi3W1/4hh7/Mqa+iCXMOKT5+MI 92.38.163.21 443 80
-r SysadmAtNbg bZNQPhUElrsypJEvkExOLbIFJ40 91.39.153.233 9001 9030 [2003:c2:c70f:f400:329c:23ff:fece:a96c]:9001
+r SysadmAtNbg bZNQPhUElrsypJEvkExOLbIFJ40 91.39.153.233 9001 9030
-r Unnamed K7s3ZRxoIUKRJCej8RDg+s9FKaQ 91.7.187.61 9001 0 [2003:d5:8f12:8b00:dea6:32ff:fecb:f015]:9001
+r Unnamed K7s3ZRxoIUKRJCej8RDg+s9FKaQ 91.7.187.61 9001 0Georg KoppenGeorg Koppenhttps://gitlab.torproject.org/tpo/network-health/team/-/issues/76Investigate why hundreds of relays always come back at the first of a month a...2022-03-04T13:19:40ZGeorg KoppenInvestigate why hundreds of relays always come back at the first of a month around 00:00:00```
08:57 <+GeKo> not random, always the first day of the month at about 00:00:00
08:57 <+GeKo> ah, that's the "bug", oaky
08:57 <@armacake> right. they're supposed to see how long it took them to use it
all last time,...```
08:57 <+GeKo> not random, always the first day of the month at about 00:00:00
08:57 <+GeKo> ah, that's the "bug", oaky
08:57 <@armacake> right. they're supposed to see how long it took them to use it
all last time, and then assume they'll take the same interval
this time,
08:57 <@armacake> and then randomly place that interval in the month
08:58 <@armacake> so it will likely skew early, but it should not be 00:00:00,
unless it took them approximately the whole month to use it last
time
09:06 <@armacake> see the file comment at the top of
src/feature/hibernate/hibernate.c for how it's supposed to behave
```https://gitlab.torproject.org/tpo/network-health/team/-/issues/175Investigate non-exit general overload2022-03-04T13:25:37ZGeorg KoppenInvestigate non-exit general overloadWe have [some](https://gitlab.torproject.org/tpo/network-health/team/-/issues/66#note_2770466) [indicators](https://lists.torproject.org/pipermail/tor-relays/2022-January/020184.html) about serious non-exit relay general overload going o...We have [some](https://gitlab.torproject.org/tpo/network-health/team/-/issues/66#note_2770466) [indicators](https://lists.torproject.org/pipermail/tor-relays/2022-January/020184.html) about serious non-exit relay general overload going on. We "solved" the *exit* relay issues by just not using the DNS failure metric anymore (see: #139 for some analysis of the problem). We might need to tune our metrics that get triggered by non-exits as well.
We should probably use our network-health relays in testing and figuring out what is going on.
/cc @dgouletNetwork Health OKRs 2022 Q1-Q2 (Metrics excluded)Georg KoppenGeorg Koppenhttps://gitlab.torproject.org/tpo/network-health/team/-/issues/139The Exit DNS Timeout Problem2022-03-04T13:25:38ZDavid Gouletdgoulet@torproject.orgThe Exit DNS Timeout ProblemSince we added the DNS timeout overload line in relays, it has been popping on the majority of Exits now. The current parameterse in 0.4.7.2-alpha are that over 10 minutes, if 1% of all DNS queries timeout, it triggers that line.
Lookin...Since we added the DNS timeout overload line in relays, it has been popping on the majority of Exits now. The current parameterse in 0.4.7.2-alpha are that over 10 minutes, if 1% of all DNS queries timeout, it triggers that line.
Looking at the top 10 list of Exits, almost half of them are overloaded likely due to DNS timeouts: https://metrics.torproject.org/rs.html#search/flag:exit%20
There are two stories from two Exit relay operators that contacted us about this problem and helped out chase down the problem. I'll go in details with one operator's story and make a note about the second operator.
## AndersTrier
This operator is a well known Exit operator based in .dk and has a large set of Exits. Last week, he showed up with almost 5% DNS timeouts reported by his Exits.
The setup here is that the tor exit node sends its DNS queries to a local Unbound server and so we were able to get a lot of information from Unbound.
The average resolving time was around 9.8 seconds but with a median of 0.07 seconds. In otherwords, 50% of the queries were normal timing below a second but it appears that 5% were so big that they brought the average to almost 10 seconds.
Anders was able to see that anything resolving to the `.by` or `.ua` would simply get no response for 4.5 minutes (apparently some default in Unbound before dumping the query).
So, he switched the Unbound server on another IP that is **not** a Tor IP. The situation got better with roughly 1% to 1.5% timeouts over the last days.
## toralf
As for toralf, he saw roughly the same problem, DNS timeouts go up to 7% with the same setup that is Unbound in front. One other thing that is a bit weird though is that he experimented by emptying its Exit policy to no ports and so DNS queries would stop. Then, he would open these:
```
ExitPolicy accept *:8074 # Gadu-Gadu
ExitPolicy accept *:11371 # OpenPGP hkp (http keyserver protocol)
ExitPolicy accept *:64738 # Mumble
```
And an hour later (likely the consensus getting around with the Exit policy), a flood of DNS requests would arrive to various domains but unrelated domains to these ports like "facebook.com" for "mail.gmail.com".
Intriguing that such requests would end up on those ports and in such numbers so quickly.
## Observations
1. It appears that Tor IPs are getting censored at various DNS levels which were confirmed with ccTLD.
2. Our 1% threshold is likely too low and so we should bump it but "to what" number seems complicated due to "if 20% of your queries go to .by in that 10 minutes, you are overloaded".
3. Seems one solution is to propose operators to put their Unbound on an unrelated Tor IP. This can be difficult as IPv4 are getting scarce and thus expensive...
https://community.torproject.org/relay/setup/exit/
In my opinion, we need to assess the DNS situation on our side and likely on a systematic level. In other words, I think we have to run Exit(s) here and conduct experiment and measurements along with Unbound.
We should also likely have scanners in place that query various domains and ccTLDs in order to learn the state of DNS censorship for Tor users.
Finally, the "overload" DNS timeout threshold should likely be raised but to what value is still unclear to me.David Gouletdgoulet@torproject.orgDavid Gouletdgoulet@torproject.orghttps://gitlab.torproject.org/tpo/network-health/team/-/issues/87When an onion service lookup has failed at the first k HSDirs we tried, what ...2022-03-04T13:28:11ZRoger DingledineWhen an onion service lookup has failed at the first k HSDirs we tried, what are the chances it will still succeed?Right now onion services publish to 8 HSDirs every 1-2 hours (see upload_descriptor_to_all()), and clients fetch from any of the core 6 of those 8.
Right now a double-digit percentage of the onion service lookups in the network result i...Right now onion services publish to 8 HSDirs every 1-2 hours (see upload_descriptor_to_all()), and clients fetch from any of the core 6 of those 8.
Right now a double-digit percentage of the onion service lookups in the network result in failure, i.e. no onion descriptor found. (See upcoming FOCI 2021 paper for data and graph.)
So the question is: if a client has gotten a "404 never heard of it" from five of the hsdirs, does asking the sixth ever help?
If it turns out that it doesn't, we should save time for the user, and save load for the network, and save privacy for the user (fewer circuits, less surface area) by not bothering with that sixth circuit.
More generally, is there a cutoff of request attempts after which it very likely won't help so we shouldn't bother?
(Even if there is a clear cutoff today, it could change tomorrow, so if we add this feature we'd want to have a consensus param, and continue measuring to know if it should change.)
Useful building blocks for this ticket, which started on something similar in the past but got closed before we got there: <br>
https://gitlab.torproject.org/tpo/network-health/metrics/analysis/-/issues/13209 <br>
https://gitlab.torproject.org/tpo/core/tor/-/issues/13208
In those tickets, @dgoulet found that 3% of the time it helps to try a second HSDir, but it never helps to try a third. But I'm not sure if his experiment at the time was broad enough to conclude that we should change Tor's behavior to only try two HSDirs and then give up.https://gitlab.torproject.org/tpo/network-health/onbasca/-/issues/74Catch common bash errors in sbws scripts2022-03-04T13:59:54ZteorCatch common bash errors in sbws scriptsWe're going to gradually update tor, chutney, and fallback-scripts bash scripts to catch more errors. sbws might also want to make similar changes.
I'm not sure if sbws uses shellcheck already. Shellcheck helps catch errors while writin...We're going to gradually update tor, chutney, and fallback-scripts bash scripts to catch more errors. sbws might also want to make similar changes.
I'm not sure if sbws uses shellcheck already. Shellcheck helps catch errors while writing scripts.
To catch more runtime failures, set these options at the start of each script:
```
set -e
set -u
set -o pipefail
```
You might also want to set:
```
IFS=$'\n\t'
```
But it can change how lists are processed.
These settings help catch common errors in bash scripts at runtime:
http://redsymbol.net/articles/unofficial-bash-strict-mode/
But they can cause scripts to fail, so you should have good unit tests and CI for all your scripts, before making these changes.
Follow-up to legacy/trac#33451.onbasca: 1.1https://gitlab.torproject.org/tpo/core/arti/-/issues/376Nightly build-repro job is failing because of ed25519 compatibility2022-03-04T15:36:31ZNick MathewsonNightly build-repro job is failing because of ed25519 compatibilityThe [`build-repro`](https://gitlab.torproject.org/tpo/core/arti/-/jobs/107047) job is failing:
```
Downloaded ed25519 v1.4.0
error: failed to parse manifest at `/usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/ed25519-1.4.0/C...The [`build-repro`](https://gitlab.torproject.org/tpo/core/arti/-/jobs/107047) job is failing:
```
Downloaded ed25519 v1.4.0
error: failed to parse manifest at `/usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/ed25519-1.4.0/Cargo.toml`
Caused by:
feature `edition2021` is required
```
As near as I can tell, this is happening because our Cargo.lock now includes `ed25519` version 1.4.0, which uses edition2021, which wasn't supported until Rust 1.56. Our reproducible build tools, on the other hand, use Rust 1.54.
I propose that we upgrade our reproducible-build tools to the latest Rust (1.59).
As a side-note, do we care if our Cargo.lock file requires a newer rust than our MSRV? I say "no", but we should open another ticket to discuss if so.Nick MathewsonNick Mathewsonhttps://gitlab.torproject.org/tpo/network-health/team/-/issues/185Figure out how different bw auth positions actually affect the measured bw va...2022-03-04T17:34:21ZGeorg KoppenFigure out how different bw auth positions actually affect the measured bw valuesWe believe that bw auths are concentrating too much bandwidth in one area (see: #179). However, it's not clear how exactly different bw auth positions affect the measured bw values. It would be useful if we had some means to investigate ...We believe that bw auths are concentrating too much bandwidth in one area (see: #179). However, it's not clear how exactly different bw auth positions affect the measured bw values. It would be useful if we had some means to investigate that question and then (in a follow-up ticket or so) come up with a better bw measurement plan tuned for our network.
/cc @juga, @mikeperryhttps://gitlab.torproject.org/tpo/tpa/team/-/issues/40546Increasing disk space on tb-build-032022-03-04T19:34:40ZboklmIncreasing disk space on tb-build-03The disk on `tb-build-03` is often full. If there is some disk space available on the machine where `tb-build-03` is running, then it would be useful to add more space for `tb-build-03`, for example 100GB.The disk on `tb-build-03` is often full. If there is some disk space available on the machine where `tb-build-03` is running, then it would be useful to add more space for `tb-build-03`, for example 100GB.Jérôme Charaouilavamind@torproject.orgJérôme Charaouilavamind@torproject.orghttps://gitlab.torproject.org/tpo/core/arti/-/issues/370Alternative DirProvider API can force provider to lie about errors2022-03-04T19:51:23ZIan Jacksoniwj@torproject.orgAlternative DirProvider API can force provider to lie about errorsIn !347 we just added an API to allow an application to provide an alternative to our `dirmgr`, via the new `DirProvider` trait. There are callbacks made into the application's code, and some can return errors: the builder requires the ...In !347 we just added an API to allow an application to provide an alternative to our `dirmgr`, via the new `DirProvider` trait. There are callbacks made into the application's code, and some can return errors: the builder requires the hook to provide an `arti_client::Error`; and the `bootstrap` method on the provider requires the hook to provide a `tor_dirmgr::Error`.
Currently, these types cannot contain errors we have not foreseen. So if the alternative directory provider needs to report some other kind of error, they must, basically, lie, by providing an enum variant containing, perhaps, nonsense. I find this situation unconscionable, even for an extension API for expert users.
We can try to fix this problem by asking the API provider to return some other error type. But of course, we want the API to be the one that dirmgr also provides.
Also, whatever we do, the error from the directory provider is going to end up in a type that `impl HasKind`. This means that the directory provider must choose an `ErrorKind`. But perhaps none of our `ErrorKind`s are right. Again, the directory provider might be forced, by our API, to lie.
We do not know what the provider's own error type(s) may be like. We don't want to make everything monomorphisedly-generic over the provider's error type. So I think we need to store the caller's actual error as `Box<dyn ...>` or `Arc<dyn ...>`.
As for the kind, we need to provide kinds that can be used for situations we haven't foreseen. I think we still need it to imply a location - which rules out a single kind like `ErrorKind::FromExternalDirProvider`.
I therefore propose the following:
* Add a variant to `tor_dirmgr::Error`, `FromExternalDirProvider`. It will contain one of:
- `Arc<dyn tor_error::ErrorWithKind> `, a new trait which `: HasKind + StdError ...`.
- `Arc<dyn StdError + Send + Sync + 'static>` + a separate `ErrorKind`
- `Arc<dyn HasKind + Send + Sync + 'static>` and we make `HasKind: std::error::Error`
* Add the following `ErrorKind`s, for use **only by code outside `arti.git`**:
`OtherLocal` `OtherTorAccess` `OtherTorNetwork` `OtherRemote` `OtherIndeterminate`
or maybe some other word than `Other`.
We could have `ExternalDirProviderLocal` etc. but that seems overly pernickety.
Earlier discussion: https://gitlab.torproject.org/tpo/core/arti/-/merge_requests/347#note_2782765
CC @eta, since I know she especially objected to `Arc<dyn...>`. I'm suggesting this for this situation because I can't see a reasonable alternative.Arti 1.0.0: Ready for production useIan Jacksoniwj@torproject.orgIan Jacksoniwj@torproject.org