The current arti-client error types are design for implementor convenience, not user convenience: they are explicit about what kind of error they're wrapping, but not about what kind of problem they actually indicate, and what you're supposed to do about it.
We should redesign the errors exposed (directly or indirectly) by arti-client, and try to make them more generally useful. This will be an API break, so we should do this on the soon-ish side, while we're still breaking our API all the time.
We should also make sure that all of our errors are Send and Clone.
Edited
Designs
Child items ...
Show closed items
Linked items 0
Link issues together to show that they're related.
Learn more.
@nickm and @Diziet had a conversation about error handling.
We came up with the following proposal:
There should be a single portmanteau error type in all the tor crates. It should exist in some base crate.
The portmanteau error type should have a kind value a la io::ErrorKind. The kinds should be in the base crate, too.
The portmanteau error type should contain (or, maybe, optionally contain) a Arc<dyn std::error::Error + Send + Sync + 'static> (the Arc<> is so we can contain an io::Error as-is).
Most top-level public APIs (eg, those presented by arti_client) should expose the portmanteau error type.
Individual bits of code, eg modules or maybe crates, can have their own error types. These should be converted to the portmanteau error type when they are passed to code that doesn't need to know the details, usually by a suitable From impl.
Crate- or code-specific error enums should no longer be nested in other similar error enums, at least, not in public APIs. IOW public APIs should avoid exposing (in the type system) nesting chains or error cause structures.
When we need to handle errors differently depending on the precise context or details of what went wrong (eg, which circuit this error occurred on) and the context needs to be part of the error type, we need to use a bespoke error type instead, and avoid converting it to the portmanteau until we no longer need to think about it in such detail.
Consequence
Must functions will return an error with 3 words: kind, heap pointer,
vtable. For a small (<= 2 words) Ok type the Result is also 3-word
due to nich optimisation.
Most error paths cons once during error propagation. On hot paths we may want to retain concrete specific error types.
Error filtering and error-specific handling can be done by matching on
the kind.
In places where we want to inspect the inner error in more detail, not
just by looking at the Tor Kind, we'll have to use dyn downcasting.
Sometimes we will need to smash strange errors to strings to make them be Send + Sync + 'static.
Rejected alternative
Instead of a type-erasing the portmanteau error type's payload, the portmanteau error type is an enum, each variant of which has its own payload of a concrete type (which might not even be an error, and might be Box<something> to keep the portanteau small).
This makes the rare case of inspecting the inner error in detail simpler.
But it exposes more complicated innards: it makes it more clearly an API-breaking change to change how and where a particular error is generated, or what additiona information is provided. And it precludes generating the same error kind in multiple places with different payloads.
Individual bits of code, eg modules or maybe crates, can have their own error types.
As a suggestion for an initial dividing point:
I'd suggest that probably everything at level tor-proto/tor-netdoc or higher should use our Error type. But I'm less sure that very low level crates e.g. tor-llcrypto and tor-bytes and tor-cell should use that error type or whether it makes sense for them to continue to have their own errors.
I'd suggest that probably everything at level tor-proto/tor-netdoc or higher
should use our Error type. But I'm less sure that very low level crates e.g.
tor-llcrypto and tor-bytes and tor-cell should use that error type or whether
it makes sense for them to continue to have their own errors.
My rule of thumb would be just to not nest errors inside other errors and instead use the new tor::Error instead. So I think that means that crates (or functions) that need to report errors from other places, ought to use tor::Error, and other places can probably use a custom enum which probably impl Into<tor::Error>.
Do we want to make any general distinctions about errors? For example:
Is anyone going to want to ask, programmatically, the question "which Tor layer/component did this error come from?". I think the answer is no.
Do we need to distinguish permanent from retriable errors? And what does that mean precisely? In the DNS there is such a distinction, so in principle resolve could return distinct errors, for example. But for the web this is all much less clear. In some sense ECONNREFUSED from the target is a permanent error: retrying it won't work unless the target changes something - but (a) such changes are routine and not really something the user wants to pay attention to (b) you could get ECONNREFUSED due to bad network conditions at your exit node.
Are there other category-like properties of errors that we need to care about? Eg "changing your underlying transport network environment (switching your phone/laptop from wifi to mobile data)" may fix this". "This problem is likely due to an attack on Tor in your environment / an attack on your peer / ???".
We don't want to add more payload data to the new tor_error::Error, so every distinction is going to have to be implied by the Kind. And we probably ought to provide an Other kind for the caller's use (like io::Error does) but then we need to either fix the answers for all these properties/categories, or provide multiple Other kinds.
Is anyone going to want to ask, programmatically, the question "which Tor layer/component did this error come from?". I think the answer is no.
I agree, though we will want to be able to tell diagnostically where it came from.
Do we need to distinguish permanent from retriable errors?
I think that if we want such a distinction, it'll probably make more sense as a question of: "If I wait and try again, might this get better on its own"? But I'm not sure that makes sense.
Are there other category-like properties of errors that we need to care about?
Here are a few, though I think this might be better off as Kinds:
"This is always a bug in your code (or in ours)."
"You should never see this; it's an error in our code."
"This is caused because of another party implementing the Tor protocol incorrectly."
We don't want to add more payload data to the new tor_error::Error, so every distinction is going to have to be implied by the Kind.
Well, that might not be the case. We've allocated a whole word for the Kind, but I bet we will never have more than 2^16 kinds. So if we repr(u16) for the Kind, then we can grab another few bits from that word for a set of bitflags.
through
And so long as the Error type is opaque, we can make this change down the road as well.
When it comes to enumerating all the possible kinds and flags here, I think it might make sense to feel them out by listing all (or a bunch) of the error cases that we already have, and then trying to put them into kinds. Sadly, many of the current errors are FooError::Bar(BarError::Baz(BazError::Quux(...))) kinds of things, where we'll need to chase down a bit deeply to see how they originate.
I'm happy to go annotate such a list (or help make one) if it'll be useful.
Today I have been working on some implementation of the plan from !257 (merged). It's not really ready to share right now. I'm hoping to share it when I have at least one error path from an error generation site to the toplevel, which is in the "new way".
One thing I am noticing is that I do think we are going to want a custom derive macro since there's a tendency to boilerplate. But for now I will just mark that up in my RFC branch.
@eta ^ fyi (since you don't seem to be tagged in this issue already)