Failure to bootstrap due to cache directory problems results in a bad end-user experience
Summary
While working on onionmasq, I encountered a problem with the Arti client failing to bootstrap. After some investigation, this appeared to be related to an issue with the cache. Deleting the cache entirely and restarting fixed things up, but the end-user experience here was lacking, especially in the context of integrating Arti as a library.
Steps to reproduce:
- Something weird happens to the cache directory
- The Arti client no longer works
What is the current bug behavior?
I got the following error message on calling bootstrap()
which was resoundingly unhelpful:
onion_tunnel: Failed to bootstrap: tor: cache access problem: Unable to bootstrap a working directory
I then realised (given insider knowledge) that I had to format out the causes, and got this:
onion_tunnel: Caused by: Unable to bootstrap a working directory
onion_tunnel: Caused by: Bad permissions in cache directory
onion_tunnel: Caused by: File or directory /data/user/0/org.torproject.artitoyvpn/cache/arti-cache/dir_blobs/con_microdesc_sha3-256-339e2da5279de87281cfd59f0b3d0f324a15045a8540e9a71a396d68c3e74d54 not found
There are a number of issues here (see below).
What is the expected behavior?
- Bootstrapping should be able to gracefully deal with cache directory problems, without user intervention
- When embedding Arti, this just looks like the entire thing is irreparably broken, when in fact it's a transient issue that could be easily resolved by clearing the cache
- Users can't always do this themselves if Arti is embedded, nor would they necessarily receive indication that they need to.
- The embedding library can't do it without relying on API-unstable error munging tactics, since no machine-readable API is exposed to ascertain that the cache dir is at fault
- The error should be correctly categorized; this doesn't look like a permissions issue
- The top-level error could possibly be renamed, too: I initially read 'working directory' as 'some place to put scratch files' (i.e. the Git 'working directory'), which was confusing
- (perhaps "Unable to download a Tor network directory" or similar)
- (more opinionated, but) We should strongly consider rethinking the
Error::source
idea- A 'normal' embedder (who didn't work on the Arti team for a year) would have no idea how to get more information out of the error, and would assume the nondescript
Unable to bootstrap a working directory
line was it - There's no standard way to print an error's causes (
std::error::Report
is still unstable) - Even if there was, the user might be transforming the error into an
io::Error
or something, or passing it across an FFI boundary; having to thread the error-cause thing throughout the end-user codebase is just a pointless form of make-work - This might make sense once the rest of the ecosystem has caught up, but right now it's a really obscure user experience
- A 'normal' embedder (who didn't work on the Arti team for a year) would have no idea how to get more information out of the error, and would assume the nondescript
Environment
- Version: arti-client 0.8.1 from crates.io, embedded inside onionmasq
- Operating system: Android 13