Non-fatal failures to bootstrap don't provide embedding code with any details
Another weird bootstrapping failure in the wild on Android/onionmasq, this time after uncleanly shutting down an emulator and bringing it back up again:
06-26 16:49:11.361 2896 3087 I onionmasq: tor_dirmgr::bootstrap: 2: Looking for a consensus.
06-26 16:49:11.592 2896 3090 I onionmasq: tor_proto::circuit::streammap: Actually got an end cell on a half-closed stream!
06-26 16:49:13.810 2896 3087 I onionmasq: tor_dirmgr::bootstrap: 3: Looking for a consensus.
06-26 16:49:13.871 2896 3090 W onionmasq: tor_dirmgr::bootstrap: Unable to advance downloading staten_attempts=3state=Looking for a consensus.
06-26 16:49:13.873 2896 3090 W onionmasq: tor_dirmgr: Unable to download a usable directory: error: Unable to finish bootstrapping a directory. We will restart in 1s.
06-26 16:49:14.877 2896 3088 I onionmasq: tor_dirmgr::bootstrap: 1: Looking for a consensus.
06-26 16:49:15.112 2896 3088 D onionmasq: onionmasq_mobile::scaffolding: AndroidScaffolding::protect() for fd 110
06-26 16:49:15.898 2896 3088 I onionmasq: tor_dirmgr::bootstrap: 2: Looking for a consensus.
06-26 16:49:18.133 2896 3089 I onionmasq: tor_dirmgr::bootstrap: 3: Looking for a consensus.
06-26 16:49:18.206 2896 3087 W onionmasq: tor_dirmgr::bootstrap: Unable to advance downloading staten_attempts=3state=Looking for a consensus.
06-26 16:49:18.208 2896 3087 W onionmasq: tor_dirmgr: Unable to download a usable directory: error: Unable to finish bootstrapping a directory. We will restart in 1.686s.
06-26 16:49:19.903 2896 3088 I onionmasq: tor_dirmgr::bootstrap: 1: Looking for a consensus.
06-26 16:49:20.952 2896 3089 I onionmasq: tor_dirmgr::bootstrap: 2: Looking for a consensus.
06-26 16:49:23.163 2896 3089 I onionmasq: tor_dirmgr::bootstrap: 3: Looking for a consensus.
06-26 16:49:23.318 2896 3088 W onionmasq: tor_dirmgr::bootstrap: Unable to advance downloading staten_attempts=3state=Looking for a consensus.
06-26 16:49:23.320 2896 3088 W onionmasq: tor_dirmgr: Unable to download a usable directory: error: Unable to finish bootstrapping a directory. We will restart in 4.452s.
06-26 16:49:27.784 2896 3090 I onionmasq: tor_dirmgr::bootstrap: 1: Looking for a consensus.
06-26 16:49:28.870 2896 3090 I onionmasq: tor_dirmgr::bootstrap: 2: Looking for a consensus.
06-26 16:49:30.992 2896 3088 I onionmasq: tor_dirmgr::bootstrap: 3: Looking for a consensus.
06-26 16:49:31.031 2896 3089 W onionmasq: tor_dirmgr::bootstrap: Unable to advance downloading staten_attempts=3state=Looking for a consensus.
06-26 16:49:31.031 2896 3089 W onionmasq: tor_dirmgr: Unable to download a usable directory: error: Unable to finish bootstrapping a directory. We will restart in 9.12s.
06-26 16:49:40.171 2896 3088 I onionmasq: tor_dirmgr::bootstrap: 1: Looking for a consensus.
06-26 16:49:41.212 2896 3090 I onionmasq: tor_dirmgr::bootstrap: 2: Looking for a consensus.
06-26 16:49:43.037 2896 3090 I onionmasq: tor_dirmgr::bootstrap: 3: Looking for a consensus.
06-26 16:49:43.070 2896 3090 W onionmasq: tor_dirmgr::bootstrap: Unable to advance downloading staten_attempts=3state=Looking for a consensus.
06-26 16:49:43.071 2896 3090 W onionmasq: tor_dirmgr: Unable to download a usable directory: error: Unable to finish bootstrapping a directory. We will restart in 23.464s.
06-26 16:50:06.541 2896 3087 I onionmasq: tor_dirmgr::bootstrap: 1: Looking for a consensus.
06-26 16:50:07.578 2896 3089 I onionmasq: tor_dirmgr::bootstrap: 2: Looking for a consensus.
06-26 16:50:10.096 2896 3088 I onionmasq: tor_dirmgr::bootstrap: 3: Looking for a consensus.
06-26 16:50:10.142 2896 3088 W onionmasq: tor_dirmgr::bootstrap: Unable to advance downloading staten_attempts=3state=Looking for a consensus.
06-26 16:50:10.143 2896 3088 W onionmasq: tor_dirmgr: Unable to download a usable directory: error: Unable to finish bootstrapping a directory. We will restart in 50.084s.
The principal complaint here is, as the error seems to have been nonfatal, there's no way for the calling code (onionmasq) to know what it is; the bootstrapping code simply loops forever without ever actually returning an error. In this case, it was evidently due to the state directory being corrupted after the unclean shutdown somehow, since clearing the cache instantly fixed it (which would be a bug in its own right, if I had any actual details of the problem).
Would it be possible to provide some mechanism to allow embedding code to have more control or insight into the bootstrap retry process? This is really quite a problem in the Tor VPN use case, since this failure just manifests itself to the user as an indefinite failure to make progress (it does at least set the 'stuck' flag, but we really would prefer the actual error object).
(Yes, I know I can configure it to give up after n attempts, etc -- but then the only error received is an overly generic CantAdvanceState
).