Tor cache breaking over time and causing AllGuardsDown
Summary
Steps to reproduce:
We are using arti-client in our app and we are having an issue where we get into a state where all of our Tor request time out or fail. I'm still trying to narrow this down and I haven't found the exact reason yet.
Deleting the arti data directory does fix the issue. This leads me to believe it might be an issue with arti itself or how it handles its internal cache.
What is the current bug behavior?
Connections time our fail with error messages such as these:
Client(Error { detail: ObtainHsCircuit { hsid: HsId([…]oqd.onion), cause: DescriptorDownload(RetryError { doing: "retrieve hidden service descriptor", errors: [(Single(1), Report(DescriptorError { hsdir: [scrubbed], error: Circuit(Guard(AllGuardsDown { retry_at: None, running: FilterCount { n_accepted: 0, n_rejected: 60 }, pending: FilterCount { n_accepted: 0, n_rejected: 0 }, suitable: FilterCount { n_accepted: 0, n_rejected: 0 }, filtered: FilterCount { n_accepted: 0, n_rejected: 0 } })) })), (Single(2), Report(DescriptorError { hsdir: [scrubbed], error: Circuit(Guard(AllGuardsDown { retry_at: None, running: FilterCount { n_accepted: 0, n_rejected: 60 }, pending: FilterCount { n_accepted: 0, n_rejected: 0 }, suitable: FilterCount { n_accepted: 0, n_rejected: 0 }, filtered: FilterCount { n_accepted: 0, n_rejected: 0 } })) })), (Single(3), Report(DescriptorError { hsdir: [scrubbed], error: Circuit(Guard(AllGuardsDown { retry_at: None, running: FilterCount { n_accepted: 0, n_rejected: 60 }, pending: FilterCount { n_accepted: 0, n_rejected: 0 }, suitable: FilterCount { n_accepted: 0, n_rejected: 0 }, filtered: FilterCount { n_accepted: 0, n_rejected: 0 } })) })), (Single(4), Report(DescriptorError { hsdir: [scrubbed], error: Circuit(Guard(AllGuardsDown { retry_at: None, running: FilterCount { n_accepted: 0, n_rejected: 60 }, pending: FilterCount { n_accepted: 0, n_rejected: 0 }, suitable: FilterCount { n_accepted: 0, n_rejected: 0 }, filtered: FilterCount { n_accepted: 0, n_rejected: 0 } })) })), (Single(5), Report(DescriptorError { hsdir: [scrubbed], error: Circuit(Guard(AllGuardsDown { retry_at: None, running: FilterCount { n_accepted: 0, n_rejected: 60 }, pending: FilterCount { n_accepted: 0, n_rejected: 0 }, suitable: FilterCount { n_accepted: 0, n_rejected: 0 }, filtered: FilterCount { n_accepted: 0, n_rejected: 0 } })) })), (Single(6), Report(DescriptorError { hsdir: [scrubbed], error: Circuit(Guard(AllGuardsDown { retry_at: None, running: FilterCount { n_accepted: 0, n_rejected: 60 }, pending: FilterCount { n_accepted: 0, n_rejected: 0 }, suitable: FilterCount { n_accepted: 0, n_rejected: 0 }, filtered: FilterCount { n_accepted: 0, n_rejected: 0 } })) }))], n_errors: 6 }) } })) }))) }))
2025-10-21T12:09:59.493667Z DEBUG list_sellers{list_sellers=ListSellersArgs { rendezvous_points: [/dns4/eigen.center/tcp/8888/p2p/12D3KooWS5RaYJt4ANKMH4zczGVhNcw5W214e2DDYXnjs5Mx5zAT] } method="list_sellers"}:Swarm::poll: libp2p_swarm: /Users/_/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/libp2p-swarm-0.44.2/src/lib.rs:843: Connection attempt to peer failed with Transport([(/onion3/rsemlyzwsz3akg6mzns4idijywou4of3iudclauahytpkibdamcbeoqd:9939/p2p/12D3KooWHGeZYdciw85kUDowhAFdjx1RCPc2piXawaq3dNi2LiZv, Other(Custom { kind: Other, error: Other(Right(Custom { kind: Other, error: Other(Left(Left(Custom { kind: Other, error: Left(Client(Error { detail: ObtainHsCircuit { hsid: HsId([…]oqd.onion), cause: DescriptorDownload(RetryError { doing: "retrieve hidden service descriptor", errors: [(Single(1), Report(DescriptorError { hsdir: [scrubbed], error: Circuit(Guard(AllGuardsDown { retry_at: None, running: FilterCount { n_accepted: 0, n_rejected: 60 }, pending: FilterCount { n_accepted: 0, n_rejected: 0 }, suitable: FilterCount { n_accepted: 0, n_rejected: 0 }, filtered: FilterCount { n_accepted: 0, n_rejected: 0 } })) })), (Single(2), Report(DescriptorError { hsdir: [scrubbed], error: Circuit(Guard(AllGuardsDown { retry_at: None, running: FilterCount { n_accepted: 0, n_rejected: 60 }, pending: FilterCount { n_accepted: 0, n_rejected: 0 }, suitable: FilterCount { n_accepted: 0, n_rejected: 0 }, filtered: FilterCount { n_accepted: 0, n_rejected: 0 } })) })), (Single(3), Report(DescriptorError { hsdir: [scrubbed], error: Circuit(Guard(AllGuardsDown { retry_at: None, running: FilterCount { n_accepted: 0, n_rejected: 60 }, pending: FilterCount { n_accepted: 0, n_rejected: 0 }, suitable: FilterCount { n_accepted: 0, n_rejected: 0 }, filtered: FilterCount { n_accepted: 0, n_rejected: 0 } })) })), (Single(4), Report(DescriptorError { hsdir: [scrubbed], error: Circuit(Guard(AllGuardsDown { retry_at: None, running: FilterCount { n_accepted: 0, n_rejected: 60 }, pending: FilterCount { n_accepted: 0, n_rejected: 0 }, suitable: FilterCount { n_accepted: 0, n_rejected: 0 }, filtered: FilterCount { n_accepted: 0, n_rejected: 0 } })) })), (Single(5), Report(DescriptorError { hsdir: [scrubbed], error: Circuit(Guard(AllGuardsDown { retry_at: None, running: FilterCount { n_accepted: 0, n_rejected: 60 }, pending: FilterCount { n_accepted: 0, n_rejected: 0 }, suitable: FilterCount { n_accepted: 0, n_rejected: 0 }, filtered: FilterCount { n_accepted: 0, n_rejected: 0 } })) })), (Single(6), Report(DescriptorError { hsdir: [scrubbed], error: Circuit(Guard(AllGuardsDown { retry_at: None, running: FilterCount { n_accepted: 0, n_rejected: 60 }, pending: FilterCount { n_accepted: 0, n_rejected: 0 }, suitable: FilterCount { n_accepted: 0, n_rejected: 0 }, filtered: FilterCount { n_accepted: 0, n_rejected: 0 } })) }))], n_errors: 6 }) } })) }))) })) }))]). peer=12D3KooWHGeZYdciw85kUDowhAFdjx1RCPc2piXawaq3dNi2LiZv
The errors are coming from our libp2p-tor crate which uses arti internally to establish peer to peer connections. libp2p-tor surfaces the arti errors the caller.
What is the expected behavior?
Connections should be successful without having to periodically delete the arti data directory.
Environment
We are using arti-client 1.5.0, specifically pinned to this commit. See the exact versions here.
Relevant logs and/or screenshots
Some more logs can be found here but these include other logs too that are unrelated to arti itself. I can help provide more detailed logging if required.
I have zipped my arti data directory and uploaded it here: https://file.kiwi/df6e13ff#hpK0cqmUwQHwy1_VRLzLzQ
Possible fixes
Deleting the data directory seems to temporarily fix the issue.