Every time that a relay is measured, it is first checked whether destination is usable, and then run connect_to_destination_over_circuit, which checks again if destination is "usable".
It implies two locks, and a destination can fail in a row due to the relay being unstable.
It'd be better to determine that a destination is not usable, either by trying to make a connection not over Tor, or after a X number of failures in a row.
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Child items
0
Show closed items
No child items are currently assigned. Use child items to break down this issue into smaller parts.
Linked items
0
Link issues together to show that they're related.
Learn more.
I think this is what is making sbws stalls.
A backtrace [0] in the moment it's stalled shows two threads trying to get the next destination.
It locks https://github.com/torproject/sbws/blob/ee64d76df54ceb3a3c9e1e2a797fd70d68bb0035/sbws/lib/destination.py#L248, then while True, then if it enters in _perform_usability_test, will lock again, then call _is_usable, which call connect_destination_over_circuit which locks again. If it fails then it sleep.
Many times connect_to_destination_over_circuit fails several times in a row, cause it's not reliable to do it through Tor and random relays.
If _usable_dests is not overwritten every time that a destination is chosen, then a lock is only needed in connect_destination_over_circuit.
It would be better to refactor all this code to trace it and debug it easier.
As a temporal solution, the "usability" can be tracked out of the class and without locks.
For now i'd just disable checking for usability, and prioritize the refactoring ticket as soon as 1.0 is done.
I thought that there's no need of a big refactor to check if a destination is "usable".
Every time that a destination is used (in connect_destination_over_circuit) and it fails, it can be recorded since there's already a lock.
Putting this in revision to implement this before review.
Reminder: when merging into master, add additional commit or fixup to change circuit_id = cb.build_circuit(circuit_path) into circuit_id, _ = cb.build_circuit(circuit_path) [0], as #28736 (moved) changed build_circuit to return a tuple.
Reminder: when merging into master, add additional commit or fixup to change circuit_id = cb.build_circuit(circuit_path) into circuit_id, _ = cb.build_circuit(circuit_path) [0], as #28736 (moved) changed build_circuit to return a tuple.