Improving HS reachability for Orbot/Android users

Orbot received a pull request here: https://github.com/n8fr8/orbot/pull/83 with proposed modifications to improve reachability of an HS running on the device. We asked Michael Rogers from the Briar Project to review this proposal, since his team has done the most with HS on mobile. The PR details and his comments are inline below.

While these improvements seem worthwhile from a usability perspective, there are concerns about impact on anonymity, as always.

The issue at the moment is that while the device is sleeping for long periods of time, it is possible for the HS to become unreachable as a result of Tor detecting a clock jump of length greater than NUM_JUMPED_SECONDS_BEFORE_WARN (100 seconds) upon waking up, which then closes all circuits. Another issue is that if the device was woken up by incoming network traffic, the device only stays awake for about a second before going back to sleep, which isn't enough time for Tor to rebuild the intro circuits, and thus the HS is no longer reachable until Tor is able to rebuild the circuits.

I attempt to improve this situation in two ways:

Increase NUM_JUMPED_SECONDS_BEFORE_WARN from 100 seconds to 600 seconds to avoid triggering the clock-jumped-close-all-circuits code every time the device wakes up from sleep.
Add a new command (MARKCONNFORWAKELOCK) and event (WAKELOCK) to the control port to allow Tor to synchronously signal Orbot to hold wake lock on behalf of Tor (since it isn't possible to hold a wake lock from native code). A wake lock is acquired at the start of a event callback, then released when libevent returns from its event loop when there are no active events. This prevents the device from sleeping when Tor still has work to do.

Comments from Michael@BriarProject:

Thanks for passing this on. I've also been looking into this problem lately. Comments inline below.

On 11/08/17 12:00, Nathan of Guardian wrote:

1. Increase `NUM_JUMPED_SECONDS_BEFORE_WARN` from 100 seconds to 600
seconds to avoid triggering the clock-jumped-close-all-circuits code
every time the device wakes up from sleep.

Is there something that makes 600 seconds qualitatively better than 100 seconds, or is this just a workaround for short sleeps?

2. Add a new command (`MARKCONNFORWAKELOCK`) and event (`WAKELOCK`) to
the control port to allow Tor to synchronously signal Orbot to hold wake
lock on behalf of Tor (since it isn't possible to hold a wake lock from
native code). A wake lock is acquired at the start of a event callback,
then released when libevent returns from its event loop when there are
no active events. This prevents the device from sleeping when Tor still
has work to do.

I like the underlying idea here, but this way of implementing it seems risky.

The problem is that Tor is driven by two kinds of events: incoming network traffic and libevent timers. When the device is asleep, incoming traffic will briefly wake it, so you can grab a wake lock until Tor finishes its work. But if a libevent timer expires during sleep, the device won't be woken. The timer will be handled next time the device wakes for some other reason.

I think this is a potential risk to anonymity, because it will result in externally visible behaviour, such as circuit teardowns, happening in correlated bursts when the device wakes up. And those bursts can be triggered by sending traffic to the device.

Tor expects timers to run at the scheduled time. That's why it panics and tears everything down if the clock jumps by 100 seconds. Suppressing that panic response seems like a bad idea. More generally, ignoring the assumption behind the panic response seems like a bad idea.

What would be a better idea?

Briar holds a wake lock whenever Tor is connected to the network, but that kills the battery, so we have to find another way.

If we could put Tor into some kind of "idle mode", where it would shut down all circuit building and other timer-driven behaviour, then it might be safe to let the device sleep until it was woken by incoming traffic. We could ask the guard for keepalives, say once every five minutes, to ensure that periodic tasks like fetching the consensus and uploading HS descriptors would have a chance to run even if there was no incoming traffic. But those tasks would still happen in bursts, so it seems to me that there would still be a risk to anonymity.

We might be able to reduce the burstiness if Tor could tell the controller the time of the next consensus fetch or descriptor upload, and the controller could use an alarm to wake the device at that time, regardless of network traffic. Doze mode adds some restrictions here - whitelisted apps can set an alarm every nine minutes, which might be enough. The alarm could also be used to check the time of the last keepalive, to detect dead guard connections.

I think doing this right is going to require significant input from the Tor devs, first of all to see what we can safely get away with in terms of sleep, and then perhaps to implement an idle mode and supporting controller commands if it looks like a good idea.

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information