Trac issueshttps://gitlab.torproject.org/legacy/trac/-/issues2020-06-13T15:48:04Zhttps://gitlab.torproject.org/legacy/trac/-/issues/32438Inconsistent failure-then-success bootstrap behavior with clock set 24h in th...2020-06-13T15:48:04ZintrigeriInconsistent failure-then-success bootstrap behavior with clock set 24h in the pastContext: I'm investigating which part of Tails' crazy clock fixing dance we can remove thanks to https://trac.torproject.org/projects/tor/ticket/24661. Corresponding Tails ticket: https://redmine.tails.boum.org/code/issues/16471.
Enviro...Context: I'm investigating which part of Tails' crazy clock fixing dance we can remove thanks to https://trac.torproject.org/projects/tor/ticket/24661. Corresponding Tails ticket: https://redmine.tails.boum.org/code/issues/16471.
Environment: Tor Browser 9.0.1 x86_64 on Debian unstable, clock set 24h before UTC. Tested both with direct connection to the Tor network and with bridges.
The first time I click "Connect" in Tor Launcher, I see a bootstrapping error (see attached screenshot and Tor logs):
```
Tor failed to establish a Tor network connection.
Loading authority certificates failed (Clock skew -81944 in microdesc flavor consensus from CONSENSUS - ?).
```
I guess that's kind of expected despite the improvements implemented as part of https://trac.torproject.org/projects/tor/ticket/24661.
But then, if I click "Reconfigure" and then "Connect", tor bootstraps successfully. This surprises me and feels inconsistent: the way I see it, either the clock is skewed enough for tor to fail bootstrapping, and then it should not succeed on second try; or tor can somehow deal with skewed clock, and then it should succeed on first try.Kathleen BradeKathleen Bradehttps://gitlab.torproject.org/legacy/trac/-/issues/31870Do an informal usability study on the "get bridges" process2020-06-13T18:36:23ZPhilipp Winterphw@torproject.orgDo an informal usability study on the "get bridges" processSee [this mailing list post](https://lists.torproject.org/pipermail/ux/2019-May/000448.html) from May 2019. We would like to:
1. Give a user a device with a censored Tor Browser / Tor Browser Android
2. Ask the user to figure out how to...See [this mailing list post](https://lists.torproject.org/pipermail/ux/2019-May/000448.html) from May 2019. We would like to:
1. Give a user a device with a censored Tor Browser / Tor Browser Android
2. Ask the user to figure out how to connect to Tor
3. Observe what issues the user runs into
We may be able to do another iteration of this experiment at the OTF summit in Taipei.https://gitlab.torproject.org/legacy/trac/-/issues/30190Do not warn about compatible OpenSSL upgrades2020-06-13T15:40:47ZteorDo not warn about compatible OpenSSL upgradesFrom https://github.com/torproject/tor/pull/951
When releasing OpenSSL patch-level maintenance updates,
we do not want to rebuild binaries using it.
And since they guarantee ABI stability, we do not have to.
Without this patch, warning...From https://github.com/torproject/tor/pull/951
When releasing OpenSSL patch-level maintenance updates,
we do not want to rebuild binaries using it.
And since they guarantee ABI stability, we do not have to.
Without this patch, warning messages were produced
that confused users:
https://bugzilla.opensuse.org/show_bug.cgi?id=1129411Tor: 0.3.5.x-finalhttps://gitlab.torproject.org/legacy/trac/-/issues/29744Streams sometimes stall for up to 1 hour without making any progress2020-06-13T18:03:57ZKarsten LoesingStreams sometimes stall for up to 1 hour without making any progressWe're measuring Tor performance using our OnionPerf tool by regularly downloading 5 MiB files over Tor. Some of these measurements run longer than 1 hour, after which a timeout in OnionPerf aborts them, or run for up to 30 minutes until ...We're measuring Tor performance using our OnionPerf tool by regularly downloading 5 MiB files over Tor. Some of these measurements run longer than 1 hour, after which a timeout in OnionPerf aborts them, or run for up to 30 minutes until they complete. (For comparison, 99% of successful runs complete within roughly two minutes.)
I noticed one particular source of slowness which I think is the reason for the application timeouts after 1 hour and for some of the 1% slowest successful runs: streams stall for seconds or minutes and would even stall for hours if we let them, without making any progress; and suddenly they make progress until they complete or stall again.
I'm attaching four graphs showing this problem. All these graphs show download progress over time with time on x and progress on y. Each gray bar is one measurement. The black line starts at the bottom of its gray bar and goes up to the top of that bar as more data is received. The number on the right is the stream ID.
The first two graphs show application timeouts, the last two show the slowest 1% of successful runs. First and third show downloads from a public server, second and fourth from an onion server.
Note that not all runs have this problem of stalling as described above. Some of the more obvious cases are:
- Page 3, stream ID 436971: that stream basically does nothing for over half an hour and then completes within seconds.
- Page 3, stream ID 436986: same as before, just with a shorter stalling period.
Other cases have different issues. For example, stream ID 34117 on page 3 is rather slow for most of the time and then suddenly gets faster at the end. However, it does not stall.
I do have tor logs and tor controller event logs for these cases. Here's a log containing many relevant STREAM and STREAM_BW events: https://people.torproject.org/~karsten/volatile/streams-2019-02-18.log.xz (61.1K)
These measurements have been made using tor versions 0.2.9.11-dev and 0.3.0.7-dev.
I can provide more data. But rather than uploading everything, please let me know what data would be most useful, and I'll provide just that.Tor: unspecifiedhttps://gitlab.torproject.org/legacy/trac/-/issues/29743Long-running tor instances fail to keep up-to-date directory information2020-06-13T18:03:57ZKarsten LoesingLong-running tor instances fail to keep up-to-date directory informationWe have a small number of long-running tor instances as part of our OnionPerf setups that are running 24/7. In the past, some of these tor instances got into a state where their directory information was no longer up-to-date enough to bu...We have a small number of long-running tor instances as part of our OnionPerf setups that are running 24/7. In the past, some of these tor instances got into a state where their directory information was no longer up-to-date enough to build circuits. In some cases they recovered after hours, days, or even weeks, but in some cases we had to restart the tor processes.
I'm attaching a graph that shows the number of open circuits as reported in heartbeat log messages. That number is relatively stable most of the time, depending on whether we're using the tor instance for making requests or for providing an onion service. But in some cases the number drops to zero, which coincides with the log message:
```
[notice] Our directory information is no longer up-to-date enough to build circuits: [...]
```
The graph also shows that sometimes the number magically goes up again. Those times coincide with the following log message:
```
[notice] We now have enough directory information to build circuits.
```
The purple dashed lines show when we restarted tor processes manually. Some of these restarts are unrelated to the number of open circuits. But some restarts happened explicitly because the tor instance was not working anymore for our measurements.
By the way, the op-nl instance shown in the middle was running 0.2.9.11-dev, whereas the op-us and op-hk instances were running 0.3.0.7-dev. It may be coincidence, but the older op-nl did not run out of up-to-date directory information, whereas the newer op-us and op-hk did. Was this issue maybe introduced in 0.3.0.x?
I have tor logs available for all these tor instances. I can easily provide them, either as a big tarball or for specific days and instances as a smaller tarball. Just let me know.Tor: unspecifiedhttps://gitlab.torproject.org/legacy/trac/-/issues/28925distinguish PT vs proxy for real in bootstrap tracker2020-06-13T15:39:42ZTaylor Yudistinguish PT vs proxy for real in bootstrap trackerThe bootstrap tracker work in #27167 adds distinctions (in the form of additional bootstrap phases) between connecting directly to a relay vs connecting through a proxy. It also tries to distinguish proxies from PTs.
The changes to do ...The bootstrap tracker work in #27167 adds distinctions (in the form of additional bootstrap phases) between connecting directly to a relay vs connecting through a proxy. It also tries to distinguish proxies from PTs.
The changes to do the distinguishing between PT vs proxy don't work, because `conn->proxy_type` gets set to the protocol type of the underlying protocol that tor uses to talk to the PT locally.Tor: 0.4.0.x-finalTaylor YuTaylor Yuhttps://gitlab.torproject.org/legacy/trac/-/issues/28654Allow relays to serve future consensuses2020-06-13T15:34:50ZteorAllow relays to serve future consensusesLike #28591 for clients, we should allow relays to serve future consensuses.Like #28591 for clients, we should allow relays to serve future consensuses.Tor: 0.4.0.x-finalhttps://gitlab.torproject.org/legacy/trac/-/issues/28591Accept a future consensus for bootstrap2020-06-13T16:06:33ZteorAccept a future consensus for bootstrap#24661 allows tor to bootstrap when the client's clock is ahead of the network by up to 1 day.
But clients can't bootstrap when the client's clock is behind the network by more than a few hours:
https://trac.torproject.org/projects/tor/...#24661 allows tor to bootstrap when the client's clock is ahead of the network by up to 1 day.
But clients can't bootstrap when the client's clock is behind the network by more than a few hours:
https://trac.torproject.org/projects/tor/ticket/24661#comment:18Tor: 0.3.5.x-finalteorteorhttps://gitlab.torproject.org/legacy/trac/-/issues/28351Test clock skewed clients using chutney2020-06-13T13:30:41ZteorTest clock skewed clients using chutneyTo test #23605, we could run chutney with the clients clock skewed using chutney.
Alternately, we could launch the clients from a chutney network on a separate, skewed machine; or launch the clients and onion services on the public netw...To test #23605, we could run chutney with the clients clock skewed using chutney.
Alternately, we could launch the clients from a chutney network on a separate, skewed machine; or launch the clients and onion services on the public network (but we'd need exits?).https://gitlab.torproject.org/legacy/trac/-/issues/28319accept a reasonably live consensus for path selection2020-06-13T15:33:46Zteoraccept a reasonably live consensus for path selectionWhen I fixed guard selection in #24661, tor said:
```
Nov 05 15:29:55.000 [notice] I learned some more directory information, but not enough to build a circuit: We have no recent usable consensus.
```
Maybe this is a logging issue, mayb...When I fixed guard selection in #24661, tor said:
```
Nov 05 15:29:55.000 [notice] I learned some more directory information, but not enough to build a circuit: We have no recent usable consensus.
```
Maybe this is a logging issue, maybe it's another constraint we need to fix.
See the full log in:
https://trac.torproject.org/projects/tor/ticket/24661#comment:13Tor: 0.4.0.x-finalteorteorhttps://gitlab.torproject.org/legacy/trac/-/issues/28255verify guard selection consensus expiry constraints2020-06-13T15:33:32ZTaylor Yuverify guard selection consensus expiry constraintsThe hypothesis in #23605 is that bootstrapping can get stuck at #23605 if there is enough clock skew for the consensus to be expired but still "reasonably live". Let's verify this and try to record more details.The hypothesis in #23605 is that bootstrapping can get stuck at #23605 if there is enough clock skew for the consensus to be expired but still "reasonably live". Let's verify this and try to record more details.Tor: 0.4.0.x-finalteorteorhttps://gitlab.torproject.org/legacy/trac/-/issues/27691reset bootstrap progress when enough things change2020-06-13T15:31:22ZTaylor Yureset bootstrap progress when enough things changeRight now, setting DisableNetwork=1 doesn't reset the bootstrap progress indicator. It probably should, because all network connections to bridges or relays will close. This will improve the user experience once we have #27103 in place...Right now, setting DisableNetwork=1 doesn't reset the bootstrap progress indicator. It probably should, because all network connections to bridges or relays will close. This will improve the user experience once we have #27103 in place, because then the earlier progress shown will be the initial network connection that everything else depends on.
We probably also want to reset the bootstrap progress when a configuration change causes us to disconnect from all our guards.Tor: unspecifiedhttps://gitlab.torproject.org/legacy/trac/-/issues/27308report bootstrap phase when we actually start, not just unblock something2020-06-13T15:30:07ZTaylor Yureport bootstrap phase when we actually start, not just unblock somethingRight now many bootstrap events get reported when the preceding task has completed. This makes it somewhat harder to tell what has gone wrong if bootstrap progress stalls.
[edit: The following isn't necessarily the best way to fix this...Right now many bootstrap events get reported when the preceding task has completed. This makes it somewhat harder to tell what has gone wrong if bootstrap progress stalls.
[edit: The following isn't necessarily the best way to fix this. It might be better to figure out how to report starting something when actually starting it.]
We should add completion milestones to bootstrap reporting. This makes bootstrap reporting more future-proof. If in the future we add a time-consuming task with (no bootstrap reporting) between two existing bootstrap tasks, it will be a little more obvious what's going on.
For example, say we have task X followed by task Z, but then we add a lengthy task Y without adding bootstrap reporting to it. In the old scheme without completion milestones, if Y stalls, the user sees:
* starting X
* starting Z
* [hang]
The user thinks Z has already started when no such thing has happened because Y is still in progress. If we add completion milestones, the user will see:
* starting X
* finished X
* starting Z
* finishing Z
in a normal bootstrap. If something gets stuck in task Y, the user will see:
* starting X
* finished X
* [hang]
This will make it more clear that something got stuck in between tasks.
In a one-line display like Tor Launcher, the completion milestones will normally flash by quickly and not be very visible to users. Completion milestones might make the NOTICE logs a bit more verbose.Tor: unspecifiedhttps://gitlab.torproject.org/legacy/trac/-/issues/27239TB team feedback on jump-to-80% work2020-06-16T00:49:21ZIsabela FernandesTB team feedback on jump-to-80% workHello TB team,
we would like your feedback on this work, and let us know if there is anything we need to know regarding this on Tor Browser side.Hello TB team,
we would like your feedback on this work, and let us know if there is anything we need to know regarding this on Tor Browser side.Tor: 0.4.0.x-finalhttps://gitlab.torproject.org/legacy/trac/-/issues/27169monitor bootstrap directory info progress separately2020-06-13T15:34:23ZTaylor Yumonitor bootstrap directory info progress separatelyAbstract out the current monitoring of bootstrap directory information progress, so we can track it state more independently. This allows us to defer reporting that we have sufficient directory information until we know that we can actua...Abstract out the current monitoring of bootstrap directory information progress, so we can track it state more independently. This allows us to defer reporting that we have sufficient directory information until we know that we can actually connect to a relay or bridge at all.
This also allows us to eliminate or simplify special case logic in `control_event_bootstrap()` that handles incremental progress during descriptor downloads.Tor: 0.3.5.x-finalhttps://gitlab.torproject.org/legacy/trac/-/issues/27167track "first" OR_CONN2020-06-13T17:44:18ZTaylor Yutrack "first" OR_CONNRight now the first stages of the "first" OR_CONN get reported as `BOOTSTRAP_STATUS_CONN_DIR` and `BOOTSTRAP_STATUS_HANDSHAKE` (the latter is a special bootstrap phase that gets translated into `BOOTSTRAP_STATUS_HANDSHAKE_DIR` or `BOOTST...Right now the first stages of the "first" OR_CONN get reported as `BOOTSTRAP_STATUS_CONN_DIR` and `BOOTSTRAP_STATUS_HANDSHAKE` (the latter is a special bootstrap phase that gets translated into `BOOTSTRAP_STATUS_HANDSHAKE_DIR` or `BOOTSTRAP_STATUS_HANDSHAKE_OR` depending on how much progress was previously reported. The logic in functions that report these events should be moved up to a new abstraction so lower level code has to track less high-level state.
This also eliminates some logic in `control_event_bootstrap()` that tries to figure out whether a given handshake attempt corresponds to a directory connection or an application circuit connection.Tor: 0.4.0.x-finalTaylor YuTaylor Yuhttps://gitlab.torproject.org/legacy/trac/-/issues/27104report intermediate status when building application circuits2020-06-13T15:29:19ZTaylor Yureport intermediate status when building application circuitsDuring bootstrap, some minimum number of application circuits must be established before bootstrapping will complete. Right now, the user will receive no feedback of intermediate progress as a bootstrap circuit is being built. We shoul...During bootstrap, some minimum number of application circuits must be established before bootstrapping will complete. Right now, the user will receive no feedback of intermediate progress as a bootstrap circuit is being built. We should make this more granular, probably with intermediate progress at each EXTEND, to make visible when Tor is being slow to build circuits.Tor: unspecifiedhttps://gitlab.torproject.org/legacy/trac/-/issues/27103report initial OR_CONN as the earliest bootstrap phases2020-06-13T15:31:22ZTaylor Yureport initial OR_CONN as the earliest bootstrap phasesWe should always make the earliest bootstrap phases be our first connection to any OR, regardless of whether we already have enough directory info to start building circuits.
When starting to boostrap with existing directory info, there...We should always make the earliest bootstrap phases be our first connection to any OR, regardless of whether we already have enough directory info to start building circuits.
When starting to boostrap with existing directory info, there might not be a need to make an initial connection to a bridge or fallback directory server to download directory info. This means that the initial OR_CONN to a bridge or guard displays on a progress bar as 80%, when in fact a fairly "early" dependency (the initial connection to any OR) could be failing.
Intuitively, starting Tor Browser and seeing the progress bar hang at 80% for a very long time is frustrating and misleading. A user who sees the progress bar hang at at 5% or 10% has a much better idea of what's going on.
Existing directory info can be reflected in the progress bar as a rapid jump after the initial OR_CONN succeeds. This seems less likely to frustrate users.Tor: 0.4.0.x-finalTaylor YuTaylor Yuhttps://gitlab.torproject.org/legacy/trac/-/issues/27102gather feedback re decoupling bootstrap progress numbers from BOOTSTRAP_STATU...2020-06-13T15:29:17ZTaylor Yugather feedback re decoupling bootstrap progress numbers from BOOTSTRAP_STATUS enum valuesIf we start reporting intermediate bootstrap phases, for example when reporting PT status when connecting to the Tor network through a PT bridge (#25502), there aren't many numbers remaining to insert between some existing phases (if we ...If we start reporting intermediate bootstrap phases, for example when reporting PT status when connecting to the Tor network through a PT bridge (#25502), there aren't many numbers remaining to insert between some existing phases (if we stick to integers).
We should decouple these so we don't have to cram everything into a tiny portion of the progress bar. It also doesn't make sense to report progress phases that we will never need to execute.
Alternatively, renumber the enums to give us more space toward the beginning of the progress bar.Tor: 0.4.0.x-finalTaylor YuTaylor Yuhttps://gitlab.torproject.org/legacy/trac/-/issues/23605expired consensus causes guard selection to stall at BOOTSTRAP PROGRESS=802020-06-13T15:33:33ZTaylor Yuexpired consensus causes guard selection to stall at BOOTSTRAP PROGRESS=80Tor can report `BOOTSTRAP_STATUS_CONN_OR` (PROGRESS=80, "Connecting to the Tor network") when it actually can do no such thing. In some situations (e.g., clock skew) this causes progress to get stuck at 80% indefinitely, resulting in ve...Tor can report `BOOTSTRAP_STATUS_CONN_OR` (PROGRESS=80, "Connecting to the Tor network") when it actually can do no such thing. In some situations (e.g., clock skew) this causes progress to get stuck at 80% indefinitely, resulting in very poor user experience.
Right now `update_router_have_minimum_dir_info()` reports the `BOOTSTRAP_STATUS_CONN_OR` event if there's a "reasonably live" consensus and enough descriptors downloaded. A client with a clock skewed several hours into the future can get stalled here indefinitely due to inability to select a guard: if the client's clock is skewed, it will never have a live consensus. (Guard selection seems to require a non-expired consensus, rather than a reasonably live consensus at least during bootstrap.)
We should either relax the guard selection consensus liveness requirement, or avoid reporting `BOOTSTRAP_STATUS_CONN_OR` when we have no reasonable chance of actually connecting to a guard for building application circuits.
Arguably we shouldn't start downloading descriptors until we have a non-expired consensus either, because that gets represented as a considerable chunk of the progress bar (40%->80%) in a way that could be misleading to a user. Making that change without additional work would cause bootstrap to get stuck at 40% instead of 80%, which might be an improvement. This can already happen if the client's clock is skewed several hours in the past.Tor: 0.4.0.x-finalTaylor YuTaylor Yu