Tor issueshttps://gitlab.torproject.org/tpo/core/tor/-/issues2023-06-08T17:51:54Zhttps://gitlab.torproject.org/tpo/core/tor/-/issues/40169Circuit Build Timeout code needs cleanup2023-06-08T17:51:54ZMike PerryCircuit Build Timeout code needs cleanupThere's two places where we time out circuits: `circuit_expire_building` and `circuit_build_times_handle_completed_hop()`. `circuit_expire_building` is filled with 19 years of cruft and complexity, and only operates on the *second* resol...There's two places where we time out circuits: `circuit_expire_building` and `circuit_build_times_handle_completed_hop()`. `circuit_expire_building` is filled with 19 years of cruft and complexity, and only operates on the *second* resolution, instead of milliseconds.
These probably only affect timeout in rare cases -- https://gitlab.torproject.org/tpo/core/tor/-/issues/40157 seems to show that with fixes from https://gitlab.torproject.org/tpo/core/tor/-/issues/40168, then we get very close to the target 20% timeout. But there's so much old cruft here that we should clean it up anyway. It might affect UX very poorly in some edge cases.
This is especially true for onion services, which rely primarily on `circuit_expire_building()`. There's likely many bad performance consequences of this, for them.Sponsor 61 - Making the Tor network faster & more reliable for users in Internet-repressive placesMike PerryMike Perryhttps://gitlab.torproject.org/tpo/core/tor/-/issues/40166Stop using APIs deprecated in OpenSSL 3.0.02022-09-01T21:42:49ZNick MathewsonStop using APIs deprecated in OpenSSL 3.0.0In #34139, we note that these APIs are deprecated in OpenSSL 3.0.0:
* DH_compute_key
* DH_generate_key
* DH_size
* ECDH_compute_key
* EC_GFp_mont_method
* EC_GFp_nist_method
* EC_GFp_simple_method
* EC_GROUP_method_of
* ENGINE_by_id
* E...In #34139, we note that these APIs are deprecated in OpenSSL 3.0.0:
* DH_compute_key
* DH_generate_key
* DH_size
* ECDH_compute_key
* EC_GFp_mont_method
* EC_GFp_nist_method
* EC_GFp_simple_method
* EC_GROUP_method_of
* ENGINE_by_id
* ENGINE_ctrl_cmd_string
* ENGINE_free
* ENGINE_get_cipher_engine
* ENGINE_get_default_DH
* ENGINE_get_default_EC
* ENGINE_get_default_RAND
* ENGINE_get_default_RSA
* ENGINE_get_digest_engine
* ENGINE_get_id
* ENGINE_get_name
* ENGINE_load_builtin_engines
* ENGINE_register_all_complete
* ENGINE_set_default
* ERR_func_error_string
* HMAC
* RSA_check_key
* RSA_generate_key_ex
* RSA_private_decrypt
* RSA_private_encrypt
* RSA_public_decrypt
* RSA_public_encrypt
* RSA_size
We should stop using them.https://gitlab.torproject.org/tpo/core/tor/-/issues/32816Move the relay options into the correct man page sections2021-09-16T14:11:41ZteorMove the relay options into the correct man page sectionsSome of the relay options aren't under Server Options or Directory Server Options in the man page. We should move them, so it's clear which options are disabled when relay mode is disabled.
Here is the full list of options that we shoul...Some of the relay options aren't under Server Options or Directory Server Options in the man page. We should move them, so it's clear which options are disabled when relay mode is disabled.
Here is the full list of options that we should check:
* DirPort, DirCache, ORPort
* --list-fingerprint, RelayBandwidth*, MaxAdvertisedBandwidth, PerConnBW*, and ServerTransportPluginhttps://gitlab.torproject.org/tpo/core/tor/-/issues/32815Move all directory authority options into the dirauth man page section2021-09-16T14:11:41ZteorMove all directory authority options into the dirauth man page sectionSome of the *AuthoritativeDir* and MinUptimeHidServDirectoryV2 options are not under Directory Authority Server Options in the man page. We should move them under the section, so it's clear which options are disabled in when directory au...Some of the *AuthoritativeDir* and MinUptimeHidServDirectoryV2 options are not under Directory Authority Server Options in the man page. We should move them under the section, so it's clear which options are disabled in when directory authority mode is disabled.https://gitlab.torproject.org/tpo/core/tor/-/issues/32388sched: When OR connection opens, always indicate the channel is ready for cells2022-09-01T21:42:49ZDavid Gouletdgoulet@torproject.orgsched: When OR connection opens, always indicate the channel is ready for cells#### Problem
In `channel_tls_handle_state_change_on_orconn()`, when called when the OR connection becomes open, we have this snippet of code for when the new state is OPEN:
```
channel_change_state_open(base_chan);
if (connecti...#### Problem
In `channel_tls_handle_state_change_on_orconn()`, when called when the OR connection becomes open, we have this snippet of code for when the new state is OPEN:
```
channel_change_state_open(base_chan);
if (connection_or_num_cells_writeable(conn) > 0) {
scheduler_channel_wants_writes(base_chan);
}
```
So basically, the above means we only indicate the scheduler that the channel wants to write if we already have cells on the `outbuf`.
It basically means that if the channel *scheduler* state is `IDLE` (initial opening state), it then ends up in state `SCHED_CHAN_WAITING_FOR_CELLS` which then means the scheduler will process the channel when a cell is queued on it. But ONLY if the channel had cells when it was opened.
This on its own, as a design, is problematic because then what can make the channel transition into a state that allows the scheduler to recognize the channel as ready to be processed for cells? (`SCHED_CHAN_WAITING_FOR_CELLS`).
Tor currently has 2 callsites that tells the scheduler that a channel "wants to write" data on the wire (`scheduler_channel_wants_writes()`, remember that function transition the scheduler state of the channel to "waiting for cells"):
1. It is mentioned above that is when the channel becomes `OPEN`.
2. The other one is when data is _flushed_ from the outbuf, in `connection_or_flushed_some()`.
So once the channel is opened, we only become in the "wants to write" state if we previously had cells in the outbuf, which I can assure you is not always the case. And the other way is when we just wrote bytes on the network but then how can we do that in the first place?
#### What Is Saving Us
One question is then: Maybe tor code flow makes it that we always have a cell in the outbuf when the channel opens?
I conducted an experiment which made an Entry node of a client to only send 1 cell at a time per mainloop round. This made it that the `VERSIONs`, `CERTS` and `NETINFO` cell were sent in 3 different mainloop round and thus the client received them with a "delay".
That was enough for the client's channel to have _nothing_ in the outbuf when the channel became `OPEN` that is when the `NETINFO` cell is received from the relay which skipped the scheduler state change and thus the client channel was stuck there in scheduler `IDLE` mode even though the channel was in `OPEN` state.
What is saving us is because the very first thing we write on the wire, `VERSIONS` cell, calls the (2) callsite that tells the scheduler to go in "waiting for cells" state. And from there, the channel stays in that state.
For now, this seems to be "fine" but any future changes like for instance where we wanted to have everything that writes on the `outbuf` to go through the scheduler so we can have proper KIST prioritization, will fail due to this design.
#### Short-Term Solution
When the channel opens, we should simply put it in `wants to write` state all the time. So even bouncing from `MAINT` to `OPEN` state and vice versa will never make some cells stuck in the channel until something is explicitly written on the outbuf.
Furthermore, it should be done in `channel_change_state_()` since this affects `channel_t`, it is NOT specific to `channeltls_t`. So in a world where we end up with multiple channel type, this whole situation explodes.https://gitlab.torproject.org/tpo/core/tor/-/issues/32103Subsystem "thread_cleanup" is never called2021-11-06T13:03:43ZoparaSubsystem "thread_cleanup" is never calledSubsystems implement the interface of `struct subsys_fns_t`, with one of the optional function pointers being `void (*thread_cleanup)(void)`. This `thread_cleanup` function is called for all subsystems by the subsystem manager function `...Subsystems implement the interface of `struct subsys_fns_t`, with one of the optional function pointers being `void (*thread_cleanup)(void)`. This `thread_cleanup` function is called for all subsystems by the subsystem manager function `void subsystems_thread_cleanup(void)`, but the `subsystems_thread_cleanup` function is never called anywhere in the code.
At the moment, the only subsystem to implement the `thread_cleanup` interface is the crypto subsystem, which uses `thread_cleanup` for freeing the threadlocal `crypto_fast_rng_t`, as well as freeing the threadlocal error queue on old versions of OpenSSL. As far as I can tell, this is never run.
I think that the `subsystems_thread_cleanup` function should be run somewhere in the code, but it's not clear to me how this `subsystems_thread_cleanup` is expected to be used. It seems like there should also be `subsystems_thread_init` and `thread_init` functions as well for initializing threadlocal variables. Right now the crypto subsystem does an ["initialize on first use" singleton pattern](https://github.com/torproject/tor/blob/dfe7f004df46edaab0b23e260218d3c6d422d5fe/src/lib/crypt_ops/crypto_rand_fast.c#L382), but it might be useful to add this initialization interface function so that subsystems have the option of initializing all of their threadlocals in one place.https://gitlab.torproject.org/tpo/core/tor/-/issues/32071Remove perl scripts from development process?2022-09-01T21:42:49ZNick MathewsonRemove perl scripts from development process?There are a few perl5 scripts in our development process:
```
[1099]$ git grep /usr/bin/perl
[...]
contrib/or-tools/checksocks.pl:#!/usr/bin/perl -w
scripts/coccinelle/test-operator-cleanup:#!/usr/bin/perl -w -p -i
scripts/codegen/gen_...There are a few perl5 scripts in our development process:
```
[1099]$ git grep /usr/bin/perl
[...]
contrib/or-tools/checksocks.pl:#!/usr/bin/perl -w
scripts/coccinelle/test-operator-cleanup:#!/usr/bin/perl -w -p -i
scripts/codegen/gen_linux_syscalls.pl:#!/usr/bin/perl -w
scripts/maint/checkLogs.pl:#!/usr/bin/perl -w
scripts/maint/checkOptionDocs.pl.in:#!/usr/bin/perl -w
scripts/maint/checkSpace.pl:#!/usr/bin/perl
scripts/maint/check_config_macros.pl:#!/usr/bin/perl -w
scripts/maint/findMergedChanges.pl:#!/usr/bin/perl
scripts/maint/updateCopyright.pl:#!/usr/bin/perl -i -w -p
scripts/test/cov-exclude:#!/usr/bin/perl -p -i
```
We could do any of the following:
* Declare that perl5 is lovely and we don't care.
* Declare that perl5 is ugly but not really a problem.
* Declare that we will no longer require perl, and migrate all of these scripts to python and/or bash+sed+awk+etc
* Declare that we will only allow perl as a "better sed/awk", and migrate all of the more complex scripts to python and/or bash+sed+awk, but leaves the others asd perl.
I am excluding these possibilities, but the imp of the perverse demands that I mention them:
* Migrate to perl6 `^W` rakuhttps://gitlab.torproject.org/tpo/core/tor/-/issues/31908Unify meaning of routerset NULL and routerset "" to avoid recurrence of #314952022-09-01T21:42:49ZNick MathewsonUnify meaning of routerset NULL and routerset "" to avoid recurrence of #31495From legacy/trac#31611:
>Avoid configuration types where NULL and "" mean different things. We would have to add a new routerset type meaning "all routers", (maybe, "*"?) and have that be the default for EntryNodes. [We probably could n...From legacy/trac#31611:
>Avoid configuration types where NULL and "" mean different things. We would have to add a new routerset type meaning "all routers", (maybe, "*"?) and have that be the default for EntryNodes. [We probably could not make NULL mean the same as "" for all cases, since there are lots of string-valued configuration types.]https://gitlab.torproject.org/tpo/core/tor/-/issues/31483token_bucket_ctr_adjust() does not convert rate to step2021-09-16T14:13:16Zteortoken_bucket_ctr_adjust() does not convert rate to stepIn legacy/trac#30687, we created a single-counter token bucket token_bucket_ctr.
token_bucket_rw_adjust() calls rate_per_sec_to_rate_per_step(rate), but token_bucket_ctr_adjust() does not.
I suggest we fix this bug by moving rate_per_s...In legacy/trac#30687, we created a single-counter token bucket token_bucket_ctr.
token_bucket_rw_adjust() calls rate_per_sec_to_rate_per_step(rate), but token_bucket_ctr_adjust() does not.
I suggest we fix this bug by moving rate_per_sec_to_rate_per_step(rate) into token_bucket_cfg_init(). And we should add some documentation that explains the difference between rate and burst.
Gaba, this is sponsor 31-must, because it is a bug fix on sponsor 31 code that has already been merged.https://gitlab.torproject.org/tpo/core/tor/-/issues/30578The circuitpadding_circuitsetup_machine test: Re-enable, remove, re-document,...2022-06-24T14:58:36ZNick MathewsonThe circuitpadding_circuitsetup_machine test: Re-enable, remove, re-document, or ___?Our code in `test_circuitpadding.c` says:
```
/** Disabled unstable test until #29298 is implemented (see #29122) */
// TEST_CIRCUITPADDING(circuitpadding_circuitsetup_machine, TT_FORK),
```
But both legacy/trac#29298 and legacy/tr...Our code in `test_circuitpadding.c` says:
```
/** Disabled unstable test until #29298 is implemented (see #29122) */
// TEST_CIRCUITPADDING(circuitpadding_circuitsetup_machine, TT_FORK),
```
But both legacy/trac#29298 and legacy/trac#29122 are closed now.
If this test will work now, let's enable it. If it is no longer useful, let's remove it. If it is disabled for some reason other than the one that's described in the comment, let's adjust the comment.https://gitlab.torproject.org/tpo/core/tor/-/issues/30526Tests should not load system geoip files.2022-09-01T21:39:46ZNick MathewsonTests should not load system geoip files.See legacy/trac#29702 for background. In that ticket we made sure that we stopped looking at torrc files, but we still sometimes look at geoipfiles. They are less risky, though, so this is a lower priority.See legacy/trac#29702 for background. In that ticket we made sure that we stopped looking at torrc files, but we still sometimes look at geoipfiles. They are less risky, though, so this is a lower priority.https://gitlab.torproject.org/tpo/core/tor/-/issues/29930Warning: can't unlink unverified-consensus on Windows2021-09-16T14:13:16ZteorWarning: can't unlink unverified-consensus on WindowsIn legacy/trac#28614, Vort says:
BTW, clean start of 0.4.0.3-alpha brings another problem:
`Mar 28 08:26:00.000 [warn] Failed to unlink C:\Users\Vort\AppData\Roaming\tor\unverified-microdesc-consensus: Permission denied`
Does this issu...In legacy/trac#28614, Vort says:
BTW, clean start of 0.4.0.3-alpha brings another problem:
`Mar 28 08:26:00.000 [warn] Failed to unlink C:\Users\Vort\AppData\Roaming\tor\unverified-microdesc-consensus: Permission denied`
Does this issue happen with an empty data directory?
Or does it only happen when unverified-microdesc-consensus exists?
I'm assigning this ticket to ahf, because he has a working Windows box with Tor.https://gitlab.torproject.org/tpo/core/tor/-/issues/29802Document the v3 onion service key files in the tor man page2022-09-01T21:29:27ZteorDocument the v3 onion service key files in the tor man pageThe tor man page is missing the names of the key files for v3 onion services.The tor man page is missing the names of the key files for v3 onion services.https://gitlab.torproject.org/tpo/core/tor/-/issues/27299hsv3: Clarify timing sources around hsv3 code2022-02-07T19:38:03ZGeorge Kadianakishsv3: Clarify timing sources around hsv3 codeA big source of bugs and confusions (e.g. legacy/trac#26980, legacy/trac#26930) in the HSv3 code stem from the fact that it uses various timing sources to compute time periods, SRV, etc. Some parts of the code use `time(NULL)`, others us...A big source of bugs and confusions (e.g. legacy/trac#26980, legacy/trac#26930) in the HSv3 code stem from the fact that it uses various timing sources to compute time periods, SRV, etc. Some parts of the code use `time(NULL)`, others use the current consensus valid-after, and others use the voting-schedule.
The code is currently not clear in which timing source is used in each case. As an example, some functions take as input `now` but they only use it to get a live consensus to use its valid-after, but that may confuse a reader that the `now` is used as the time source (e.g. `should_rotate_descriptors()` that caused the legacy/trac#26930 confusion).
We should try to clarify and improve the function signatures around the HSv3 codebase on this regard.https://gitlab.torproject.org/tpo/core/tor/-/issues/26713relays should explicitly dirty their descriptors on entering/exiting hibernat...2022-09-01T21:29:26ZRoger Dingledinerelays should explicitly dirty their descriptors on entering/exiting hibernation, rather than implicitly doing it by triggering a bandwidth changeMotivated by the legacy/trac#24104 fixes:
Currently Tor notices that it's gone into hibernation, and should publish a new descriptor saying so, when it calls check_descriptor_bandwidth_changed() and it decides "cur" is 0 and that's suff...Motivated by the legacy/trac#24104 fixes:
Currently Tor notices that it's gone into hibernation, and should publish a new descriptor saying so, when it calls check_descriptor_bandwidth_changed() and it decides "cur" is 0 and that's sufficiently different than the last bandwidth, so it dirties its descriptor with reason "bandwidth has changed".
Similarly, in that same function, when we stop hibernating we notice because cur is not zero but the last published bandwidth number was 0.
That "implicitly notice if we've hibernated" logic makes it really easy to have surprises here. For example, in legacy/trac#24104, we almost changed it so you don't notice the start of hibernation if your uptime is more than 24 hours.
We should back out the new hack in legacy/trac#24104 that made an exception when we're hibernating, and then add new explicit mark-my-descriptor-dirty lines when we enter hibernation and when we exit it.https://gitlab.torproject.org/tpo/core/tor/-/issues/24906Make channelpadding_update_padding_for_channel() use !channel_is_client()2022-09-01T21:31:34ZteorMake channelpadding_update_padding_for_channel() use !channel_is_client()Because we need to include all relays, even if they are flapping in the consensus:
```
// We should not allow malicious relays to disable or reduce padding for
// us as clients. In fact, we should only accept this cell at all if w...Because we need to include all relays, even if they are flapping in the consensus:
```
// We should not allow malicious relays to disable or reduce padding for
// us as clients. In fact, we should only accept this cell at all if we're
// operating as a relay. Bridges should not accept it from relays, either
// (only from their clients).
if ((get_options()->BridgeRelay &&
connection_or_digest_is_known_relay(chan->identity_digest)) ||
!get_options()->ORPort_set) {
```
We should also use !server_mode() rather than !ORPort_set.https://gitlab.torproject.org/tpo/core/tor/-/issues/24731Stop checking routerinfos for addresses when we use microdescs for circuits2022-09-01T21:31:34ZteorStop checking routerinfos for addresses when we use microdescs for circuitsDirectory mirrors and clients that FetchUselessDescriptors check for IPv4 and IPv6 addresses in the following order:
* routerinfos (descriptors)
* routerstatus (consensus)
* microdescriptors
But they should check using the following ord...Directory mirrors and clients that FetchUselessDescriptors check for IPv4 and IPv6 addresses in the following order:
* routerinfos (descriptors)
* routerstatus (consensus)
* microdescriptors
But they should check using the following order:
* bridge routerinfos (descriptors)
* routerstatus (consensus)
If using microdescriptors for circuits:
* microdescriptors
Otherwise:
* routerinfos (descriptors)
There is code that implements this algorithm in commits decb0636e2, 1d1c927b9a, and 4979ec3c17 of my bug23975_tree branch.
But this adds overhead to every address lookup when building circuits.
Maybe we can make it faster by:
* not parsing routerinfos or microdescs if we aren't using them for circuits, or
* putting a canonical address in node_t, updating it whenever ri, rs, or md change, and always using ithttps://gitlab.torproject.org/tpo/core/tor/-/issues/23744sched: Notification events have different meaning for each scheduler2022-09-01T21:31:34ZDavid Gouletdgoulet@torproject.orgsched: Notification events have different meaning for each schedulerAs an example, KIST controls how much and when channel data is pushed on the network which means this `wants to write` event used by the Vanilla scheduler as "queued cell but no space on the outbuf" is not something that is coherent with...As an example, KIST controls how much and when channel data is pushed on the network which means this `wants to write` event used by the Vanilla scheduler as "queued cell but no space on the outbuf" is not something that is coherent with KIST.
A channel is scheduled in when it has cells in the queue, then they are flushed one by one by the KIST scheduler until the kernel says "no more space". Then, that channel is put back in the channel pending list and will get handled at the next tick of KIST.
So, we really don't need KIST to be notified of this event from `connection_flushed_some()` which is used by Vanilla to try to flush more cells in the outbuf because the outbuf has room for it.
Where it is useful is the second callsite of that even in `channel_tls_handle_state_change_on_orconn()` which notifies the scheduler that it might be in need of flushing some stuff. In the case of a brand new channel just opening, its state is "IDLE" and that even will then put it in "waiting for cells" after.
That being said, what needs to happened:
1. Make the notification event a per-scheduler thing because KIST and Vanilla have different semantic for those events really. We should of course avoid as much as we can of code duplication and thus some "generic event handler" do make sense if they share the same semantic.
2. Add a "channel state is open" notification event instead of "wants to write" which is really only true in very specific cases in `channel_tls_handle_state_change_on_orconn()`. That way, the scheduler can take a decision on the status of the channel instead of blind guessing it is waiting for cells.
3. Nullify the "wants to write" event for KIST considering (2) so it stops possibly scheduling channels that do not need at all to be scheduled.https://gitlab.torproject.org/tpo/core/tor/-/issues/22717Rename channelpadding.c's CHANNEL_IS_CLIENT to avoid confusion2022-09-01T21:31:33ZteorRename channelpadding.c's CHANNEL_IS_CLIENT to avoid confusionThere is already a channel_is_client() function in channel.[ch], with different arguments and semantics.
And while we're doing that, we should make the newly renamed CHANNEL_IS_CLIENT() call channel.h's channel_is_client(), because it's...There is already a channel_is_client() function in channel.[ch], with different arguments and semantics.
And while we're doing that, we should make the newly renamed CHANNEL_IS_CLIENT() call channel.h's channel_is_client(), because it's the right abstraction to use.
We could survive releasing 0.3.1 without this fix, because it's not a bug (yet).https://gitlab.torproject.org/tpo/core/tor/-/issues/22449Remove timestamp_dirty kludge from mark_circuit_unusable_for_new_conns()2021-09-16T14:18:43ZNick MathewsonRemove timestamp_dirty kludge from mark_circuit_unusable_for_new_conns()In mark_circuit_unsusable_for_new_conns(), we set the unusable_for_new_conns flag... but we also carry around some old code that messes around with timestamp_dirty.
The root problem here is that the old code makes timestamp_dirty into...In mark_circuit_unsusable_for_new_conns(), we set the unusable_for_new_conns flag... but we also carry around some old code that messes around with timestamp_dirty.
The root problem here is that the old code makes timestamp_dirty into a lie; "unusable" and "dirty" are not the same concept.
We should carefully audit timestamp_dirty and its users to make sure that it's safe to remove this old kludge, and then remove it or replace it with something more accurate.