Some web caches (such as Farahavar's previous cache), pass on the X-Your-IP-Address-Is header from one directory document to multiple clients. This causes the clients to guess the wrong IP address as their address.
I think we should add one or more of the following headers to every directory response:
Pragma: no-cache tells HTTP 1.0 compliant caches to disable caching entirely. (This will also disable caching for HTTP 1.1 caches unless we provide a more generous Cache-Control header, like the one below.)
Connection: close X-Your-IP-Address-Is tells HTTP 1.1 caches to never send out the X-Your-IP-Address-Is header, even to the first client requesting the document.
Cache-Control: no-cache="X-Your-IP-Address-Is" tells HTTP 1.1 caches to not cache the header at all. Alternately, if the cache doesn't support the no-cache="" feature, it tells the cache not to cache the entire document. (This also causes the cache to attempt to revalidate the header, which might not be what we want, as Tor doesn't support cache revalidation.)
I don't know enough about how caches typically behave to recommend which ones.
IMO cacheing should probably happen on URLs that are cacheable (ie, consensuses).
But when a cache stores a consensus, it also stores the X-Your-IP-Address header from either the cache's machine, or the original client that asked for the consensus.
Neither of these values of X-Your-IP-Address-Is are valid for any other clients, but caches will typically send them out with the cached response.
What if we went a step further and didn't include the header at all in unencrypted connections? That is, we include it in the begin_dir response but not in the naked dirport responses.
The main effect would be that relays, who use the naked dirport, would no longer be able to learn their IP address from their directory authority interactions.
We could work around that by finally moving all dir traffic to begin_dir (which still makes me uncomfortable because of the extra scaling and load, but maybe this is a good additional kick for why we should do it anyway), or by having relays who don't know their address launch a begin_dir connection just for finding it out.
Actually, wait a minute, don't netinfo cells have your address in them now too? Does that mean x-your-address-is on naked dirport answers is redundant? And thus we should try to phase it out in favor of the encrypted, authenticated mechanism that we built?
The reason I want to get rid of the caching situation is because this is an information leak, from one user to another. Now, it's mostly just relays who suffer, since they're the ones who use naked dirport requests. But this is still an uncomfortable state of affairs to leave in place.
What if we went a step further and didn't include the header at all in unencrypted connections? That is, we include it in the begin_dir response but not in the naked dirport responses.
I think this is an excellent idea. As the HTTP headers of a naked dirport response are unauthenticated, they can be modified in transit, and we can't know either way.
The main effect would be that relays, who use the naked dirport, would no longer be able to learn their IP address from their directory authority interactions.
A relay believes any directory mirror, not just the authorities. But if it doesn't know its IP address, it will only connect to authorities.
We could work around that by finally moving all dir traffic to begin_dir (which still makes me uncomfortable because of the extra scaling and load, but maybe this is a good additional kick for why we should do it anyway), or by having relays who don't know their address launch a begin_dir connection just for finding it out.
With the introduction of fallback directory mirrors in 0.2.8 (legacy/trac#15775 (moved)), the extra load for bootstrap begindirs will be shared among 100-250 high-uptime directory mirrors, rather than just the ~9 authorities.
After bootstrap, with the introduction of "dir servers for all" (legacy/trac#12538 (moved)) in 0.2.8, it will be shared among almost all relays.
So I think we can do begindirs for all directory fetches. We might want to fix legacy/trac#17848 (moved) at the same time, otherwise clients and relays won't know if they have an existing connection to a directory server, and load balancing will suffer.
Actually, wait a minute, don't netinfo cells have your address in them now too? Does that mean x-your-address-is on naked dirport answers is redundant? And thus we should try to phase it out in favor of the encrypted, authenticated mechanism that we built?
It has the relay's IPv4 address.
(Although it's somewhat orthogonal, we'd like to have some way for relays to learn their IPv6 addresses, too. This would be somewhat easier to do by adding a HTTP header, rather than changing the format of a NETINFO cell. See legacy/trac#5940 (moved).)
The reason I want to get rid of the caching situation is because this is an information leak, from one user to another. Now, it's mostly just relays who suffer, since they're the ones who use naked dirport requests. But this is still an uncomfortable state of affairs to leave in place.
It is impossible that we will fix all 226 currently open 028 tickets before 028 releases. Time to move some out. This is my second pass through the "new" and tickets, looking for things to move to 0.2.9.
Trac: Milestone: Tor: 0.2.8.x-final to Tor: 0.2.9.x-final
What if we went a step further and didn't include the header at all in unencrypted connections? That is, we include it in the begin_dir response but not in the naked dirport responses.
I think this is an excellent idea. As the HTTP headers of a naked dirport response are unauthenticated, they can be modified in transit, and we can't know either way.
The main effect would be that relays, who use the naked dirport, would no longer be able to learn their IP address from their directory authority interactions.
A relay believes any directory mirror, not just the authorities. But if it doesn't know its IP address, it will only connect to authorities.
We could work around that by finally moving all dir traffic to begin_dir (which still makes me uncomfortable because of the extra scaling and load, but maybe this is a good additional kick for why we should do it anyway), or by having relays who don't know their address launch a begin_dir connection just for finding it out.
With the introduction of fallback directory mirrors in 0.2.8 (legacy/trac#15775 (moved)), the extra load for bootstrap begindirs will be shared among 100-250 high-uptime directory mirrors, rather than just the ~9 authorities.
After bootstrap, with the introduction of "dir servers for all" (legacy/trac#12538 (moved)) in 0.2.8, it will be shared among almost all relays.
So I think we can do begindirs for all directory fetches.
We might want to fix legacy/trac#17848 (moved) at the same time, otherwise clients and relays won't know if they have an existing connection to a directory server, and load balancing will suffer.
Actually, wait a minute, don't netinfo cells have your address in them now too? Does that mean x-your-address-is on naked dirport answers is redundant? And thus we should try to phase it out in favor of the encrypted, authenticated mechanism that we built?
It has the relay's IPv4 address.
(Although it's somewhat orthogonal, we'd like to have some way for relays to learn their IPv6 addresses, too. This would be somewhat easier to do by adding a HTTP header, rather than changing the format of a NETINFO cell. See legacy/trac#5940 (moved).)
Actually, the NETINFO format supports IPv6. If it doesn't work when you connect to a relay's IPv6 ORPort, that's a bug.
The reason I want to get rid of the caching situation is because this is an information leak, from one user to another. Now, it's mostly just relays who suffer, since they're the ones who use naked dirport requests. But this is still an uncomfortable state of affairs to leave in place.
Correcting summary to match current spelling of the header in code: X-Your-Address-Is.
Trac: Username: jryans Summary: Tell caches to remove X-Your-IP-Address-Is from Tor Directory documents to Tell caches to remove X-Your-Address-Is from Tor Directory documents Cc: N/Atojryans@gmail.com
I've used local testing with Chutney to verify that it's working correctly.
We do appear to receive both IPv4 and IPv6 addresses as expected via NETINFO cells. We should also update the control spec to note a new method for IP address changes, but I'll wait until we discuss the approach here first.
Trac: Username: jryans Status: assigned to needs_review
This patch looks correct to me: it removes support for giving or taking the X-Your-Address-Is hint, and replaces it with looking at the netinfo hint instead.
I have two questions about this patch -- arma or teor, could you give some feedback?
Should it ignore IPv6 addresses, because we can't publish them as a relay's primary address? (I think "yes".)
Should it continue to give the X-Your-Address-Is answer for now, in case some relays actually need it? (Here I'm not sure.)
This patch looks correct to me: it removes support for giving or taking the X-Your-Address-Is hint, and replaces it with looking at the netinfo hint instead.
I have two questions about this patch -- arma or teor, could you give some feedback?
Should it ignore IPv6 addresses, because we can't publish them as a relay's primary address? (I think "yes".)
At some point, we would like relays to be able to learn their IPv6 addresses (legacy/trac#5940 (moved)). But since this is a new feature, with its own ticket, there's no need to implement it in this ticket.
Should it continue to give the X-Your-Address-Is answer for now, in case some relays actually need it? (Here I'm not sure.)
Yes, but only over an encrypted begindir connection.
Or, we can cut out all the X-Your-Address-Is code, and just rely on NETINFO cells.
Here's what I'd like to do:
A) directory caches only send X-Your-Address-Is on begindir connections
modify if (!is_local_addr(&conn->base_.addr)) { in write_http_response_header_impl() to check for begindir connections
B) relays only believe X-Your-Address-Is from begindir connections
modify if (conn->dirconn_direct) { in connection_dir_client_reached_eof() to check for begindir connections
C) relays which don't know their own IP address make a begindir connection to an authority to discover that IP address
modify if (!directory_must_use_begindir(options)) { in directory_command_should_use_begindir() to also check if we know our own address using router_pick_published_address(options, &addr, 1) == 0
What will likely happen is:
the relay initiates a begindir connection and receives its own IP address in the NETINFO cell
If this fails, then:
the relay believes the X-Your-IP-Address-Is header on the encrypted begindir request
We can't just implement A) as a transitional measure for older relays talking to newer caches, because those older relays will never make begindir connections - in directory_command_should_use_begindir(), relays never make begindir connections (well, until legacy/trac#20711 (moved), and then only in the case when the mirror only has an ORPort).
If you stop sending x-your-address-is headers, then all of the relays that haven't upgraded yet will be sol. So we shouldn't stop sending them for a good while, right?
Teor suggests "relays which don't know their own IP address make a begindir connection to an authority to discover that IP address", but one of the goals here is to detect if the IP address has changed, not just to learn it the first time. So if relays only do begindir when they don't know any IP address for themselves, and they stop believing the naked http header, then we lose the functionality to learn when the address changed.
...Unless there is some periodic relay handshake that the relay does, which would let it learn about a new address from the netinfo cell. And I've got just the one -- the periodic reachability tests by the directory authorities -- except, if the IP address changes, the relays will stop being reachable anymore, so they're going to have to notice on their own that something changed, in order to generate a new descriptor with the new address in it, and only then will the authorities try reaching them on the new address. Bummer.
Ok. To summarize:
Directory servers shouldn't stop giving out the header yet, or it'll break existing relays.
We can teach new relays to listen to the address they find in the netinfo cell. Probably we should only believe it when we're interacting with a directory authority. But that change by itself won't be enough, because we also need to make relays do periodic outbound connection handshakes with directory authorities, or they won't reliably get the netinfo cells they need. That step probably requires at least some design, and makes an 030 target less likely.
Right now if anything caches directory objects at the middlebox, I fear they cache the http headers too. So I think Nick's statement that "cacheing should probably happen on URLs that are cacheable (ie, consensuses)" is not true yet, because every url will have the x-your-address-is header in it, and we shouldn't take that header out yet. Does that mean we should add the no-cache / no-store lines for now, while we still serve the x-your-address-is header?
If you stop sending x-your-address-is headers, then all of the relays that haven't upgraded yet will be sol. So we shouldn't stop sending them for a good while, right?
Yes. Even on unencrypted connections, sadly.
Teor suggests "relays which don't know their own IP address make a begindir connection to an authority to discover that IP address", but one of the goals here is to detect if the IP address has changed, not just to learn it the first time. So if relays only do begindir when they don't know any IP address for themselves, and they stop believing the naked http header, then we lose the functionality to learn when the address changed.
...Unless there is some periodic relay handshake that the relay does, which would let it learn about a new address from the netinfo cell. And I've got just the one -- the periodic reachability tests by the directory authorities -- except, if the IP address changes, the relays will stop being reachable anymore, so they're going to have to notice on their own that something changed, in order to generate a new descriptor with the new address in it, and only then will the authorities try reaching them on the new address. Bummer.
The preemptive (client) circuits that each relay makes elicit netinfo cells in response, as do bandwidth self-testing and ORPort and DirPort reachability checks.
Ok. To summarize:
Directory servers shouldn't stop giving out the header yet, or it'll break existing relays.
We can teach new relays to listen to the address they find in the netinfo cell. Probably we should only believe it when we're interacting with a directory authority. But that change by itself won't be enough, because we also need to make relays do periodic outbound connection handshakes with directory authorities, or they won't reliably get the netinfo cells they need. That step probably requires at least some design, and makes an 030 target less likely.
How about a design where the relay uses the netinfo cells from any circuit to detect if the address might have changed, and then contacts a directory authority to confirm?
Right now if anything caches directory objects at the middlebox, I fear they cache the http headers too. So I think Nick's statement that "cacheing should probably happen on URLs that are cacheable (ie, consensuses)" is not true yet, because every url will have the x-your-address-is header in it, and we shouldn't take that header out yet. Does that mean we should add the no-cache / no-store lines for now, while we still serve the x-your-address-is header?
Yes, we should add the cache lines, there's no way of escaping that.
(Well, 0.2.8 and later clients don't need them, because they all use begindir.)
Maybe somebody wants to put together a short branch that adds those headers to directory responses when they come over an unencrypted channel? It seems everybody here agrees with that component (or is that 'nobody disagrees').
These needs_revision, tickets, tagged with 034-removed-*, are no longer in-scope for 0.3.4. We can reconsider any of them, if somebody does the necessary revision.
Trac: Milestone: Tor: 0.3.4.x-final to Tor: unspecified
We may fix this issue in legacy/trac#5940 (moved) by requiring BEGINDIR before trusting X-Your-Address-Is, and making relays use the ORPort if they don't know their address.
Trac: Parent: N/Atolegacy/trac#5940 (moved) Summary: Tell caches to remove X-Your-Address-Is from Tor Directory documents to Stop caches storing or modifying X-Your-Address-Is from Tor Directory documents
The problematic caches are HTTP caches, not directory caches.
Trac: Summary: Stop caches storing or modifying X-Your-Address-Is from Tor Directory documents to Stop HTTP caches storing or modifying X-Your-Address-Is from Tor Directory documents