Trac issueshttps://gitlab.torproject.org/legacy/trac/-/issues2020-06-13T16:03:44Zhttps://gitlab.torproject.org/legacy/trac/-/issues/17592Clean up connection timeout logic2020-06-13T16:03:44ZMike PerryClean up connection timeout logicIn #6799, it was decided to keep TLS connections open for a random amount of time after they are idle, to defend against an attack that used internal Tor network connectivity information to determine Guard nodes (Slides: https://www.cryp...In #6799, it was decided to keep TLS connections open for a random amount of time after they are idle, to defend against an attack that used internal Tor network connectivity information to determine Guard nodes (Slides: https://www.cryptolux.org/images/8/85/ESORICS-2012-Presentation-2.pdf Paper: https://eprint.iacr.org/2012/432.pdf).
Unfortunately, this logic (in connection_or_set_canonical()) is kind of a mess. Relays and clients are treated the same, and client connections are also kept alive for an additional hour by predictive circuit building even when otherwise idle, where as relays are not.
If we treat relays and clients separately for this timeout, we could reduce total client keep-alive time significantly (down to 30 minutes or so), and significantly increase relay connection lifetime. This should result in reduced total connection counts on relays, with better defenses against Torscan.
This is also needed in order to put reasonable bounds on padding overhead in #16861 for mobile clients. Furthermore, aside from the wieners running middle relays behind junky home routers who will whine about connection overflow, having a more well-connected Tor network is a good idea for many reasons (not just Torscan).Tor: 0.3.1.x-finalMike PerryMike Perryhttps://gitlab.torproject.org/legacy/trac/-/issues/17604Try to use only one canonical connection2020-06-13T15:53:07ZMike PerryTry to use only one canonical connectionFor #16861, I would like to experiment with padding between relays, as well as generally keep relay-to-relay orconns open much longer. This will be beneficial against attacks like Torscan (https://eprint.iacr.org/2012/432.pdf), as well a...For #16861, I would like to experiment with padding between relays, as well as generally keep relay-to-relay orconns open much longer. This will be beneficial against attacks like Torscan (https://eprint.iacr.org/2012/432.pdf), as well as related netflow-based attacks that attempt to determine the guard node of a connection by using netflow data to accomplish the same thing as the Torscan attack.
Unfortunately, the logic for preferring orconns (is_canonical and channel_is_better()) is a mind-bender, but it appears to be the case that we may have situations where multiple orconns can be opened between relays where each side disagrees over which connection is canonical. This can happen because the source port won't match the orport in the descriptor of the remote relay for incoming connections. It will also be the case for nodes that make outgoing connections from different IP address than those in their descriptor.
I asked on #tor-dev, and two relay operators reported cases of such multiple connections to relays.
I think the following changes will improve the situation:
1. Mark all authenticated connections as canonical (everything with link proto v3 and higher will authenticate, yes?)
2. Alter channel_is_better() to prefer older orconns in the case of multiple canonical connections, and use the orconn with more circuits on it in case of age ties.
I think age is more important than number of circuits because we want to avoid giving an attacker the ability to switch an orconn between relays by creating a new one and opening a bunch of circuits on it. channel_is_better() is doing the opposite of this behavior right now, on both fronts.Tor: 0.3.1.x-finalMike PerryMike Perryhttps://gitlab.torproject.org/legacy/trac/-/issues/19958Implement proposal 264 (protocol versioning)2020-06-13T15:03:40ZNick MathewsonImplement proposal 264 (protocol versioning)This is not 100% strictly necessary for #15055 to work, but every time we change a protocol, we will wish that we had included this feature.
Mostly implemented in a fit of C while I was on vacation.This is not 100% strictly necessary for #15055 to work, but every time we change a protocol, we will wish that we had included this feature.
Mostly implemented in a fit of C while I was on vacation.Tor: 0.2.9.x-finalNick MathewsonNick Mathewsonhttps://gitlab.torproject.org/legacy/trac/-/issues/20117Update PathsNeededToBuildCircuits man page entry with actual default2020-06-13T15:01:24ZteorUpdate PathsNeededToBuildCircuits man page entry with actual default"if the directory authorities do not choose a value, Tor will use 0.6.""if the directory authorities do not choose a value, Tor will use 0.6."Tor: 0.2.9.x-finalteorteorhttps://gitlab.torproject.org/legacy/trac/-/issues/20081potential memory corruption in or/buffers.c (not exploitable)2020-06-13T15:01:11ZGeorge Kadianakispotential memory corruption in or/buffers.c (not exploitable)Bug reported by _Guido Vranken_ through hackerone.
We believe it's not an exploitable issue, but it's still a bug worth fixing.
Here follows the report:
----
In `or/buffer.s.c`:
```
/** Return the allocation size we'd like to use to ...Bug reported by _Guido Vranken_ through hackerone.
We believe it's not an exploitable issue, but it's still a bug worth fixing.
Here follows the report:
----
In `or/buffer.s.c`:
```
/** Return the allocation size we'd like to use to hold <b>target</b>
* bytes. */
static inline size_t
preferred_chunk_size(size_t target)
{
size_t sz = MIN_CHUNK_ALLOC;
while (CHUNK_SIZE_WITH_ALLOC(sz) < target) {
sz <<= 1;
}
return sz;
}
```
```
#define MIN_CHUNK_ALLOC 256
#define CHUNK_SIZE_WITH_ALLOC(memlen) ((memlen) - CHUNK_HEADER_LEN)
```
CHUNK_HEADER_LEN is usually around 30 bytes or so.
The problem with `preferred_chunk_size` is that for a large `size_t target`, the function will return 0.
If you compile this program with `-m32`:
```
#include <stdio.h>
#include <stdint.h>
#define FLEXIBLE_ARRAY_MEMBER /**/
#define DEBUG_CHUNK_ALLOC
/** A single chunk on a buffer. */
typedef struct chunk_t {
struct chunk_t *next; /**< The next chunk on the buffer. */
size_t datalen; /**< The number of bytes stored in this chunk */
size_t memlen; /**< The number of usable bytes of storage in <b>mem</b>. */
#ifdef DEBUG_CHUNK_ALLOC
size_t DBG_alloc;
#endif
char *data; /**< A pointer to the first byte of data stored in <b>mem</b>. */
uint32_t inserted_time; /**< Timestamp in truncated ms since epoch
* when this chunk was inserted. */
char mem[FLEXIBLE_ARRAY_MEMBER]; /**< The actual memory used for storage in
* this chunk. */
} chunk_t;
#if defined(__GNUC__) && __GNUC__ > 3
#define STRUCT_OFFSET(tp, member) __builtin_offsetof(tp, member)
#else
#define STRUCT_OFFSET(tp, member) \
((off_t) (((char*)&((tp*)0)->member)-(char*)0))
#endif
#define MIN_CHUNK_ALLOC 256
#define CHUNK_HEADER_LEN STRUCT_OFFSET(chunk_t, mem[0])
#define CHUNK_SIZE_WITH_ALLOC(memlen) ((memlen) - CHUNK_HEADER_LEN)
static inline size_t
preferred_chunk_size(size_t target)
{
size_t sz = MIN_CHUNK_ALLOC;
while (CHUNK_SIZE_WITH_ALLOC(sz) < target) {
sz <<= 1;
}
return sz;
}
int main(void)
{
size_t i = 1024;
printf("i is %08X, preferred_chunk_size is %08X\n", i, preferred_chunk_size(i)); i <<= 1;
printf("i is %08X, preferred_chunk_size is %08X\n", i, preferred_chunk_size(i)); i <<= 1;
printf("i is %08X, preferred_chunk_size is %08X\n", i, preferred_chunk_size(i)); i <<= 1;
printf("i is %08X, preferred_chunk_size is %08X\n", i, preferred_chunk_size(i)); i <<= 1;
printf("i is %08X, preferred_chunk_size is %08X\n", i, preferred_chunk_size(i)); i <<= 1;
printf("i is %08X, preferred_chunk_size is %08X\n", i, preferred_chunk_size(i)); i <<= 1;
printf("i is %08X, preferred_chunk_size is %08X\n", i, preferred_chunk_size(i)); i <<= 1;
printf("i is %08X, preferred_chunk_size is %08X\n", i, preferred_chunk_size(i)); i <<= 1;
printf("i is %08X, preferred_chunk_size is %08X\n", i, preferred_chunk_size(i)); i <<= 1;
printf("i is %08X, preferred_chunk_size is %08X\n", i, preferred_chunk_size(i)); i <<= 1;
printf("i is %08X, preferred_chunk_size is %08X\n", i, preferred_chunk_size(i)); i <<= 1;
printf("i is %08X, preferred_chunk_size is %08X\n", i, preferred_chunk_size(i)); i <<= 1;
printf("i is %08X, preferred_chunk_size is %08X\n", i, preferred_chunk_size(i)); i <<= 1;
printf("i is %08X, preferred_chunk_size is %08X\n", i, preferred_chunk_size(i)); i <<= 1;
printf("i is %08X, preferred_chunk_size is %08X\n", i, preferred_chunk_size(i)); i <<= 1;
printf("i is %08X, preferred_chunk_size is %08X\n", i, preferred_chunk_size(i)); i <<= 1;
printf("i is %08X, preferred_chunk_size is %08X\n", i, preferred_chunk_size(i)); i <<= 1;
printf("i is %08X, preferred_chunk_size is %08X\n", i, preferred_chunk_size(i)); i <<= 1;
printf("i is %08X, preferred_chunk_size is %08X\n", i, preferred_chunk_size(i)); i <<= 1;
printf("i is %08X, preferred_chunk_size is %08X\n", i, preferred_chunk_size(i)); i <<= 1;
printf("i is %08X, preferred_chunk_size is %08X\n", i, preferred_chunk_size(i)); i <<= 1;
printf("i is %08X, preferred_chunk_size is %08X\n", i, preferred_chunk_size(i)); i <<= 1;
return 0;
}
```
the output is:
```
i is 00000400, preferred_chunk_size is 00000800
i is 00000800, preferred_chunk_size is 00001000
i is 00001000, preferred_chunk_size is 00002000
i is 00002000, preferred_chunk_size is 00004000
i is 00004000, preferred_chunk_size is 00008000
i is 00008000, preferred_chunk_size is 00010000
i is 00010000, preferred_chunk_size is 00020000
i is 00020000, preferred_chunk_size is 00040000
i is 00040000, preferred_chunk_size is 00080000
i is 00080000, preferred_chunk_size is 00100000
i is 00100000, preferred_chunk_size is 00200000
i is 00200000, preferred_chunk_size is 00400000
i is 00400000, preferred_chunk_size is 00800000
i is 00800000, preferred_chunk_size is 01000000
i is 01000000, preferred_chunk_size is 02000000
i is 02000000, preferred_chunk_size is 04000000
i is 04000000, preferred_chunk_size is 08000000
i is 08000000, preferred_chunk_size is 10000000
i is 10000000, preferred_chunk_size is 20000000
i is 20000000, preferred_chunk_size is 40000000
i is 40000000, preferred_chunk_size is 80000000
i is 80000000, preferred_chunk_size is 00000000
```
The danger is that the return value of `preferred_chunk_size` is always used as a parameter to `tor_malloc` or `tor_realloc`. It is called at these places:
In `buf_pullup`:
```
210 newsize = CHUNK_SIZE_WITH_ALLOC(preferred_chunk_size(capacity));
211 newhead = chunk_grow(buf->head, newsize);
```
In `buf_new_with_capacity`:
```
283 /** Create and return a new buf with default chunk capacity <b>size</b>.
284 */
285 buf_t *
286 buf_new_with_capacity(size_t size)
287 {
288 buf_t *b = buf_new();
289 b->default_chunk_size = preferred_chunk_size(size);
290 return b;
291 }
```
In `buf_add_chunk_with_capacity`:
```
401 /** Append a new chunk with enough capacity to hold <b>capacity</b> bytes to
402 * the tail of <b>buf</b>. If <b>capped</b>, don't allocate a chunk bigger
403 * than MAX_CHUNK_ALLOC. */
404 static chunk_t *
405 buf_add_chunk_with_capacity(buf_t *buf, size_t capacity, int capped)
406 {
407 chunk_t *chunk;
408
409 if (CHUNK_ALLOC_SIZE(capacity) < buf->default_chunk_size) {
410 chunk = chunk_new_with_alloc_size(buf->default_chunk_size);
411 } else if (capped && CHUNK_ALLOC_SIZE(capacity) > MAX_CHUNK_ALLOC) {
412 chunk = chunk_new_with_alloc_size(MAX_CHUNK_ALLOC);
413 } else {
414 chunk = chunk_new_with_alloc_size(preferred_chunk_size(capacity));
415 }
```
`buf_new_with_capacity` is currently called nowhere except for tests.
`buf_add_chunk_with_capacity` is called at various places but currently not with the `apped` parameter set to 0.
However, `buf_pullup` is called at various places and the call to `preferred_chunk_size` is reachable. Whether it is reachable with a parameter large enough that it will return 0 I'm not sure about.
```
int
tor_main(int argc, char *argv[])
{
buf_t* buf;
char* string;
size_t string_len;
size_t i;
buf = buf_new();
string_len = 0x00001000;
string = tor_malloc(string_len);
for (i = 0; i < 507904; i++)
{
write_to_buf(string, string_len, buf);
}
write_to_buf(string, 0x3FFFFFA, buf);
free(string);
buf_pullup(buf, 0x90000000);
}
```
What will happen is that `buf_pullup` will call `hunk_grow`
```
140 static inline chunk_t *
141 chunk_grow(chunk_t *chunk, size_t sz)
142 {
143 off_t offset;
144 size_t memlen_orig = chunk->memlen;
145 tor_assert(sz > chunk->memlen);
146 offset = chunk->data - chunk->mem;
147 chunk = tor_realloc(chunk, CHUNK_ALLOC_SIZE(sz));
148 chunk->memlen = sz;
149 chunk->data = chunk->mem + offset;
```
`tor_realloc` will in effect be called with a size parameter of 0. Whether and how much legitimate heap memory `realloc` will allocate might be implementation-dependent. The point is that the
following lines might overwrite heap memory:
```
148 chunk->memlen = sz;
149 chunk->data = chunk->mem + offset;
```
since `hunk` is a memory area that has just been allocated to 0 (or 1, after correction) bytes.
The whole scenario is not very likely considering Tor's frugal memory consumption but it is nonetheless a programming fault in the buffers "API" that could lead to stability issues. Especially if you ever
expand the use of `buf_pullup`, `buf_new_with_capacity`, and/or uncapped `buf_add_chunk_with_capacity`, it'll be wise to hard-limit the amounts of right-shifts in `preferred_chunk_size` (a
single unintended negative integer -> size_t can be conducive in establishing an exploitation path).Tor: 0.2.9.x-finalNick MathewsonNick Mathewsonhttps://gitlab.torproject.org/legacy/trac/-/issues/16861Pad Tor connections to collapse netflow records2020-06-13T14:54:41ZMike PerryPad Tor connections to collapse netflow recordsThe collection of traffic statistics from routers is quite common. Recently, there was a minor scandal when a University network administrator upstream of UtahStateExits (and UtahStateMeekBridge) posted that they had collected over 360G ...The collection of traffic statistics from routers is quite common. Recently, there was a minor scandal when a University network administrator upstream of UtahStateExits (and UtahStateMeekBridge) posted that they had collected over 360G of netflow records to boingboing:
https://lists.torproject.org/pipermail/tor-relays/2015-August/007575.html
Unfortunately, the comment has since disappeared, but the tor-relays archives preserve it.
This interested me, so I asked some questions about the defaults and record resolution, and did some additional searching. It turns out that Cisco IOS routers have an "inactive flow timeout" that by default is 15 seconds, and it can't be set lower than 10 seconds. What this timeout does is cause the router to emit a new netflow "record" for a connection that is idle for that long, even if it stays open. Several other routers have similar settings. The Fortinet default is also 15 seconds for this. For Juniper, it is also 30 seconds (but Juniper routers can set it as low as 4 seconds).
With this information, I decided to write a patch that sends padding on a client's Tor connection bidirectionally at a random interval that we can control from the consensus, with a default of 4s-14s. It only sends padding if the connection is idle. It does not pad connections that are used only for tunneled directory traffic.
It also gives us the ability to control how long we keep said connections open. Since the default netflow settings for Cisco also generate a record for active flows after 30 minutes, it doesn't make a whole lot of sense to pad beyond that point.
This should mean that the total overhead for this defense is very low, especially since we have recently moved to only one guard. Well under 50 bytes/second for at most 30 minutes.
I still have a few questions, though, which is why I put so many people in Cc to this ticket. I will put my questions in the first comment.Tor: 0.3.1.x-finalMike PerryMike Perry