Exhausting our bandwidth write limit stops the connection from reading
In commit 488e2b00bf881b97bcc8e4bbe304845ff1d79a03
, we've refactored the block the connection on bandwidth logic and one typo got in, probably bad copy paste:
void
connection_write_bw_exhausted(connection_t *conn, bool is_global_bw)
{
(void)is_global_bw;
conn->write_blocked_on_bw = 1;
connection_stop_reading(conn);
reenable_blocked_connection_schedule();
}
Notice the connection_stop_reading()
call where it should be a stop writing ... This has the really bad side effect of making tor stop reading on the socket if the write limit is reached, and because read_blocked_on_bw
is not set to 1, it is never reenabled through our mainloop callback.
This fix is critical else bytes accumulate in the kernel TCP buffers which can lead to OOM but also lost of connectivity with >= 0.3.4.x relays. One way to accumulate is the keepalive cell that bypasses KIST scheduler so tor sends it regardless if the kernel thinks it is OK. I'll open a ticket for this which is another problem.
This is most likely fixing #27813 (moved).
Appeared in 0.3.4.1-alpha.