Feb 12 18:54:55 tornode2 Tor[6362]: We're low on memory (cell queues total alloc: 1602579792 buffer total alloc: 1388544, tor compress total alloc: 1586784 rendezvous cache total alloc: 489909). Killing circuits withover-long queues. (This behavior is controlled by MaxMemInQueues.)Feb 12 18:54:56 tornode2 Tor[6362]: Removed 1599323088 bytes by killing 1 circuits; 39546 circuits remain alive. Also killed 0 non-linked directory connections.
Notice the ~1GB of cells for one single circuit? Somehow, there is an issue in tor that makes it possible to fill up the circuit cell queue while the scheduler is just not emptying that queue.
This is not trivial so I'll try to explain what I can see:
In append_cell_to_circuit_queue(), there is a check on the cell queue for a maximum size which then makes the circuit to stop reading on the connection if reached:
/* If we have too many cells on the circuit, we should stop reading from * the edge streams for a while. */ if (!streams_blocked && queue->n >= CELL_QUEUE_HIGHWATER_SIZE) set_streams_blocked_on_circ(circ, chan, 1, 0); /* block streams */
In set_streams_blocked_on_circ() there is this non documented non trivial if as the very first thing it does:
if (circ->n_chan == chan) { circ->streams_blocked_on_n_chan = block; if (CIRCUIT_IS_ORIGIN(circ)) edge = TO_ORIGIN_CIRCUIT(circ)->p_streams; } else { circ->streams_blocked_on_p_chan = block; tor_assert(!CIRCUIT_IS_ORIGIN(circ)); edge = TO_OR_CIRCUIT(circ)->n_streams; } [stop reading on all the "edge" streams]
Lets use the example where we are a Guard and we have an or_circuit_t with the "p_chan" being the client connection and the "n_chan" being the connection to the middle node.
If we set the block on n_chan, we would only stop the read() if the circuit is origin because p_streams is only set on a tor client.
Does this means that if we try to deliver a cell forward (n_chan), even though we've reached the high limit, we'll still queue it because there is never a time where we check if we are blocked on "n_chan" at the relay level?
Per a discussion with armadev on IRC, this situation is unfortunately possible and the only defense we have in place is the OOM handler.
We should thought of having a upper limit on any given circuit queue which would indicate that we queued stuff but weren't suppose if the tor client was acting normally.
Just an FYI, the frequency seems to increase. Which I think has a negative effect for the entire network, as in the times leading up to this the CPU usage doubles (possibly higher, but hitting max of VM) the node becomes sluggish and so does the network performance for this host.
So possibly it's language, but the status and previous comment seems to indicate this won't be fixed? Should the operator then tweak the MaxMemInQueues by the # of expected circuits times X KiB to assure this doesn't have such an impact?
Feb 12 18:54:56 tornode2 Tor[6362]: Removed 1599323088 bytes by killing 1 circuits; 39546 circuits remain alive. Also killed 0 non-linked directory connections.
Feb 14 18:46:29 tornode2 Tor[13637]: Removed 519278496 bytes by killing 1 circuits; 122786 circuits remain alive. Also killed 0 non-linked directory connections.
Feb 14 19:04:21 tornode2 Tor[13637]: Removed 785149728 bytes by killing 1 circuits; 111745 circuits remain alive. Also killed 0 non-linked directory connections.
Feb 14 19:46:44 tornode2 Tor[13637]: Removed 1510173456 bytes by killing 1 circuits; 98098 circuits remain alive. Also killed 0 non-linked directory connections.
Feb 15 14:14:18 tornode1 Tor[23044]: Removed 1360147536 bytes by killing 1 circuits; 58922 circuits remain alive. Also killed 0 non-linked directory connections.
Feb 16 01:58:55 tornode1 Tor[23044]: Removed 509234880 bytes by killing 4 circuits; 57780 circuits remain alive. Also killed 0 non-linked directory connections.
Feb 16 02:07:22 tornode1 Tor[23044]: Removed 699879840 bytes by killing 1 circuits; 53007 circuits remain alive. Also killed 0 non-linked directory connections.
Feb 16 03:05:34 tornode1 Tor[23044]: Removed 1593724176 bytes by killing 2 circuits; 52509 circuits remain alive. Also killed 0 non-linked directory connections.
An upper limit on the circuit queue bound with the SENDME logic would look like this: bug25226_033_01
The gist is that if the queue size goes above the circuit window start maximum limit (1000), the circuit is closed with TORPROTOCOL reason. Assuming we ever reach that limit, it means something is wrong in the path and the edge connection keeps sending stuff even though it shouldn't.
Note from IRC: I like the idea here as a band-aid. I think that the maximum should be configured via a consensus param, though, and default to something a bit higher than 1000?
Also I really want to know what Roger thinks of that idea.
In the long term, we're not going to get long-term fixes for the performance issues caused by being low on memory of this until we can better identify memory pressure: either by setting MaxMemInQueues more aggressively, working with the OS/allocator somehow, asking the user to configure MaxMemInQueues to a lower value, or something like that.
An upper limit on the circuit queue bound with the SENDME logic would look like this: bug25226_033_01
The gist is that if the queue size goes above the circuit window start maximum limit (1000), the circuit is closed with TORPROTOCOL reason. Assuming we ever reach that limit, it means something is wrong in the path and the edge connection keeps sending stuff even though it shouldn't.
tor-spec.txt:
To control a circuit's bandwidth usage, each OR keeps track of two 'windows', consisting of how many RELAY_DATA cells it is allowed to originate (package for transmission), and how many RELAY_DATA cells it is willing to consume (receive for local streams). These limits do not apply to cells that the OR receives from one host and relays to another.
tor-design.pdf:
Leaky-pipe circuit topology: Through in-band signaling within the circuit, Tor initiators can direct traffic to nodes partway down the circuit. This novel approach allows traffic to exit the circuit from the middle — possibly frustrating traffic shape and volume attacks based on observing the end of the circuit. (It also allows for long-range padding if future research shows this to be worthwhile.)
An upper limit on the circuit queue bound with the SENDME logic would look like this: bug25226_033_01
The gist is that if the queue size goes above the circuit window start maximum limit (1000), the circuit is closed with TORPROTOCOL reason. Assuming we ever reach that limit, it means something is wrong in the path and the edge connection keeps sending stuff even though it shouldn't.
tor-spec.txt:
To control a circuit's bandwidth usage, each OR keeps track of two 'windows', consisting of how many RELAY_DATA cells it is allowed to originate (package for transmission), and how many RELAY_DATA cells it is willing to consume (receive for local streams). These limits do not apply to cells that the OR receives from one host and relays to another.
We can change this part of the spec.
tor-design.pdf:
Leaky-pipe circuit topology: Through in-band signaling within the circuit, Tor initiators can direct traffic to nodes partway down the circuit. This novel approach allows traffic to exit the circuit from the middle — possibly frustrating traffic shape and volume attacks based on observing the end of the circuit. (It also allows for long-range padding if future research shows this to be worthwhile.)
Standard Tor clients do not use this feature, but they get close:
connection netflow padding is at link level, not circuit level
client introduce circuits send a single INTRODUCE1 cell, then EXTEND if the introduction fails