Didn't recognize a cell, but circ stops here! Closing circuit
During the ongoing DDoS, this protocol warning log line showed up more than usual likely due to network congestion:
log_fn(LOG_PROTOCOL_WARN, LD_PROTOCOL,
"Didn't recognize a cell, but circ stops here! Closing circuit. "
"It was created %ld seconds ago.", (long)seconds_open);
}
We looked into this in order to understand why and if this was indicative of some specific signal that would help us identify the root(s) of the DDoS.
What we believe is that after a circuit has been destroyed, the node emitting that log line is receiving "inflight cell" that were sent by the client before it would receive the TRUNCATED
. And we see a lot more of those right now because of heavy congestion thus likelihood of longer inflight time.
Cause
Lets use this circuit setup as an example: C -> G -> M -> E
. And the node emitting the log line is M
.
It means that E
sent a DESTROY
on the circuit. And so, current C-tor behavior, at M
, is to transform the DESTROY
in a TRUNCATED
cell (because it is coming from ahead) and send that cell down to C
. Reminder that the truncated cell is a relay command and so it is encapsulated in onionskin down the circuit so only C
can understand it.
Then, C
would send a DESTROY
on the circuit which is forwarded up to M
stopping there because M->E
link has already been removed for that circuit.
And so, the log line appears when M
receives a non DESTROY
cell after sending down the TRUNCATED
. Hitting that log line also means that the circuit gets closed with a "tor protocol" error reason and so there is a bit of a flurry of DESTROY cell going downward and upward on that circuit because we sent a TRUNCATED
but then decided it was a no good circuit due to a cell we can't relay forward.
Solution
This is a bit interesting that C-tor does that because tor-spec.txt
stipulates:
Upon receiving an outgoing DESTROY cell, an OR frees resources associated with the corresponding circuit. If it's not the end of the circuit, it sends a DESTROY cell for that circuit to the next OR in the circuit. If the node is the end of the circuit, then it tears down any associated edge connections (see section 6.1).
... and says this about TRUNCATED
:
To tear down part of a circuit, the OP may send a RELAY_TRUNCATE cell signaling a given OR (Stream ID zero). That OR sends a DESTROY cell to the next node in the circuit, and replies to the OP with a RELAY_TRUNCATED cell.
But we are not in that situation. The spec seems to mention "outgoing DESTROY cell" which is unclear what it means (from p_chan or n_chan?) but maybe that should be the right behavior as in regardless, we just forward a DESTROY
either way?
After talking to @nickm about this, seems that we have possibly two good solution:
-
M
sends down the circuit aDESTROY
instead which would get propagated down toC
. And so, any inflight cells would just get ignored byM
after that since the circuit would not exists anymore. -
Flag the circuit in a "truncated" state leading to ignoring any non
DESTROY
cells and thus avoiding that protocol warning.