Circuit Padding attack scenario through circpad_global_max_padding_pct
Hello *,
I am from RadicallyOpenSecurity and currently performing a short review of the Padding Machines for Tor NGI Zero PET project.
During discussion with the padding machine author Tobias Pulls (@pulls), I noticed a general attack scenario that could be of relevance for the padding framework implementation.
The general idea of the attack is that a malicious Tor client can, under some circumstances, repeatedly force the circuit padding logic of middle relays to reply with more padding bytes than non-padding bytes. After some time of sustained attacks, the target middle relay will develop a statistical circuit padding overhead percentage which is over the the network-wide circpad_global_max_padding_pct
limit (see section 3.5 of the documentation), at which point it will stop sending padding on any of its circuits.
Once the attack stops, the ratio between sent padding and sent non-padding of the relay goes back to normal over time while the relay serves non-padded cell data to legitimate clients, so it reverts automatically to a safe state again where it has working padding machines.
However, there are several notable details:
- To our knowledge, the currently rolled out padding machines can not be used to trigger this problem. This is likely only a problem for future machines with high volume that defend against website fingerprinting attacks.
- Due to the event-based nature of client padding machines, the suppression of padding generation at the middle relay means that the
CIRCPAD_EVENT_PADDING_RECV
transition on the client is never taken. This results in the client not sending any padding, or performing only a very limited subset of the regular padding behavior (depending on the machine definition). This amplifies the practical impact of the missing padding from the relay. - Since the relay behavior change affects all of its circuits at once and may happen several times over a short time-span, the resulting traffic anomalies may be a fingerprinting opportunity for adversaries that are performing traffic analysis on the network.
- According to @pulls, the consequences of this attack on the relay server would only be logged at
info
level and may go unnoticed by relay operators. - There is a threshold mechanism at the individual circuit machine level through the
relay->max_padding_percent
setting. At first glance, this appears to mitigate the described attack scenario. However, note that this limit is not applied to the firstrelay->allowed_padding_count
number of cells, so an attacker can circumvent this limitation by repeatedly triggering new padding machines. Since it appears to be beneficial for padding machines to have this initial level of freedom over padding for their effectiveness against fingerprinting, the attack is unlikely to be mitigated with restrictive machine limits without also impacting the usefulness of the machines themselves. - The attack can also be performed in the other direction by a malicious middle relay against clients. In this case, it would disable/degrade padding on all of the client circuits. However, we regard this variant as less impactful than client->relay attacks, in part because this attack is harder to scale.
Some initial recommendations:
- Higher severity logging when exceeding
circpad_global_max_padding_pct
. - The consensus-provided global limits could be enforced per circuit instead of truly globally. The root of the problem is shared state across circuits. This would still make the global padding limits useful, since in the future there might be multiple machines active throughout the lifetime of a circuit.
- Based on the information from @pulls , the
CircuitPaddingDisabled
(see issue 28693) switch might be sufficient as an emergency fail-safe in case of network-wide padding problems.
I would like to thank @pulls for walking me through the various aspects of related network behavior and providing other helpful suggestions.