[security] O(N²) protover parse via unrecognized tokens (consensus-parse CPU amplification) TROVE-2026-027
- **Severity:** MEDIUM (pre-signature CPU/availability amplification)
- **Class:** algorithmic-complexity DoS (no crash, no spec violation)
- **Crate / site:** `tor-protover` — `ProtocolsInner::add`, reached through
`MdConsensus::parse`
- **Audited commit:** `voltrevo/arti` `eb2ab2f88` (upstream Arti code — see
"Upstream note")
- **Attacker position:** any directory cache/guard the client fetches a
consensus from
- **Status:** unfixed at the audited commit
- **PoC:** `pocs/protover-quadratic-cpu/`
## Summary
The `required-client-protocols` line of a microdesc consensus is parsed via
`tor_protover::Protocols::from_str` while `tor-dirmgr` parses the fetched
consensus — **before** any signature check. For each subprotocol token,
`ProtocolsInner::add` does a linear scan over all previously seen
unrecognized entries with a full string compare before pushing. For `N`
distinct unrecognized tokens this is `N(N−1)/2` comparisons — O(N²) — with
no per-line or per-token cap. A single crafted (unsigned) consensus pegs one
core parsing attacker bytes, stalling consensus install / bootstrap.
This is a CPU/latency stall, not a crash or OOM; it is reported as a
robustness/availability issue rather than a memory-safety bug.
## Root cause
`ProtocolsInner::add` (`tor-protover/src/lib.rs:487-513`) stores unrecognized
subprotocols in a `Vec<SubprotocolEntry>`. For each new token it runs
`self.unrecognized.iter().any(|ent| ent.proto.is_unrecognized(s))`
(`:500-503`) — a linear scan with a full string compare (`s2 == s`,
`:208`) — and only then pushes (`:508`). `Protocols::from_str` (`:642`)
splits the line on spaces with no count limit and calls `add` per token.
Distinct token names never hit the early `Duplicate` return, so each
insertion `k` scans `k−1` non-matching entries: Σ(k−1) = N(N−1)/2.
There is no per-line/per-token cap: the header rule is
`REQUIRED_CLIENT_PROTOCOLS.rule().args(1..)` (unbounded) and the tokenizer
has no line-length limit. The only ceiling is the 16 MiB−1 consensus body
cap, which permits N ≈ 2.1M distinct ~8-byte tokens.
## Verified call chain
```
add_from_download (tor-dirmgr/src/state.rs:310)
-> add_consensus_text (state.rs:353)
-> MdConsensus::parse (tor-netdoc each_flavor.rs:108) [pre-signature]
-> parse_from_reader -> Preamble::from_section
-> ProtoStatus::from_section (netstatus.rs:1438)
-> item.args_as_str().parse::<Protocols>() (netstatus.rs:1446)
-> Protocols::from_str (tor-protover lib.rs:639)
-> ProtocolsInner::add (lib.rs:499-510)
-> self.unrecognized.iter().any(..) <-- O(N) scan per token
```
`MdConsensus::parse` returns an *unchecked* consensus; `check_signature`
runs only later (`state.rs:493/526`). The CPU is spent on untrusted bytes
before authentication.
## Measurements (production `MdConsensus::parse`, `--release`)
```
N document bytes parse time x vs prev
0 (base) 5,108 0.00012 s —
10,000 85,108 0.1200 s —
20,000 165,108 0.4752 s 3.96
40,000 325,108 1.9098 s 4.02
80,000 645,108 7.6354 s 4.00
100,291 807,436 12.073 s —
```
Doubling N quadruples parse time (textbook O(N²)). **Measured:** a single
807,436-byte consensus (well under the 16 MiB cap) parses in **12.07 s** of
pegged single-core CPU — a ~100,000x slowdown over the 0.12 ms baseline,
all pre-signature.
**Extrapolations** from the clean N² fit (inference, not measured): ~160k
tokens (~1.3 MB) → ~30 s; a full 16 MiB−1 body (N ≈ 2.1M) →
N(N−1)/2 ≈ 2.2e12 comparisons ≈ ~87 min of pegged CPU for one document. A
malicious guard re-serving it on every refresh can hold the client's
directory view stale for as long as the client keeps that source.
## Why existing defenses don't prevent it
The cost is incurred during parse, before any `Err` or signature check;
`tor-memquota` tracks memory, not CPU; there is no per-line/per-token parse
cap. The eventual signature rejection happens only after the stall.
## Suggested mitigation
Replace the linear `unrecognized` scan with an O(1) membership check (e.g. a
hash set of seen names), and/or cap the number of subprotocol tokens parsed
per line.
## Upstream note
`tor-protover` is not in the `voltrevo/arti` fork's priority-diff set; the
quadratic scan and the parse-before-signature ordering trace to general Arti
development, so this is very likely present upstream — worth reporting
upstream. Labelled inference; no `torproject-main` checkout was diffed here.
## Reproduction
With an `arti` checkout at the audited commit placed at the repo root
(see top-level `README.md`):
```
cd pocs/protover-quadratic-cpu
cargo test --release -- --nocapture --test-threads=1
```
`poc_protover_quadratic_consensus_parse` prints the scaling table and
asserts a single sub-16-MiB document parses in >10 s;
`supporting_protover_from_str_scaling` isolates the same 4x-per-doubling
curve to the production `tor_protover::Protocols::from_str`.
issue