[security] O(N²) protover parse via unrecognized tokens (consensus-parse CPU amplification) TROVE-2026-027
- **Severity:** MEDIUM (pre-signature CPU/availability amplification) - **Class:** algorithmic-complexity DoS (no crash, no spec violation) - **Crate / site:** `tor-protover` — `ProtocolsInner::add`, reached through `MdConsensus::parse` - **Audited commit:** `voltrevo/arti` `eb2ab2f88` (upstream Arti code — see "Upstream note") - **Attacker position:** any directory cache/guard the client fetches a consensus from - **Status:** unfixed at the audited commit - **PoC:** `pocs/protover-quadratic-cpu/` ## Summary The `required-client-protocols` line of a microdesc consensus is parsed via `tor_protover::Protocols::from_str` while `tor-dirmgr` parses the fetched consensus — **before** any signature check. For each subprotocol token, `ProtocolsInner::add` does a linear scan over all previously seen unrecognized entries with a full string compare before pushing. For `N` distinct unrecognized tokens this is `N(N−1)/2` comparisons — O(N²) — with no per-line or per-token cap. A single crafted (unsigned) consensus pegs one core parsing attacker bytes, stalling consensus install / bootstrap. This is a CPU/latency stall, not a crash or OOM; it is reported as a robustness/availability issue rather than a memory-safety bug. ## Root cause `ProtocolsInner::add` (`tor-protover/src/lib.rs:487-513`) stores unrecognized subprotocols in a `Vec<SubprotocolEntry>`. For each new token it runs `self.unrecognized.iter().any(|ent| ent.proto.is_unrecognized(s))` (`:500-503`) — a linear scan with a full string compare (`s2 == s`, `:208`) — and only then pushes (`:508`). `Protocols::from_str` (`:642`) splits the line on spaces with no count limit and calls `add` per token. Distinct token names never hit the early `Duplicate` return, so each insertion `k` scans `k−1` non-matching entries: Σ(k−1) = N(N−1)/2. There is no per-line/per-token cap: the header rule is `REQUIRED_CLIENT_PROTOCOLS.rule().args(1..)` (unbounded) and the tokenizer has no line-length limit. The only ceiling is the 16 MiB−1 consensus body cap, which permits N ≈ 2.1M distinct ~8-byte tokens. ## Verified call chain ``` add_from_download (tor-dirmgr/src/state.rs:310) -> add_consensus_text (state.rs:353) -> MdConsensus::parse (tor-netdoc each_flavor.rs:108) [pre-signature] -> parse_from_reader -> Preamble::from_section -> ProtoStatus::from_section (netstatus.rs:1438) -> item.args_as_str().parse::<Protocols>() (netstatus.rs:1446) -> Protocols::from_str (tor-protover lib.rs:639) -> ProtocolsInner::add (lib.rs:499-510) -> self.unrecognized.iter().any(..) <-- O(N) scan per token ``` `MdConsensus::parse` returns an *unchecked* consensus; `check_signature` runs only later (`state.rs:493/526`). The CPU is spent on untrusted bytes before authentication. ## Measurements (production `MdConsensus::parse`, `--release`) ``` N document bytes parse time x vs prev 0 (base) 5,108 0.00012 s — 10,000 85,108 0.1200 s — 20,000 165,108 0.4752 s 3.96 40,000 325,108 1.9098 s 4.02 80,000 645,108 7.6354 s 4.00 100,291 807,436 12.073 s — ``` Doubling N quadruples parse time (textbook O(N²)). **Measured:** a single 807,436-byte consensus (well under the 16 MiB cap) parses in **12.07 s** of pegged single-core CPU — a ~100,000x slowdown over the 0.12 ms baseline, all pre-signature. **Extrapolations** from the clean N² fit (inference, not measured): ~160k tokens (~1.3 MB) → ~30 s; a full 16 MiB−1 body (N ≈ 2.1M) → N(N−1)/2 ≈ 2.2e12 comparisons ≈ ~87 min of pegged CPU for one document. A malicious guard re-serving it on every refresh can hold the client's directory view stale for as long as the client keeps that source. ## Why existing defenses don't prevent it The cost is incurred during parse, before any `Err` or signature check; `tor-memquota` tracks memory, not CPU; there is no per-line/per-token parse cap. The eventual signature rejection happens only after the stall. ## Suggested mitigation Replace the linear `unrecognized` scan with an O(1) membership check (e.g. a hash set of seen names), and/or cap the number of subprotocol tokens parsed per line. ## Upstream note `tor-protover` is not in the `voltrevo/arti` fork's priority-diff set; the quadratic scan and the parse-before-signature ordering trace to general Arti development, so this is very likely present upstream — worth reporting upstream. Labelled inference; no `torproject-main` checkout was diffed here. ## Reproduction With an `arti` checkout at the audited commit placed at the repo root (see top-level `README.md`): ``` cd pocs/protover-quadratic-cpu cargo test --release -- --nocapture --test-threads=1 ``` `poc_protover_quadratic_consensus_parse` prints the scaling table and asserts a single sub-16-MiB document parses in >10 s; `supporting_protover_from_str_scaling` isolates the same 4x-per-doubling curve to the production `tor_protover::Protocols::from_str`.
issue