Rasmus Dahlberg · Rasmus Dahlberg · fba3cdfe
--- a/doc/tlog-algorithms.md 0 → 100644

+ 407

− 0
+++ b/doc/tlog-algorithms.md 0 → 100644

+ 407

− 0
+# Transparency log verification algorithms
+
+The package in [pkg/merkle](../pkg/merkle) uses two verification algorithms from
+[RFC 9162][].  This document provides some insight with regards to how they
+work, as well as an extension that improves performance with _compact ranges_.
+
+[RFC 9162]: https://datatracker.ietf.org/doc/html/rfc9162
+
+**Preliminaries.**
+The reader is assumed to know how inclusion and consistency proofs work, e.g.,
+in sufficient detail to draw examples on a whiteboard.  If needed, see
+[transparency log preliminaries](./tlog-preliminaries.md) for an introduction.
+
+## Inclusion proofs
+
+Recall that an inclusion proof is composed of the minimum number of hashes that
+are needed to recompute a root hash from a leaf hash, leaf index, and tree size.
+
+The proof is generated by traversing the log's Merkle tree from the root and
+down towards the leaf in question, selecting every sibling hash along the way.
+The first hash in the proof corresponds to the leaf's sibling hash.
+
+Below is a walk-through of the verification algorithm in RFC 9162, [Section
+2.1.3.2][].  The goal is to provide an intuition of how to read and visualize
+it.  Note that it is a traversal from a given leaf and up towards the root.
+
+[Section 2.1.3.2]: https://datatracker.ietf.org/doc/html/rfc9162#section-2.1.3.2
+
+**Input:**
+
+  - `hash`: computed as `H(0x00 || leaf_cert)` to get a leaf hash.  The `0x00`
+    constant is a domain separator to distinguish leaf nodes from interior
+    nodes.  Interior nodes are computed as `H(0x01 || left || right)`.
+  - `leaf_index`: zero-based leaf index in the log's Merkle tree
+  - `tree_size`: number of leaves in the log's Merkle tree
+  - `root_hash`: expected root hash for the specified tree size
+  - `inclusion_path`: list of hashes that proves inclusion
+
+**Output:**
+
+  - Pass or fail
+
+**Algorithm:**
+
+    1.  Compare leaf_index from the inclusion_proof_v2 structure against
+        tree_size.  If leaf_index is greater than or equal to tree_size,
+        then fail the proof verification.
+
+        COMMENT: the specified index is out of range.
+
+    2.  Set fn to leaf_index and sn to tree_size - 1.
+
+        COMMENT: think of fn as your first pointer finger.  Place it on the leaf
+          that you are proving inclusion for.
+
+        COMMENT: think of sn as your second pointer finger.  Place it on the
+          last leaf in the tree.
+
+        COMMENT: for visualization, it is helpful to draw all leaf nodes at the
+          tree's bottom.  "Drag out the edges" where subtree nodes are missing.
+
+               o                            o
+             /   \                         / \
+            o     \      rather than      o   o
+           / \     \                     / \  2
+          o   o     o                   o   o
+          0   1     2                   0   1
+
+         A right-shift of sn can be thought of as "shortening the egde by one
+         level if it is dragged out to reach a parent, else go up to the
+         parent".  A right-shift of fn always means "go up to the parent node".
+
+         Example: after the first bitshift of fn and sn for a tree of size 3
+         with fn=1 and sn=2 initially, what happens visually is the following:
+
+               o                            o
+             /   \                         / \
+            o     \      ---------->      o   o
+           / \     \                     fn   sn
+          o   o     o
+             fn    sn
+
+         The bit pattern of fn shifted from 01 -> 0
+         The bit pattern of sn shifted from 10 -> 1
+
+         Also note that if fn has a sibling at the same tree level, the least
+         significant bit indicates whether it is a left (0) or right child (1).
+
+         Okay, so we have our first (fn) and second (sn) pointer fingers.  From
+         this we can visualize exactly how the log's Merkle tree structure is.
+         After consuming a proof hash, we move up the tree with right shifts.
+
+    3.  Set r to hash.
+
+        COMMENT: intialize r to the value that our first finger points at.
+
+    4.  For each value p in the inclusion_path array:
+
+        a.  If sn is 0, then stop the iteration and fail the proof
+            verification.
+
+            COMMENT: sn=0 means that both fingers point at the root.  So,
+              there shouldn't be any hashes left in our inclusion proof.
+
+        b.  If LSB(fn) is set, or if fn is equal to sn, then:
+
+            COMMENT: your second finger ALWAYS points at a right child unless it
+              is the root.  We already know it doesn't point at the root.  In
+              other words, if both fingers point at the same node ("fn is equal
+              to sn") we can be sure the first finger points at a right child.
+
+            COMMENT: otherwise, you first finger is guaranteed to have a sibling
+              at the same level (you have at least one more node to the right;
+              your second finger points at it).  This means that having the
+              least significant bit of fn set tells us it is a right child.
+
+            COMMENT: so, now we know that the hash from the proof should be
+              applied as a left node.  We hash accordingly in step 4b(i).
+
+                               left node from proof
+                                       |    +---- right node from first finger
+                                       |    |
+                                       v    v
+            i.   Set r to HASH(0x01 || p || r).
+
+            ii.  If LSB(fn) is not set, then right-shift both fn and sn
+                 equally until either LSB(fn) is set or fn is 0.
+
+                 COMMENT: if we only triggered 4b due to our fingers pointing at
+                   the same node, we just procuded a hash with a left node that
+                   is at a higher tree level (i.e., "our edge is dragged out").
+                   Shorten the edge until we reach the same level.  The check
+                   fn=0 ensures that the loop stops in case we reach the root.
+
+            Otherwise:
+
+                               left node from first finger
+                                      |    +---- right node from proof
+                                      |    |
+                                      v    v
+            i.  Set r to HASH(0x01 || r || p).
+
+        c.  Finally, right-shift both fn and sn one time.
+
+            COMMENT: now we hashed the node that our first finger points at with
+              a sibling hash that was provided in the proof.  The result is
+              stored in r.  All that's left to do is move up one tree level; fn
+              will then (again) be referring to the value stored in r.
+
+              (If we still have proof hashes left we repeat the loop, otherwise
+              we continue to the final step of checking that the result is OK.)
+
+    5.  Compare sn to 0.  Compare r against the root_hash.  If sn is
+        equal to 0 and r and the root_hash are equal, then the log has
+        proven the inclusion of hash.  Otherwise, fail the proof
+        verification.
+
+        COMMENT: checking that sn=0 implies fn=0, i.e., both fingers reached the
+          root at the same time.  Too few proof hashes would result in sn != 0,
+          and we already aborted if we had too many proof hashes in 4a.  We used
+          the intermediate values of fn and sn to determine the tree structure,
+          i.e., in which order should proof hashes be applied to our leaf hash.
+          All is good if the recomputed root hash matches our reference root.
+
+          Bonus: due to the domain separation between leaves and interior nodes,
+          it is also not possible to be tricked into believing that, e.g.,
+
+                root
+               /    \
+              a      b
+              ^      ^
+              |      |
+            "foo"  "bar"
+
+          is a size-1 tree with the item "a||b", as this would then be computed
+          by the verifier as H(0x00 || a || b); not H(0x01 || a || b).
+
+## Consistency proofs
+
+Recall that a consistency proof is composed of the minimum number of hashes that
+are needed to recompute a root hash in an append-only tree with `n` leaves given
+an earlier root hash in a tree of size `m` for `0 < m < n`.
+
+The proof is generated by traversing the log's newer tree from the root and down
+towards the last leaf in the old tree, selecting every sibling hash along the
+way and stopping at the first subtree that does not cover any of the appended
+leaves.  If the old tree size is not perfect, the hash of the final subtree is
+also added to the proof.  (Otherwise that subtree hash is the same as the old
+root hash.  Therefore, it would be redundant to add as the verifier knows it.)
+
+Below is a walk-through of the verification algorithm in RFC 9162, [Section
+2.1.4.2][].  The goal is to provide an intuition of how to read and visualize
+it.  Note that it is a traversal from a subtree up towards both tree roots.
+
+[Section 2.1.4.2]: https://datatracker.ietf.org/doc/html/rfc9162#section-2.1.4.2
+
+**Input:**
+
+  - `first`: number of leaves in the old tree
+  - `second`: number of leaves in the new tree
+  - `first_hash`: root hash in the old tree
+  - `second_hash`: root hash in the new tree
+  - `consistency_path`: list of hashes that proves consistency
+
+**Constraints:**
+
+  - `first` is larger than zero
+  - `first` is smaller than `second`
+
+**Output:**
+
+  - Pass or fail
+
+**Algorithm:**
+
+    1.  If consistency_path is an empty array, stop and fail the proof
+        verification.
+
+        COMMENT: at least one proof hash is required with our constraints.
+
+    2.  If first is an exact power of 2, then prepend first_hash to the
+        consistency_path array.
+
+        COMMENT: proof generation does not add the final subtree hash to the
+          proof if it is identical to the old root hash.  Here it is instead
+          added to the proof by the verifier, such that the first two hashes in
+          proof always correspond to the first left and right nodes to hash
+          together in the new tree.  This makes the algorithm more homogenus.
+
+    3.  Set fn to first - 1 and sn to second - 1.
+
+        COMMENT: Vizualize the new tree.  Place your first finger on the last
+          leaf in the old tree.  Place your second finger on the last leaf.
+
+    4.  If LSB(fn) is set, then right-shift both fn and sn equally until
+        LSB(fn) is not set.
+
+        COMMENT: our first finger points at a leaf.  What we really want to be
+          pointing at is the subtree that the prover stopped at when generating
+          the proof.  We know that our first finger can never be more to the
+          right than now (because it is at the last leaf in the old tree).  If
+          we are pointing at a right node, it is certainly not the subtree that
+          the prover stopped at while generating the proof: the parent would
+          have been reached sooner, and it too does not contain any of the
+          appended leaves which is the criteria to stop going further down.
+
+          So, what's happening here is that we repeatedly move both fingers up
+          one level in the tree until the first finger points at the left-child
+          subtree where the prover stopped generating the proof.
+
+    5.  Set both fr and sr to the first value in the consistency_path
+        array.
+
+        COMMENT: initialize fr and sr to the subtree hash our first finger
+          points at; fr will become the old root hash and sr the new root hash.
+
+    6.  For each subsequent value c in the consistency_path array:
+
+        a.  If sn is 0, then stop the iteration and fail the proof
+            verification.
+
+            COMMENT: same comment as in inclusion proof verification; both
+              fingers point at the root while there are unused proof hashes.
+
+        b.  If LSB(fn) is set, or if fn is equal to sn, then:
+
+            COMMENT: same comment as in inclusion proof verification; our first
+              finger points at a right child.
+
+            COMMENT: left-node proof hashes are required to recompute both the
+              new and the old tree head.   In contrast, only the new tree needs
+              to use the right-node proof hashes that cover appended leaves.
+
+            i.    Set fr to HASH(0x01 || c || fr).
+
+            ii.   Set sr to HASH(0x01 || c || sr).
+
+            iii.  If LSB(fn) is not set, then right-shift both fn and sn
+                  equally until either LSB(fn) is set or fn is 0.
+
+                  COMMENT: same comment as in inclusion proof verification; we
+                    just produced a hash with a left node that is at a higher
+                    level.  "Shorten the edge until we reach the same level".
+
+            Otherwise:
+
+            i.  Set sr to HASH(0x01 || sr || c).
+
+        c.  Finally, right-shift both fn and sn one time.
+
+            COMMENT: same comment as in inclusion proof verification.  We hashed
+              the node that our first finger points at with a sibling hash that
+              was provided in the proof.  Move both fingers up one tree level.
+
+    7.  After completing iterating through the consistency_path array as
+        described above, verify that the fr calculated is equal to the
+        first_hash supplied, that the sr calculated is equal to the
+        second_hash supplied, and that sn is 0.
+
+        COMMENT: we verified both tree structures due to how we moved our
+          fingers, checking that the proof has the appropriate size and applying
+          hashes in the order indicated by the intermediate values of fn and sn.
+          The old tree head only consumed left-child subtree hashes in the proof
+          and we check that the resulting root hash is correct.  When using the
+          same hashes but also adding right-node subtree hashes that only cover
+          appended leaves, we check that the resulting new root hash is correct.
+
+Note that the required constraints result in two special cases that implementers
+must treat separately before considering use of the above algorithm:
+
+  - `first == 0`: consistent with any `second > first` if `first_hash ==
+    SHA-256("")` and `second_hash != SHA-256("")`.
+  - `first == second`: consistent if `first_hash == second_hash`, and correct if
+    the two equal hashes are (non-)empty depending on tree size.
+
+## Compact range extension
+
+A typical monitoring flow is to have a current tree head of size `m`; upon
+discovery of a tree head with size `n > m`, the new leaves at indices
+`m,...,n-1` are fetched.  A possible verification pattern is to check that:
+
+  - The new tree is consistent with the old tree
+  - The appended entries are included in the new tree
+
+Fetching `n-m` entries is an inevitable linear monitoring cost.  It is not
+necessary to verify a linear number of logarithmic inclusion proofs, however.
+
+Suppose that:
+
+  - An old tree head has size `m` and root `M`.
+  - A new tree head has size `n` and root `N`.
+  - Consistency `(m,M)` to `(n,N)` has been verified.
+  - Entries `m,...,n-1` for `m-n > 1` have been downloaded.
+
+A verifier wants to check that the downloaded entries are included in the new
+tree at indices `m,...,n-1` but without verifying `n-m` inclusion proofs.
+
+The intuition of a solution is as follows.  You can request an inclusion proof
+for the entry at index `m` in the tree of size `n`.  This assures that the first
+leaf at index `m` is included in the new tree.  Now note that _all right-node
+subtree hashes_ in the requested inclusion proof covers each and every entry
+after index `m`, i.e., `m+1,...,n-1`.  As such, the verifier already knows what
+they should be and can simply _confirm them_ while verifying a proof once.
+
+Such confirmation is easy to add in the above inclusion proof verification
+algorithm, i.e., in step 4b's otherwise-clause.
+
+Below is an algorithm to generate the required subtree hashes to confirm.
+
+**Input:**
+
+  - `nodes`: a list of `k` consecutive leaf nodes.  Each node in the list has an
+    `index` and a `hash`.
+
+**Constraints:**
+
+  - `k > 0`
+  - `node[0].index > 0`
+  - `node[i].index == 1 + node[i-1].index` for `i in [1,k)`
+
+**Output:**
+
+  - The minimum number of hashes that cover the leaves in the specified range.
+    The first hash is from the lowest tree level, similar to an inclusion proof.
+
+**Algorithm**:
+
+    1.  Let hashes be an empty list.
+
+    2.  Repeat until nodes contain a single node:
+
+        a) If nodes[1].index XOR 1 is not equal to nodes[0].index, then append
+           nodes[0].hash to hashes and remove nodes[0].
+
+           COMMENT: the first node's sibling is to the left.  Therefore, we can
+             not progress further on this subtree.
+
+        b)  Set i to zero.
+
+        c)  While i is smaller than the current number of nodes:
+
+            i)  If nodes[i] is not the last node, set nodes[i].hash to H(0x01 ||
+                nodes[i].hash || nodes[i+1].hash) and then remove nodes[i+1].
+
+                COMMENT: only the first and the last node may not have a sibling
+                  on a given tree level.  This is not the last node.  We already
+                  handled the case of the first node not having a sibling in 4a.
+                  So, here an interior hash is produced for two sibling nodes.
+
+           ii)  Right-shift nodes[i].index once.
+
+                COMMENT: after taking a pass over a node (be it after computing
+                  an interior node or just noting that we're at the last node),
+                  we need to update the index so that it refers to the parent.
+                  This sets us up for taking a new pass at the next tree level.
+
+          iii)  Increment i by one.
+
+    3.  Append nodes[0].hash to hashes and output hashes.
+
+Note that the above algorithm computes a [compact range][] for a list of
+consecutive leaves when node-index zero is considered an invalid input.  The
+application of this in inclusion proof verification is a subset of the compact
+range theory.  It avoids an `O(klogn)` overhead while verifying `k` entries in
+the tree of size `n`, and with a very small diff compared to inclusion proofs.
+
+[compact range]: https://github.com/transparency-dev/merkle/blob/main/docs/compact_ranges.md