Skip to content
Snippets Groups Projects

Add Merkle tree verification

Merged Rasmus Dahlberg requested to merge merkle into main
13 files
+ 1772
0
Compare changes
  • Side-by-side
  • Inline
Files
13
+ 407
0
# Transparency log verification algorithms
The package in [pkg/merkle](../pkg/merkle) uses two verification algorithms from
[RFC 9162][]. This document provides some insight with regards to how they
work, as well as an extension that improves performance with _compact ranges_.
[RFC 9162]: https://datatracker.ietf.org/doc/html/rfc9162
**Preliminaries.**
The reader is assumed to know how inclusion and consistency proofs work, e.g.,
in sufficient detail to draw examples on a whiteboard. If needed, see
[transparency log preliminaries](./tlog-preliminaries.md) for an introduction.
## Inclusion proofs
Recall that an inclusion proof is composed of the minimum number of hashes that
are needed to recompute a root hash from a leaf hash, leaf index, and tree size.
The proof is generated by traversing the log's Merkle tree from the root and
down towards the leaf in question, selecting every sibling hash along the way.
The first hash in the proof corresponds to the leaf's sibling hash.
Below is a walk-through of the verification algorithm in RFC 9162, [Section
2.1.3.2][]. The goal is to provide an intuition of how to read and visualize
it. Note that it is a traversal from a given leaf and up towards the root.
[Section 2.1.3.2]: https://datatracker.ietf.org/doc/html/rfc9162#section-2.1.3.2
**Input:**
- `hash`: computed as `H(0x00 || leaf_cert)` to get a leaf hash. The `0x00`
constant is a domain separator to distinguish leaf nodes from interior
nodes. Interior nodes are computed as `H(0x01 || left || right)`.
- `leaf_index`: zero-based leaf index in the log's Merkle tree
- `tree_size`: number of leaves in the log's Merkle tree
- `root_hash`: expected root hash for the specified tree size
- `inclusion_path`: list of hashes that proves inclusion
**Output:**
- Pass or fail
**Algorithm:**
1. Compare leaf_index from the inclusion_proof_v2 structure against
tree_size. If leaf_index is greater than or equal to tree_size,
then fail the proof verification.
COMMENT: the specified index is out of range.
2. Set fn to leaf_index and sn to tree_size - 1.
COMMENT: think of fn as your first pointer finger. Place it on the leaf
that you are proving inclusion for.
COMMENT: think of sn as your second pointer finger. Place it on the
last leaf in the tree.
COMMENT: for visualization, it is helpful to draw all leaf nodes at the
tree's bottom. "Drag out the edges" where subtree nodes are missing.
o o
/ \ / \
o \ rather than o o
/ \ \ / \ 2
o o o o o
0 1 2 0 1
A right-shift of sn can be thought of as "shortening the egde by one
level if it is dragged out to reach a parent, else go up to the
parent". A right-shift of fn always means "go up to the parent node".
Example: after the first bitshift of fn and sn for a tree of size 3
with fn=1 and sn=2 initially, what happens visually is the following:
o o
/ \ / \
o \ ----------> o o
/ \ \ fn sn
o o o
fn sn
The bit pattern of fn shifted from 01 -> 0
The bit pattern of sn shifted from 10 -> 1
Also note that if fn has a sibling at the same tree level, the least
significant bit indicates whether it is a left (0) or right child (1).
Okay, so we have our first (fn) and second (sn) pointer fingers. From
this we can visualize exactly how the log's Merkle tree structure is.
After consuming a proof hash, we move up the tree with right shifts.
3. Set r to hash.
COMMENT: intialize r to the value that our first finger points at.
4. For each value p in the inclusion_path array:
a. If sn is 0, then stop the iteration and fail the proof
verification.
COMMENT: sn=0 means that both fingers point at the root. So,
there shouldn't be any hashes left in our inclusion proof.
b. If LSB(fn) is set, or if fn is equal to sn, then:
COMMENT: your second finger ALWAYS points at a right child unless it
is the root. We already know it doesn't point at the root. In
other words, if both fingers point at the same node ("fn is equal
to sn") we can be sure the first finger points at a right child.
COMMENT: otherwise, you first finger is guaranteed to have a sibling
at the same level (you have at least one more node to the right;
your second finger points at it). This means that having the
least significant bit of fn set tells us it is a right child.
COMMENT: so, now we know that the hash from the proof should be
applied as a left node. We hash accordingly in step 4b(i).
left node from proof
| +---- right node from first finger
| |
v v
i. Set r to HASH(0x01 || p || r).
ii. If LSB(fn) is not set, then right-shift both fn and sn
equally until either LSB(fn) is set or fn is 0.
COMMENT: if we only triggered 4b due to our fingers pointing at
the same node, we just procuded a hash with a left node that
is at a higher tree level (i.e., "our edge is dragged out").
Shorten the edge until we reach the same level. The check
fn=0 ensures that the loop stops in case we reach the root.
Otherwise:
left node from first finger
| +---- right node from proof
| |
v v
i. Set r to HASH(0x01 || r || p).
c. Finally, right-shift both fn and sn one time.
COMMENT: now we hashed the node that our first finger points at with
a sibling hash that was provided in the proof. The result is
stored in r. All that's left to do is move up one tree level; fn
will then (again) be referring to the value stored in r.
(If we still have proof hashes left we repeat the loop, otherwise
we continue to the final step of checking that the result is OK.)
5. Compare sn to 0. Compare r against the root_hash. If sn is
equal to 0 and r and the root_hash are equal, then the log has
proven the inclusion of hash. Otherwise, fail the proof
verification.
COMMENT: checking that sn=0 implies fn=0, i.e., both fingers reached the
root at the same time. Too few proof hashes would result in sn != 0,
and we already aborted if we had too many proof hashes in 4a. We used
the intermediate values of fn and sn to determine the tree structure,
i.e., in which order should proof hashes be applied to our leaf hash.
All is good if the recomputed root hash matches our reference root.
Bonus: due to the domain separation between leaves and interior nodes,
it is also not possible to be tricked into believing that, e.g.,
root
/ \
a b
^ ^
| |
"foo" "bar"
is a size-1 tree with the item "a||b", as this would then be computed
by the verifier as H(0x00 || a || b); not H(0x01 || a || b).
## Consistency proofs
Recall that a consistency proof is composed of the minimum number of hashes that
are needed to recompute a root hash in an append-only tree with `n` leaves given
an earlier root hash in a tree of size `m` for `0 < m < n`.
The proof is generated by traversing the log's newer tree from the root and down
towards the last leaf in the old tree, selecting every sibling hash along the
way and stopping at the first subtree that does not cover any of the appended
leaves. If the old tree size is not perfect, the hash of the final subtree is
also added to the proof. (Otherwise that subtree hash is the same as the old
root hash. Therefore, it would be redundant to add as the verifier knows it.)
Below is a walk-through of the verification algorithm in RFC 9162, [Section
2.1.4.2][]. The goal is to provide an intuition of how to read and visualize
it. Note that it is a traversal from a subtree up towards both tree roots.
[Section 2.1.4.2]: https://datatracker.ietf.org/doc/html/rfc9162#section-2.1.4.2
**Input:**
- `first`: number of leaves in the old tree
- `second`: number of leaves in the new tree
- `first_hash`: root hash in the old tree
- `second_hash`: root hash in the new tree
- `consistency_path`: list of hashes that proves consistency
**Constraints:**
- `first` is larger than zero
- `first` is smaller than `second`
**Output:**
- Pass or fail
**Algorithm:**
1. If consistency_path is an empty array, stop and fail the proof
verification.
COMMENT: at least one proof hash is required with our constraints.
2. If first is an exact power of 2, then prepend first_hash to the
consistency_path array.
COMMENT: proof generation does not add the final subtree hash to the
proof if it is identical to the old root hash. Here it is instead
added to the proof by the verifier, such that the first two hashes in
proof always correspond to the first left and right nodes to hash
together in the new tree. This makes the algorithm more homogenus.
3. Set fn to first - 1 and sn to second - 1.
COMMENT: Vizualize the new tree. Place your first finger on the last
leaf in the old tree. Place your second finger on the last leaf.
4. If LSB(fn) is set, then right-shift both fn and sn equally until
LSB(fn) is not set.
COMMENT: our first finger points at a leaf. What we really want to be
pointing at is the subtree that the prover stopped at when generating
the proof. We know that our first finger can never be more to the
right than now (because it is at the last leaf in the old tree). If
we are pointing at a right node, it is certainly not the subtree that
the prover stopped at while generating the proof: the parent would
have been reached sooner, and it too does not contain any of the
appended leaves which is the criteria to stop going further down.
So, what's happening here is that we repeatedly move both fingers up
one level in the tree until the first finger points at the left-child
subtree where the prover stopped generating the proof.
5. Set both fr and sr to the first value in the consistency_path
array.
COMMENT: initialize fr and sr to the subtree hash our first finger
points at; fr will become the old root hash and sr the new root hash.
6. For each subsequent value c in the consistency_path array:
a. If sn is 0, then stop the iteration and fail the proof
verification.
COMMENT: same comment as in inclusion proof verification; both
fingers point at the root while there are unused proof hashes.
b. If LSB(fn) is set, or if fn is equal to sn, then:
COMMENT: same comment as in inclusion proof verification; our first
finger points at a right child.
COMMENT: left-node proof hashes are required to recompute both the
new and the old tree head. In contrast, only the new tree needs
to use the right-node proof hashes that cover appended leaves.
i. Set fr to HASH(0x01 || c || fr).
ii. Set sr to HASH(0x01 || c || sr).
iii. If LSB(fn) is not set, then right-shift both fn and sn
equally until either LSB(fn) is set or fn is 0.
COMMENT: same comment as in inclusion proof verification; we
just produced a hash with a left node that is at a higher
level. "Shorten the edge until we reach the same level".
Otherwise:
i. Set sr to HASH(0x01 || sr || c).
c. Finally, right-shift both fn and sn one time.
COMMENT: same comment as in inclusion proof verification. We hashed
the node that our first finger points at with a sibling hash that
was provided in the proof. Move both fingers up one tree level.
7. After completing iterating through the consistency_path array as
described above, verify that the fr calculated is equal to the
first_hash supplied, that the sr calculated is equal to the
second_hash supplied, and that sn is 0.
COMMENT: we verified both tree structures due to how we moved our
fingers, checking that the proof has the appropriate size and applying
hashes in the order indicated by the intermediate values of fn and sn.
The old tree head only consumed left-child subtree hashes in the proof
and we check that the resulting root hash is correct. When using the
same hashes but also adding right-node subtree hashes that only cover
appended leaves, we check that the resulting new root hash is correct.
Note that the required constraints result in two special cases that implementers
must treat separately before considering use of the above algorithm:
- `first == 0`: consistent with any `second > first` if `first_hash ==
SHA-256("")` and `second_hash != SHA-256("")`.
- `first == second`: consistent if `first_hash == second_hash`, and correct if
the two equal hashes are (non-)empty depending on tree size.
## Compact range extension
A typical monitoring flow is to have a current tree head of size `m`; upon
discovery of a tree head with size `n > m`, the new leaves at indices
`m,...,n-1` are fetched. A possible verification pattern is to check that:
- The new tree is consistent with the old tree
- The appended entries are included in the new tree
Fetching `n-m` entries is an inevitable linear monitoring cost. It is not
necessary to verify a linear number of logarithmic inclusion proofs, however.
Suppose that:
- An old tree head has size `m` and root `M`.
- A new tree head has size `n` and root `N`.
- Consistency `(m,M)` to `(n,N)` has been verified.
- Entries `m,...,n-1` for `m-n > 1` have been downloaded.
A verifier wants to check that the downloaded entries are included in the new
tree at indices `m,...,n-1` but without verifying `n-m` inclusion proofs.
The intuition of a solution is as follows. You can request an inclusion proof
for the entry at index `m` in the tree of size `n`. This assures that the first
leaf at index `m` is included in the new tree. Now note that _all right-node
subtree hashes_ in the requested inclusion proof covers each and every entry
after index `m`, i.e., `m+1,...,n-1`. As such, the verifier already knows what
they should be and can simply _confirm them_ while verifying a proof once.
Such confirmation is easy to add in the above inclusion proof verification
algorithm, i.e., in step 4b's otherwise-clause.
Below is an algorithm to generate the required subtree hashes to confirm.
**Input:**
- `nodes`: a list of `k` consecutive leaf nodes. Each node in the list has an
`index` and a `hash`.
**Constraints:**
- `k > 0`
- `node[0].index > 0`
- `node[i].index == 1 + node[i-1].index` for `i in [1,k)`
**Output:**
- The minimum number of hashes that cover the leaves in the specified range.
The first hash is from the lowest tree level, similar to an inclusion proof.
**Algorithm**:
1. Let hashes be an empty list.
2. Repeat until nodes contain a single node:
a) If nodes[1].index XOR 1 is not equal to nodes[0].index, then append
nodes[0].hash to hashes and remove nodes[0].
COMMENT: the first node's sibling is to the left. Therefore, we can
not progress further on this subtree.
b) Set i to zero.
c) While i is smaller than the current number of nodes:
i) If nodes[i] is not the last node, set nodes[i].hash to H(0x01 ||
nodes[i].hash || nodes[i+1].hash) and then remove nodes[i+1].
COMMENT: only the first and the last node may not have a sibling
on a given tree level. This is not the last node. We already
handled the case of the first node not having a sibling in 4a.
So, here an interior hash is produced for two sibling nodes.
ii) Right-shift nodes[i].index once.
COMMENT: after taking a pass over a node (be it after computing
an interior node or just noting that we're at the last node),
we need to update the index so that it refers to the parent.
This sets us up for taking a new pass at the next tree level.
iii) Increment i by one.
3. Append nodes[0].hash to hashes and output hashes.
Note that the above algorithm computes a [compact range][] for a list of
consecutive leaves when node-index zero is considered an invalid input. The
application of this in inclusion proof verification is a subset of the compact
range theory. It avoids an `O(klogn)` overhead while verifying `k` entries in
the tree of size `n`, and with a very small diff compared to inclusion proofs.
[compact range]: https://github.com/transparency-dev/merkle/blob/main/docs/compact_ranges.md
Loading