Commit a315eed5 authored by Nick Mathewson's avatar Nick Mathewson 🤹
Browse files

Proposal 298: canonicalize family lines

parent 3c34000c
Loading
Loading
Loading
Loading
+2 −0
Original line number Diff line number Diff line
@@ -218,6 +218,7 @@ Proposals by number:
295  Using ADL-GCM for relay cryptography (solving the crypto-tagging attack) [OPEN]
296  Have Directory Authorities expose raw bandwidth list files [OPEN]
297  Relaxing the protover-based shutdown rules [OPEN]
298  Putting family lines in canonical form [OPEN]


Proposals by status:
@@ -250,6 +251,7 @@ Proposals by status:
   295  Using ADL-GCM for relay cryptography (solving the crypto-tagging attack)
   296  Have Directory Authorities expose raw bandwidth list files
   297  Relaxing the protover-based shutdown rules [for 0.3.5.x]
   298  Putting family lines in canonical form [for 0.3.6.x]
 ACCEPTED:
   188  Bridge Guards and other anti-enumeration defenses
   249  Allow CREATE cells with >505 bytes of handshake data
+62 −0
Original line number Diff line number Diff line
Filename: 298-canonical-families.txt
Title: Putting family lines in canonical form
Author: Nick Mathewson
Created: 31-Oct-2018
Status: Open
Target: 0.3.6.x

1. Introduction

   With ticket #27359, we begin encoding microdescriptor families in
   memory in a reference-counted form, so that if 10 relays all list the
   same family, their family only needs to be stored once.  For large
   families, this has the potential to save a lot of RAM -- but only if
   the families are the same across those relays.

   Right now, family lines are often encoded in different ways, and
   placed into consensuses and microdescriptor lines in whatever format
   the relay reported.

   This proposal describes an algorithm that authorities should use
   while voting to place families into a canonical format.

   This algorithm is forward-compatible, so that new family line formats
   can be supported in the future.

2. The canonicalizing algorithm

   To make a the family listed in a router descriptor canonical:

      For all entries of the form $hexid=name or $hexid~name, remove
      the =name or ~name portion.

      Remove all entries of the form $hexid, where hexid is not 40
      hexadecimal characters long.

      If an entry is a valid nickname, put it into lower case.

      If an entry is a valid $hexid, put it into upper case.

      If there are any entries, add a single $hexid entry for the relay
      in question, so that it is a member of its own family.

      Sort all entries in lexical order.

      Remove duplicate entries.

   Note that if an entry is not of the form "nickname", "$hexid",
   "$hexid=nickname" or "$hexid~nickname", then it will be unchanged:
   this is what makes the algorithm forward-compatible.

3. When to apply this algorithm

   We allocate a new consensus method number.  When building a consensus
   using this method or later, before encoding a family entry into a
   microdescriptor, the authorities should apply the algorithm above.

   Relay MAY apply this algorithm to their own families before
   publishing them.  Unlike authorities, relays SHOULD warn about
   unrecognized family items.