Stream directory downloads to reduce latency and save RAM
Here's a tricky one, but it has the potential to save time and memory.
Right now, the tor-dirclient
API downloads the entire requested object to RAM, decompressing it as we go. That's okay for stuff like consensus documents, where we will need the whole thing decompressed anyway, but it's less good for stuff like microdescriptors, where we'd like to handle each one as soon as we receive it, and we get a lot of them in a single document. This means that we're keeping like 10MB of temporary string data around when all but the most recent 3-4k is totally parsable.
It's also bad for latency, since we can be in a position where the information we would need to become bootstrapped is sitting in a download buffer, waiting for the download to complete.
We could save intermediate memory and latency by refactoring our downloader code to (optionally) return a bytestream of downloaded information, and then to write code to convert that bytestream into a Stream
of Microdesc
or AuthCert
.
This would require significant refactoring in bootstrap.rs.
Found while doing #87
Edited to add: One caveat here. Many prefixes of a microdescriptor are themselves valid microdescriptors. Thus, when parsing a stream of microdescriptors, you can't safely parse the last one until the stream is finished.
Edited to add: Another application of this approach: we have some interest in being able to reject consensus documents early if their first 1k describes a consensus we wouldn't use.