We don't archive microdesc consensuses, so you won't find any files containing them on the metrics website. That's why they aren't listed. The same applies to microdescs, too.
... so I'm not positive where it came from, but should that @type annotation be on the page?
A fine question. Does stem add that line to microdesc consensuses that it receives from Tor? What about microdescs?
So, it seems they're already supported in Stem, just not using that specific @type annotation.
Would it be a high-ish priority for any of these other types to get support or shall we wait until there's a need?
If I had to guess in what order need will arise, that would be: bridge-pool-assignments 1.0, tordnsel 1.0, torperf 1.0, directory 1.0. But it's hard to guess when that will be. bridge-pool-assignments might be relevant soon, the others maybe not.
We don't archive microdesc consensuses, so you won't find any files containing them on the metrics website. That's why they aren't listed. The same applies to microdescs, too.
Hmmm. Would you mind expanding the @type annotations to include things not on the metrics site?
I've been getting feedback from Aaron, and last night I overhauled stem's parse_file() function to make it more user friendly...
One of the changes that I made was to let users specify the descriptor_type and added a table of descriptor_type to class mappings.
I decided to use our @type annotations for the argument rather than making up something of my own because they provide a nice, canonical way of specifying descriptor formats. I'd rather not need to make up additional descriptor_types of my own to cover microdescriptors. :)
A fine question. Does stem add that line to microdesc consensuses that it receives from Tor? What about microdescs?
Nope. Pinged Ravi on irc to ask if he remembers where this came from.
If I had to guess in what order need will arise, that would be: bridge-pool-assignments 1.0, tordnsel 1.0, torperf 1.0, directory 1.0. But it's hard to guess when that will be. bridge-pool-assignments might be relevant soon, the others maybe not.
Ok. Let me know if/when a need arises.
You earlier made a list of tasks we needed to make stem the primary descriptor parsing library (ideally so we don't need to continue maintaining metrics-lib as well). Thoughts on the next step?
I'm a little hazy about the descriptor parsing work I did. I'm not sure where it came from, but Googling "network-status-microdesc-consensus-3" shows up nothing except references to this bit of code in Stem. It feels like I made it up. I can't imagine why I would do that. Maybe I attempted guessing? I don't know.
Hmmm. Would you mind expanding the @type annotations to include things not on the metrics site?
Not at all. Which annotations should we add?
You earlier made a list of tasks we needed to make stem the primary descriptor parsing library (ideally so we don't need to continue maintaining metrics-lib as well). Thoughts on the next step?
Parse microdescriptors - Smaller replacement for server descriptors.
Microdescriptors are not relevant for metrics.
Parse bridge pool assignments - Published by BridgeDB and sanitized by metrics.
This may become relevant once we make progress on the metrics-web replacement. This might be in the next few weeks.
Parse exit list entry - Published by DNSEL or TorBEL to indicate what ip address exit relay X had at timestamp Y.
Same as bridge pool assignments.
Parse Torperf output - Performance data measured by making periodic requests over the Tor network. We'll want to implement legacy/trac#3036 (closed) first.
This one should be postponed until after the Torperf rewrite. Hopefully, we'll have a new Torperf data format in a month from now.
Port Onionoo - See legacy/trac#6452 (closed) - One of the chief users of metrics-lib, Onionoo is the data provider for Atlas. There's a design document which might be a good starting point.
Sathya and I recently agreed to postpone this and resume Onionoo development in Java. Porting Onionoo isn't as simple as it seems. We'll need somebody with two months of free time for this. We should revisit this in 6 months from now.
Remote descriptor fetching - Ability to fetch descriptors via an authority or directory mirror's DirPort. This involves some tricky, but important performance optimizations like making requests to directories in parallel, requesting up to 96 descriptors in a single HTTP GET, using .z compression, etc.
Port DocTor - Fetches current consensuses and votes and outputs consensus problems.
These two would be nice. This is not related to the other stuff we're currently working on. If I have to do this myself, that's going to take months. But this could be done by an interested volunteer. Happy to guide that volunteer and review code.
Relay class - Convenience class for commonly requested relay attributes, which lazily loads a relay's server descriptor, microdescriptor, and network status entity as needed. This is more a todo item for the Controller class since it requires a control socket.
Okay, unrelated to metrics, it seems.
Port the statistics-aggregating portion of metrics-web - Provides data for graphs on the metrics website. Once we have that we only need a new metrics website before we can kill metrics-web.
I'm currently working on this together with a volunteer. Soon there will be code.
Port ExoneraTor - A website that tells you whether some IP address was a Tor relay. We could probably re-use the existing database schema with minor tweaks.
Searchable Tor descriptor and metrics data archive - Once we have that, we can turn off relay search and maybe even ExoneraTor.
I think I'd prefer a new searchable data archive that makes ExoneraTor obsolete over a simple rewrite of ExoneraTor.
Replace metrics-lib - Part of the goal of this project is to deprecate metrics-lib in favor of stem so we have one fewer services to maintain. When the above is done we should be very, very close - last step is to double check that we haven't missed anything. Realistically, we're talking about years here, not months.
Presently the only thing that I'm spotting are microdescriptors. "@type network-status-microdesc-consensus-3 1.0" or something similar for a microdescriptor network status documents would be nice. :)
This may become relevant once we make progress on the metrics-web replacement. This might be in the next few weeks.
Ok. Ping me when it does.
Sathya and I recently agreed to postpone this and resume Onionoo development in Java. Porting Onionoo isn't as simple as it seems. We'll need somebody with two months of free time for this. We should revisit this in 6 months from now.
If only I had more time. This looks like a really fun one.
Speaking of interesting projects, we should think of some good ones for GSoC later...
Presently the only thing that I'm spotting are microdescriptors. "@type network-status-microdesc-consensus-3 1.0" or something similar for a microdescriptor network status documents would be nice. :)
Done.
This may become relevant once we make progress on the metrics-web replacement. This might be in the next few weeks.
Ok. Ping me when it does.
Will do, thanks!
Sathya and I recently agreed to postpone this and resume Onionoo development in Java. Porting Onionoo isn't as simple as it seems. We'll need somebody with two months of free time for this. We should revisit this in 6 months from now.
If only I had more time. This looks like a really fun one.
Speaking of interesting projects, we should think of some good ones for GSoC later...
Onionoo is fun, but it's quite easy to underestimate the effort of rewriting it in Python. The problem is that we need a full replacement that is as reliable as the Java Onionoo, or it won't be of any use. That's different from other GSoC projects where an 80% version of what was originally planned can be quite useful. I think I wouldn't recommend Onionoo as a GSoC project.
This is getting off-topic. Should we close the ticket and move the GSoC discussion somewhere else? :)