Skip to content

Remove inconsistency between bandwidth history graphs

Today I found an inconsistency between our various bandwidth history graphs:

The reason is that we're only matching consensuses and extra-info descriptors for the second and third graph, but not for the first. And we need to do that in order to break down totals by guards/exits.

While it may seem simpler to just skip that matching step in the first graph, it leads to inconsistent data. Consider the following data taken from bandwidth.csv:

date isexit isguard advbw bwread bwwrite dirread dirwrite
2018-03-14 f f 6757900851 2454444601 2493893288
2018-03-14 f t 15218678985 7024742679 7191640536
2018-03-14 t f 1592294787 562694042 558274048
2018-03-14 t t 6189896122 3322794675 3356316394
2018-03-14 29758770745 13367602689 13603416291 6877369 187770410

In theory, the sum of the first four rows should match the fifth row, modulo rounding errors.

This works for advertised bandwidth (which is based on server descriptor data, not extra-info descriptors). But it does not work for bandwidth histories:

6757900851 + 15218678985 + 1592294787 + 6189896122 - 29758770745 = 0
2454444601 + 7024742679 + 562694042 + 3322794675 - 13367602689 = -2926692
2493893288 + 7191640536 + 558274048 + 3356316394 - 13603416291 = -3292025

The difference comes from relays that reported bandwidth histories but that the directory authorities did not list as running.

Suggestion: we simply omit the bandwidth totals for cases where we have values by exit/guard flags:

date isexit isguard advbw bwread bwwrite dirread dirwrite
2018-03-14 f f 6757900851 2454444601 2493893288
2018-03-14 f t 15218678985 7024742679 7191640536
2018-03-14 t f 1592294787 562694042 558274048
2018-03-14 t t 6189896122 3322794675 3356316394
2018-03-14 29758770745 13367602689 13603416291 6877369 187770410

We'd remove an inconsistency by doing so, and we'd remove some code. The graphing code would have to do one more step to aggregate data from four rows, but that's not critical.

If this sounds reasonable to others, I'll prepare a patch. Setting to needs_review for the idea, not for code.

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information