Consider to scale using the descriptors' bandwidth mean instead of the last one, as Torflow does
Digging again into Torflow, i realized that it stores the descriptors' bandwidth mean of all the values it got for a relay, not just the last. The code in sqlsupport.py:
avg_desc_bw = select([func.avg(BwHistory.desc_bw)],
BwHistory.table.c.router_idhex
== RouterStats.table.c.router_idhex).as_scalar()
[...]
RouterStats.table.update(values=
{RouterStats.table.c.min_rank:min_r,
RouterStats.table.c.avg_rank:avg_r,
RouterStats.table.c.max_rank:max_r,
RouterStats.table.c.avg_bw:avg_bw,
RouterStats.table.c.avg_desc_bw:avg_desc_bw}).execute()
And it's this desc_bw
that gets written in the data/scanner.*/scan-data/bws-*done*
and later read by aggregate.py.
Same happen with the consensus bandwidth, though it is not used for scaling, only to calculate the percent of the total weight with respect the last consensus.
I tried a patch an with the measurements from the last month, the total weight was slightly lower.
We're not sure whether it's actually more accurate using the last descriptors. Creating this issue just in case we need to get even closer to Torflow behavio ur.