Make user statistics more robust against outliers
tl;wr: From June 11 to 13, 2015, the number of bridge users briefly went up from around 20k to 140k. A closer investigation of the underlying data revealed that the aggregate statistics reported by a single bridge were responsible for this major spike. The estimation method used for user statistics should be made robust against outliers, possibly by applying the more recently developed techniques that are used to extrapolate hidden-service statistics.
Here are more details about that single bridge reporting almost unbelievable high statistics: It's the bridge with nickname "solemnizersfiaun" and hashed fingerprint 420C39C86B0E71F653E18552B28B9189DA2F1377 that reported to have served up to 80k users. But from the bandwidth statistics it looks like that bridge actually answered a huge number of consensus requests during those days in June. It pushed up to 20 MB/s, which is probably rather unusual for a bridge. A closer look at the descriptor tells us that most of these bytes were used to answer directory requests. (I didn't do the math whether a such a burst over a few hours would be sufficient to write 800k compressed consensuses.) So, either the bridge is telling us the truth, or it's lying to us in a very sophisticated way.
And it's not only that bridge that reported very high statistics in June. There's another bridge with nickname "Unnamed" and hashed fingerprint 82F37B9A8400A1E0C0730D8E4639150AE11AC640 that reported to have served around 10k users on June 18 and 22. Similarly, that bridge reported extremely high traffic during those days. I didn't look for more bridges, but it's possible that there were more that reported unusual numbers that didn't stand out as much as these.
So, I'm not sure if we'll find out what exactly happened there, but it seems very unrealistic that these directory requests were generated by actual human users. That's why I think we should remove these outliers in our estimation method.