Improve our handling of stochastic tests

We have cool stochastic tests for our probability distributions but we are not handling very well their stochastic nature.

As part of legacy/trac#29693 (moved) we greatly reduced their false positive rate, but that also reduces their use.

There might be other approaches we could take to make them more useful for us. examples: https://trac.torproject.org/projects/tor/ticket/29693#comment:4