Skip to content

rep_hist_format_hs_stats() should add noise, then round

In order to guarantee differential privacy, we need to:

  • sample at the scale of the noise (not unit scale)
  • add the noise to the signal
  • round the noisy signal

This is the "snapping" mitigation from "On Significance of the Least Significant Bits For Differential Privacy" by Ilya Mironov https://pdfs.semanticscholar.org/2f2b/7a0d5000a31f7f0713a3d20919f9703c9876.pdf

rep_hist_format_hs_stats() rounds once to the bin size, then adds noise which has been rounded to the nearest integer. This isn't ideal, because it makes the least significant bits of the noise meaningless.

Instead, we should:

  • round the noise to integer precision
  • add the signal to the noise
  • round the noisy signal to the bin size

I think this was introduced in 14e83e62.

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information