Hash PRNG output before use, so that it's not revealed to the network

changed milestone to %Tor: 0.2.8.x-final in legacy/trac

added component::core tor/tor in Legacy / Trac milestone::Tor: 0.2.8.x-final in Legacy / Trac priority::medium in Legacy / Trac resolution::fixed in Legacy / Trac severity::normal in Legacy / Trac status::closed in Legacy / Trac type::enhancement in Legacy / Trac version::tor unspecified in Legacy / Trac labels

FWIW, though I don't generally object to belt-and-suspenders engineering, I'm not 100% sure of this approach; if we think there's any chance that future outputs of our PRNG can be predicted from our past PRNG outputs, we should probably throw away the PRNG and use a good one, right? (Or for that matter, we should revise openssl to stop exposing PRNG outputs in that case too.)

Yes, we should always use a PRNG that's unpredictable (and switch as soon as we suspect it's not). On that topic, OpenSSL still uses SHA-1 internally for its PRNG, and we're trying to phase out SHA-1. But I'm not sure if known SHA-1 vulnerabilities affect its use in the OpenSSL PRNG.

Hashing PRNG output helps protect previous random outputs, if we discover later on that our PRNG was more predictable than we thought. (It works kinda like forward secrecy for random numbers.)

On the other hand, hashing PRNG output could introduce vulnerabilities if any bits of the hash function's output are correlated with each other.

Since I'd written the code already, I updated my rand-hash-v2 branch based on bug17686_v2_027 in legacy/trac#17686 (moved). See https://github.com/teir2345/tor.git

It contains the modifications to crypto_rand, and new benchmarks comparing crypto_rand (hashed PRNG output) with crypto_rand_raw (PRNG output).

While I don't think PRNG performance is critical-path, on typical random data requests, random performance increases up to 48% at smaller sizes (4 bytes) due to the hash output buffer, and performance decreases up to 27% at larger sizes (32 bytes).

If we do choose to go ahead with this, we should run the unit tests on both crypto_rand_raw and crypto_rand. I can update the existing unit tests to do that, and I'd like to include the unit tests in legacy/trac#17697 (moved) as well. But I'll wait until we decide we want this change.

Trac:
Status: new to needs_review

nickm, ioerror: I don't know whether we want to go ahead with hashing PRNG output. (Rather than, say, using a PRNG that's not vulnerable like Dual EC was.)

Are you happy for me to close this ticket, or is it a feature we need?

Trac:
Cc: N/A to nickm, ioerror
Status: needs_review to needs_information

I asked a cryptographer sitting next to me and they suggested that my suggested approach is reasonable. Hash the PRNG output. Or more directly: "Never reveal pure randomness outputs."

For example, what happens when someone uses RDRAND internally? In theory, nothing in our stack does that; in practice, I guess that is incorrect. Something in the kernel, for example? Some weird OpenSSL builds or something?

Not to say that RDRAND is backdoored but that if it was, I think a hash would prevent what we currently understand to be a way to exploit a Dual-EC like backdoor.

Replying to ioerror:

I asked a cryptographer sitting next to me and they suggested that my suggested approach is reasonable. Hash the PRNG output. Or more directly: "Never reveal pure randomness outputs."

Could you ask the cryptographer to jabber or IRC or email me quickly or something? I don't think the reasoning makes sense in this case, for a few of reasons:

We know what the PRNG is.
OpenSSL already exposes the PRNG outputs (in ClientHello and ServerHello, at least), and we can't stop it from doing that without replacing the PRNG.
The PRNG is under our control. The answer to a questionable PRNG would seem to be replacing it with a non-broken PRNG.

For example, what happens when someone uses RDRAND internally?

We detect attempts to replace the OpenSSL PRNG with the RDRAND engine, or any other engine; see legacy/trac#10402 (moved).

In theory, nothing in our stack does that; in practice, I guess that is incorrect. Something in the kernel, for example? Some weird OpenSSL builds or something?

Not to say that RDRAND is backdoored but that if it was, I think a hash would prevent what we currently understand to be a way to exploit a Dual-EC like backdoor.

It would only prevent it if openssl itself stopped exposing direct outputs, right?

Replying to nickm:

Replying to ioerror:

I asked a cryptographer sitting next to me and they suggested that my suggested approach is reasonable. Hash the PRNG output. Or more directly: "Never reveal pure randomness outputs."

Could you ask the cryptographer to jabber or IRC or email me quickly or something? I don't think the reasoning makes sense in this case, for a few of reasons:

We know what the PRNG is.

I guess "never" was the operative word.

OpenSSL already exposes the PRNG outputs (in ClientHello and ServerHello, at least), and we can't stop it from doing that without replacing the PRNG.

Yes, though the question is - do we want our protocol to have the same mistake? I think no.

The PRNG is under our control. The answer to a questionable PRNG would seem to be replacing it with a non-broken PRNG.

We hope.

For example, what happens when someone uses RDRAND internally?

We detect attempts to replace the OpenSSL PRNG with the RDRAND engine, or any other engine; see legacy/trac#10402 (moved).

If that fails, do we fail closed or open?

In theory, nothing in our stack does that; in practice, I guess that is incorrect. Something in the kernel, for example? Some weird OpenSSL builds or something?

Not to say that RDRAND is backdoored but that if it was, I think a hash would prevent what we currently understand to be a way to exploit a Dual-EC like backdoor.

It would only prevent it if openssl itself stopped exposing direct outputs, right?

If openssl did things correctly, I'd still suggest our protocol should not reveal things. An extra hash shouldn't hurt and it may mitigate issues, especially if our analysis were incorrect. Even if we properly prevent RDRAND's use with OpenSSL, if we ever get random data from the kernel, I'm really unsure that we can prevent that from using RDRAND. I guess I'd rather be safe than sorry and I'm not convinced I understand every aspect enough.

Additionally, alternative implementations of Tor will be more fragile than Tor.

Okay, I still think I'm right, but after an IRC discussion with ioerror, I'm a bit concerned about the number of smart cautious people who disagree with me.

I think our best move forward here is:

Never do anything raw with the system entropy, except feeding it into a PRNG or a hash.
Whenever the PRNG might be exposed, hash it first. Split crypto_rand() into crypto_rand() and crypto_rand_exposed().
Amend specifications with a note on PRNG usage, describing the above.

Trac:
Status: needs_information to needs_revision

feature17694_strongest_027 starts this, based on the code for legacy/trac#17686 (moved). I suggest it for inclusion in 0.2.8 only. I'm going to extend it to know about other prngs.

Trac:
Status: needs_revision to needs_review

Replying to nickm:

feature17694_strongest_027 starts this, based on the code for legacy/trac#17686 (moved). I suggest it for inclusion in 0.2.8 only. I'm going to extend it to know about other prngs.

I would really like to see comments added to crypto_strongest_rand() since this is not that obvious to get at first glance. For instance:

Why is inp twice the size and the second half is randomized differently? (I assume to mix different entropy source but would be nice that it's explain why we do that).
Why do we tor_assert(0) instead of clean exit?

The rest lgtm! I'm those in favor of this as well which will simplifies things for prop#250 that right now explicitely hash the return value of crypto_rand() as a precaution of not leaking any raw bytes to the wire.

Replying to nickm:

feature17694_strongest_027 starts this, based on the code for legacy/trac#17686 (moved). I suggest it for inclusion in 0.2.8 only. I'm going to extend it to know about other prngs.

This patch only hashes the entropy used in keys.

I don't think this achieves the overall goal: "make sure we never leak raw PRNG output to the network".

We can easily leak raw PRNG output via salts, nonces and other randomly chosen values that are sent on the wire.

Even our "random" choices of relays could leak some bits.

Replying to teor:

I don't think this achieves the overall goal: "make sure we never leak raw PRNG output to the network".

We can easily leak raw PRNG output via salts, nonces and other randomly chosen values that are sent on the wire.

Even our "random" choices of relays could leak some bits.

At some point this becomes rather silly, not to mention expensive, to the point where "We should ditch OpenSSL's CSPRNG, if we don't trust it if state gets exposed somehow instead of always passing output through extra hash functions" becomes compelling.

IMO that point now has been reached. Others are free to disagree with me.

Replying to yawning:

Replying to teor:

I don't think this achieves the overall goal: "make sure we never leak raw PRNG output to the network".

We can easily leak raw PRNG output via salts, nonces and other randomly chosen values that are sent on the wire.

Even our "random" choices of relays could leak some bits.

At some point this becomes rather silly, not to mention expensive, to the point where "We should ditch OpenSSL's CSPRNG, if we don't trust it if state gets exposed somehow instead of always passing output through extra hash functions" becomes compelling.

Depending on how it's implemented, the hashing has a performance cost of about -40% (if we hash an entire buffer, then use it for multiple small random calls) to +30% on Tor's typical input sizes.

IMO that point now has been reached. Others are free to disagree with me.

To be clear, I'm agnostic on hashing PRNG output. But if we want to prevent leaking PRNG bits, we should hash all the bits that could be sent out from or be observed from outside the process.

Replying to teor:

Depending on how it's implemented, the hashing has a performance cost of about -40% (if we hash an entire buffer, then use it for multiple small random calls) to +30% on Tor's typical input sizes.

For the former case, that reads suspiciously like "re-implement libottery as a Hash_DRBG", sitting on top of another hash based CSPRNG (OpenSSL's).

IMO that point now has been reached. Others are free to disagree with me.

To be clear, I'm agnostic on hashing PRNG output. But if we want to prevent leaking PRNG bits, we should hash all the bits that could be sent out from or be observed from outside the process.

I'm mostly agnostic here. I think anything more than what nickm's branch does warrants re-evaluating why we are doing this, and if there's a way to get what we want efficiently (since RAND_bytes() is hashing entropy from the system entropy source. I'm not convinced that's not good enough, at all.).

I've added to feature17694_strongest_027 to address dgoulet's comments. Please have a quick look to make sure I haven't neutered our rng code?

As for hashing everything else, I think that's best done in a separate branch if at all.

(After a quick look over the code, I do agree with teor that the vast majority of our PRNG usage should be regarded as "potentially visible to an observer.")

Looks good to me. Split off legacy/trac#17799 (moved) for "hash everything", not just the important things.

Still looks good to me. Merging this to master.

Marking as possible 0.2.7 backport, but I'm thinking "no backport" on this stuff.

Trac:
Milestone: Tor: 0.2.8.x-final to Tor: 0.2.7.x-final

Compilation is failing with

src/common/crypto.c:2484:1: error: no previous prototype for ‘crypto_strongest_rand_raw’ [-Werror=missing-prototypes]
 crypto_strongest_rand_raw(uint8_t *out, size_t out_len)
 ^

Trac:
Status: needs_review to needs_revision

fixed that with a static

Now the build on precise is failing with

../src/common/crypto.c: In function 'crypto_strongest_rand':
../src/common/crypto.c:2530:1: error: stack protector not protecting local variables: variable length buffer [-Werror=stack-protector]
cc1: all warnings being treated as errors

It looks like using SHA512_DIGEST_LENGTH indirectly isn't working. Why wasn't it used directly? And why is SHA512_DIGEST_LENGTH used while the rest of the Tor code base uses DIGEST512_LEN?

Better now?

It looks like using SHA512_DIGEST_LENGTH indirectly isn't working. Why wasn't it used directly?

It made the code really ugly.

And why is SHA512_DIGEST_LENGTH used while the rest of the Tor code base uses DIGEST512_LEN?

I'm using a particular hash, not a generic 512-bit hash. Also, the crypto.c code is allowed to know about openssl stuff at the low-level.

Replying to nickm:

Better now? Jenkins seems to be happy, so yes. Thank you.

I did a review on the crypto_strongest_rand function and i believe the out addition and out_len subtraction in the else block can be removed. (At first i thought this would cause a buffer overflow until i saw the break statement, but it still causes an integer overflow in out_len).

Currently leaning towards "no backport". Anybody object?

Replying to nickm:

Currently leaning towards "no backport". Anybody object?

This is complex code, and it's a precautionary measure, so I'd be ok with "no backport".

Okay, marking these tickets as "no backport." Please reopen if you disagree.

Trac:
Resolution: N/A to fixed
Milestone: Tor: 0.2.7.x-final to Tor: 0.2.8.x-final
Status: needs_revision to closed

closed

mentioned in issue legacy/trac#17799 (moved)

moved from legacy/trac#17694 (moved)

added Feature label and removed 1 deleted label

removed 1 deleted label

Hash PRNG output before use, so that it's not revealed to the network

Child items ...

Activity