Our current memwipe implementation (see legacy/trac#7352 (moved)) was chosen for being very-likely-to-be-safe, not for its speed. We could get much faster, probably: All we really need is a memwipe, plus some way to tell the compiler that the memwipe shouldn't be eliminated.
so a while ago it was pointed out to me that I should not count on memset to zero sensitive memory because it could be optimized out, and was pointed to a Tor ticket (by NickM) that discussed this same thing in Tor and the solution suggested in that ticket was to use openssl cleanse instead
but the ticket also pointed out that this was a heavy handed solution and there should be something better thought up and today I told this to somebody else who is developing a security program , to make sure to not use memset like I was in case they were, and to use openssl cleanse instead, and they said that a better solution is to use the word volatile in the cast because then the compiler doesn't optimize it out http://www.lix.polytechnique.fr/~liberti/public/computing/prog/c/C/SYNTAX/volatile.html
so maybe that is better for Tor too , I dunno, just passing it on
The volatile-based thing is actually pretty sure to work, assuming that the compiler isn't busted. It turns out that it's pretty slow in practice, though. Another way to use volatile is:
The volatile-based thing is actually pretty sure to work, assuming that the compiler isn't busted.
If the compiler isn't broken, a plain memset is sufficient. The point of this ticket is to find a memory-erasing function that broken compilers are not yet smart enough to recognize and remove.
OS X (10.9?) supports memset_s, as documented in the manual pages as of February 21, 2012.
We'd have to #define STDC_WANT_LIB_EXT1 1 before (first) including string.h, which could be fun.
void *guaranteed_memset(void *v,int c,size_t n) { char memset_failed = 0; int i; /* do the memset in a fast, safe, platform-specific way */ for (i = 0; i < n; i++) { if (v[i] != c) { memset_failed = 1; break; } } assert(!memset_failed); return v;}
we could prove the overwrite occurred, and ensure compilers don't/haven't optimise(d) it away.
This is the safest approach - one question: should we guard it with #if PARANOIA, or do it all the time?
(I favour doing it every time we overwrite, as it is a vital security property. However, it will be slower to read all the memory again.)
"If you don't like the "volatile" solution
(I don't, because access to volatile objects may be significantly slowed,
and because aliasing volatile objects with non-volatile-qualified pointers
and accessing through such an alias invokes undefined behavior, and because
volatile seems like the sort of thing broken implementations may get wrong),
use the external-memset-wrapper one I proposed in my previous note."
I don't like undefined behaviour - it tends to make conforming, optimising compilers misbehave, and unpredictably as they add new optimisations. We don't want to do things for broken compilers that break conformant ones.
To be clear, I don't think any of my patch requires undefined behavior; do you? In most cases, it accesses non-volatile objects through a volatile pointer; not the other way around. Does the standard forbid that too? Or have I misunderstood?
Also fwiw, the "external memset wrapper" solution isn't going to work with any compiler that does whole-program optimization.
The check-after-memset thing you propose might work too .. but I think that a compiler is also technically allowed to optimize that whole thing out, along with the memset, if it can prove that nothing else will look at the buffer afterwards.
Apologies, I wasn't clear - I was commenting on the solution proposed in the article, not your patch.
However, this quote worries me:
"volatile seems like the sort of thing broken implementations may get wrong"
But, ultimately, there is only so much we can do to work around broken compilers.
"The check-after-memset thing you propose might work too .. but I think that a compiler is also technically allowed to optimize that whole thing out, along with the memset, if it can prove that nothing else will look at the buffer afterwards."
The assert() guarantees that there will be output if the buffer isn't cleared.
However, do you think a compiler could prove to itself that:
if it executed the code, there would never be any output from the assert()
therefore, it doesn't need to execute the code or the assert?
I guess it could. Screwy logic though.
I think asserting on the value of a volatile pointer fixes this.
How do you feel about:
#if PARANOIAstatic voidmemwipe_checker(volatile char *p, char c, size_t sz){ /* check we filled the block with the right values */ while (sz--) assert(*p++ == c);}#endifvoidmemwipe(void *mem, unsigned char byte, size_t sz){ /* ... memory wiping code ... */#if PARANOIA /* if we're paranoid, check we actually wiped the memory */ memwipe_checker(mem, byte, sz);#endif
We could also make this level of PARANOIA mandatory, at some cost to performance.
I think the buffers could all be declared volatile:
if char buf[2048] (secret) is declared volatile, this means that the compiler cannot predict the value of sum, and therefore isn't allowed to precompute it (I can imagine a series of optimisation steps: optimise memcpy to static initialisation, optimise sum based on static initialisation, notice code similarity and use same value for all three sums, avoid compiling most of the program...)
if char buf[1024] (check) is declared volatile, it is no longer "uninitialised" memory - it is externally modifiable memory, that the compiler can never predict the value of.
using volatile also avoids all caching of values from these buffers, making sure the data is really written to and read from memory, not just cached in a large L3 cache somewhere.
volatile is a way of telling the compiler: "I know what I'm doing here."
If we were really paranoid, we could test with and without volatile on char buf[2048] (secret).
Also, should we test heap allocations as well as stack allocations?
They could have very different behaviours, and therefore security characteristics.
Well, for test-memwipe.c, we want the compiler to remove the memwipe if it can. I worry that making 'secret' volatile might keep some compiler out there from using its memset optimization, and thereby make our test invalid.
Heap allocations are probably even simpler to test than stack allocations. Let me see what I can come up with.
actually, the openssl implementation is plenty fast, it turns out. The horrible kludgy implementation is only used when there isn't an asm one available, and there appear to be asm ones for most platforms.
I'm going to suggest we take the test for this in 0.2.7 though; tests are good.
Trac: Milestone: Tor: 0.2.7.x-final to Tor: 0.2.???
Here's a patch that checks if the platform supports explicit_bzero(). Tested on OpenBSD.
PASS: src/test/test-memwipe
There are multiple changes suggested in this ticket:
nickm's original branch better_memwipe attempted to avoid using OpenSSL_cleanse for performance reasons by checking if a simple memset worked. I was happy to see it merged, but we never did because there was no evidence that performance was an issue.
The attached explicit_bzero has better semantics than OpenSSL_cleanse, and is faster. Let's merge it.
As discussed on IRC, we can also use memset_s on platforms that support it (NetBSD, OS X):
So we can do the following:
if memset_s is supported:
call memset_s to wipe the memory and set the bytes
As long as we build with headers that contain SecureZeroMemory, the code will work on any Windows version:
"This function is defined as the RtlSecureZeroMemory function (see WinBase.h). The implementation of RtlSecureZeroMemory is provided inline and can be used on any version of Windows (see WinNT.h.)"
run:
Jan 03 xx:xx:xx.000 [notice] Bootstrapped 0%: Starting
Jan 03 xx:xx:xx.000 [notice] Bootstrapped 80%: Connecting to the Tor network
Jan 03 xx:xx:xx.000 [notice] Bootstrapped 85%: Finishing handshake with first hop
Jan 03 xx:xx:xx.000 [notice] Bootstrapped 90%: Establishing a Tor circuit
Jan 03 xx:xx:xx.000 [notice] Tor has successfully opened a circuit. Looks like client functionality is working.
Jan 03 xx:xx:xx.000 [notice] Bootstrapped 100%: Done
It's a straightforward patch that works on later versions of OpenBSD, NetBSD, and OS X.
(Using memset_s(..., ..., 0, ...) then memset(..., ..., byte, ...) is redundant, but I'm ok with that. It makes our choice between alternative functions clearer, which is important in security-critical code.)