Choose a faster memwipe implementation

changed milestone to %Tor: 0.2.8.x-final in legacy/trac

added component::core tor/tor in Legacy / Trac milestone::Tor: 0.2.8.x-final in Legacy / Trac priority::medium in Legacy / Trac resolution::implemented in Legacy / Trac severity::normal in Legacy / Trac status::closed in Legacy / Trac tor-relay in Legacy / Trac type::enhancement in Legacy / Trac labels

Also see http://www.dwheeler.com/secure-programs/Secure-Programs-HOWTO/protect-secrets.html :

 void *guaranteed_memset(void *v,int c,size_t n)
  { volatile char *p=v; while (n--) *p++=c; return v; }

(Totally untested on modern compilers; the advice at the end of that page makes me think that the document could be a bit dated.)

Replying to nickm:

{ volatile char *p=v; while (n--) *p++=c; return v; }

I'm no expert here, but it seems to me that anything that doesn't use the changed memory after the change is just asking for trouble.

hoppy on #nottor:

so a while ago it was pointed out to me that I should not count on memset to zero sensitive memory because it could be optimized out, and was pointed to a Tor ticket (by NickM) that discussed this same thing in Tor and the solution suggested in that ticket was to use openssl cleanse instead but the ticket also pointed out that this was a heavy handed solution and there should be something better thought up and today I told this to somebody else who is developing a security program , to make sure to not use memset like I was in case they were, and to use openssl cleanse instead, and they said that a better solution is to use the word volatile in the cast because then the compiler doesn't optimize it out http://www.lix.polytechnique.fr/~liberti/public/computing/prog/c/C/SYNTAX/volatile.html so maybe that is better for Tor too , I dunno, just passing it on

The volatile-based thing is actually pretty sure to work, assuming that the compiler isn't busted. It turns out that it's pretty slow in practice, though. Another way to use volatile is:

    void * (* volatile memset_volatile)(void *, int, size_t) = memset; 
    ...
    memset_volatile(p, c, n);

(via a NetBSD mailing list at http://netbsd.2816.n7.nabble.com/Adding-memset-s-function-td43349.html .)

Another option is the memset_s() function added in the the C11 standard. Other than MS, though, I don't think any of the libcs we use support it yet.

Replying to nickm:

The volatile-based thing is actually pretty sure to work, assuming that the compiler isn't busted.

If the compiler isn't broken, a plain memset is sufficient. The point of this ticket is to find a memory-erasing function that broken compilers are not yet smart enough to recognize and remove.

This might actually be worth doing, since OPENSSL_cleanse is such a pile of voodoo.

See branch better_memwipe in my public repository.

Trac:
Milestone: Tor: unspecified to Tor: 0.2.6.x-final
Status: new to needs_review

Apply a nickm-patch keyword to tickets in needs_review in 0.2.6 where I wrote the patch, so I know which ones I can('t) review myself.

Trac:
Keywords: N/A deleted, nickm-patch added

OS X (10.9?) supports memset_s, as documented in the manual pages as of February 21, 2012. We'd have to #define STDC_WANT_LIB_EXT1 1 before (first) including string.h, which could be fun.

If we do something like:

void *guaranteed_memset(void *v,int c,size_t n) {

  char memset_failed = 0;
  int i;

  /* do the memset in a fast, safe, platform-specific way */

  for (i = 0; i < n; i++) {
    if (v[i] != c) {
      memset_failed = 1;
      break;
    }
  }

  assert(!memset_failed);

  return v;
}

we could prove the overwrite occurred, and ensure compilers don't/haven't optimise(d) it away.

This is the safest approach - one question: should we guard it with #if PARANOIA, or do it all the time?

(I favour doing it every time we overwrite, as it is a vital security property. However, it will be slower to read all the memory again.)

What do you think, nickm, arma, rransom?

Also, from http://www.securityfocus.com/archive/82/298061/2002-10-27/2002-11-02/0 linked from http://www.dwheeler.com/secure-programs/Secure-Programs-HOWTO/protect-secrets.html which is quoted in comment 1:

"If you don't like the "volatile" solution (I don't, because access to volatile objects may be significantly slowed, and because aliasing volatile objects with non-volatile-qualified pointers and accessing through such an alias invokes undefined behavior, and because volatile seems like the sort of thing broken implementations may get wrong), use the external-memset-wrapper one I proposed in my previous note."

I don't like undefined behaviour - it tends to make conforming, optimising compilers misbehave, and unpredictably as they add new optimisations. We don't want to do things for broken compilers that break conformant ones.

To be clear, I don't think any of my patch requires undefined behavior; do you? In most cases, it accesses non-volatile objects through a volatile pointer; not the other way around. Does the standard forbid that too? Or have I misunderstood?

Also fwiw, the "external memset wrapper" solution isn't going to work with any compiler that does whole-program optimization.

The check-after-memset thing you propose might work too .. but I think that a compiler is also technically allowed to optimize that whole thing out, along with the memset, if it can prove that nothing else will look at the buffer afterwards.

Apologies, I wasn't clear - I was commenting on the solution proposed in the article, not your patch.

However, this quote worries me: "volatile seems like the sort of thing broken implementations may get wrong"

But, ultimately, there is only so much we can do to work around broken compilers.

"The check-after-memset thing you propose might work too .. but I think that a compiler is also technically allowed to optimize that whole thing out, along with the memset, if it can prove that nothing else will look at the buffer afterwards."

The assert() guarantees that there will be output if the buffer isn't cleared.

However, do you think a compiler could prove to itself that:

if it executed the code, there would never be any output from the assert()
therefore, it doesn't need to execute the code or the assert?

I guess it could. Screwy logic though.

I think asserting on the value of a volatile pointer fixes this.

How do you feel about:

#if PARANOIA
static void
memwipe_checker(volatile char *p, char c, size_t sz)
{
  /* check we filled the block with the right values */
  while (sz--)
    assert(*p++ == c);
}
#endif

void
memwipe(void *mem, unsigned char byte, size_t sz)
{
  /* ... memory wiping code ... */
#if PARANOIA
  /* if we're paranoid, check we actually wiped the memory */
  memwipe_checker(mem, byte, sz);
#endif

We could also make this level of PARANOIA mandatory, at some cost to performance.

Re: the patch itself - test-memwipe.c

I think the buffers could all be declared volatile:

if char buf[2048] (secret) is declared volatile, this means that the compiler cannot predict the value of sum, and therefore isn't allowed to precompute it (I can imagine a series of optimisation steps: optimise memcpy to static initialisation, optimise sum based on static initialisation, notice code similarity and use same value for all three sums, avoid compiling most of the program...)
if char buf[1024] (check) is declared volatile, it is no longer "uninitialised" memory - it is externally modifiable memory, that the compiler can never predict the value of.
using volatile also avoids all caching of values from these buffers, making sure the data is really written to and read from memory, not just cached in a large L3 cache somewhere. volatile is a way of telling the compiler: "I know what I'm doing here."

If we were really paranoid, we could test with and without volatile on char buf[2048] (secret).

Also, should we test heap allocations as well as stack allocations? They could have very different behaviours, and therefore security characteristics.

Well, for test-memwipe.c, we want the compiler to remove the memwipe if it can. I worry that making 'secret' volatile might keep some compiler out there from using its memset optimization, and thereby make our test invalid.

Heap allocations are probably even simpler to test than stack allocations. Let me see what I can come up with.

I've made the tests a little more sophisticated wrt stuff above; does this seem worth merging now?

I think that memwipe.c already has the required #define for OSX; does it look okay to you?

Yes, the #define looked good when I first reviewed the code. I'll review after finishing legacy/trac#13538 (moved).

Deferring; I have no evidence that the performance issue here actually matters.

Trac:
Milestone: Tor: 0.2.6.x-final to Tor: 0.2.7.x-final

actually, the openssl implementation is plenty fast, it turns out. The horrible kludgy implementation is only used when there isn't an asm one available, and there appear to be asm ones for most platforms.

I'm going to suggest we take the test for this in 0.2.7 though; tests are good.

Trac:
Milestone: Tor: 0.2.7.x-final to Tor: 0.2.???

Here's a patch that checks if the platform supports explicit_bzero(). Tested on OpenBSD.

PASS: src/test/test-memwipe

Trac:
Username: logan
Sponsor: N/A to N/A
Severity: N/A to Normal

Wrong patch sent. Wrong branch. Uploading in a few minutes.

Trac:
Username: logan

tor_explicit_bzero.diff

Trac:
Username: logan

Replying to logan:

Here's a patch that checks if the platform supports explicit_bzero(). Tested on OpenBSD.

PASS: src/test/test-memwipe

Replying to logan:

Here's a patch that checks if the platform supports explicit_bzero(). Tested on OpenBSD.

PASS: src/test/test-memwipe

There are multiple changes suggested in this ticket:

nickm's original branch better_memwipe attempted to avoid using OpenSSL_cleanse for performance reasons by checking if a simple memset worked. I was happy to see it merged, but we never did because there was no evidence that performance was an issue.

The attached explicit_bzero has better semantics than OpenSSL_cleanse, and is faster. Let's merge it.

As discussed on IRC, we can also use memset_s on platforms that support it (NetBSD, OS X):

So we can do the following:

if memset_s is supported:
- call memset_s to wipe the memory and set the bytes
if explicit_bzero is supported:
- call explicit_bzero to wipe the memory
- call memset to set the bytes
if SecureZeroMemory is supported: (Win32)
- call SecureZeroMemory to wipe the memory
- call memset to set the bytes
otherwise:
- call OpenSSL_cleanse to wipe the memory
- call memset to set the bytes

Edit: add Windows SecureZeroMemory

Win32 also apparently has memset_s.

Trac:
Status: needs_review to needs_revision

Trac:
Milestone: Tor: 0.2.??? to Tor: 0.2.8.x-final

Replying to teor:

Win32 also apparently has memset_s.

Windows has SecureZeroMemory in Windows.h. https://msdn.microsoft.com/en-us/library/windows/desktop/aa366877.aspx

As long as we build with headers that contain SecureZeroMemory, the code will work on any Windows version:

"This function is defined as the RtlSecureZeroMemory function (see WinBase.h). The implementation of RtlSecureZeroMemory is provided inline and can be used on any version of Windows (see WinNT.h.)"

for the above diff [https://trac.torproject.org/projects/tor/attachment/ticket/7419/memset_s_explicit_bzero.diff] , i forgot to add a description

uname -a Darwin selven@hackers.mu.local 15.2.0 Darwin Kernel Version 15.2.0: Fri Nov 13 19:56:56 PST 2015; root:xnu-3248.20.55~2/RELEASE_X86_64 x86_64

configured with: ./configure --with-openssl-dir=/opt/local/ --disable-asciidoc make check said: PASS: src/test/test PASS: src/test/test-slow PASS: src/test/test-memwipe PASS: src/test/test_workqueue PASS: src/test/test_keygen.sh PASS: src/test/test_zero_length_keys.sh SKIP: src/test/test_switch_id.sh PASS: src/test/test_ntor.sh PASS: src/test/test_bt.sh

run: Jan 03 xx:xx:xx.000 [notice] Bootstrapped 0%: Starting Jan 03 xx:xx:xx.000 [notice] Bootstrapped 80%: Connecting to the Tor network Jan 03 xx:xx:xx.000 [notice] Bootstrapped 85%: Finishing handshake with first hop Jan 03 xx:xx:xx.000 [notice] Bootstrapped 90%: Establishing a Tor circuit Jan 03 xx:xx:xx.000 [notice] Tor has successfully opened a circuit. Looks like client functionality is working. Jan 03 xx:xx:xx.000 [notice] Bootstrapped 100%: Done

Happy new year Thanks, Selven

Trac:
Username: selven@hackers.mu

Tested Selven's patch on OpenBSD. Detects explicit_bzero, and uses it.

Trac:
Username: logan

Replying to selven@…:

for the above diff [https://trac.torproject.org/projects/tor/attachment/ticket/7419/memset_s_explicit_bzero.diff] , i forgot to add a description

Thanks for the patch and the testing!

It's a straightforward patch that works on later versions of OpenBSD, NetBSD, and OS X. (Using memset_s(..., ..., 0, ...) then memset(..., ..., byte, ...) is redundant, but I'm ok with that. It makes our choice between alternative functions clearer, which is important in security-critical code.)

Let's get this merged, then close this ticket. https://trac.torproject.org/projects/tor/attachment/ticket/7419/memset_s_explicit_bzero.diff

(I've split off legacy/trac#17986 (moved) for using SecureZeroMemory on Windows.) (Removing nickm-patch because the latest patch is not by nickm.)

Trac:
Keywords: nickm-patch deleted, N/A added
Status: needs_revision to needs_review

Awaiting a revision to the changes file.

Trac:
Status: needs_review to needs_revision

This is going to require a manual merge with legacy/trac#17986 (moved) that preserves all the different cases.

Trac:
Username: selven@hackers.mu

memset_s_explicit_bzero.diff

new diff, with changes files pointing to this ticket added.

Trac:
Username: selven@hackers.mu

Merged this.; thanks!

Trac:
Status: needs_revision to closed
Resolution: N/A to implemented

closed

mentioned in issue legacy/trac#15377 (moved)

mentioned in issue legacy/trac#17986 (moved)

moved from legacy/trac#7419 (moved)

added Feature label and removed 1 deleted label

added Relay label and removed 1 deleted label

Choose a faster memwipe implementation

Child items 0

Activity