This bug only occurs with ./configure --enable-fragile-hardening on my system. So it may be a tor/stem race condition bug. (legacy/trac#29437 (moved) is a similar bug, we may need legacy/trac#30901 (moved) to debug this kind of race condition.)
It looks like this timing issue was introduced in the legacy/trac#30984 (moved) refactor, perhaps in commit c744d23c. (At least on my machine.)
Tor doesn't guarantee control reply timing. And we're unlikely to be able to restore the old timing behaviour. So stem's tests should be adapted to work with the timing in both Tor 0.4.2 and Tor master.
It looks like this timing issue was introduced in the legacy/trac#30984 (moved) refactor, perhaps in commit c744d23c. (At least on my machine.)
Tor doesn't guarantee control reply timing. And we're unlikely to be able to restore the old timing behaviour. So stem's tests should be adapted to work with the timing in both Tor 0.4.2 and Tor master.
I'm not sure what that commit has to do with TAKEOWNERSHIP. It seems to be about GETCONF instead. Are you suggesting that a change to the timing or formatting of GETCONF is causing a specific stem test to consistently fail?
Taylor and I have been investigating this and here is what we found:
The integ/process.py code is doing this test to see whether Tor is running:
if tor_process.poll() == 0: return # tor exited
This is calling the poll method of a subprocess.Popen() object, which only returns 0 when the process exits with an exitcode of 0. If Tor exits with any other exit code, it will return something else.
In this case, I found that Tor was actually exiting with a SIGPIPE, because of this chain of events:
stderr had been closed by stem.
There was a memory leak (legacy/trac#33039 (moved)), and so LeakSanitizer was trying to write to stderr.
LeakSanitizer couldn't write to stderr (because it was closed), and so it got a SIGPIPE.
We didn't notice this at the time because there was nothing to tell us that the bug had actually occurred.
I've opened a pull request against stem so that it gives a more accurate message if Tor fails during theses tests: https://github.com/torproject/stem/pull/54 . I hope it's in the right place. (I did not find any other cases where stem was using the poll()==0 pattern.)
We should find some way to make it so that when stem is running its tests, it does not close Tor's stderr, but rather reports stderr output as a test failure. This will make it likelier that we will notice LeakSanitizer failures in the future.
We should find some way to make it so that when stem is running its tests, it does not close Tor's stderr, but rather reports stderr output as a test failure. This will make it likelier that we will notice LeakSanitizer failures in the future.