We need to implement is_valid() method of stem.descriptor.server_descriptor.RelayDescriptor ![1] , to do some verifications on the descriptor:
a contained fingerprint is actually a hash of the signing key and
a router signature was created using the signing key.
There's already Java code for doing this in metrics-tasks 2. However, the Java code is a standalone test, while stem's implementation is self-contained within the descriptor.
We need some ssl library to read the pem-format keys in descriptors, and M2Crypto seems to be the best choice 3. The problem with M2Crypto is that it requires SSL_v2 support from openssl, which is considered unsafe thus excluded from recent Ubuntu releases, and possibly Debian 4. I don't know how many people run Tor in Ubuntu, and whether we should let users responsible for having a complete openssl library. It seems quite hard to work this around on Ubuntu .
A fingerprint (a HASH_LEN-byte of asn1 encoded public key, encoded in
hex, with a single space after every 4 characters) for this router's
identity key. A descriptor is considered invalid (and MUST be
rejected) if the fingerprint line does not match the public key.
I didn't realize that there was a 'MUST' clause here. We should check is_valid() in the server descriptor constructor when validate is True, and raise a ValueError if it's invalid. Note that this will break a few integ tests since I've messed with some of the data in the descriptor data directory to make the tests more interesting...
We should swap out the bad test data with real instances when we come across it.
The problem with M2Crypto is that it requires SSL_v2 support from openssl, which is considered unsafe thus excluded from recent Ubuntu releases, and possibly Debian [4].
Do we need the ssl v2 support? As the post mentioned the module itself is available on Ubuntu...
atagar@morrigan:~$ lsb_release -sdUbuntu 11.04atagar@morrigan:~$ sudo apt-get install m2cryptoNote, selecting 'python-m2crypto' instead of 'm2crypto'The following NEW packages will be installed: python-m2crypto0 upgraded, 1 newly installed, 0 to remove and 108 not upgraded.Need to get 277 kB of archives....
Thanks Damian. It turns out the M2Crypto package downloaded from their website does not work with Ubuntu, but the distribution in Ubuntu's repository is good.
However, after playing with it for several hours, I found out that M2Crypto only support PEM format keys in X.509 standard but not in PKCS, and the public keys in server descriptors are encoded in PKCS. They have slightly different headers: X.509 keys starts with "-----BEGIN PUBLIC KEY-----", while PKCS keys starts with "-----BEGIN RSA PUBLIC KEY-----". The content is also represented in different ways, so simply changing the header won't work [1].
>> from M2Crypto import RSA, BIO>> bio = BIO.MemoryBuffer(descriptor.signing_key)>> rsa = RSA.load_pub_key_bio(bio)M2Crypto.RSA.RSAError: no start line
The first verification is done by using python-rsa, but I have some difficulties implementing the second part. According to the Java code, what we should do is:
1. Read the signing key from the descriptor;
2. Get the signature from the descriptor, filter out the header and footer, and
do a base64 decode to get the signature bytes;
3. Decrypt the signature bytes with the signing key and remove the PKCS1
padding to get the original message;
4. Encode the message in hex and compare it to the digest of the descriptor.
I'm done the first two parts and checked they were correct. The hard part is 3. I'm trying to do it with the verify() method in python-rsa [1]. However, I always get a
ValidationError. I'm diving into the code of verify() method and trying to decrypt the signature bytes step by step, and it seems that the decrypted message does not start with the signature marker of PKCS1 padding, which should be '\x00\x01'.
Fantastic! Should we move forward with code reviewing and merging this part? It should probably have a unit test or two.
Sure. Do you want me to write the unit tests?
I've also renamed is_valid() to validate(), and instead of returning a boolean, it raises a ValueError if the validation fails, since we MUST perform this validation before accepting a descriptor. Do you think this makes sense?
I cracked the Java crypto library code today, and printed out everything in the decryption process to find what was wrong. It seems that python-rsa uses a different way (maybe non-standard) to transform between octet strings and integers and incompatible with our signature. Things got much more subtle here, and I'm afraid I can't go on. If anyone else would like to continue this work, I'm more than happy to share what I've got with him.
Yup. Let me know if you want any help with it, ideally it'll be for the happy case and all of the edge cases you can think of (minimal descriptor content, malformed signature, etc). Really try to break your change via the tests, that's how we best get rigorous testing. :)
I've also renamed is_valid() to validate(), and instead of returning a boolean, it raises a ValueError if the validation fails, since we MUST perform this validation before accepting a descriptor. Do you think this makes sense?
I don't think that's necessary, the constructor can simply do...
if validate and not self.is_valid(): raise ValueError("yo, something bad is going on")
The constructor has a validate flag in case they want to accept malformed data so also calling this function 'validate' would be confusing. However, maybe we should come up with a more descriptive name than is_valid()? Would is_signature_valid() be better?
If anyone else would like to continue this work, I'm more than happy to share what I've got with him.
Yup, please add all of the details you think would be helpful for future people trying to figure this out to this ticket. Also, lets add that task to the dev wiki...
https://trac.torproject.org/projects/tor/wiki/doc/stem
The constructor has a validate flag in case they want to accept malformed data so also calling this function 'validate' would be confusing.
Right, I didn't think of that.
However, maybe we should come up with a more descriptive name than is_valid()? Would is_signature_valid() be better?
Well, I prefer is_valid() for its conciseness. :)
Yup, please add all of the details you think would be helpful for future people trying to figure this out to this ticket.
Nick's conversation with me contains some useful details:
https://lists.torproject.org/pipermail/tor-dev/2012-May/003544.html
The key is to find a crypto library that implements the pkcs1 signature verification process (part 3). IMHO, pyCrypto and M2Crypto don't have it, and python-rsa verify() function only works with the signature created by its sign() function.
In the test file for server descriptor, I added the correct fingerprint into RELAY_DESCRIPTOR_ATTR, and test_minimal_relay_descriptor() will now check if it has been read correctly. Another test case, test_fingerprint_malformed(), is added to make sure the construction will fail if the fingerprint does not match the signing key.
Notice that after the second part of the validation is implemented, testing would become more complicated, because the signature has to match the hash of the content, and signed with descriptor's signing key (the private one).
I think I've merged my branch with yours in the right way...but please double check since I'm still not comfortable with git merge.
I've written the unit test and pushed into my repo...
Fantastic! Don't worry about merging. When doing development what you'll generally want to do is...
switch to your master branch and make sure it's up to date
git checkout master
git pull origin
make a new branch where you implement your feature
git checkout -b my_spiffy_new_feature
when you're done again make sure your master branch its up to date
git fetch origin
this will update your feature branch to be off of stem's current master
git rebase origin/master
pushes into your repository
git push my_remote
Note that these commands will vary based on what you're calling your remotes. If you're new to git then I'd suggest...
http://progit.org/book/
Personally I found it to be very well written. There's no need to do a merge (though it doesn't hurt). I can then easily cherry-pick or merge your changes.
All this said though we have a couple issue that needs to be addressed before I can look at your branch...
+"""^M
+Parsing for Tor server descriptors, which contains the infrequently changing^M
+information about a Tor relay (contact information, exit policy, public keys,^M
...
See the ^M at the end of the lines? Those are Windows newlines. Whatever editor you're using are introducing them. This causes a couple issues...
A simple diff claims that every line is changed. I can tell diff to ignore whitespace, but it's still annoying.
Mixed newlines can confuse a lot of tools. They're very bad to introduce into code.
So we need to both strip these out and figure out a method for you to edit files without introducing them. This is a common gotcha so I'm sure google will have some good suggestions.
For my part I should make the whitespace checker look for these. Also, we need to make non-builtin dependencies optional. Otherwise users will get...
atagar@morrigan:~/Desktop/stem$ ./run_tests.py --unitTraceback (most recent call last): File "./run_tests.py", line 23, in <module> import test.unit.descriptor.server_descriptor File "/home/atagar/Desktop/stem/test/unit/descriptor/server_descriptor.py", line 9, in <module> import stem.descriptor.server_descriptor File "/home/atagar/Desktop/stem/stem/descriptor/server_descriptor.py", line 35, in <module> import rsaImportError: No module named rsa
I'm not quite sure how we should handle this, logging a warning once if they try to verify without the module or something else...
Uggg, stupid trac. Forgot that it misinterprets the carrot character. I meant...
+"""^M+Parsing for Tor server descriptors, which contains the infrequently changing^M+information about a Tor relay (contact information, exit policy, public keys,^M...
So we need to both strip these out and figure out a method for you to edit files without introducing them.
That's weird, since I think I've removed all the CRs, and in my emacs/vim there is no CR remaining in server_descriptors.py.
I'm not quite sure how we should handle this, logging a warning once if they try to verify without the module or something else...
What about in RelayDescriptor.init:
if validate: try: if not self.is_valid(): raise ValueError(self.nickname + " is not a valid server descriptor!") except ImportError: print("Unable to validate. Please install python-rsa module first.")
Making the rsa module import optional. Not only is it not a builtin, but it also isn't packaged for most platforms. Hence when we make a deb and rpm we won't be able to have a stated dependency on it, which in turn means that this won't function for most users. However, as the commit message mentions I looked around for alternatives and found them lacking so guess we'll need to go with this and hope that the python-rsa folks someday get their act together and package their library.
The fingerprint attribute is optional (in the wild descriptors may not have it), so accounted for this in the change.
That's weird, since I think I've removed all the CRs, and in my emacs/vim there is no CR remaining in server_descriptors.py.
If you run "git show d793809" then you should see the issue, where the whole file appears to be deleted and then re-added with the altered newlines. I ended up needing to run 'git diff --ignore-space-at-eol HEAD^' and then re-apply your change by hand to the current master. Not the end of the world, but it took a bit of work.
Again, thanks for the patch! Leaving this ticket open for the remaining validation implementation.
It makes the code work 'properly' but does not fix the now
broken unit tests that relied on invalid relay descriptor data.
I implemented the signature verification both with the
python-rsa library[default] and also with the python-crypto library
though I had to write some custom code in both cases as neither
library appears to do exactly what was needed.
I can look into 'fixing' the unit tests insofar as necessary.
This is an example of the kind of hard coding that no longer passes.
Traceback:File "/home/eoin/stem/test/integ/descriptor/server_descriptor.py", line 89, in test_metrics_descriptorself.assertEquals("2C7B27BEAB04B4E2459D89CA6D5CD1CC5F95A689", desc.digest())AssertionError: '2C7B27BEAB04B4E2459D89CA6D5CD1CC5F95A689' != ",{'\xbe\xab\x04\xb4\xe2E\x9d\x89\xcam\\\xd1\xcc_\x95\xa6\x89"
Hi Eoin. As you mentioned this breaks the unit tests pretty badly. You're completely right that mocking.get_relay_server_descriptor() provides invalid data according to these integrity checks, but it does so somewhat on purpose. The get_relay_server_descriptor() function aims to...
provide a minimal server descriptor by default that only has mandatory arguments
allow the caller to get a custom descriptor by providing additional entries
It would be a pita to then make sure that our content always matches our signature. Luckily it's also not necessary - we can mock is_valid() or other validation functions to always say "the descriptor is ok" in the unit tests.
I'm not entirely clear what's happening in 0d433b5 but if this lets us check descriptor validity without the python-rsa module then that would be fantastic! Is the Crypto.Util module a builtin and available in python 2.5? If so then do you think we can drop the python-rsa usage?
My original motivation here was to get rid of the python-rsa usage.. so that's certainly an option.
I think quite a few of the tests can be fixed by turning off validation, and the others can probably be fixed by replacing the hardcoded values in test functions with references to the mocking code:
e.g. to fix test/unit/tutorial.py
... and in a roundabout way they do. However, the tests look almost nothing like the examples, and the heavy mocking makes those tests pretty hard to read. If you have a better idea for testing the examples then I'd welcome it. :)
The python-crypto lib is not builtin but it is included in ubuntu & debian by default.
I've added more info on the signature verification in the code comments this time, and you can read more about it here:
http://www.ietf.org/rfc/rfc2313.txt