Implement verification of server descriptor

added component::archived/stem descriptors owner::reganeet priority::medium reporter::reganeet resolution::implemented status::closed type::enhancement labels

Hi Beck. Looks like the first step will be to come up with a counterpart for Karsten's determineKeyHash() function...

https://gitweb.torproject.org/metrics-tasks.git/blob/HEAD:/task-2768/VerifyDescriptors.java#l269

From the dir-spec...

A fingerprint (a HASH_LEN-byte of asn1 encoded public key, encoded in hex, with a single space after every 4 characters) for this router's identity key. A descriptor is considered invalid (and MUST be rejected) if the fingerprint line does not match the public key.

I didn't realize that there was a 'MUST' clause here. We should check is_valid() in the server descriptor constructor when validate is True, and raise a ValueError if it's invalid. Note that this will break a few integ tests since I've messed with some of the data in the descriptor data directory to make the tests more interesting...

https://gitweb.torproject.org/stem.git/tree/HEAD:/test/integ/descriptor/data

We should swap out the bad test data with real instances when we come across it.

The problem with M2Crypto is that it requires SSL_v2 support from openssl, which is considered unsafe thus excluded from recent Ubuntu releases, and possibly Debian [4].

Do we need the ssl v2 support? As the post mentioned the module itself is available on Ubuntu...

atagar@morrigan:~$ lsb_release -sd
Ubuntu 11.04

atagar@morrigan:~$ sudo apt-get install m2crypto
Note, selecting 'python-m2crypto' instead of 'm2crypto'
The following NEW packages will be installed:
  python-m2crypto
0 upgraded, 1 newly installed, 0 to remove and 108 not upgraded.
Need to get 277 kB of archives.
...

I don't know how many people run Tor in Ubuntu

Lots, including me. :P

Cheers! -Damian

Thanks Damian. It turns out the M2Crypto package downloaded from their website does not work with Ubuntu, but the distribution in Ubuntu's repository is good.

However, after playing with it for several hours, I found out that M2Crypto only support PEM format keys in X.509 standard but not in PKCS, and the public keys in server descriptors are encoded in PKCS. They have slightly different headers: X.509 keys starts with "-----BEGIN PUBLIC KEY-----", while PKCS keys starts with "-----BEGIN RSA PUBLIC KEY-----". The content is also represented in different ways, so simply changing the header won't work [1].

>> from M2Crypto import RSA, BIO
>> bio = BIO.MemoryBuffer(descriptor.signing_key)
>> rsa = RSA.load_pub_key_bio(bio)
M2Crypto.RSA.RSAError: no start line

I'm looking for a substitute of M2Crypto now.

[1] http://www.cryptosys.net/pki/rsakeyformats.html

Trac:
Username: reganeet

The first verification is done by using python-rsa, but I have some difficulties implementing the second part. According to the Java code, what we should do is: 1. Read the signing key from the descriptor; 2. Get the signature from the descriptor, filter out the header and footer, and do a base64 decode to get the signature bytes; 3. Decrypt the signature bytes with the signing key and remove the PKCS1 padding to get the original message; 4. Encode the message in hex and compare it to the digest of the descriptor.

I'm done the first two parts and checked they were correct. The hard part is 3. I'm trying to do it with the verify() method in python-rsa [1]. However, I always get a ValidationError. I'm diving into the code of verify() method and trying to decrypt the signature bytes step by step, and it seems that the decrypted message does not start with the signature marker of PKCS1 padding, which should be '\x00\x01'.

Going to dive deeper tomorrow...

[1] https://bitbucket.org/sybren/python-rsa/src/5d834ee3e7e5/rsa/pkcs1.py

Trac:
Username: reganeet

The first verification is done by using python-rsa

Fantastic! Should we move forward with code reviewing and merging this part? It should probably have a unit test or two.

Fantastic! Should we move forward with code reviewing and merging this part? It should probably have a unit test or two. Sure. Do you want me to write the unit tests? I've also renamed is_valid() to validate(), and instead of returning a boolean, it raises a ValueError if the validation fails, since we MUST perform this validation before accepting a descriptor. Do you think this makes sense?

I cracked the Java crypto library code today, and printed out everything in the decryption process to find what was wrong. It seems that python-rsa uses a different way (maybe non-standard) to transform between octet strings and integers and incompatible with our signature. Things got much more subtle here, and I'm afraid I can't go on. If anyone else would like to continue this work, I'm more than happy to share what I've got with him.

Trac:
Username: reganeet

Sure. Do you want me to write the unit tests?

Yup. Let me know if you want any help with it, ideally it'll be for the happy case and all of the edge cases you can think of (minimal descriptor content, malformed signature, etc). Really try to break your change via the tests, that's how we best get rigorous testing. :)

I've also renamed is_valid() to validate(), and instead of returning a boolean, it raises a ValueError if the validation fails, since we MUST perform this validation before accepting a descriptor. Do you think this makes sense?

I don't think that's necessary, the constructor can simply do...

if validate and not self.is_valid():
  raise ValueError("yo, something bad is going on")

The constructor has a validate flag in case they want to accept malformed data so also calling this function 'validate' would be confusing. However, maybe we should come up with a more descriptive name than is_valid()? Would is_signature_valid() be better?

If anyone else would like to continue this work, I'm more than happy to share what I've got with him.

Yup, please add all of the details you think would be helpful for future people trying to figure this out to this ticket. Also, lets add that task to the dev wiki... https://trac.torproject.org/projects/tor/wiki/doc/stem

Thanks! -Damian

The constructor has a validate flag in case they want to accept malformed data so also calling this function 'validate' would be confusing. Right, I didn't think of that.

However, maybe we should come up with a more descriptive name than is_valid()? Would is_signature_valid() be better? Well, I prefer is_valid() for its conciseness. :)

Yup, please add all of the details you think would be helpful for future people trying to figure this out to this ticket. Nick's conversation with me contains some useful details: https://lists.torproject.org/pipermail/tor-dev/2012-May/003544.html The key is to find a crypto library that implements the pkcs1 signature verification process (part 3). IMHO, pyCrypto and M2Crypto don't have it, and python-rsa verify() function only works with the signature created by its sign() function.

Also, lets add that task to the dev wiki... Done.

Trac:
Username: reganeet

Hi Damian,

I've written the unit test and pushed into my repo: https://github.com/beckbaibai/stem

In the test file for server descriptor, I added the correct fingerprint into RELAY_DESCRIPTOR_ATTR, and test_minimal_relay_descriptor() will now check if it has been read correctly. Another test case, test_fingerprint_malformed(), is added to make sure the construction will fail if the fingerprint does not match the signing key.

Notice that after the second part of the validation is implemented, testing would become more complicated, because the signature has to match the hash of the content, and signed with descriptor's signing key (the private one).

I think I've merged my branch with yours in the right way...but please double check since I'm still not comfortable with git merge.

Thanks! Beck

Trac:
Username: reganeet

I've written the unit test and pushed into my repo...

Fantastic! Don't worry about merging. When doing development what you'll generally want to do is...

switch to your master branch and make sure it's up to date

git checkout master git pull origin

make a new branch where you implement your feature

git checkout -b my_spiffy_new_feature

when you're done again make sure your master branch its up to date

git fetch origin

this will update your feature branch to be off of stem's current master

git rebase origin/master

pushes into your repository

git push my_remote

Note that these commands will vary based on what you're calling your remotes. If you're new to git then I'd suggest... http://progit.org/book/

Personally I found it to be very well written. There's no need to do a merge (though it doesn't hurt). I can then easily cherry-pick or merge your changes.

All this said though we have a couple issue that needs to be addressed before I can look at your branch...

+"""^M +Parsing for Tor server descriptors, which contains the infrequently changing^M +information about a Tor relay (contact information, exit policy, public keys,^M ...

See the ^M at the end of the lines? Those are Windows newlines. Whatever editor you're using are introducing them. This causes a couple issues...

A simple diff claims that every line is changed. I can tell diff to ignore whitespace, but it's still annoying.
Mixed newlines can confuse a lot of tools. They're very bad to introduce into code.

So we need to both strip these out and figure out a method for you to edit files without introducing them. This is a common gotcha so I'm sure google will have some good suggestions.

For my part I should make the whitespace checker look for these. Also, we need to make non-builtin dependencies optional. Otherwise users will get...

atagar@morrigan:~/Desktop/stem$ ./run_tests.py --unit
Traceback (most recent call last):
  File "./run_tests.py", line 23, in <module>
    import test.unit.descriptor.server_descriptor
  File "/home/atagar/Desktop/stem/test/unit/descriptor/server_descriptor.py", line 9, in <module>
    import stem.descriptor.server_descriptor
  File "/home/atagar/Desktop/stem/stem/descriptor/server_descriptor.py", line 35, in <module>
    import rsa
ImportError: No module named rsa

I'm not quite sure how we should handle this, logging a warning once if they try to verify without the module or something else...

Cheers! -Damian

Uggg, stupid trac. Forgot that it misinterprets the carrot character. I meant...

+"""^M
+Parsing for Tor server descriptors, which contains the infrequently changing^M
+information about a Tor relay (contact information, exit policy, public keys,^M
...

So we need to both strip these out and figure out a method for you to edit files without introducing them.

That's weird, since I think I've removed all the CRs, and in my emacs/vim there is no CR remaining in server_descriptors.py.

I'm not quite sure how we should handle this, logging a warning once if they try to verify without the module or something else...

What about in RelayDescriptor.init:

if validate: 
  try:
    if not self.is_valid():
      raise ValueError(self.nickname + " is not a valid server descriptor!")
  except ImportError:
    print("Unable to validate. Please install python-rsa module first.")

Trac:
Username: reganeet

Pushed an adaptation of your change - thanks for the patch! https://gitweb.torproject.org/stem.git/commitdiff/40a971105ff162d0caa9e0a1c5fa3c0f7d3c2b72

Changes included...

Making the rsa module import optional. Not only is it not a builtin, but it also isn't packaged for most platforms. Hence when we make a deb and rpm we won't be able to have a stated dependency on it, which in turn means that this won't function for most users. However, as the commit message mentions I looked around for alternatives and found them lacking so guess we'll need to go with this and hope that the python-rsa folks someday get their act together and package their library.
The fingerprint attribute is optional (in the wild descriptors may not have it), so accounted for this in the change.

That's weird, since I think I've removed all the CRs, and in my emacs/vim there is no CR remaining in server_descriptors.py.

If you run "git show d793809" then you should see the issue, where the whole file appears to be deleted and then re-added with the altered newlines. I ended up needing to run 'git diff --ignore-space-at-eol HEAD^' and then re-apply your change by hand to the current master. Not the end of the world, but it took a bit of work.

Again, thanks for the patch! Leaving this ticket open for the remaining validation implementation.

Trac:
Keywords: N/A deleted, descriptors added

I took a look at the problems in this ticket..

I've made an initial fix here. https://github.com/eoinof/stem/commits/stem-trac-5810

It makes the code work 'properly' but does not fix the now broken unit tests that relied on invalid relay descriptor data. I implemented the signature verification both with the python-rsa library[default] and also with the python-crypto library though I had to write some custom code in both cases as neither library appears to do exactly what was needed.

I can look into 'fixing' the unit tests insofar as necessary. This is an example of the kind of hard coding that no longer passes.

Traceback:
File "/home/eoin/stem/test/integ/descriptor/server_descriptor.py", line 89, in test_metrics_descriptor
self.assertEquals("2C7B27BEAB04B4E2459D89CA6D5CD1CC5F95A689", desc.digest())

AssertionError: '2C7B27BEAB04B4E2459D89CA6D5CD1CC5F95A689' != ",{'\xbe\xab\x04\xb4\xe2E\x9d\x89\xcam\\\xd1\xcc_\x95\xa6\x89"

Trac:
Username: eoinof

Hi Eoin. As you mentioned this breaks the unit tests pretty badly. You're completely right that mocking.get_relay_server_descriptor() provides invalid data according to these integrity checks, but it does so somewhat on purpose. The get_relay_server_descriptor() function aims to...

provide a minimal server descriptor by default that only has mandatory arguments
allow the caller to get a custom descriptor by providing additional entries

It would be a pita to then make sure that our content always matches our signature. Luckily it's also not necessary - we can mock is_valid() or other validation functions to always say "the descriptor is ok" in the unit tests.

I'm not entirely clear what's happening in 0d433b5 but if this lets us check descriptor validity without the python-rsa module then that would be fantastic! Is the Crypto.Util module a builtin and available in python 2.5? If so then do you think we can drop the python-rsa usage?

My original motivation here was to get rid of the python-rsa usage.. so that's certainly an option. I think quite a few of the tests can be fixed by turning off validation, and the others can probably be fixed by replacing the hardcoded values in test functions with references to the mocking code: e.g. to fix test/unit/tutorial.py

-        expected_line = "%i. speedyexit (102.13 KB/s)" % count
+        expected_line = "%i. speedyexit (%s/s)" % (count, str_tools.get_size_label(int(mocking.ACTUAL_OBSERVED_BANDWIDTH) ,2))

Trac:
Username: eoinof

I'll take a look at what exactly the regular python crypto libs provide as well.

Trac:
Username: eoinof

My original motivation here was to get rid of the python-rsa usage.. so that's certainly an option.

Fantastic! Just to be sure, we're going with a derivation of your commit 0d433b5 but dropping the other two, right?

e.g. to fix test/unit/tutorial.py

On a side note the tutorial unit tests I wrote are awful. My aim was to test the code snippets on... https://stem.readthedocs.org/en/latest/tutorial.html

... and in a roundabout way they do. However, the tests look almost nothing like the examples, and the heavy mocking makes those tests pretty hard to read. If you have a better idea for testing the examples then I'd welcome it. :)

Of the changes I proposed

This whitespace problem is a bugfix. [I've split it out from the updated descriptor] https://github.com/eoinof/stem/commit/c9663bf5f2639fee7867f418ddeadd614f4e4cdb

The python-crypto lib is not builtin but it is included in ubuntu & debian by default.

I've added more info on the signature verification in the code comments this time, and you can read more about it here: http://www.ietf.org/rfc/rfc2313.txt

https://github.com/eoinof/stem/commit/269587c646e7841d0c6fee914e2a8deabbe89fe5 I've ignored the verification failure so all unit tests will still pass.

Trac:
Username: eoinof

Implement verification of server descriptor

Child items ...

Activity

switch to your master branch and make sure it's up to date

make a new branch where you implement your feature

when you're done again make sure your master branch its up to date

this will update your feature branch to be off of stem's current master

pushes into your repository