Stem's DescriptorReader should handle relative paths in processed files when given a target with a relative path
A bugfix for DescriptorReader._handle_file() when (one of the) target(s) descriptor directory is given by a relative path. Need to make sure it is an absolute path when comparing to the (always absolute) paths in _processed_files. Please find the linked commit and attached git diff.
A (probably unnecessarily) longer explanation: when stem.descriptor.reader.DescriptorReader is initialized with a relative path for a target, e.g.:
from stem.descriptor.reader import DescriptorReader
reader = DescriptorReader(['server-descriptors'], persistence_path='./used_desc')
The DescriptorReader._handle_file() method (which is used when the reader is accessed as an iterator, etc.) will skip over the loaded _processed_files, because the check for a given file (as 'target', which will be a relative path) will mismatch the one in the processed files dictionary (as '_processed_files', where the paths are always absolute) - stem/descriptor/reader.py, line 462, which attempts to get the 'previously last used' timestamp for a given target file:
last_used = self._processed_files.get(target)
Here, 'target' would in our example something of the following kind:
'server-descriptors/402619c25024fb360f88992437242b8938b99e5d'
However in _processed_files (and in the 'used_desc' file), the corresponding key would be e.g.
'/home/kostas/priv/tordev/data/recent/relay-descriptors/server-descriptors/402619c25024fb360f88992437242b8938b99e5d'
We need to make 'target' always be an absolute path to avoid this kind of issue, and also to make sure that our 'new_processed_files' (to be used when e.g. the iterator is to be called again, i.e. when e.g. we want to re-iterate over our reader to see if anything new came up) also stores absolute paths.
Here is a link to a commit that makes sure the relevant paths are always absolute: https://github.com/wfn/stem/commit/18a92836fac436b7fdd7f5d3ab10786f55b82c99
Ran Stem unit tests incl. for reader.py just in case, all good.
Attached please also find a sample script which makes use of this functionality by supplying a relative path to DescriptorReader, just in case. (I rsync'd 'relay-descriptors' in 'recent' for my Stem experiments.) See attached sample_output.txt
I'm also attaching a git diff output (git diff 1773ebaab470206653ce6d84c3ef1276f81c5d0a , last commit in git.torproject.org/stem.git) just in case.