Skip to content
GitLab
Projects Groups Snippets
  • /
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in
  • Trac Trac
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Issues 246
    • Issues 246
    • List
    • Boards
    • Service Desk
    • Milestones
  • Monitor
    • Monitor
    • Metrics
    • Incidents
  • Analytics
    • Analytics
    • Value stream
  • Wiki
    • Wiki
  • Activity
  • Create a new issue
  • Issue Boards
Collapse sidebar
  • Legacy
  • TracTrac
  • Issues
  • #8815
Closed
Open
Issue created May 02, 2013 by wfn@wfn

Stem's DescriptorReader should handle relative paths in processed files when given a target with a relative path

A bugfix for DescriptorReader._handle_file() when (one of the) target(s) descriptor directory is given by a relative path. Need to make sure it is an absolute path when comparing to the (always absolute) paths in _processed_files. Please find the linked commit and attached git diff.

A (probably unnecessarily) longer explanation: when stem.descriptor.reader.DescriptorReader is initialized with a relative path for a target, e.g.:

from stem.descriptor.reader import DescriptorReader
reader = DescriptorReader(['server-descriptors'], persistence_path='./used_desc')

The DescriptorReader._handle_file() method (which is used when the reader is accessed as an iterator, etc.) will skip over the loaded _processed_files, because the check for a given file (as 'target', which will be a relative path) will mismatch the one in the processed files dictionary (as '_processed_files', where the paths are always absolute) - stem/descriptor/reader.py, line 462, which attempts to get the 'previously last used' timestamp for a given target file:

last_used = self._processed_files.get(target)

Here, 'target' would in our example something of the following kind:

'server-descriptors/402619c25024fb360f88992437242b8938b99e5d'

However in _processed_files (and in the 'used_desc' file), the corresponding key would be e.g.

'/home/kostas/priv/tordev/data/recent/relay-descriptors/server-descriptors/402619c25024fb360f88992437242b8938b99e5d'

We need to make 'target' always be an absolute path to avoid this kind of issue, and also to make sure that our 'new_processed_files' (to be used when e.g. the iterator is to be called again, i.e. when e.g. we want to re-iterate over our reader to see if anything new came up) also stores absolute paths.

Here is a link to a commit that makes sure the relevant paths are always absolute: https://github.com/wfn/stem/commit/18a92836fac436b7fdd7f5d3ab10786f55b82c99

Ran Stem unit tests incl. for reader.py just in case, all good.

Attached please also find a sample script which makes use of this functionality by supplying a relative path to DescriptorReader, just in case. (I rsync'd 'relay-descriptors' in 'recent' for my Stem experiments.) See attached sample_output.txt

I'm also attaching a git diff output (git diff 1773ebaab470206653ce6d84c3ef1276f81c5d0a , last commit in git.torproject.org/stem.git) just in case.

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Assignee
Assign to
Time tracking