Skip to content

GitLab

  • Projects
  • Groups
  • Snippets
  • Help
    • Loading...
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in
Trac
Trac
  • Project overview
    • Project overview
    • Details
    • Activity
  • Issues 246
    • Issues 246
    • List
    • Boards
    • Labels
    • Service Desk
    • Milestones
  • Operations
    • Operations
    • Metrics
    • Incidents
  • Analytics
    • Analytics
    • Value Stream
  • Wiki
    • Wiki
  • Members
    • Members
  • Collapse sidebar
  • Activity
  • Create a new issue
  • Issue Boards

GitLab is used only for code review, issue tracking and project management. Canonical locations for source code are still https://gitweb.torproject.org/ https://git.torproject.org/ and git-rw.torproject.org.

  • Legacy
  • TracTrac
  • Issues
  • #8815

Closed
Open
Opened May 02, 2013 by wfn@wfn

Stem's DescriptorReader should handle relative paths in processed files when given a target with a relative path

A bugfix for DescriptorReader._handle_file() when (one of the) target(s) descriptor directory is given by a relative path. Need to make sure it is an absolute path when comparing to the (always absolute) paths in _processed_files. Please find the linked commit and attached git diff.

A (probably unnecessarily) longer explanation: when stem.descriptor.reader.DescriptorReader is initialized with a relative path for a target, e.g.:

from stem.descriptor.reader import DescriptorReader
reader = DescriptorReader(['server-descriptors'], persistence_path='./used_desc')

The DescriptorReader._handle_file() method (which is used when the reader is accessed as an iterator, etc.) will skip over the loaded _processed_files, because the check for a given file (as 'target', which will be a relative path) will mismatch the one in the processed files dictionary (as '_processed_files', where the paths are always absolute) - stem/descriptor/reader.py, line 462, which attempts to get the 'previously last used' timestamp for a given target file:

last_used = self._processed_files.get(target)

Here, 'target' would in our example something of the following kind:

'server-descriptors/402619c25024fb360f88992437242b8938b99e5d'

However in _processed_files (and in the 'used_desc' file), the corresponding key would be e.g.

'/home/kostas/priv/tordev/data/recent/relay-descriptors/server-descriptors/402619c25024fb360f88992437242b8938b99e5d'

We need to make 'target' always be an absolute path to avoid this kind of issue, and also to make sure that our 'new_processed_files' (to be used when e.g. the iterator is to be called again, i.e. when e.g. we want to re-iterate over our reader to see if anything new came up) also stores absolute paths.

Here is a link to a commit that makes sure the relevant paths are always absolute: https://github.com/wfn/stem/commit/18a92836fac436b7fdd7f5d3ab10786f55b82c99

Ran Stem unit tests incl. for reader.py just in case, all good.

Attached please also find a sample script which makes use of this functionality by supplying a relative path to DescriptorReader, just in case. (I rsync'd 'relay-descriptors' in 'recent' for my Stem experiments.) See attached sample_output.txt

I'm also attaching a git diff output (git diff 1773ebaab470206653ce6d84c3ef1276f81c5d0a , last commit in git.torproject.org/stem.git) just in case.

To upload designs, you'll need to enable LFS and have admin enable hashed storage. More information
Assignee
Assign to
None
Milestone
None
Assign milestone
Time tracking
None
Due date
None
Reference: legacy/trac#8815