1. 20 Jun, 2017 2 commits
    • Karsten Loesing's avatar
      Add new method to retrieve raw descriptor length. · a113feb2
      Karsten Loesing authored
      When we switched from storing copies of byte[] to storing byte[]
      references, offset, and length, getRawDescriptor() suddenly became
      more expensive than before.  (Before that, copying bytes in the first
      place was always expensive.)  If an application only calls
      Descriptor#getRawDescriptorBytes() to learn the array length, there's
      now a more efficient way to provide that information.
      a113feb2
    • Karsten Loesing's avatar
      Make DescriptorFile obsolete. · 1062da63
      Karsten Loesing authored
      This change simplifies the DescriptorReader interface and allows for
      shorter and more concise application code.  The result is that
      DescriptorReader returns Descriptor instances rather than
      DescriptorFile instances containing Descriptors and accepts a maximum
      queue size in Descriptors, DescriptorFile is deprecated, and
      Descriptor contains a File reference to the descriptor file.
      Implements #22141.
      
      Related to this change, this commit introduces a new
      UnparseableDescriptor to be returned by DescriptorParser and
      DescriptorReader if a descriptor cannot be parsed, as opposed to
      throwing a DescriptorParseException or skipping the entire descriptor
      file (fixes #22139), respectively.
      
      Also related to this change, DescriptorParser now returns an Iterable
      instead of a List, which prepares parsing large descriptor files
      descriptor by descriptor (will be tackled in #20395).
      1062da63
  2. 17 Jun, 2017 1 commit
  3. 16 Jun, 2017 1 commit
  4. 08 Jun, 2017 2 commits
  5. 07 Jun, 2017 1 commit
  6. 06 Jun, 2017 6 commits
    • Karsten Loesing's avatar
      Prepare for 1.8.0 release. · 086b0460
      Karsten Loesing authored
      086b0460
    • Karsten Loesing's avatar
      Store raw descriptors as byte[], offset, and length. · a4d184bf
      Karsten Loesing authored
      Prior to this commit we read raw descriptor bytes from disk, split
      them into serveral byte[] for each contained descriptor, and stored
      those copies together with descriptors.  We further copied descriptor
      parts, like signatures or status entries, and stored those copies as
      well.
      
      Overall, we temporarily required up to 3 times the size of descriptor
      files just to store raw descriptor contents: 1) the entire descriptor
      file read to memory, 2) copies of all contained descriptors, and 3)
      copies of contained descriptor parts.  After moving on to the next
      descriptor file, 1) was freed, but 2) and 3) remained in memory.  This
      was rather wasteful.
      
      With this commit we store raw descriptors as reference to the byte[]
      containing the entire descriptor file plus offset and length of the
      part containing one descriptor.  Similarly we store raw descriptor
      parts as a reference to the full descriptor plus offset and length of
      the descriptor part.  This saves a lot of memory, and it avoids
      unnecessary array copying.
      
      This change is also a step towards not storing raw descriptor contents
      in memory at all, but instead leaving contents on disk and accessing
      parts as needed.  However, this commit does not take that step yet.
      
      The original purpose of this commit was to prepare switching from the
      platform's default charset to UTF-8 for #21932.  The idea was to
      reduce access to DescriptorImpl#rawDescriptorBytes and add all methods
      working on those bytes, including converting them to a String, to
      DescriptorImpl.  This commit achieves this purpose by preparing that
      switch, yet it does not take that step, either.  Switching to UTF-8 is
      midly backward-incompatible, so it'll have to wait until 2.0.0.
      However, switching will be much easier based on the changes in this
      commit.
      
      Many of these changes in this commit are interdependent which makes it
      difficult to split up this commit with reasonable effort.  Still, in
      order to facilitate reviews, here is an explanation of changes made in
      this commit from top to bottom:
      
      Move all code for processing raw descriptor bytes from a) detecting
      the descriptor type, b) finding descriptor starts and ends, up to c)
      invoking the right DescriptorImpl subclass constructors from
      DescriptorImpl and its subclasses over to DescriptorParserImpl.
      
      Include offset and limit in the constructors of DescriptorImpl and
      most of its subclasses.
      
      Refer to directory and network status parts in RelayDirectoryImpl and
      NetworkStatusImpl and its subclasses by offset and length rather than
      passing copies of raw descriptors.
      
      Provide two overloaded methods DescriptorImpl#newScanner() that
      internally handle the byte[]-to-String conversion rather than leaving
      this task to all DescriptorImpl subclasses.
      
      In DescriptorImpl, rather than storing a copy of raw descriptor bytes
      per descriptor, store a reference to a potentially larger byte[],
      containing all descriptors read from a given file, together with
      offset and length.
      
      Provide various methods in DescriptorImpl that provide access to raw
      descriptor bytes and that internally handle issues like unified
      character encoding.
      
      Include an XXX21932 tag in all places where byte[] is currently
      converted to String using the platform's default charset.
      
      Update existing methods in DescriptorImpl to only access
      rawDescriptorBytes within offset and offset + length.
      
      In classes referenced from DescriptorImpl subclasses, like
      DirSourceEntryImpl and NetworkStatusEntryImpl, rather than storing a
      copy of raw descriptor bytes, store a reference to the parent
      DescriptorImpl instance together with offset and length.
      
      Change raw descriptor bytes in ExitListEntryImpl into a String,
      because the byte[] we stored there was never read from disk but
      generated by ourselves using String#getBytes() using the platform's
      default charset.  We also never used raw bytes in ExitListEntryImpl
      anyway.  Admittedly, we could use offset and length there, too, but
      the amount of saved memory is likely not worth the necessary code
      changes.
      
      Remove redundant zero-length checks from DescriptorImpl subclasses
      including ExitListImpl, NetworkStatusImpl, and RelayDirectoryImpl.
      These checks are redundant, because we already performed the same
      checks in DescriptorImpl#countKeys().
      
      Move commonly used helper methods for finding the first index of a
      keyword or splitting descriptory by keyword from DescriptorImpl
      subclasses, like NetworkStatusImpl and RelayDirectoryImpl, to
      DescriptorImpl.
      
      In test classes, replace the numerous invocations of DescriptorImpl
      subclass constructors with local buildSomething() methods, so that
      future changes to constructor signatures won't produce a diff as long
      as this one.
      a4d184bf
    • Karsten Loesing's avatar
    • Karsten Loesing's avatar
      Fix bug in digest computation. · 232ea426
      Karsten Loesing authored
      232ea426
    • Karsten Loesing's avatar
      Move descriptor digest computation to DescriptorImpl. · 74de0a5d
      Karsten Loesing authored
      The main intention behind this change is to reduce the number of
      places in the code where byte[] is converted to String.  But another
      reason is to reduce code duplication, which would have been sufficient
      to make this change.
      74de0a5d
    • Karsten Loesing's avatar
  7. 01 Jun, 2017 1 commit
  8. 26 May, 2017 2 commits
  9. 16 May, 2017 2 commits
  10. 12 May, 2017 1 commit
  11. 11 May, 2017 1 commit
  12. 10 May, 2017 2 commits
  13. 09 May, 2017 2 commits
  14. 08 May, 2017 1 commit
    • Karsten Loesing's avatar
      Add support for six new key-value pairs added by OnionPerf. · ee696b09
      Karsten Loesing authored
      OnionPerf adds six new key-value pairs to the .tpf format that
      Torperf/CollecTor did not produce: ENDPOINTLOCAL, ENDPOINTPROXY,
      ENDPOINTREMOTE, HOSTNAMELOCAL, HOSTNAMEREMOTE, and SOURCEADDRESS.
      
      We should add support for these keys to metrics-lib, so that we can
      start using their values.
      
      Implements #22122.
      ee696b09
  15. 07 Apr, 2017 1 commit
    • Karsten Loesing's avatar
      Don't skip unrecognized lines in certain cases. · 8f69d784
      Karsten Loesing authored
      When we started using Java 7's switch-on-String in 2b4d7732, we broke
      unrecognized line parsing in extra-info descriptors.  Namely, when we
      reached the end of a crypto block we didn't reset the list for
      collecting crypto lines.  So far so good, but any following
      unrecognized lines would be collected as crypto lines and later
      discarded, rather than being added to the unrecognized-lines list and
      later reported.
      
      This only affects relay descriptors, because sanitized bridge
      descriptors don't contain crypto blocks.  And it only affects relay
      descriptors with crypto blocks, like "identity-ed25519", whereas relay
      extra-info descriptors published by older versions were not affected.
      
      Fixes #21890.
      8f69d784
  16. 17 Feb, 2017 2 commits
  17. 14 Feb, 2017 2 commits
  18. 01 Feb, 2017 1 commit
  19. 31 Jan, 2017 1 commit
  20. 05 Jan, 2017 1 commit
  21. 20 Dec, 2016 1 commit
    • Karsten Loesing's avatar
      Add methods for loading and saving a history file. · 56840df1
      Karsten Loesing authored
      The history file implementation in `DescriptorReader` writes the
      history file passed in `setExcludeFiles()` immediately after reading
      and parsing the last descriptor and putting it into the queue,
      regardless of whether the application has finished processing those
      descriptors.
      
      If the application fails after the history file is written, it may not
      be able to process descriptors in the next execution that have still
      been in the queue at the time of failing.
      
      This commit deprecates the `setExcludeFiles()` method and replaces it
      by a `setHistoryFile()` and a `saveHistoryFile()` method.
      Applications would use `setHistoryFile()` before starting to read
      descriptors, process all descriptors, perform any cleaning up, and
      then call `saveHistoryFile()`.
      
      Implements #20521.
      56840df1
  22. 19 Oct, 2016 2 commits
  23. 09 Oct, 2016 1 commit
    • Karsten Loesing's avatar
      Avoid running into an IOException and logging it. · f7f56dd8
      Karsten Loesing authored
      When we recently switched from System.err printing to slf4j logging,
      we started logging an IOException that we shouldn't be running into
      and that we simply ignored before.  This exception gets thrown when
      DescriptorReaderImpl attempts to read a parse history file that
      doesn't exist (yet).  We should simply check whether that files exists
      before attempting to read it.
      
      Fixes #20320.
      f7f56dd8
  24. 31 Aug, 2016 2 commits
  25. 24 Aug, 2016 1 commit