Skip to content

GitLab

  • Projects
  • Groups
  • Snippets
  • Help
    • Loading...
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in
Trac
Trac
  • Project overview
    • Project overview
    • Details
    • Activity
  • Issues 246
    • Issues 246
    • List
    • Boards
    • Labels
    • Service Desk
    • Milestones
  • Operations
    • Operations
    • Metrics
    • Incidents
  • Analytics
    • Analytics
    • Value Stream
  • Wiki
    • Wiki
  • Members
    • Members
  • Collapse sidebar
  • Activity
  • Create a new issue
  • Issue Boards

GitLab is used only for code review, issue tracking and project management. Canonical locations for source code are still https://gitweb.torproject.org/ https://git.torproject.org/ and git-rw.torproject.org.

  • Legacy
  • TracTrac
  • Issues
  • #25329

Closed (moved)
Open
Opened Feb 21, 2018 by iwakeh iwakeh@iwakeh

Enable metrics-lib to process large (> 2G) logfiles

Metrics-lib receives compressed logs, usually of sizes below 600kB. As this can be dealt with in-memory, this ticket is about handling the logs that deflate to larger files (approx. 2G).

Commons-compressed doesn't provide methods for determining the deflated content size (as the command line tool xz does). Other compression types metrics-lib supports have this option, but it also would require more changes.

Compression can be very effective. Thus, using a cut-off compressed size is sort of arbitrary. An example for xz compression: the 3G deflated log has 589492 compressed input array length; using extreme compression it even shrinks to a length of 405480; on the other hand a deflated 64M file can have an input array of 509212 length.

For handling larger log files with metrics-lib some interface changes will be necessary. Here a suggestion:


 public interface LogDescriptor extends Descriptor {
 
   /**
-   * Returns the decompressed raw descriptor bytes of the log.
+   * Returns the compressed raw descriptor bytes of the log.
+   *
+   * <p>For access to the log's decompressed bytes
+   * use method {@code decompressedByteStream}.</p>
+   *
    * @since 2.2.0
    */

   public byte[] getRawDescriptorBytes();
 
   /**
+   * Returns the decompressed raw descriptor bytes of the log as stream.
+   *
+   * @since 2.2.0
+   */
+  public InputStream decompressedByteStream();
+

I think this might be easiest to understand and use; and of course the implementation wouldn't need to change processing for large and 'normal' logs. It also avoids deciding about the method to find out if a file is large or not.

Thoughts?

To upload designs, you'll need to enable LFS and have admin enable hashed storage. More information
Assignee
Assign to
None
Milestone
None
Assign milestone
Time tracking
None
Due date
None
Reference: legacy/trac#25329