Skip to content

GitLab

  • Menu
Projects Groups Snippets
    • Loading...
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in
  • Trac Trac
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Issues 246
    • Issues 246
    • List
    • Boards
    • Service Desk
    • Milestones
  • Monitor
    • Monitor
    • Metrics
    • Incidents
  • Analytics
    • Analytics
    • Value stream
  • Wiki
    • Wiki
  • Activity
  • Create a new issue
  • Issue Boards
Collapse sidebar
  • Legacy
  • TracTrac
  • Issues
  • #25329

Closed (moved)
(moved)
Open
Created Feb 21, 2018 by iwakeh iwakeh@iwakeh

Enable metrics-lib to process large (> 2G) logfiles

Metrics-lib receives compressed logs, usually of sizes below 600kB. As this can be dealt with in-memory, this ticket is about handling the logs that deflate to larger files (approx. 2G).

Commons-compressed doesn't provide methods for determining the deflated content size (as the command line tool xz does). Other compression types metrics-lib supports have this option, but it also would require more changes.

Compression can be very effective. Thus, using a cut-off compressed size is sort of arbitrary. An example for xz compression: the 3G deflated log has 589492 compressed input array length; using extreme compression it even shrinks to a length of 405480; on the other hand a deflated 64M file can have an input array of 509212 length.

For handling larger log files with metrics-lib some interface changes will be necessary. Here a suggestion:


 public interface LogDescriptor extends Descriptor {
 
   /**
-   * Returns the decompressed raw descriptor bytes of the log.
+   * Returns the compressed raw descriptor bytes of the log.
+   *
+   * <p>For access to the log's decompressed bytes
+   * use method {@code decompressedByteStream}.</p>
+   *
    * @since 2.2.0
    */

   public byte[] getRawDescriptorBytes();
 
   /**
+   * Returns the decompressed raw descriptor bytes of the log as stream.
+   *
+   * @since 2.2.0
+   */
+  public InputStream decompressedByteStream();
+

I think this might be easiest to understand and use; and of course the implementation wouldn't need to change processing for large and 'normal' logs. It also avoids deciding about the method to find out if a file is large or not.

Thoughts?

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Assignee
Assign to
Time tracking