Skip to content

GitLab

  • Menu
Projects Groups Snippets
    • Loading...
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in
  • Trac Trac
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Issues 246
    • Issues 246
    • List
    • Boards
    • Service Desk
    • Milestones
  • Monitor
    • Monitor
    • Metrics
    • Incidents
  • Analytics
    • Analytics
    • Value stream
  • Wiki
    • Wiki
  • Activity
  • Create a new issue
  • Issue Boards
Collapse sidebar
  • Legacy
  • TracTrac
  • Issues
  • #25523

Closed (moved)
(moved)
Open
Created Mar 16, 2018 by Karsten Loesing@karsten

Add support for webstats tarballs

I started creating tarballs containing .xz-compressed webstats files. When I attempt to feed them into DescriptorReader, it fails with an exception like the following:

Cannot parse descriptor file ’in/webstats-2016-01.tar’.
��s",�����k)�nnq����w؆jG�I�[1��eѰCx%��'.
        at org.torproject.descriptor.impl.DescriptorParserImpl.detectTypeAndParseDescriptors(DescriptorParserImpl.java:136)
        at org.torproject.descriptor.impl.DescriptorParserImpl.parseDescriptors(DescriptorParserImpl.java:33)
        at org.torproject.descriptor.impl.DescriptorReaderImpl$DescriptorReaderRunnable.readTarball(DescriptorReaderImpl.java:325)
        at org.torproject.descriptor.impl.DescriptorReaderImpl$DescriptorReaderRunnable.readTarballs(DescriptorReaderImpl.java:276)
        at org.torproject.descriptor.impl.DescriptorReaderImpl$DescriptorReaderRunnable.run(DescriptorReaderImpl.java:162)
        at java.lang.Thread.run(Thread.java:745)}

The tarballs I created contain files as follows:

$ tar tf webstats-2016-01.tar
[...]
webstats-2016-01/torproject.org/2016/01/25/torproject.org_aroides.torproject.org_access.log_20160125.xz
webstats-2016-01/torproject.org/2016/01/25/torproject.org_archeotrichon.torproject.org_access.log_20160125.xz

When I extract tarball files before reading them with DescriptorReader, this works just fine.

I think that the issue is that DescriptorParserImpl#detectTypeAndParseDescriptors() looks at descriptorFile rather than fileName to obtain the file name. The effect is that it learns the tarball file name, rather than the file name of the contained log file:

-    if (descriptorFile.getName().contains(LogDescriptorImpl.MARKER)
+    if (fileName.contains(LogDescriptorImpl.MARKER)

The above is untested and probably insufficient. It's just supposed to start the bug hunting. Priority is medium, because we can just extract tarballs for now. But it's a bug, and it may confuse users as soon as we provide these tarballs and no working code to process them.

This is also related to #22695 (moved).

Assigning to iwakeh who said they'd like to grab it.

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Assignee
Assign to
Time tracking