Skip to content
GitLab
Projects Groups Snippets
  • /
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in
  • Trac Trac
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Issues 246
    • Issues 246
    • List
    • Boards
    • Service Desk
    • Milestones
  • Monitor
    • Monitor
    • Metrics
    • Incidents
  • Analytics
    • Analytics
    • Value stream
  • Wiki
    • Wiki
  • Activity
  • Create a new issue
  • Issue Boards
Collapse sidebar
  • Legacy
  • TracTrac
  • Issues
  • #31204
Closed (moved) (moved)
Open
Issue created Jul 19, 2019 by Karsten Loesing@karsten

Extend file objects in index.json to include descriptor types, publication times, and file digests

atagar suggested to extend file objects in CollecTor's index.json to include descriptor types, publication times, and file digests.

As of now, file objects in the index.json file have the following fields:

  • "path": Relative path of the file.
  • "size": Size of the file in bytes.
  • "last_modified": Timestamp when the file was last modified using pattern "YYYY-MM-DD HH:MM" in the UTC timezone.

The new fields could be defined as follows, though this is very much subject to discussion on this ticket:

  • "types": List of descriptor types as found in @type annotations of contained descriptors (optional).
  • "first_published": Earliest published timestamp (or similar) of contained descriptors (optional).
  • "last_published": Latest published timestamp (or similar) of contained descriptors (optional).
  • "sha256": SHA-256 digest of the file, encoded as base64 (optional).

All these new fields seem reasonable things to add, and I don't see why we wouldn't want to add them. The index will get bigger, but that sounds acceptable. The coding effort is non-zero, which is something we'll have to admit. But all in all, I don't see a blocker for doing this.

Implementation note: All these new fields have in common that they're not just file attributes that we can easily obtain from Java's File class. We'll have to open and read files in order to obtain these fields, and that's very time-consuming. I could see how we do this in a background thread (or thread pool) started by CollecTor's CreateIndexJson.java with a state file of some sort to avoid reprocessing files that haven't changed. And while this thread (pool) hasn't completed processing a file, the index would simply omit these new fields (not files!), which is why fields are defined as optional above.

What else did I miss? atagar, please fill in any thoughts that I left out.

Once we agree on the spec here, this could be a fine little project for a volunteer.

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information
Assignee
Assign to
Time tracking