Loading policy.md +1 −0 Original line number Diff line number Diff line Loading @@ -29,6 +29,7 @@ the Git repository for this wiki, run the command: * [TPA-RFC-47: Email account retirement](policy/tpa-rfc-47-email-account-retirement) * [TPA-RFC-66: Migrate to Gitlab Ultimate Edition](policy/tpa-rfc-66-gitlab-ultimate-program) * [TPA-RFC-73: Tails infra merge roadmap](policy/tpa-rfc-73-tails-infra-merge-roadmap) * [TPA-RFC-74: GitLab CI retention policy](policy/tpa-rfc-74-gitlab-ci-retention-policy) ## Proposed Loading policy/tpa-rfc-74-gitlab-ci-retention-policy.md 0 → 100644 +91 −0 Original line number Diff line number Diff line --- title: TPA-RFC-74: GitLab CI retention policy costs: n/a approval: sysadmins and team leads affected users: gitlab users deadline: none status: draft discussion: https://gitlab.torproject.org/tpo/tpa/team/-/issues/41874 --- [[_TOC_]] Summary: a proposal to limit the retention of GitLab CI data to 1 year # Background As more and more Tor projects moved to GitLab and embraced its continuous integration features, managing the ensuing storage requirements has been a challenge. We regularly deal with near filesystem saturation incidents on the GitLab server, especially involving CI artifact storage, such as tpo/tpa/team#41402 and recently, tpo/tpa/team#41861 Previously, [TPA-RFC-14][] was implemented to reduce the default artifact retention period from 30 to 14 days. This, and CI optimization of individual projects has provided relief, but the long-term issue has not been definitively addressed since the retention period doesn't apply to some artifacts such as job logs, which are kept indefinitely by default. [TPA-RFC-14]: tpa-rfc-14-gitlab-artifacts # Proposal Implement a daily GitLab maintenance task to delete CI pipelines older than 1 year in *all* projects hosted on our instance. This will: * Purge old CI pipeline and job records for the GitLab database * Delete associated CI job artifacts, even those "kept" either: * When [manually prevented from expiring][] ("Keep" button) * When they're the [latest successful pipeline artifact][] * Delete old CI job log artifacts [manually prevented from expiring]: https://gitlab.torproject.org/help/ci/jobs/job_artifacts#with-an-expiry [latest pipeline artifact]: https://gitlab.torproject.org/help/ci/jobs/job_artifacts.md#keep-artifacts-from-most-recent-successful-jobs ## Goals This is expected to significantly reduce the growth rate of CI-related storage usage, and of the GitLab service in general. ## Affected users All users of GitLab CI will be impacted by this change. ## Timeline Barring the need to further discussion, this will be implemented on Monday, December 9th. ## Costs estimates ### Hardware This is expected to reduce future requirements in terms of storage hardware. ### Staff This will reduce the amount of TPA labor needed to deal with filesystem saturation incidents. # Alternatives considered A "CI housekeeping" script is already in place, which scrubs job logs daily in a hard-coded list of key projects such as c-tor packaging, which runs an elaborate CI pipeline on a daily basis, and triage-bot, which runs it CI pipeline on a schedule, every 15 minutes. Although it has helped up until now, this approach is not able to deal with the increasing use of personal fork projects which are used for development. It's possible to define a different retention policy based on a project's namespace. For example, projects under the `tpo` namespace could have a longer retention period, while others (personal projects) could have a shorter one. This isn't part of the proposal currently as it could violate the principle of least surprise. # References * Discussion ticket: tpo/tpa/team#41861 * [Make It Ephemeral: Software Should Decay and Lose Data](https://lucumr.pocoo.org/2024/10/30/make-it-ephemeral/) Loading
policy.md +1 −0 Original line number Diff line number Diff line Loading @@ -29,6 +29,7 @@ the Git repository for this wiki, run the command: * [TPA-RFC-47: Email account retirement](policy/tpa-rfc-47-email-account-retirement) * [TPA-RFC-66: Migrate to Gitlab Ultimate Edition](policy/tpa-rfc-66-gitlab-ultimate-program) * [TPA-RFC-73: Tails infra merge roadmap](policy/tpa-rfc-73-tails-infra-merge-roadmap) * [TPA-RFC-74: GitLab CI retention policy](policy/tpa-rfc-74-gitlab-ci-retention-policy) ## Proposed Loading
policy/tpa-rfc-74-gitlab-ci-retention-policy.md 0 → 100644 +91 −0 Original line number Diff line number Diff line --- title: TPA-RFC-74: GitLab CI retention policy costs: n/a approval: sysadmins and team leads affected users: gitlab users deadline: none status: draft discussion: https://gitlab.torproject.org/tpo/tpa/team/-/issues/41874 --- [[_TOC_]] Summary: a proposal to limit the retention of GitLab CI data to 1 year # Background As more and more Tor projects moved to GitLab and embraced its continuous integration features, managing the ensuing storage requirements has been a challenge. We regularly deal with near filesystem saturation incidents on the GitLab server, especially involving CI artifact storage, such as tpo/tpa/team#41402 and recently, tpo/tpa/team#41861 Previously, [TPA-RFC-14][] was implemented to reduce the default artifact retention period from 30 to 14 days. This, and CI optimization of individual projects has provided relief, but the long-term issue has not been definitively addressed since the retention period doesn't apply to some artifacts such as job logs, which are kept indefinitely by default. [TPA-RFC-14]: tpa-rfc-14-gitlab-artifacts # Proposal Implement a daily GitLab maintenance task to delete CI pipelines older than 1 year in *all* projects hosted on our instance. This will: * Purge old CI pipeline and job records for the GitLab database * Delete associated CI job artifacts, even those "kept" either: * When [manually prevented from expiring][] ("Keep" button) * When they're the [latest successful pipeline artifact][] * Delete old CI job log artifacts [manually prevented from expiring]: https://gitlab.torproject.org/help/ci/jobs/job_artifacts#with-an-expiry [latest pipeline artifact]: https://gitlab.torproject.org/help/ci/jobs/job_artifacts.md#keep-artifacts-from-most-recent-successful-jobs ## Goals This is expected to significantly reduce the growth rate of CI-related storage usage, and of the GitLab service in general. ## Affected users All users of GitLab CI will be impacted by this change. ## Timeline Barring the need to further discussion, this will be implemented on Monday, December 9th. ## Costs estimates ### Hardware This is expected to reduce future requirements in terms of storage hardware. ### Staff This will reduce the amount of TPA labor needed to deal with filesystem saturation incidents. # Alternatives considered A "CI housekeeping" script is already in place, which scrubs job logs daily in a hard-coded list of key projects such as c-tor packaging, which runs an elaborate CI pipeline on a daily basis, and triage-bot, which runs it CI pipeline on a schedule, every 15 minutes. Although it has helped up until now, this approach is not able to deal with the increasing use of personal fork projects which are used for development. It's possible to define a different retention policy based on a project's namespace. For example, projects under the `tpo` namespace could have a longer retention period, while others (personal projects) could have a shorter one. This isn't part of the proposal currently as it could violate the principle of least surprise. # References * Discussion ticket: tpo/tpa/team#41861 * [Make It Ephemeral: Software Should Decay and Lose Data](https://lucumr.pocoo.org/2024/10/30/make-it-ephemeral/)