Skip to content
Snippets Groups Projects
Verified Commit bd27ea8b authored by anarcat's avatar anarcat
Browse files

start summarizing team#40478

parent a3b456bf
No related branches found
No related tags found
No related merge requests found
......@@ -21,6 +21,7 @@ and add it to the above list.
* [TPA-RFC-38: Setting Up a Wiki Service](policy/tpa-rfc-38-new-wiki-service)
* [TPA-RFC-45: Mail architecture](policy/tpa-rfc-45-mail-architecture)
* [TPA-RFC-47: Email account retirement](policy/tpa-rfc-47-email-account-retirement)
* [TPA-RFC-56: large file storage](policy/tpa-rfc-56-large-file-storage)
## Proposed
......
---
title: TPA-RFC-56: large file storage
---
[[_TOC_]]
Summary: TODO
# Background
We've had multiple incident with servers running out of disk space in
the past. This RFC aims at collecting a summary of those issues and
giving a proposal of a solution that should cover most of them.
Those are the issues that were raised in the past with servers running
out of disk space:
* [#40475 (closed)](/tpo/tpa/team/-/issues/40475), [#40615
(closed)](/tpo/tpa/team/-/issues/40615): "gitlab-02 running out of
disk space"). CI artifacts, and non-linear growth events
* [#40431 (closed)](/tpo/tpa/team/-/issues/40431): "`ci-runner-01`
invalid ubuntu package signatures"); [gitlab#95
(closed)](/tpo/tpa/gitlab/-/issues/95): "Occasionally clean-up
Gitlab CI storage". non-linear, possibly explosive and
unpredictable growth. cache sharing issues between
runners. somewhat under control now that we have more runners.
* [#40477 (closed)](/tpo/tpa/team/-/issues/40477) ("backup failure: disk full on
bungei"). backups, non-linear, mostly archive-01 but also
gitlab. workaround [good for ~8
months](https://gitlab.torproject.org/tpo/tpa/team/-/issues/40477#note_2756638
"backup failure: disk full on bungei") (from October 2021, so
until June 2022) hopefully.
* [#40442 (closed)](/tpo/tpa/team/-/issues/40442) ("meronense running out of disk
space"). metrics storage, linear growth. transitioning between
storage systems (see [tpo/network-health/metrics/collector#40012
(closed)](https://gitlab.torproject.org/tpo/network-health/metrics/collector/-/issues/40012
"Come up with a plan to make past descriptors etc. easier
available and queryable \(giant database\)")). workaround good for
years.
* [#40535 (closed)](/tpo/tpa/team/-/issues/40535): "colchicifolium disk full". storage is
steadily increasing, adding about 30GB per 90 days according to
[@hiro](/hiro "Hiro"), with `/srv` regularly reaching 90% full and capacity
being added
TODO: to add, https://gitlab.torproject.org/tpo/tpa/team/-/issues/40478#note_2808917
> archive-01 ([#40779 (closed)](/tpo/tpa/team/-/issues/40779)) and vineale ([#40778 (closed)](/tpo/tpa/team/-/issues/40778))
> just ran out of disk space too. the strategy for the former is to
> just bump up disk space and eventually migrate to gitlab. for the
> former, it's unclear. it seems like we're eating 2TB a year on that
> thing, or more...
>
> also, we were asked where to put large VM images (3x8GB), and we
> answered "git(lab) LFS" with the intention of moving to object
> storage if we run out of space on the main VM, see #40767 (closed)
> for the discussion.
Note that GitLab needs to be scaled up specifically as well, which
primarily involves splitting it in multiple machines, see [#40479](/tpo/tpa/team/-/issues/40479 "scale out GitLab to 2k users")
for that discussion. It's partly in scope of this discussion in the
sense that a solution chosen here must be somewhat useful to scale
GitLab out.
Design and performance issues:
* Ganeti's DRBD backend - a full reboot of all nodes in the cluster
takes hours, because all machines need to be migrated between the
nodes (which is fine) and do not migrate back to their original
pattern (which is not). this might or might not be fixed by a
change in the migration algorithm, but it could also be fixed by
changing storage away from DRBD to something else.
* [tpo/network-health/metrics/collector#40012 (closed)](/tpo/network-health/metrics/collector/-/issues/40012): "Come up
with a plan to make past descriptors etc. easier available and
queryable \(giant database\)" (in onionoo/collector storage). lots
of small files, might require FS snapshots or transition to
database, see new design in that ticket, or object storage (see
also [tpo/network-health/metrics/collector#40023 (closed)](/tpo/network-health/metrics/collector/-/issues/40023),
"Move collector storage from file based to object storage")
* [#40650 (closed)](/tpo/tpa/team/-/issues/40650): "colchicifolium backups are barely
functional". backups take _days_ to complete, possible solution is
to "Move collector storage from file based to object storage"
([tpo/network-health/metrics/collector#40023 (closed)](/tpo/network-health/metrics/collector/-/issues/40023))
* [#40482 (closed)](/tpo/tpa/team/-/issues/40482): "meronense performance problems (out of
memory?)". nightly memory spikes usage every night, not directly
TPA's responsability, but related to the above
Much of the above and this RFC come from the brainstorm established in
issue [tpo/tpa/team#40478][].
## Storage usage analysis
redo the graphs in https://gitlab.torproject.org/tpo/tpa/team/-/issues/40478#note_2760208
# Proposal
## Goals
<!-- include bugs to be fixed -->
### Must have
### Nice to have
### Non-Goals
## Scope
## Affected users
# Examples or Personas
Examples:
* ...
Counter examples:
* ...
# Alternatives considered
## Throw hardware at it
## TODO: brainstorm ideas to triage
just throwing ideas out there.
in kubernetes, assuming we might want to go there:
* <https://longhorn.io/> \- k8s volumes, native-only, no legacy support?
* <https://rook.io/> \- ceph operator
object storage options:
* [minio](https://min.io/): suggested/shipped by gitlab omnibus now?
* ceph has support for s3
* [openio](https://www.openio.io/) mentioned in one of the GitLab threads, not evaluated
* [garage](https://garagehq.deuxfleurs.fr/) is another alternative
in general: i think Ceph is a great option that ticks a lot of the boxes here:
* redundancy (a la DRBD)
* but also load-balancing (ie. read/write to multiple servers, i think)
* S3 backend, which checks the gitlab box.
* native ganeti integration
the only concern might be its performance and reliability. gitlab evaluated it
as a NFS replacement but decided against it. other war stories:
* <https://blog.acolyer.org/2019/11/06/ceph-evolution/>
* <https://michael-prokop.at/blog/2021/04/09/a-ceph-war-story/>
* <https://www.openstack.org/summit/vancouver-2015/summit-videos/presentation/ceph-at-cern-a-year-in-the-life-of-a-petabyte-scale-block-storage-service>
* <https://gitlab.com/gitlab-com/gl-infra/infrastructure/-/issues/727> \- gitlab evaluated Ceph and moving to metal in 2016 and decided to stick with the cloud
alternatively, we could go with a SAN, home-grown or commercial, but i would
rather avoid proprietary stuff, which means we'd have to build our own, and
i'm not sure how we would do _that_. ZFS replication maybe? and that would
only solve the Ganeti storage problems. we'd still need an S3 storage, but we
could use something like minio for that specifically.
oh, and we could fix the backup problems by ditching bacula and switching to
something like borg. we'd need an offsite server to "pull" the backups,
however (because borg is push, which means a compromised backup server can
trash its own backups). we could build this with ZFS/BTRFS replication, again.
> another caveat with borg is that restores are kind of slow. bacula
> seems to be really fast at restores, at least it's my experience
> restoring websites in #40501 (closed) today, really positive
> feeling.
## TODO: triage Ceph war stories from GitLab and SO
more war stories, this time from gitlab:
* when they were saying they would move to bare metal and ceph: <https://about.gitlab.com/blog/2016/11/10/why-choose-bare-metal/>
* when they subsequently tried and failed and switched back to the cloud and not ceph, see <https://gitlab.com/gitlab-com/gl-infra/infrastructure/-/issues/727> <https://gitlab.com/gitlab-com/operations/-/issues/1> quote from [this deployment issue](https://gitlab.com/gitlab-com/gl-infra/infrastructure/-/issues/241#note_39509631):
> While it's true that we lean towards PostgreSQL, our usage of CephFS was not
> for the database server, but for the git repositories. In the end we
> abandoned our usage of CephFS for shared storage and reverted back to a
> sharded NFS design.
and StackOverflow's (presumably) Jeff Atwood:
* "We had disastrous experiences with Ceph and Gluster on bare metal. I think this says more about the immaturity (and difficulty) of distributed file systems than the cloud per se." <https://news.ycombinator.com/item?id=12940042> in response to the first article from GitLab.com above (which ended up being correct: the went back to the cloud)
about this, one key thing to keep in mind is that GitLab were looking
for an NFS replacement.
we don't use NFS anywhere right now (thank god) so that is not a
requirement.
the above "horror stories" might not be the same with other storage
mechanisms. indeed, there's a big difference between using Ceph as a
filesystem (ie. CephFS) and an object storage (RadosGW) or block
storage (RBD), which might be better targets for us.
In particular, we're likely to use Ceph as a block device -- for
Ganeti instance disks, which Ganeti has good support for -- or object
storage -- for GitLab's "things", which it is now also designed
for. And indeed, "NFS" (ie. real filesystem) is now (14.x?) deprecated
in GitLab, so shared data storage is expected to go through S3-like
"object storage" APIs from here on.
## TODO: triage CERN experience
oh, and also i should drop this here... CERN started with a 3PB Ceph
deployment [around 2015](https://www.openstack.org/videos/summits/vancouver-2015/ceph-at-cern-a- year-in-the-life-of-a-petabyte-scale-block-storage-service). It seems it's still in use:
* [2017](https://indico.cern.ch/event/649159/contributions/2761965/attachments/1544385/2423339/hroussea-storage-at-CERN.pdf), 65PB
* [2018](https://indico.mathrice.fr/event/143/contribution/1/material/slides/0.pdf), 300PB?
* [2019](https://www.hpcwire.com/2019/09/30/how-ceph-is-helping-to-unlock-the-secrets-of-the-universe/), 1PB/day, 115PB/year?
* [2021](https://www.concat.de/wp-content/uploads/2021/05/WP-Storage-Wars-Part-3-CEPH-for-HPC-Environments.pdf), 65PB?
... although, as you can see, it's not exactly clear to me how much data is
managed by ceph. they seem to have a good experience with Ceph in any case,
with three active committers, and they say it's a "great community", which is
certainly a plus...
## TODO triage meeting brainstorm
https://gitlab.torproject.org/tpo/tpa/team/-/issues/40478#note_2788264
we ended up [brainstorming this in a
meeting](https://gitlab.torproject.org/tpo/tpa/team/-/wikis/meeting/2022-02-14#storage-
brainstorm), where we said:
> We considered the following technologies for the broader problem:
>
> * S3 object storage for gitlab
> * ceph block storage for ganeti
> * filesystem snapshots for gitlab / metrics servers backups
>
> We'll look at setting up a VM with minio for testing. We could first test
> the service with the CI runners image/cache storage backends, which can
> easily be rebuilt/migrated if we want to drop that test.
>
> This would disregard the block storage problem, but we could pretend this
> would be solved at the service level eventually (e.g. redesign the metrics
> storage, split up the gitlab server). Anyways, migrating away from DRBD to
> Ceph is a major undertaking that would require a lot of work. It would also
> be part of the largest "[trusted high performance
> cluster](https://gitlab.torproject.org/groups/tpo/tpa/-/milestones/2)" work
> that we recently de-prioritized.
so it looks like the next step might be to setup minio here as a prototype.
[@hiro](/hiro "Hiro") is also considering object storage for collector
([tpo/network-health/metrics/collector#40023 (closed)](/tpo/network-
health/metrics/collector/-/issues/40023 "Move collector storage from file
based to object storage")) which could solve a lot of the problems we're
having here.
## upstream provider
"they have terabytes of storage where we could run a VM to have a
secondary storage server for bacula."
https://gitlab.torproject.org/tpo/tpa/team/-/issues/40478#note_2843500
## minio licensing dispute
re minio, they are involved in a [licensing
dispute](https://blocksandfiles.com/2023/03/26/we-object-minio-says-no-more-
open-license-for-you-weka/) with commercial storage providers
([Weka](https://www.weka.io/) and [Nutanix](https://www.nutanix.com/)) because
the latter used Minio in their products without giving attribution. see also
[this hacker news discussion](https://news.ycombinator.com/item?id=32148007).
it should also be noted that they switched to the AGPL relatively recently. i
don't think this should keep us from using it, but just a note to say there's
some storm brewing there.
# Costs
# Approval
# Deadline
# Status
This proposal is currently in the `draft` state.
# References
* discussion issue: [tpo/tpa/team#40478][].
[tpo/tpa/team#40478]: https://gitlab.torproject.org/tpo/tpa/team/-/issues/40478
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment