TPA-RFC-84: design and implement backup strategy for MinIO buckets or the entire server
We're considering using MinIO for more and more things, mainly GitLab (artifacts storage in #41403 and gitaly backups in #40518) but possibly other (e.g. metrics storage in tpo/network-health/metrics/collector#40023).
Right now, we don't have any backups of that server, which is probably fine: we only store container images there, which can be regenerated in case of a catastrophe. But if we start storing gitaly backups and gitlab artifacts, it needs to be permanent now.
Research how backups can be performed, develop a policy and implement it.
Next steps:
* [x] research articles anarcat found on the topic (see wallabag)
* [x] discuss the idea in the network
* [x] decide if we want this per bucket or per site
* [x] write up a proposal in particular (in progress, see https://gitlab.torproject.org/tpo/tpa/team/-/wikis/policy/tpa-rfc-84-minio-backups-and-scaling)
* [ ] backup/restore recovery
* [x] impact on other teams
* [x] timeline
* [x] estimates
* [x] review this issue
* [ ] implement proposal
* [x] minio-fsn-02 setup (4TiB), consider splitting in chunks? (tpo/tpa/team#42136) see https://gitlab.torproject.org/tpo/tpa/team/-/wikis/policy/tpa-rfc-84-minio-backups-and-scaling#warm-hard-disk-storage
* [x] implement quotas (tpo/tpa/team#42155) (should resolve #42077), see https://gitlab.torproject.org/tpo/tpa/team/-/wikis/policy/tpa-rfc-84-minio-backups-and-scaling#quotas
* [x] add monitoring for bucket quota usage
* [x] sync up minio-fsn-02 and minio-01 with hot/cold storage (tpo/tpa/team#42156)
* [x] setup tiered storage
* [x] join minio-fsn-02 to cluster
* [x] test assigning a bucket to a specific tier. tie the `network-health` bucket to the "warm" tier
* [ ] add more storage capacity to the `warm` tier cluster (tpo/tpa/team#42237)
* [ ] implement storage backups for both clusters (minio-01 and minio-fsn-02), see https://gitlab.torproject.org/tpo/tpa/team/-/wikis/policy/tpa-rfc-84-minio-backups-and-scaling#minio-native-backups-with-possible-exceptions
* [ ] consider setting up a new minio-dal-03 server
* [ ] document and test backup/restore procedures
issue