TPA-RFC-84: design and implement backup strategy for MinIO buckets or the entire server
We're considering using MinIO for more and more things, mainly GitLab (artifacts storage in #41403 and gitaly backups in #40518) but possibly other (e.g. metrics storage in tpo/network-health/metrics/collector#40023 (closed)).
Right now, we don't have any backups of that server, which is probably fine: we only store container images there, which can be regenerated in case of a catastrophe. But if we start storing gitaly backups and gitlab artifacts, it needs to be permanent now.
Research how backups can be performed, develop a policy and implement it.
Next steps:
-
research articles anarcat found on the topic (see wallabag) -
discuss the idea in the network -
decide if we want this per bucket or per site -
write up a proposal in particular (in progress, see https://gitlab.torproject.org/tpo/tpa/team/-/wikis/policy/tpa-rfc-84-minio-backups-and-scaling) -
backup/restore recovery -
impact on other teams -
timeline -
estimates -
review this issue
-
-
implement proposal -
minio-fsn-02 setup (4TiB), consider splitting in chunks? (#42136 (closed)) see https://gitlab.torproject.org/tpo/tpa/team/-/wikis/policy/tpa-rfc-84-minio-backups-and-scaling#warm-hard-disk-storage -
implement quotas (#42155 (closed)) (should resolve #42077 (closed)), see https://gitlab.torproject.org/tpo/tpa/team/-/wikis/policy/tpa-rfc-84-minio-backups-and-scaling#quotas -
add monitoring for bucket quota usage
-
-
sync up minio-fsn-02 and minio-01 with hot/cold storage (#42156 (closed)) -
setup tiered storage -
join minio-fsn-02 to cluster -
test assigning a bucket to a specific tier. tie the network-health
bucket to the "warm" tier
-
-
add more storage capacity to the warm
tier cluster (#42237) -
implement storage backups for both clusters (minio-01 and minio-fsn-02), see https://gitlab.torproject.org/tpo/tpa/team/-/wikis/policy/tpa-rfc-84-minio-backups-and-scaling#minio-native-backups-with-possible-exceptions -
consider setting up a new minio-dal-03 server
-
-
document and test backup/restore procedures
Edited by lelutin