Verified Commit d715d9ec authored by anarcat's avatar anarcat
Browse files

make a minio section (team#40478)

parent 58ab66dd
Loading
Loading
Loading
Loading
+81 −20
Original line number Diff line number Diff line
@@ -199,10 +199,6 @@ just throwing ideas out there.

object storage options:

 * [minio][]: suggested/shipped by gitlab omnibus now? [not packaged
   in Debian](https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=859207), container deployment probably the only reasonable
   solution, but watch out for network overhead. no release numbers,
   unclear support policy. golang.
 * ceph has support for s3
 * [openio][] mentioned in one of the GitLab threads, not evaluated,
   python, main website down: https://www.openio.io/
@@ -227,7 +223,7 @@ alternatively, we could go with a SAN, home-grown or commercial, but i would
rather avoid proprietary stuff, which means we'd have to build our own, and
i'm not sure how we would do _that_. ZFS replication maybe? and that would
only solve the Ganeti storage problems. we'd still need an S3 storage, but we
could use something like minio for that specifically.
could use something like MinIO for that specifically.

oh, and we could fix the backup problems by ditching bacula and switching to
something like borg. we'd need an offsite server to "pull" the backups,
@@ -239,7 +235,6 @@ trash its own backups). we could build this with ZFS/BTRFS replication, again.
> restoring websites in #40501 (closed) today, really positive
> feeling.

[minio]: https://min.io/
[openio]: https://www.openio.io/

## TODO: triage Ceph war stories from GitLab and SO
@@ -311,7 +306,7 @@ we ended up [brainstorming this in a meeting][] , where we said:
>   * ceph block storage for ganeti
>   * filesystem snapshots for gitlab / metrics servers backups
>
> We'll look at setting up a VM with minio for testing. We could first test
> We'll look at setting up a VM with MinIO for testing. We could first test
> the service with the CI runners image/cache storage backends, which can
> easily be rebuilt/migrated if we want to drop that test.
>
@@ -322,7 +317,7 @@ we ended up [brainstorming this in a meeting][] , where we said:
> lot of work. It would also be part of the largest "[trusted high
> performance cluster][]" work that we recently de-prioritized.

so it looks like the next step might be to setup minio here as a prototype.
so it looks like the next step might be to setup MinIO here as a prototype.

hiro is also considering object storage for collector
([tpo/network-health/metrics/collector#40023 (closed)][] which could
@@ -338,16 +333,63 @@ secondary storage server for bacula."

https://gitlab.torproject.org/tpo/tpa/team/-/issues/40478#note_2843500

## minio licensing dispute
## MinIO

[MinIO][] is suggested/shipped by gitlab omnibus now? It is [not packaged in
Debian][]. Container deployment probably the only reasonable
solution, but watch out for network overhead. no release numbers,
unclear support policy. Written in Golang.

Features:

 * [active-active replication](https://min.io/product/active-data-replication-for-object-storage), although with low latency (<20ms)
   and loss requirements (< 0.01%), requires a load balancer for HA
 * asynchronous replication, can survive replicas going down (data
   gets cached and resynced after)
 * [bucket replication](https://min.io/docs/minio/linux/administration/bucket-replication.html) 
 * [erasure coding](https://min.io/docs/minio/linux/operations/concepts/erasure-coding.html#minio-erasure-coding)
 * [rolling upgrades](https://min.io/docs/minio/linux/operations/install-deploy-manage/upgrade-minio-deployment.html) with "a few seconds" downtime (presumably
   compensated by client-side retries)
 * object versioning, [immutability](https://min.io/product/data-immutability-for-object-storage)
 * [Prometheus and InfluxDB monitoring](https://min.io/docs/minio/linux/operations/monitoring.html), also includes [bucket event
   notifications](https://min.io/docs/minio/linux/administration/monitoring/bucket-notifications.html)
 * [audit logs](https://min.io/docs/minio/linux/operations/monitoring/minio-logging.html#minio-logging-publish-audit-logs)
 * [external identity providers](https://min.io/docs/minio/linux/operations/external-iam.html): LDAP, OIDC (Keycloak
   specifically)
 * [object server-side encryption](https://min.io/product/enterprise-object-storage-encryption) through external Key Management
   Services (e.g. Hashicorp Vault)
 * built-in [TLS support](https://min.io/docs/minio/linux/operations/network-encryption.html)
 * [recommended hardware setups](https://min.io/product/reference-hardware) although probably very expensive
 * [self-diagnostics and hardware tests](https://min.io/docs/minio/linux/operations/checklists/hardware.html#recommended-hardware-tests)
 * [lifecycle management](https://min.io/docs/minio/linux/administration/object-management/object-lifecycle-management.html#minio-lifecycle-management)
 * [FTP/SFTP/FTPS support](https://min.io/docs/minio/linux/developers/file-transfer-protocol.html)

Missing and downsides:

 * only two-node replication
 * possible licensing issues (see below)
 * upgrades and pool expansions require all servers to restart at once
 * [cannot resize existing server pools](https://min.io/docs/minio/linux/operations/concepts.html#can-i-change-the-size-of-an-existing-minio-deployment), in other words, a resize
   means building a new larger server and retiring the old one (!)
 * very high hardware requirements (4 nodes with each 32 cores, 128GB
   RAM, 8 drives, 25-100GbE for 2-4k clients)
 * [other limitations](https://min.io/docs/minio/linux/operations/checklists/thresholds.html)
 * backups need to be done through bucket replication or site
   replication, difficult to backup using our normal backup systems

[MinIO]: https://min.io/
[not packaged in Debian]: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=859207

### Licensing dispute

re minio, they are involved in a [licensing dispute][] with commercial
storage providers ([Weka][] and [Nutanix][]) because the latter used
Minio in their products without giving attribution. see also [this
hacker news discussion][32148007].
MinIO are involved in a [licensing dispute][] with commercial storage
providers ([Weka][] and [Nutanix][]) because the latter used MinIO in
their products without giving attribution. See also [this hacker news
discussion][32148007].

it should also be noted that they switched to the AGPL relatively recently. i
don't think this should keep us from using it, but just a note to say there's
some storm brewing there.
It should also be noted that they switched to the AGPL relatively
recently. I don't think this should keep us from using it, but just a
note to say there's some storm brewing there.

[Weka]: https://www.weka.io/
[Nutanix]: https://www.nutanix.com/
@@ -369,7 +411,7 @@ and has been [renewed for a year in May 2023][].

Features:

 * apparently [faster than Minio on higher-latency links][] (100ms+)
 * apparently [faster than MinIO on higher-latency links][] (100ms+)
 * [Prometheus monitoring][] (see [metrics list][]) and Grafana
   dashboard
 * [regular releases][] with actual release numbers, although not yet
@@ -379,6 +421,10 @@ Features:
   can come out")
 * read-after-write consistency (stronger than Amazon S3's eventual
   consistency)
 * support for asynchronous replicas (so-called "dangerous" mode that
   returns to the client as soon as the local write finishes), see the
   [replication mode][] for details
 * [static website hosting][]

Missing and downsides:

@@ -388,14 +434,21 @@ Missing and downsides:
   duplication across nodes
 * designed for smaller, "home lab" distributed setups, might not be
   our target
 * [no built-in authentication system][]
 * [no built-in authentication system][], no support for [S3 policies or ACLs][]
 * [non-goals][] also include "extreme performance" and features
   above the S3 API
 * uses a CRDT and Dynamo instead of Raft, see [this discussion for
   tradeoffs][] and [the design page][]
 * no live migration, [upgrade procedure][] currently imply short
   downtimes
 * backups require live filesystem snapshots or shutdown
 * backups require live filesystem snapshots or shutdown, [example
   backup script][]
 * no [bucket versioning][]
 * no [object locking][]
 * no [server-side encryption][], they argue for client-side encryption,
   full disk encryption, and transport encryption instead in their
   [encryption section][]
 * no HTTPS support out of the box, can be easily fixed with a proxy

See also their [comparison with other software][] including MinIO. A
lot of the information in this section was gleaned from [this Hacker
@@ -404,7 +457,7 @@ News discussion][30256753] and [this other one][33853539].
[Docker image]: https://hub.docker.com/r/dxflrs/garage
[binaries]: https://garagehq.deuxfleurs.fr/download/
[NLNet grant]: https://nlnet.nl/project/Garage/
[faster than Minio on higher-latency links]: https://garagehq.deuxfleurs.fr/documentation/design/benchmarks/
[faster than MinIO on higher-latency links]: https://garagehq.deuxfleurs.fr/documentation/design/benchmarks/
[Prometheus monitoring]: https://garagehq.deuxfleurs.fr/documentation/cookbook/monitoring/
[metrics list]: https://garagehq.deuxfleurs.fr/documentation/reference-manual/monitoring/
[regular releases]: https://git.deuxfleurs.fr/Deuxfleurs/garage/releases
@@ -418,6 +471,14 @@ News discussion][30256753] and [this other one][33853539].
[30256753]: https://news.ycombinator.com/item?id=30256753
[33853539]: https://news.ycombinator.com/item?id=33853539
[upgrade procedure]: https://garagehq.deuxfleurs.fr/documentation/operations/upgrading/#major-upgarades-with-minimal-downtime
[replication mode]: https://garagehq.deuxfleurs.fr/documentation/reference-manual/configuration/#replication-mode
[static website hosting]: https://garagehq.deuxfleurs.fr/documentation/cookbook/exposing-websites/
[bucket versioning]: https://git.deuxfleurs.fr/Deuxfleurs/garage/issues/166
[S3 policies or ACLs]: https://garagehq.deuxfleurs.fr/documentation/reference-manual/s3-compatibility/#acl-policies-endpoints
[object locking]: https://garagehq.deuxfleurs.fr/documentation/reference-manual/s3-compatibility/#locking-objects
[server-side encryption]: https://garagehq.deuxfleurs.fr/documentation/reference-manual/s3-compatibility/#server-side-encryption
[encryption section]: https://garagehq.deuxfleurs.fr/documentation/cookbook/encryption/
[example backup script]: https://git.deuxfleurs.fr/Deuxfleurs/nixcfg/src/branch/main/cluster/prod/app/backup/build/backup-garage/do-backup.sh

## SeaweedFS