Storage is becoming a little tight on gnt-dal and we should look at what our options are to expand the available storage on that cluster, since that seems to be the main bottleneck right now.
We only have 2x 2.5" empty SSD drive bays in each of our three gnt-dal nodes. We can't add additional NVMe drives in these machines.
a "sweet spot": say just adding more drives of the same size
We could add 2x 2TB SSD drives to each of the three nodes, that would give us 2TB extra useable storage space on each node if we use them to grow the existing RAID10 array.
How much would that be?
a "pipe dream": new SSD (or NVMe?!) drives, as big as possible
First, we only have 2x 2.5" empty SSD drive bays in each of our three gnt-dal nodes. We can't add additional NVMe drives in these machines.
Second, Samsung makes 7.68TB versions of the current drives that we have, they're very expensive coming in at around $1,000 each drive. In addition, we'd probably need to investigate mdadm more closely to make sure we could actually grow our current RAID10 array with this new geometry. Otherwise, we could always just integrate them as a separate RAID1 array but it would make the setup somewhat less flexible.
Right. So that would be what, 6k$USD for the upgrade on the entire cluster? That's not completely out of this world, to be honest.
Please do investigate how it would integrate in the array.
There seems to be bigger drives, but prices become really prohibitive at 16TB (4k$/drive), so let's leave that aside for now.
a "weird setup": add really huge HDD drives for archival (we have 16 and 24TB drives now!)
As far as I know those drives only exist in 3.5" spinning rust form factor, and we don't have any HDD drive bays that could house them in the gnt-dal machines. Also, I would seriously recommend against putting any sort of database workload on HDDs.
Of course, not for databases, but we could use it to archive things in a "cold" MinIO tier, for example, freeing up some space for other things.
I did a cursory exam of pcpartspicker on this, and it seems you are right that 2.5" HDDs are not available in those sizes: the biggest I found are 4TB, and their prices are not much better than SSDs anyways. e.g. there's the Seagate barracuda 4TB 5400rpm at 280USD(0.070USD(0.070
/GB):
Please do investigate how it would integrate in the array.
I'm growing an array at home so I looked into this.
It doesn't look like we can grow RAID-10 arrays with differently-sized disks. (interestingly, it's really hard to find authoritative data on this: there used to be a mediawiki for this which was already quite poor, but it's now archived and the thing it points to is an abysmal, md0-kernel-interface-only documentation that's utterly useless for our needs here.)
ANYWAYS.
It seems like RAID-1/4/5/6 arrays can be grown, according to those docs, but according to that, RAID-10 arrays are "very difficult to reshape and currently any attempt to do so will be rejected. It could be done if someone cares to code it."
So we can't resize the existing array with larger drives (but we could add similarly-sized drives to the array. I think. I'm not sure.)
In any case, what we can almost certainly do is add those as a new RAID-1 array, and pile that onto the VG, as a new PV. I think that would work. Essentially, that would be RAID-1+0, with LVM doing the RAID-0 layer.
(Perhaps, this is how we should have done this in the first place?)
So the question is whether we go with the current drive size or use bigger drives. Could you estimate the prices on both? Include spares, please. I think so far I have:
8x7.68TB drives (two spares, 23.04TB extra capacity, 8k$USD cost? 1k$/unit?)
6x2TB drives (no extra spares, 6TB extra capacity, 1.7k$USD cost, 285$/unit)
surely there's some math wrong there, as the 23TB option looks much more attractive there, even when taking the spares into account. And indeed, the Intel S4510 7.68TB drives are more like 1500USD on Newegg, which would bring the total to 12k$ for 23TB.
It seems like RAID-1/4/5/6 arrays can be grown, according to those docs, but according to that, RAID-10 arrays are "very difficult to reshape and currently any attempt to do so will be rejected. It could be done if someone cares to code it."
So we can't resize the existing array with larger drives (but we could add similarly-sized drives to the array. I think. I'm not sure.)
The documentation you're referring to is ancient. I don't think we should discount our ability to do this based only on that. I do think we can grow a RAID10 volume using larger disks, and this is based on experiments I did a while ago with loop devices, although we should verify this still holds with recent kernels of course.
In any case, I would advise against adding new storage capacity to the cluster by creating a new, separate volume. as opposed to extending the existing one. An extra storage boundary is going to hurt by reducing our flexibility and is likely to create situations in which we'll need to move virtual disks across volumes, which is a time consuming operation.
It seems like RAID-1/4/5/6 arrays can be grown, according to those docs, but according to that, RAID-10 arrays are "very difficult to reshape and currently any attempt to do so will be rejected. It could be done if someone cares to code it."
So we can't resize the existing array with larger drives (but we could add similarly-sized drives to the array. I think. I'm not sure.)
The documentation you're referring to is ancient. I don't think we should discount our ability to do this based only on that. I do think we can grow a RAID10 volume using larger disks, and this is based on experiments I did a while ago with loop devices, although we should verify this still holds with recent kernels of course.
I couldn't find any newer documentation on this, I have done quite a bit
of research!
In any case, I would advise against adding new storage capacity to the cluster by creating a new, separate volume. as opposed to extending the existing one. An extra storage boundary is going to hurt by reducing our flexibility and is likely to create situations in which we'll need to move virtual disks across volumes, which is a time consuming operation.
What do you mean by "volume" here? Volume Group (VG)? That's not my
intention: my intention is to create a new RAID-1 Physical Volume (PV)
that would get added to the the existing VG.
I don't think such a boundary (a new PV) will hurt our flexibility, as
we won't need to move "virtual disks" (logical volumes, LV, i assume?)
across PVs: it will just get allocated as is. and unless we need to
rebuild an entire array, we'll never need to move LVs between PVs.
...
On 2025-03-17 13:56:18, Jérôme Charaoui (@lavamind) wrote:
So the question is whether we go with the current drive size or use bigger drives. Could you estimate the prices on both?
If anyone wants to pick this up, please go ahead. I don't expect to have time this week to look into it.
I'm going to pick this up because I think we really need to prioritize buying and shipping the hardware ASAP, but i will need to pick your brain on this:
Could you estimate the prices on both? Include spares, please. I think so far I have:
8x7.68TB drives (two spares, 23.04TB extra capacity, 8k$USD cost? 1k$/unit?)
6x2TB drives (no extra spares, 6TB extra capacity, 1.7k$USD cost, 285$/unit)
surely there's some math wrong there, as the 23TB option looks much more attractive there, even when taking the spares into account. And indeed, the Intel S4510 7.68TB drives are more like 1500USD on Newegg, which would bring the total to 12k$ for 23TB.
Please check the attached original invoice for those 3 servers. Each server already has 10 SSDs, with no more space for 2.5" hot-swap drives. You can only add 2 x M.2 SSDs on board.
Seems like we had a massive misunderstanding there, again.
One possibility would be to indeed add 2xM.2 drives and replace the Micro 2.5" 500GB drives with 8TB drives, but that's slightly more expensive, and a more involved operation as we need to do a two-stage migration. Ugh.
root@angela:/home/anarcat/raid-10-test# resize2fs /dev/md0resize2fs 1.47.2 (1-Jan-2025)Filesystem at /dev/md0 is mounted on /mnt; on-line resizing requiredold_desc_blocks = 1, new_desc_blocks = 1The filesystem on /dev/md0 is now 784896 (4k) blocks long.