Unverified Commit 1c5a5228 authored by anarcat's avatar anarcat
Browse files

expand iscsi docs

parent 541b017b
Loading
Loading
Loading
Loading
+121 −3
Original line number Diff line number Diff line
@@ -147,6 +147,111 @@ DRBD, in the FSN cluster:
          --backend-parameters memory=1g,vcpus=1 \
          test-01.torproject.org

Do not forget to follow the [next steps](#next-steps), above.

### iSCSI integration

To create a VM with iSCSI backing, a disk must first be created on the
SAN, then adopted in a VM, which needs to be *reinstalled* on top of
that. This is typical how large disks are provisionned in the
`gnt-chi` cluster, in the [Cymru POP](howto/new-machine-cymru).

The following instructions assume you are on a node with an [iSCSI
initiator properly setup](howto/new-machine-cymru#iscsi-initiator-setup), and the [SAN cluster management tools
setup](howto/new-machine-cymru#san-management-tools-setup). It also assumes you are familiar with the `SMcli` tool, see
the [storage servers documentation](howto/new-machine-cymru#storage-servers) for an introduction on that.

This assumes you are creating a 500GB VM, partitioned on the Linux
host, *not* on the iSCSI volume. TODO: change those instructions to
create one volume per partition, so that those can be resized more
easily. This is how `web-chi-03` was setup.

 1. create the disk on the SAN and assign it to the host group:

        create virtualDisk physicalDiskCount=3 raidLevel=5 userLabel="web-chi-03" capacity=500GB;
        set virtualDisk ["web-chi-03"] logicalUnitNumber=4 hostGroup="gnt-chi";

    NOTE: the `logicalUnitNumber` here must be an increment from the
    previous highest LUN. See also the [disk creation instructions](howto/new-machine-cymru#creating-a-disk)
    for a discussion.

 2. detect the new device on the Linux side:

        iscsiadm -m node --rescan
        ls -altr /dev/disk/by-path/*lun-4

    TODO: is the `... --rescan` necessary?

 3. find the associated WWID:

        /lib/udev/scsi_id -g -u -d /dev/sdl

 4. configure the disk on all Ganeti nodes, in Puppet's
    `profile::ganeti::chi` class:

        iscsi::multipath::alias { 'web-chi-03':
          wwid => '36782bcb00063c6a500000d67603f7abf',
        }

 5. confirm that multipath works, it should look something like this":

        root@chi-node-01:~# multipath -ll
        web-chi-03-srv (36782bcb00063c6a500000d67603f7abf) dm-20 DELL,MD32xxi
        size=500G features='5 queue_if_no_path pg_init_retries 50 queue_mode mq' hwhandler='1 rdac' wp=rw
        |-+- policy='round-robin 0' prio=6 status=active
        | |- 11:0:0:4 sdi 8:128 active ready running
        | |- 12:0:0:4 sdj 8:144 active ready running
        | `- 9:0:0:4  sdh 8:112 active ready running
        `-+- policy='round-robin 0' prio=1 status=enabled
          |- 10:0:0:4 sdk 8:160 active ghost running
          |- 7:0:0:4  sdl 8:176 active ghost running
          `- 8:0:0:4  sdm 8:192 active ghost running
        root@chi-node-01:~#

    and the device `/dev/mapper/web-chi-03` should exist.

 5. partition the disk with `parted`:

        parted --script --align optimal /dev/mapper/web-chi-03 \
           mklabel gpt \
           mkpart primary 0% 8MB \
           set 1 bios_grub on \
           mkpart primary 8MB 10008MB \
           mkpart primary 10008MB 18008MB \
           mkpart primary 18008MB 100%

    TODO: this is one step that would be skipped if we have one iSCSI
    volume per partition, obviously.

    TODO: we probably do not need that `bios_grub` partition either.

 6. adopt the disks in Ganeti:

        gnt-instance add \
              -o debootstrap+buster \
              -t drbd --no-wait-for-sync \
              --net 0:ip=pool,network=gnt-chi-01 \
              --no-ip-check \
              --no-name-check \
              --disk 0:adopt=/dev/disk/by-id/dm-name-web-chi-03-part2 \
              --disk 1:adopt=/dev/disk/by-id/dm-name-web-chi-03-part3,name=swap \
              --disk 2:adopt=/dev/disk/by-id/dm-name-web-chi-03-part4
              --backend-parameters memory=8g,vcpus=2 \
              web-chi-03.torproject.org

 7. at this point, the VM probably doesn't boot, because for some
    reason the `gnt-instance-debootstrap` doesn't fire when disks are
    adopted. so you need to reinstall the machine, which involves
    stopping it first:

        gnt-instance shutdown --timeout=0 web-chi-03
        gnt-instance reinstall

From here on, follow the [next steps](#next-steps) above.

TODO: This would ideally be automated by an external storage provider,
see the [storage reference for more information](#storage).

## Modifying an instance

### CPU, memory changes
@@ -1764,9 +1869,22 @@ particular.

See also the [DRBD documentation](howto/drbd).

NOTE: the Cymru PoP has an iSCSI cluster for large filesystem
storage. See the [cymru documentation for
details](howto/new-machine-cymru#ganeti-iscsi-integration) for details.
The Cymru PoP has an iSCSI cluster for large filesystem
storage. Ideally, this would be automated inside Ganeti, some quick
links:

 * [search for iSCSI in the ganeti-devel mailing list](https://www.mail-archive.com/search?l=ganeti-devel@googlegroups.com&q=iscsi&submit.x=0&submit.y=0)
 * in particular a [discussion of integrating SANs into ganeti](https://groups.google.com/forum/m/?_escaped_fragment_=topic/ganeti/P7JU_0YGn9s)
   seems to say "just do it manually" (paraphrasing) and [this
   discussion has an actual implementation](https://groups.google.com/forum/m/?_escaped_fragment_=topic/ganeti/kkXFDgvg2rY), [gnt-storage-eql](https://github.com/atta/gnt-storage-eql)
 * it could be implemented as an [external storage provider](https://github.com/ganeti/ganeti/wiki/External-Storage-Providers), see
   the [documentation](http://docs.ganeti.org/ganeti/2.10/html/design-shared-storage.html)
 * the DSA docs are in two parts: [iscsi](https://dsa.debian.org/howto/iscsi/) and [export-iscsi](https://dsa.debian.org/howto/export-iscsi/)
 * someone made a [Kubernetes provisionner](https://github.com/nmaupu/dell-provisioner) for our hardware which
   could provide sample code

For now, iSCSI volumes are manually created and passed to new virtual
machines. 

## Issues

+16 −22
Original line number Diff line number Diff line
@@ -713,7 +713,7 @@ that document on how to recover from RAID failures and so on.
## Storage servers

To talk to the storage servers, you'll need first to install the
`SMcli` commandline tool, see the [install instructions](#iscsi-cluster-management-tools-setup) for more
`SMcli` commandline tool, see the [install instructions](#san-management-tools-setup) for more
information on that.

In general, commands are in the form of:
@@ -861,7 +861,10 @@ Map that group to a Logical Unit Number (LUN):
    set virtualDisk ["anarcat-test"] logicalUnitNumber=3 hostGroup="gnt-chi";

Important: the LUN needs to be greater than 1, LUNs 0 and 1 are
special.
special. It should be the current highest LUN plus one.

TODO: we should figure out if the LUN can be assigned automatically,
or how to find what the highest LUN currently is.

At this point, the device should show up on hosts in the `hostGroup`,
as multiple `/dev/sdX` (for example, `sdb`, `sdc`, ..., `sdg`, if
@@ -1306,7 +1309,7 @@ operating system installed, by Cymru, but that system can be safely
wiped and replaced by the standard install procedure established in
this page.

### iSCSI cluster specifications
### SAN cluster specifications

There are 4 Dell MD3220i iscsi hardware raid units. Each MD3220i has a
MD1220 expansion unit attached for a total of 48 900GB disks per unit
@@ -1359,7 +1362,7 @@ at Amazon](https://www.amazon.com/Seagate-Savvio-2-5-Inch-Internal-ST9900805SS/d
delivery" has a cost... And it's actually fairly hard to find those
old drives in other sites, so we probably pay a premium there as well.

### iSCSI cluster management tools setup
### SAN management tools setup

The access the iSCSI servers, you need to setup the (proprietary)
SMCli utilities from Dell. First, you need to extract the software
@@ -1452,6 +1455,7 @@ Then the device is available as a unique device in:
   distinct network switches (or at least VLANs)

[SAN]: https://en.wikipedia.org/wiki/Storage_area_network

## Network topoloy

The network at Cymru is split into different VLANs:
@@ -1540,12 +1544,11 @@ See also [howto/raid](howto/raid).

### Storage

We need to figure out how to use the iSCSI cluster, which provides
172TiB of storage over iSCSI in the management network. Debian used
this in the past with ganeti, but that involves creating, resizing,
and destroying volumes by hand before/after creating, and destroying
VMs. While that is not ideal, it will certainly be a first step in
getting this infrastructure used.
The iSCSI cluster provides roughly 172TiB of storage in the management
network, at least in theory. Debian used this in the past with ganeti,
but that involves creating, resizing, and destroying volumes by hand
before/after creating, and destroying VMs. While that is not ideal, it
is the first step in getting this infrastructure used.

We also use the "normal" DRBD setup with the local SAS disks available
on the servers. This is used for the primary disks for Ganeti
@@ -1627,21 +1630,12 @@ blacklist {
```

It seems that configuration is actually optional: multipath will still
work fine without it, so it's not deployed elsewhere.
work fine without it, so it's not deployed consistently across nodes
at the moment.

### Ganeti iSCSI integration

Ideally, this would be automated inside Ganeti, some quick links:

 * [search for iSCSI in the ganeti-devel mailing list](https://www.mail-archive.com/search?l=ganeti-devel@googlegroups.com&q=iscsi&submit.x=0&submit.y=0)
 * in particular a [discussion of integrating SANs into ganeti](https://groups.google.com/forum/m/?_escaped_fragment_=topic/ganeti/P7JU_0YGn9s)
   seems to say "just do it manually" (paraphrasing) and [this
   discussion has an actual implementation](https://groups.google.com/forum/m/?_escaped_fragment_=topic/ganeti/kkXFDgvg2rY), [gnt-storage-eql](https://github.com/atta/gnt-storage-eql)
 * it could be implemented as an [external storage provider](https://github.com/ganeti/ganeti/wiki/External-Storage-Providers), see
   the [documentation](http://docs.ganeti.org/ganeti/2.10/html/design-shared-storage.html)
 * the DSA docs are in two parts: [iscsi](https://dsa.debian.org/howto/iscsi/) and [export-iscsi](https://dsa.debian.org/howto/export-iscsi/)
 * someone made a [Kubernetes provisionner](https://github.com/nmaupu/dell-provisioner) for our hardware which
   could provide sample code
See [Ganeti storage reference](howto/ganeti#Storage) and [Ganeti iSCSI integration](howto/ganeti#iscsi-integration).

### Private network access considerations