Following on from my NAS disaster recovery post, I decided to fork out for some new hard disks and move the volume I backup to from the NAS to being internal to the server doing the backups. I had been considering this for performance reasons but as the NAS is also backed up, it would have aided recovery to not have to fetch the off-site backups. The flip-side of this option is that it means if the server fails I lose access to the backups - so either way I have a single point of failure.

Installing the new disks

Removing one of the old disks

Currently the server has 3 disks in linux software raid 1, one spinning 500GB hard disk and two solid-state 500GB hard disks. Yes, unbalanced performance is generally a bad idea in RAID however I consciously decided to go with redundancy over performance characteristics when I built it. As I have 4 bays in the server and 3 are occupied, I need to lose a disk from the OS’s RAID setup (obviously the spinning disk is my candidate of choice) so I can install 2 new large (8TB, as they seemed most cost effective) spinning disks (also in RAID 1) for the backup volume (and, given the size, I will probably also add some local mirrors on there).

To do this, I need to identify which device is which disk - this is easiest done with lsblk. ROTA or “rotational device” is set to 1 for spinning disks, even if you cannot identify it from the model number. I output the serial number to allow me to double check them against the physical disk when removing them (just in case there was a 4th disk that wasn’t showing up in lsblk):

lsblk -o NAME,ROTA,VENDOR,MODEL,SERIAL

I also need the names of the raid devices, which I find from the venerable /proc/mdstat:

cat /proc/mdstat

On my system, the spinning disk turned out to be /dev/sda and I have 2 raid devices (one for /boot and one for LVM) - each using a partition on that disk. Removing a disk from each raid volume is a two step process - first the disk must be “failed”, to tell the RAID system to stop using it, then it can be removed from the array:

# Fail each partition in each raid volume
$ sudo mdadm /dev/md126 --fail /dev/sda1
mdadm: set /dev/sda1 faulty in /dev/md126
$ sudo mdadm /dev/md127 --fail /dev/sda2
mdadm: set /dev/sda1 faulty in /dev/md126
# Remove the disk from the raid volume
$ sudo mdadm /dev/md126 --remove /dev/sda1
mdadm: hot removed /dev/sda1 from /dev/md126
$ sudo mdadm /dev/md127 --remove /dev/sda2
mdadm: hot removed /dev/sda2 from /dev/md127

The advice online is to remove the RAID signature so that the disk will not be automatically re-added to the array, wipefs will do this:

$ sudo wipefs -a /dev/sda1
/dev/sda1: 4 bytes were erased at offset 0x00001000 (linux_raid_member): fc 4e 2b a9
/dev/sda1: 4 bytes were erased at offset 0x00000000 (xfs): 58 46 53 42
$ sudo wipefs -a /dev/sda2
/dev/sda2: 4 bytes were erased at offset 0x00001000 (linux_raid_member): fc 4e 2b a9

Ans finally, resize the RAID array (which is now reporting one disk of an expected three is missing):

$ sudo mdadm --grow /dev/md126 --raid-devices=2
raid_disks for /dev/md126 set to 2
$ sudo mdadm --grow /dev/md127 --raid-devices=2
raid_disks for /dev/md127 set to 2

I then shutdown the system and physically removed the spinning disk.

Setting up the new disks

Once physically installed, I booted the system. I thought about the best way to layer LUKS, LVM and RAID. I decided to create a partition on each disk to be used for RAID 1, this will be passed through to the VM running BackupPC where it will have encryption and within that LVM layered on top.

I used parted to create a GPT partition table then a partition 4.5TB in size on each disk for the backup data. This leaves about 3.5TB unallocated for future use. I called the partition (since parted asks for a name) backup-raid-<number> where <number> is a unique number (e.g. 1 or 2) for the disk.

Using mdadm I created the RAID 1 setup:

mdadm --create /dev/md/backup-raid /dev/disk/by-partlabel/backup-raid-1 /dev/disk/by-partlabel/backup-raid-2 --level=1 --raid-devices=2

Attaching new disks to the VM

This should be possible (i.e. I have not tested this use at own risk) to do this at installation time by adding --disk path=/dev/md/backup-raid,format=raw to the virt-install command.

To add to an existing VM, however, requires editing its XML definition. This can be done by firing up virsh, running the command edit <domain> where <domain> is the name of the VM to edit and adding a disk to the <devices>...</devices> section - for neatness I added it next to the existing disk:

<disk type='block' device='disk'>
  <driver name='qemu' type='raw'/>
  <source dev='/dev/md/backup-raid'/>
  <target dev='vdb' bus='virtio'/>
</disk>

A full shutdown then start of the VM was required for the disk to appear.

Setting up encryption and LVM

Once the disk was visible to the VM, I followed the same process I did when initially setting up encrypted LVM for my backup system. I was able to setup encryption on it:

cryptsetup luksFormat /dev/vdb

And then open it to be able to use it for LVM:

cryptsetup luksOpen /dev/vdb backuppc-pv-new

I could then create the LVM volume group and backuppc volume. The group will be renamed later, but for now needs to not clash with the existing volume. The size is kept less than 4TB for the same reason as before - “only using 4TB (3725GB) of the 4.5TiB volume group, so it will fit on a 4TB external disk (as opposed to 4TiB) and we have space for snapshots”:

pvcreate /dev/mapper/backuppc-pv-new
vgcreate backuppc-new /dev/mapper/backuppc-pv-new
lvcreate -n store -L 3725G backuppc-new

Copying the existing data over

As the volumes are completely identical in size, and we want to clone the existing backuppc I simply decided to use dd to do the copy (status=progress will give progress information as it copies):

dd if=/dev/mapper/backuppc-store of=/dev/mapper/backuppc--new-store bs=4M status=progress

It copied around 6GB in the first minute, so the entire 4.5TB volume should take around 12.5 hours.

Renaming the volume group

Once the copy was complete, I disabled the “old” LVM volume group and disconnected the iscsi device (again, see my original post for directions). Then I renamed the new, local, volume group:

vgrename backuppc-new backuppc

I then closed and reopened the encryption volume with the new name:

vgchange -a n backuppc
cryptsetup luksClose backuppc-pv-new
cryptsetup luksOpen /dev/vdb backuppc-pv

Mount the new volume

Since the LVM volume to be mounted is now named the same as before, no changes (e.g. to /etc/fstab) are required and it can be directly mounted:

mount /var/lib/backuppc

Updating scripts

Since I wrote some [scripts to simplify mounting/unmounting the backup volume(/notes/2022/07/05/removing-configuration-with-saltstack.html#mountumount-scripts), they needed updating to not use iSCSI any more.

This was simply a case of removing the sections that use iscsiadm to --login/--logout of iSCSI and replace references to /dev/disk/by-partlabel/BackupPC with /dev/vdb (the explicitly configured device name in the VM’s definition - so we can trust it not to change unexpectedly). I also dropped -iscsi from the names (so backuppc-mount-iscsi became simply backuppc-mount and likewise backuppc-umount-iscsi became backuppc-umount).

Removing iSCSI

I could now remove the iSCSI software from the VM:

apt purge open-iscsi
apt autoremove --purge # Tidy up dependencies

And finally, delete the iSCSI volume from the NAS (and disable iSCSI, since it was not being used for anything else).