As part of working on my new home lab I moved my old core switch, a TP-Link T1600G-28PS to replace the aged Cisco switches that I was using temporarily. Now I have the same make and generation of switch in the lab as my main network, I can start replicating my core network in the lab for testing and development. I am not sure I have ever bootstrapped a network (bearing in mind this lab is air-gapped) before - I have always started with some sort of router or other existing infrastructure (i.e. some form of DHCP server and DNS) I have migrated away from.

In some ways this links in with my recent musings on configuration management, bootstrapping a blank canvas is where Ansible comes into its own as it requires no infrastructure to begin configuring systems (c.f. Salt, Puppet et al. which generally need a server setup). I had a long think about how best to do this bootstrap, considering options such as manually configured DHCP and DNS services again then set things up, replicate my existing configuration management setup and use it to redeploy from scratch or restore from the live-environment’s backup. Of these, I found the last two most appealing as most of the setup is done via configuration management, so theoretically should be reproducible directly by running the configuration management tool (although I anticipate some unexpected external (i.e. internet) dependencies will crop-up), and doing a full restore of the environment from backup (using the “offsite” disk) allows a practical test of the DR process. I decided to do the latter (restore from backup) as I have never attempted a full DR process (and, as the adage goes, “an untested backup may be as good as no backup”).

I left my 3 Proxmox nodes turned off while I did this, so was working up on previously unconfigured machines. My main reason for doing this was to avoid upsetting Ceph and any risk to the test infrastructure I had been using to date.

Initial setup

First thing I did was to reset the old core network switch to factory defaults. I did this for 3 reasons:

  1. I cannot remember precisely how each port is configured (and since the switch is configured to DHCP on the management network.
  2. There is currently no DHCP server in the network, this makes getting onto it to check challenging) and the lab hardware is not plugged into the same ports as the corresponding kit on the live network.
  3. In a full DR situation, I am unlikely to start with a configured switch.

Extending the network

The next thing I did was, using existing cabling, extend the lab network to my desk by re-patching to physically isolate one of the ports from the room the lab kit is in and run (bright red) cables from those ports to my desk/the lab switch (at either end). I used red cables, bought for the purpose, because I did not already own any red cables so they are distinct from any other network cabling (in addition to being labelled as the lab network/link to office).

Configuring a system for doing the restore

I started by trying to do the restore from what will be my “desktop” machine in the lab network, on my desk, however I found problems getting it configured - I identified the very old BIOS was missing some options and set about trying to update it. As you will see from the linked blog post, this became quite an ordeal and I lost a literal 2 weeks (it is now the 3rd May 2022 as I resume writing here) to getting this working.

Once I had the BIOS updated, I was able to get Debian 10, the current “old stable” release as my backup server is yet to be upgraded to Debian 11, booted on it. My plan is to mount on of the “off-site” (DR) backup copies read-only on the host, use a writeable overlay layer to expose the backup to BackupPC and allow recovery from the backup without risking alteration of it.

Boot live USB

I downloaded the Debian 10.12.0 Gnome live cd image and used Rufus to copy it to a USB drive.

I then booted the machine from it without any issues.

Getting BackupPC running

I downloaded the required packages not already installed in the Live CD environment onto a USB flash drive (by downloading the individual package files from https://packages.debina.org/):

  • backuppc
  • apache2
    • apache2-data
  • apache2-utils
  • dma (for mail-transport-agent)
  • libarchive-zip-perldpkg
  • libcgi-pm-perl
  • libio-compress-perl
    • libcompress-raw-bzip2-perl
    • libcompress-raw-zlib-perl
  • libsocket6-perl
  • libtime-parsedate-perl
  • samba-common-bin
    • python-samba
      • python-crypto
      • python-ldb
      • python-tdb
  • smbclient
    • samba-common

In total, these came to approximately 5MB of package files once downloaded (so very small by today’s standard).

With the USB key plugged into the booted live image, I installed the packages from the terminal by cding to the directory and running dpkg -i *.deb. I accepted the default options for dma (mailname of localhost, no smarthost) and samba-common (don’t modify smb.conf), and ignored the generated password for the new backuppc install (it will not be required).

Making the disaster recovery backup available

I now have BackupPC running in a live Debian environment - the next step is to get my backups available.

  1. Open the encrypted off-site backup drive readonly:
    1. On initially plugging it in, I was automatically prompted for the encryption passphrase so I cancelled this.
    2. Manually opened it readonly with cryptsetup luksOpen --readonly /dev/disk/by-partlabel/backuppc-offsite? backuppc-pv.
  2. Make a mount point: mkdir /media/backuppc-ro
  3. Mount the backup volume readonly: mount -o ro /dev/mapper/backuppc-pv /media/backuppc-ro
  4. Stop backuppc: systemctl stop backuppc
  5. Find the uid/gid of the backuppc use on the system from which the DR backup was taken (ls -ln /media/backuppc-ro will show you them).
  6. Check for (using ps) and stop any processes running as the user/group with those uid/gid values on the current system. Make a note of processes/daemons stopped.
  7. Update /etc/passwd and /etc/group to make the local backuppc user and group ids match those in the off-site backup. Make sure to note the original uid/gid for backuppc locally (old_bpc_uid/old_bpc_gid) and the old (new_bpc_uid/new_bpc_gid - these are the values for backuppc on the DR media) and new (new_tgt_uid/new_tgt_gid - these are the new values for any existing user/group that clashes with the DR backuppc uid/gid) values for any user changed to free the uid/gid being remapped to.
  8. Use find / -xdev -not -path /media/backuppc-ro \( -gid old_gid -o -gid new_gid -o -uid old_uid -o -uid new_uid \) to locate files that need permissions fixing (if using these commands, you need to do the remapping of the local user that clashes before the remapping of the old backuppc to those clashed values):
    1. User remapped to free backuppc uid (find / -xdev -not -path /media/backuppc-ro -uid new_uid -print0 | xargs -0 chown new_tgt_uid)
    2. Group remapped to free backuppc gid (new_gid - needs remapping to their new gid) (find / -xdev -not -path /media/backuppc-ro -gid new_gid -print0 | xargs -0 chgrp new_tgt_gid)
    3. backuppc user old uid (old_uid - needs remapping to new_uid) (find / -xdev -not -path /media/backuppc-ro -uid old_uid -print0 | xargs -0 chown new_uid)
    4. backuppc group old gid (old_gid - needs remapping to new_gid) (find / -xdev -not -path /media/backuppc-ro -gid old_gid -print0 | xargs -0 chgrp new_gid)
  9. Start any processes stopped at step 6 with the new uid/gid.
  10. Restore the BackupPC suid permissions (which chown will have removed for safety): chmod 4750 /usr/lib/backuppc/cgfi-bin/index.cgi
  11. Extract the DR copy’s configuration: tar -xf /media/backuppc-ro/etc-backuppc.tgz -C /etc/backuppc --transform='flags=r;s,^etc/backuppc,.,'
  12. Create a folder to act as the read-write layer: mkdir /tmp/backuppc-rw
  13. Make it writeable to the backuppc user: chown backuppc:backuppc /tmp/backuppc-rw
  14. Make a workdir for OverlayFS: mkdir /tmp/backuppc-overlay-work
  15. Mount the read-only and read-write filesystems in a stack using : mount -t overlay overlay /var/lib/backuppc -o lowerdir=/media/backuppc-ro,upperdir=/tmp/backuppc-rw,workdir=/tmp/backuppc-overlay-work,index=on,metacopy=on (index=on is required for hardlinks to function correctly, metacopy=on enables only copying the metadata on metadata changes (e.g. chown) rather than the whole file)
  16. Start backuppc again: systemctl start backuppc
  17. Start apache: systemctl start apache2

I did have to modify the following settings to allow for this DR install having a URL below /backuppc/ (on the live system its at / on a virtual host of backuppc.home.entek.org.uk):

$Conf{ServerHost} = `hostname`;
$Conf{CgiURL} = 'http://' . $Conf{ServerHost} . '/backuppc/index.cgi';
$Conf{CgiImageDirURL} = '/backuppc/image';

Once I had the DR backup available, I proceeded to restore the router into the home-lab network to provide DNS and DHCP in the same way as my home network.