Moving Proxmox cluster to its final home

This post is about physically moving my HP EliteDesk 800 G2 mini “live” Proxmox cluster from it’s “temporary” home under my desk, where the cluster has been since I bought the systems in March 2023, to its final home on my infrastructure shelves.

New hardware

Our UPS went “pop” during the very hot weather recently, so I also installed it’s replacement at the same time.

During the move (since I needed to unplug and move each system anyway), I also installed new 128GB NVMe drives as well as plugging in a new 10-port (8 x 2.5GbE RJ45 plus 2 x 10GbE SFP ports) 2.5GbE unmanaged switch and USB3 to 2.5GbE adaptors for each of the Proxmox hosts in my cluster. The new disks will eventually become the new OS disk, the existing 240GB SATA SSDs will be fully utilised for ceph, and the new network hardware will be a private network for Ceph OSDs.

The new UPS has 2 things connected to it: my APC PDU and a 4-way extension strip. The 4-way has our fibre broadband ONT, my USB disk enclosure and my new 2.5GbE switch connected, as all have moulded “UK” plugs so I cannot easily connect them to the PDU (which is fully populated anyway).

Final APC PDU setup

The 8 ports of my AP7920 PDU are connected as follows:

1GbE (core) PoE ethernet switch
NAS
Router
proxmox01
proxmox02
proxmox03
proxmox04
proxmox05

Configuring the PDU

I reconfigured ports 1 & 3 to auto power on after power failure, ports 2 and 4-8 to remain off for manual power on (or, hopefully, scripted via an Ansible playbook - but that’s a future goal):

I connected to the PDU via serial console (since the switch and router are both off at this point).
In 3- System then 2 - Identification, I set the Name, Contact and Location as I felt suited me (Office PDU, Laurence Hurst and Office @ Home respectively).
Then I went into 1- Device Manager, 2- Outlet Management and 1- Outlet Control/Configuration. Here, I renamed each outlet to reflect its new use and set them to either power on never (-1) or after a delay (0, i.e. instantly, for the switch and 60 for the router) via 2- Power On Delay(sec).

Final switch setup

(Aside: bind-key -n Bspace send-keys C-h is a handy command for tmux (press CTRL-B then : to enter it), when using telnet to access a switch, to tell tmux to sent CTRL-H when backspace is pressed.)

I logged into my core network switch and reconfigured the ports with the new connections to the Proxmox cluster (which was via temporary switch before), as well as the PDU port (which was also on the temporary switch before). I also de-configured the ports that were in use by my HP Microserver before (as I removed the defunct network and power cabling for this in the process of moving the hardware around) and audited that all ports descriptions matched their current purpose. In line with good security practices, I disabled the ports that had nothing connected (after configuring them onto my general-access VLAN, to be doubly sure that they wouldn’t come up on anything sensitive if re-enabled).

The full list of changes I made were:

Ports 25 (on the management network - was HP Microserver iLO port) - reconfigure onto main network and disable:

  core-switch(config)#interface gigabitEthernet 1/0/25
  core-switch(config-if)#switchport general allowed vlan 20 untagged
  core-switch(config-if)#switchport pvid 20
  core-switch(config-if)#no switchport general allowed vlan 10
  core-switch(config-if)#no description
  core-switch(config-if)#shutdown
  core-switch(config-if)#exit

Ports 17-20, removed LAG 1 (was HP Microserver), reconfigure & shutdown HP Microserver ports, and renumber LAG 2 to LAG 1 (is NAS):

  core-switch(config)#interface port-channel 2
  core-switch(config-if)#no description
  core-switch(config-if)#exit
  core-switch(config)#interface range gigabitEthernet 1/0/17-18
  core-switch(config-if-range)#no channel-group
  core-switch(config-if-range)#no description
  core-switch(config-if-range)#switchport general allowed vlan 20 untagged
  core-switch(config-if-range)#switchport pvid 20
  core-switch(config-if-range)#no switchport general allowed vlan 1
  core-switch(config-if-range)#shutdown
  core-switch(config-if-range)#exit
  core-switch(config)#interface range gigabitEthernet 1/0/19-20
  core-switch(config-if-range)#no channel-group
  core-switch(config-if-range)#channel-group 1 mode active
  core-switch(config-if-range)#description lag-1
  core-switch(config-if-range)#exit
  core-switch(config)#no interface port-channel 2
  core-switch(config)#interface port-channel 1
  core-switch(config-if)#description readynas-214
  core-switch(config-if)#exit

Ports 5-9,11-12,14-16,21-24,26-30 & 40 (except port 40 none had descriptions, 40 just noted it was reserved for the 2nd port in another room, and all were general ports in the main access VLAN, so other than shutting them down there was nothing to do) - disabled (port 10 is being used temporarily for setting up my mini router and is on a “Development” VLAN):
```
  core-switch(config)#interface range gigabitEthernet 1/0/5-9,1/0/11-12,1/0/14-16,1/0/21-24,1/0/26-30,1/0/40
  core-switch(config-if-range)#shutdown
  core-switch(config-if-range)#exit
```

Port 45 was on the default VLAN (1) and no other (at a guess, it has never been configured at all) so set it to the access vlan and shutdown:

  core-switch(config)#interface gigabitEthernet 1/0/45
  core-switch(config-if)#switchport general allowed vlan 20 untagged
  core-switch(config-if)#switchport pvid 20
  core-switch(config-if)#no switchport general allowed vlan 1
  core-switch(config-if)#shutdown
  core-switch(config-if)#exit

Shutdown ports 49-52 (SFP ports) but left those on the default VLAN (VLAN 1):

  core-switch(config)#interface range gigabitEthernet 1/0/49-52
  core-switch(config-if-range)#shutdown
  core-switch(config-if-range)#exit

Corrected descriptions on all ports that were incorrect, removed descriptions from unused ports (except port 40, which kept the description reserved-lobby-2 despite being unused).

Port 48 - configured for APC PDU:

  core-switch(config)#interface gigabitEthernet 1/0/48
  core-switch(config-if)#switchport general allowed vlan 10 untagged
  core-switch(config-if)#switchport pvid 10
  core-switch(config-if)#no switchport general allowed vlan 20
  core-switch(config-if)#description apc-pdu
  core-switch(config-if)#exit

Ports 31-35 - configured for Proxmox hosts:

  core-switch(config)#interface range gigabitEthernet 1/0/31-35
  core-switch(config-if-range)#switchport general allowed vlan 10 untagged
  core-switch(config-if-range)#switchport general allowed vlan 11,20,50,51 tagged
  core-switch(config-if-range)#switchport pvid 10
  core-switch(config-if-range)#exit
  core-switch(config)#interface gigabitEthernet 1/0/31
  core-switch(config-if)#description proxmox01
  core-switch(config-if)#exit
  core-switch(config)#interface gigabitEthernet 1/0/32
  core-switch(config-if)#description proxmox02
  core-switch(config-if)#exit
  core-switch(config)#interface gigabitEthernet 1/0/33
  core-switch(config-if)#description proxmox03
  core-switch(config-if)#exit
  core-switch(config)#interface gigabitEthernet 1/0/34
  core-switch(config-if)#description proxmox04
  core-switch(config-if)#exit
  core-switch(config)#interface gigabitEthernet 1/0/35
  core-switch(config-if)#description proxmox05
  core-switch(config-if)#exit

Port 47 (was connected to the temporary switch and so configured as a trunk for all the VLANs Proxmox needed) de-configured and shutdown:

  core-switch(config)#interface gigabitEthernet 1/0/47
  core-switch(config-if)#switchport acceptable frame all
  core-switch(config-if)#switchport general allowed vlan 20 untagged
  core-switch(config-if)#switchport pvid 20
  core-switch(config-if)#no switchport general allowed vlan 1,10,11,50,51
  core-switch(config-if)#shutdown
  core-switch(config-if)#exit

Finally, saved the new configuration (from enable mode):

copy running-config startup-config
copy running-config backup-config

UPS

As I was swapping one APC BX series UPS for another, the only change I made was to the name and description in /etc/nut/ups.conf - since I named the UPS for its model (which, with hindsight, was an error) I changed it to simply mainups to avoid having to reconfigure all clients (including /etc/nut/upsmon.conf for nut-monitor on the same host) in the future. I originally tried main_ups, which nut liked but my ReadyNas 214 would not allow me to set it up as a remote UPS with that name.

I experienced communications problems with the new UPS, despite being the same family (BXxxxx) as my old one - searching online suggests this is a common problem with any manufactured from late-2023 or 2024 onwards. Based on a GitHub issues thread I changed the global pollfrequency to 10 seconds and set the UPS to pollonly (so the driver will ignore interrupts from the UPS) to see if that helped. If it does not solve the problem, I will try the other suggestion of running the latest release of nut in a container. (Update 2025-09-14: it has been fine with 10s poll frequency and poll only settings.)

I continued the exercise with setting up new 2.5Gbps networking for the cluster.