Moving Proxmox cluster to its final home
This post is about physically moving my HP EliteDesk 800 G2 mini “live” Proxmox cluster from it’s “temporary” home under my desk, where the cluster has been since I bought the systems in March 2023, to its final home on my infrastructure shelves.
New hardware
Our UPS went “pop” during the very hot weather recently, so I also installed it’s replacement at the same time.
During the move (since I needed to unplug and move each system anyway), I also installed new 128GB NVMe drives as well as plugging in a new 10-port (8 x 2.5GbE RJ45 plus 2 x 10GbE SFP ports) 2.5GbE unmanaged switch and USB3 to 2.5GbE adaptors for each of the Proxmox hosts in my cluster. The new disks will eventually become the new OS disk, the existing 240GB SATA SSDs will be fully utilised for ceph, and the new network hardware will be a private network for Ceph OSDs.
The new UPS has 2 things connected to it: my APC PDU and a 4-way extension strip. The 4-way has our fibre broadband ONT, my USB disk enclosure and my new 2.5GbE switch connected, as all have moulded “UK” plugs so I cannot easily connect them to the PDU (which is fully populated anyway).
Final APC PDU setup
The 8 ports of my AP7920 PDU are connected as follows:
- 1GbE (core) PoE ethernet switch
- NAS
- Router
- proxmox01
- proxmox02
- proxmox03
- proxmox04
- proxmox05
Configuring the PDU
I reconfigured ports 1 & 3 to auto power on after power failure, ports 2 and 4-8 to remain off for manual power on (or, hopefully, scripted via an Ansible playbook - but that’s a future goal):
- I connected to the PDU via serial console (since the switch and router are both off at this point).
- In
3- System
then2 - Identification
, I set theName
,Contact
andLocation
as I felt suited me (Office PDU
,Laurence Hurst
andOffice @ Home
respectively). - Then I went into
1- Device Manager
,2- Outlet Management
and1- Outlet Control/Configuration
. Here, I renamed each outlet to reflect its new use and set them to either power on never (-1
) or after a delay (0
, i.e. instantly, for the switch and60
for the router) via2- Power On Delay(sec)
.
Final switch setup
(Aside: bind-key -n Bspace send-keys C-h
is a handy command for tmux (press CTRL-B
then :
to enter it), when using telnet to access a switch, to tell tmux to sent CTRL-H
when backspace is pressed.)
I logged into my core network switch and reconfigured the ports with the new connections to the Proxmox cluster (which was via temporary switch before), as well as the PDU port (which was also on the temporary switch before). I also de-configured the ports that were in use by my HP Microserver before (as I removed the defunct network and power cabling for this in the process of moving the hardware around) and audited that all ports descriptions matched their current purpose. In line with good security practices, I disabled the ports that had nothing connected (after configuring them onto my general-access VLAN, to be doubly sure that they wouldn’t come up on anything sensitive if re-enabled).
The full list of changes I made were:
-
Ports 25 (on the management network - was HP Microserver iLO port) - reconfigure onto main network and disable:
core-switch(config)#interface gigabitEthernet 1/0/25 core-switch(config-if)#switchport general allowed vlan 20 untagged core-switch(config-if)#switchport pvid 20 core-switch(config-if)#no switchport general allowed vlan 10 core-switch(config-if)#no description core-switch(config-if)#shutdown core-switch(config-if)#exit
-
Ports 17-20, removed LAG 1 (was HP Microserver), reconfigure & shutdown HP Microserver ports, and renumber LAG 2 to LAG 1 (is NAS):
core-switch(config)#interface port-channel 2 core-switch(config-if)#no description core-switch(config-if)#exit core-switch(config)#interface range gigabitEthernet 1/0/17-18 core-switch(config-if-range)#no channel-group core-switch(config-if-range)#no description core-switch(config-if-range)#switchport general allowed vlan 20 untagged core-switch(config-if-range)#switchport pvid 20 core-switch(config-if-range)#no switchport general allowed vlan 1 core-switch(config-if-range)#shutdown core-switch(config-if-range)#exit core-switch(config)#interface range gigabitEthernet 1/0/19-20 core-switch(config-if-range)#no channel-group core-switch(config-if-range)#channel-group 1 mode active core-switch(config-if-range)#description lag-1 core-switch(config-if-range)#exit core-switch(config)#no interface port-channel 2 core-switch(config)#interface port-channel 1 core-switch(config-if)#description readynas-214 core-switch(config-if)#exit
-
Ports 5-9,11-12,14-16,21-24,26-30 & 40 (except port 40 none had descriptions, 40 just noted it was reserved for the 2nd port in another room, and all were general ports in the main access VLAN, so other than shutting them down there was nothing to do) - disabled (port 10 is being used temporarily for setting up my mini router and is on a “Development” VLAN):
core-switch(config)#interface range gigabitEthernet 1/0/5-9,1/0/11-12,1/0/14-16,1/0/21-24,1/0/26-30,1/0/40 core-switch(config-if-range)#shutdown core-switch(config-if-range)#exit
-
Port 45 was on the default VLAN (1) and no other (at a guess, it has never been configured at all) so set it to the access vlan and shutdown:
core-switch(config)#interface gigabitEthernet 1/0/45 core-switch(config-if)#switchport general allowed vlan 20 untagged core-switch(config-if)#switchport pvid 20 core-switch(config-if)#no switchport general allowed vlan 1 core-switch(config-if)#shutdown core-switch(config-if)#exit
-
Shutdown ports 49-52 (SFP ports) but left those on the default VLAN (VLAN 1):
core-switch(config)#interface range gigabitEthernet 1/0/49-52 core-switch(config-if-range)#shutdown core-switch(config-if-range)#exit
-
Corrected descriptions on all ports that were incorrect, removed descriptions from unused ports (except port 40, which kept the description
reserved-lobby-2
despite being unused). -
Port 48 - configured for APC PDU:
core-switch(config)#interface gigabitEthernet 1/0/48 core-switch(config-if)#switchport general allowed vlan 10 untagged core-switch(config-if)#switchport pvid 10 core-switch(config-if)#no switchport general allowed vlan 20 core-switch(config-if)#description apc-pdu core-switch(config-if)#exit
-
Ports 31-35 - configured for Proxmox hosts:
core-switch(config)#interface range gigabitEthernet 1/0/31-35 core-switch(config-if-range)#switchport general allowed vlan 10 untagged core-switch(config-if-range)#switchport general allowed vlan 11,20,50,51 tagged core-switch(config-if-range)#switchport pvid 10 core-switch(config-if-range)#exit core-switch(config)#interface gigabitEthernet 1/0/31 core-switch(config-if)#description proxmox01 core-switch(config-if)#exit core-switch(config)#interface gigabitEthernet 1/0/32 core-switch(config-if)#description proxmox02 core-switch(config-if)#exit core-switch(config)#interface gigabitEthernet 1/0/33 core-switch(config-if)#description proxmox03 core-switch(config-if)#exit core-switch(config)#interface gigabitEthernet 1/0/34 core-switch(config-if)#description proxmox04 core-switch(config-if)#exit core-switch(config)#interface gigabitEthernet 1/0/35 core-switch(config-if)#description proxmox05 core-switch(config-if)#exit
-
Port 47 (was connected to the temporary switch and so configured as a trunk for all the VLANs Proxmox needed) de-configured and shutdown:
core-switch(config)#interface gigabitEthernet 1/0/47 core-switch(config-if)#switchport acceptable frame all core-switch(config-if)#switchport general allowed vlan 20 untagged core-switch(config-if)#switchport pvid 20 core-switch(config-if)#no switchport general allowed vlan 1,10,11,50,51 core-switch(config-if)#shutdown core-switch(config-if)#exit
Finally, saved the new configuration (from enable
mode):
copy running-config startup-config
copy running-config backup-config
UPS
As I was swapping one APC BX series UPS for another, the only change I made was to the name and description in /etc/nut/ups.conf
- since I named the UPS for its model (which, with hindsight, was an error) I changed it to simply mainups
to avoid having to reconfigure all clients (including /etc/nut/upsmon.conf
for nut-monitor
on the same host) in the future. I originally tried main_ups
, which nut liked but my ReadyNas 214 would not allow me to set it up as a remote UPS with that name.
I experienced communications problems with the new UPS, despite being the same family (BXxxxx
) as my old one - searching online suggests this is a common problem with any manufactured from late-2023 or 2024 onwards. Based on a GitHub issues thread I changed the global pollfrequency
to 10 seconds and set the UPS to pollonly
(so the driver will ignore interrupts from the UPS) to see if that helped. If it does not solve the problem, I will try the other suggestion of running the latest release of nut in a container. (Update 2025-09-14: it has been fine with 10s poll frequency and poll only settings.)
I continued the exercise with setting up new 2.5Gbps networking for the cluster.