Building live Proxmox Virtual Environment cluster
Now I have automated installing Debian I can finally setup the HP EliteDesk 800 G2 systems I bought back in March last year. The systems came with 8GB of RAM fitted but I upgraded them all to 32GB as I knew I wanted to run a number of virtual machines on each. These systems have 4 core/4 thread i5-6500T 2.5GHz processors.
This post, started in January 2024, initially contained some side-tracking into infrastructure changes that were needed to support the process but in April I separated these out into supporting posts: turning remote unlocking playbook into a role, caching mirror proxy for the secure network, migrating DHCP from SaltStack to Ansible and managing hosts files with Ansible.
System firmware settings
I updated the firmware when I got he machines in March 2023 but when I checked one I found that the Virtualization Technology (VTx)
and Virtualization Technology for Directed I/O (VTd)
checkboxes were unticked so it seems I did not go through the settings on them all. I decided to reset each to defaults, then change the same settings so I know they are consistently configured.
The settings (from default) I changed are:
- Advanced
- Boot Options
- After Power Loss → Power On
- Secure Boot Configuration
- Configure Legacy Support and Secure Boot → Legacy Support Disable and Secure Boot Disable
- System Options
- Virtualization Technology (VTx) → ticked
- Virtualization Technology for Directed I/O (VTd) → ticked
- Boot Options
Network configuration
The systems are temporarily attached to an 8-port managed TP-Link switch under my desk and plugged into one of my APC remote switched PDU.
I initially configured the network ports they are plugged into onto VLAN 50, to match my target network design however in the course of working on setting up the systems it became apparent to me that it was more appropriate to configure the host systems into the management VLAN. They, as hosts for the VMs including providing all configuration for network access and virtual switches, are both as low-level and sensitive as the physical network devices (switches, WAPs etc.). I therefore reconfigured these ports onto VLAN 10.
I configured the PDU to not power up their sockets on start (by setting the power on delay to -1
). I must be able to remotely connect to the PDU to power them up after a power outage, something I need to bear in mind as I migrate services onto this cluster (make sure that connectivity to the PDU does not become dependent on something running on the hosts controlled by the PDU).
Finally, I added entries for the hosts to my inventory and created the proxmox_virtual_environment_hosts
group:
internal:
hosts:
#...
pve01:
interfaces:
- mac_address: 11:22:33:44:55:01
pve02:
interfaces:
- mac_address: 11:22:33:44:55:02
pve03:
interfaces:
- mac_address: 11:22:33:44:55:03
pve04:
interfaces:
- mac_address: 11:22:33:44:55:04
pve05:
interfaces:
- mac_address: 11:22:33:44:55:05
#...
proxmox_virtual_environment_hosts:
hosts:
pve[01:05]:
and added static IP assignments to the domain group level networks variable:
192.168.10.51: pve01
192.168.10.52: pve02
192.168.10.53: pve03
192.168.10.54: pve04
192.168.10.55: pve05
Installing Proxmox Virtual Environment
As I want to control the partitioning and setting up Ceph, as well as automate the install (which is not possible with Proxmox’s own installer, I am following the installing Proxmox VE on Debian instructions.
Enabling mirror access
Firstly, I added the PVE repositories to my list of mirror_proxies
in the mirror_servers
group_vars file:
mirror_proxies
#...
- name: proxmox-no-subscription
upstream: http://download.proxmox.com/
description: Proxmox no-subscription repositories
And to the list of local mirrors in my domain level group_vars
:
local_mirror:
#...
proxmox-no-subscription:
uri: http://mirror/proxmox-no-subscription
Preparing the role - package repositories
I created a new role, proxmox-virtual-environment
to setup Proxmox Virtual Environment. I used the existing apt-source
role as a dependency (in the role’s meta/main.yaml
) in order to configure the repositories on the hosts:
---
dependencies:
- role: apt-source
vars:
name: proxmox-virtual-environment
uri: "{{ local_mirror['proxmox-no-subscription'].uri | default('http://download.proxmox.com') }}/debian/pve"
gpg_key:
url: "{{ local_mirror['proxmox-no-subscription'].uri | default('http://download.proxmox.com') }}/debian/proxmox-release-{{ ansible_facts['distribution_release'] }}.gpg"
suite: "{{ ansible_facts['distribution_release'] }}"
components:
- pve-no-subscription
src:
no_src: yes
when: ansible_facts['os_family'] == 'Debian'
- role: apt-source
vars:
name: proxmox-ceph.reef
uri: "{{ local_mirror['proxmox-no-subscription'].uri | default('http://download.proxmox.com') }}/debian/ceph-reef"
gpg_key:
url: "{{ local_mirror['proxmox-no-subscription'].uri | default('http://download.proxmox.com') }/debian/proxmox-release-{{ ansible_facts['distribution_release'] }}.gpg"
suite: "{{ ansible_facts['distribution_release'] }}"
components:
- no-subscription
src:
no_src: yes
when: ansible_facts['os_family'] == 'Debian'
...
Installing the Proxmox VE kernel
Firstly, handlers are flushed to ensure that package caches have been updated if needed, then the kernel package is installed:
# Run any pending handlers, to make sure caches are updated if any
# repositories have been changed/added.
- name: Flush handlers
meta: flush_handlers
# Ensure things are enforced in the same order as on the wiki:
# https://pve.proxmox.com/wiki/Install_Proxmox_VE_on_Debian_12_Bookworm
# 1. Package repositories are handled in this role's dependencies
# 2. Install the Proxmox VE Kernel
- name: Proxmox kernel is installed
become: true
ansible.builtin.package:
name: proxmox-default-kernel
register: kernel_updated
Once installed, the host needs to be rebooted if the kernel is updated. According to the detailed instructions on the wiki it must be rebooted into before progressing “as some packages depend on specific kernel compile flags to be set or feature extensions (e.g., for apparmor) to be available”.
To work with my remote disk encryption unlocking, this uses a shell command to issue the reboot (as ansible.builtin.reboot
will wait for the host to come up, which won’t happen until the disk is unlocked). It uses the unlock role I already created, which will wait for the host to be up:
# XXX this is very site specific... (maybe we need to take a reboot
# method/role that is an argument, defaulting to
# ansible.builtin.reboot?)
- name: Host is rebooted if kernel is updated
become: true
# Ansible's reboot command waits, and checks, for the host to
# come back, which will never happen. Even with async (see below)
# there is a race condition if the ssh connection gets closed (by
# the shutdown process) before Ansible has disconnected so it is
# necessary to delay the shutdown command by longer than the
# async value, in order to avoid this problem.
ansible.builtin.shell: 'sleep 2 && /usr/bin/systemctl reboot --message="Ansible triggered reboot for Proxmox Kernel install."'
# Run in background, waiting 1 second before closing connection
async: 1
# Launch in fire-and-forget mode - with a poll of 0 Ansible skips
# polling entirely and moves to the next task, which is precisely
# what we need.
poll: 0
when: kernel_updated.changed
- name: Unlock remote host
# Uses:
# unlock_crypt_port from `all` group_vars
# unlock_crypt_key from `all` group_vars
ansible.builtin.include_role:
name: unlock-crypt
when: kernel_updated.changed
Installing the Proxmox VE packages
This is standard package installation fare:
# 3. Install the Proxmox VE packages
- name: Proxmox VE packages are installed
become: true
ansible.builtin.package:
name:
# XXX This is a lie - NTP not yet setup. Should sort that.
# N.B. postfix (MTA) and chrony (NTP) are setup elsewhere
- proxmox-ve
- open-iscsi
- name: Proxmox VE provided (unmanaged by Ansible) repositories are removed
become: true
ansible.builtin.file:
path: /etc/apt/sources.list.d/pve-enterprise.list
state: absent
Remove the debian stock kernel
Again, very straight forward package management:
# 4. Remove the Debian Kernel
- name: Debian stock kernels are removed
become: true
ansible.builtin.package:
name:
# The wiki page says remove 'linux-image-amd64' and
# 'linux-image-6.1*', however as the proxmox kernel package
# is called 'proxmox-kernel...', this seems safe and more
# robust than relying on the kernel always being '6.1...'.
- linux-image-*
state: absent
Remove the os-prober package (optional but recommended)
More package management:
# 5. Recommended: Remove the os-prober Package
- name: os-prober package is removed
become: true
ansible.builtin.package:
name: os-prober
state: absent
Post install configuration
Configure the firewall
Using my existing firewalld role, I added the service then allowed it:
# Post install configuration
# Configure firewall
- name: firewalld is configured
block:
- ansible.builtin.include_role:
name: firewalld
tasks_from: add_service
vars:
name: proxmox-ve
ports:
- 8006/tcp
description: Proxmox Virtual Environment
- name: Handlers have flushed, so proxmox-ve firewall service becomes accessible if firewall has changed
ansible.builtin.meta: flush_handlers
# XXX Want to make this better (properly zoned, not just opened to everything via default zone)
- name: Proxmox-ve service is allowed
become: yes
ansible.posix.firewalld:
service: proxmox-ve
permanent: true
state: enabled
notify: reload firewalld
- name: Handlers have flushed, so Proxmox VE becomes accessible if firewall has changed
ansible.builtin.meta: flush_handlers
Applying the role
Finally, I added it to the play targeting proxmox-virtual-environment-hosts
in site.yaml
:
# Now do the actual install
- name: Proxmox virtual environment is installed and configured
ansible.builtin.include_role:
name: proxmox-virtual-environment
This got me 5 systems each successfully running Proxmox Virtual Environment standalone, I decided that clustering should be a new post (i.e. “to be continued…”) since I’ve been working on this post exactly 3 months now (started it on 19th January 2024!).