Adding 2.5 GbE network to Proxmox cluster

In my post about physically moving my Proxmox cluster I installed new NVMe disks, for the OS (existing SATA SSDs will be used entirely for Ceph), and 2.5 GbE USB3 network adaptors (attached to a 2.5 GbE switch, which has 8 2.5 GbE ports and 2 10 GbE ports). This post is about setting up the new networking (disks to follow later), now it is installed.

Fixing my Proxmox Ansible playbook

Quite a few of my tasks rely on running Proxmox commands to find the current state. As I wanted to check what configuration would change before applying it, I was running my playbooks in “check mode” - seemingly for the first time as quite a few failed. I went through and added check_mode: false # Always run, even in check mode to all the read-only commands so that I could run the whole playbook and see what would change.

Having a plan

There are a number of things I need to do, so I came up with a rough plan/sequence for the entire (network and disks) process:

Set up the new network
Reconfigure Ceph’s OSDs to use a separate cluster network
Rebuild each node, one-by-one, onto the new disks upgrading to Proxmox 9 in the process:
1. Migrate all VMs away from the node
2. Remove node from the cluster
3. Reinstall the node using Proxmox 9 onto the new disk
4. Configure Ceph on the new node to use the whole of the old disk
5. Re-add the node to the cluster (with cluster network, this time) using both public and cluster network for Corosync resilience with the cluster network (which is higher bandwidth) having a higher priority.

Setting up the new network

As the new network is dedicated to just Ceph and will only have the Proxmox hosts connected, the interfaces will be statically configured.

New Ansible role

I created a new role to manage static interface configurations in /etc/network/interfaces.d on Debian-based systems, which I called network-ifupdown. For now it only does the minimum configuration I need for my Proxmox systems but I intend to expand it in the future to also manage e.g. ppp interfaces which I have configured in this way on my router.

To set what interfaces should be managed by this role, I added managed_by: ifupdown to my inventory in the hosts’ existing interfaces dictionary. For example:

interfaces:
  - mac_address: aa:bb:cc:dd:ee:ff
  - name: pve01-ceph
    mac_address: aa:bb:cc:dd:ee:ff
    method: static
    managed_by: ifupdown

Following my existing approach of specifying IP address allocations as part of network configuration, for ease of visibility and management of addresses, I added a new network to my domain configuration file (my hosts are group by domain, so this is a group_vars file from Ansible’s perspective) for this network:

networks:
  172.16.0.0/28:
    name: ceph-cluster
    ip4_assignments:
      172.16.0.1: pve01-ceph
      172.16.0.2: pve02-ceph
      172.16.0.3: pve03-ceph
      172.16.0.4: pve04-ceph
      172.16.0.5: pve05-ceph

Role’s `argument_specs`

For now, this is just setting a single static IP address on the interface so I only provided support for that, although I did include options for configuring the filename to be managed on a per-interface basis:

---
argument_specs:
  main:
    network_ifupdown_manage_global_interfaces_file:
      description: Whether to replace the global interfaces file (`/etc/network/interfaces`) with a generic one that defines `lo` and includes everything in `/etc/network/interfaces.d` or not
      type: bool
      default: true
    network_ifupdown_interfaces:
      type: list
      elements: dict
      options:
        name:
          description: Interface name
          type: str
          required: true
        method:
          description: Method by which the interface should obtain its IPv4 address.
          type: str
          required: true
          choices:
            - dhcp
            - static
        interfaces_file:
          description: Override default interfaces file to manage (`/etc/network/interfaces.d/`). `interface_name` is the only supported template variable.
          type: str
          default: '/etc/network/interfaces.d/'
        ip4:
          description: IPv4 configuration information (only relevant for static method - note currently on one IP address per interface is supported)
          type: dict
          required: false
          options:
            address:
              description: IPv4 address
              type: str
              required: true
            mask_bits:
              description: Number of bits that represent the subnet (i.e. the number after '/' in CIDR notation)
              type: int
              required: true
...

The role’s defaults/main.yaml just specified the setting for network_ifupdown_manage_global_interfaces_file:

---
network_ifupdown_manage_global_interfaces_file: true
...

Gathering information for the role

Separating out networking configuration from `site.yaml`

As my site.yaml has grown rather organically, it has become a bit unwieldy. I decided the networking configuration was a good thing to start breaking out but my first attempt, putting those bits in a new playbook in a subdirectory called playbooks (playbooks/networking.yaml) failed because Ansible looks relatively for roles (so it was looking in playbooks/roles/ instead of my top-level roles/ directory) even when that playbook is imported into another at the top level (in this case site.yaml).

A bit of research found me a Google Groups post on this problem, one suggestion is to keep all roles a single central location (although I am trying to move my roles out into self-contained, Ansible Galaxy installable, roles I am not there yet) or include task files instead. I opted to do the latter, although this did mean a bit of targetting with when instead of pattern matching hosts at the play level to target the right type of network management to the right hosts. However, all hosts managed by Ansible need networking so it does make logical sense to have a single hosts: all that applies the tasks.

In my site.yaml:

- name: Networking is configured
  hosts: all:!dummy
  tags: networking
  tasks:
    - name: Import networking tasks
      ansible.builtin.import_tasks: tasks/networking.yaml

Doing the information gathering

Going forward, I intend to replace the group and OS detection with variables that control the networking method to use - set at the group level where appropriate. This will simplify the logic in then new tasks file and make it easier to override, if necessary, on a per-host or group basis.

For now, though, I moved some of my existing computed network information in the new tasks file, and prefixed all variables (“facts”) set with network_ for clarity and to avoid potential clashes. The tasks file then just includes the relevant role, based on the network configuration found:

---
#
# Computed variables are set
#
- name: Facts are initialised
  ansible.builtin.set_fact:
    network_mac_interface_map: {}
    network_name_ip_map: {}
# Build map of MAC -> interfaces
- name: MAC address to interface names is populated
  ansible.builtin.set_fact:
    network_mac_interface_map: >-
      {{
        network_mac_interface_map
        |
        combine({
          interface_details.macaddress: item
        })
      }}
  vars:
    interface_details: '{{ ansible_facts[item] }}'
  loop: '{{ ansible_facts.interfaces }}'
  # Skip VLAN subinterfaces and interfaces with no MAC (e.g. VPN and loopback)
  when: "'.' not in item and 'macaddress' in interface_details"
- name: Map of names to static IP addresses is known for this hosts
  ansible.builtin.set_fact:
    network_name_ip_map: >-
      {{
        network_name_ip_map
        |
        combine({
          item.value: item.key
        })
      }}
  vars:
    interface_names: "{{ vars.interfaces | selectattr('name', 'defined') | map(attribute='name') }}"
  when: item.value in interface_names
  loop: "{{ networks.values() | map(attribute='ip4_assignments') | map('dict2items') | flatten }}"
#
# Manage network interfaces
#
- name: NetworkManager is used to mange desktop systems
  ansible.builtin.include_role:
    name: network-manager
  when: "'desktops' in group_names and ansible_facts.system == 'Linux'" # Not a Windows thing
- name: Any ifupdown interfaces are configured
  block:
    - name: "`network_ifupdown_interfaces` is initialised"
      ansible.builtin.set_fact:
        network_ifupdown_interfaces: []
    - name: Find list of interfaces to manage with ifupdown
      ansible.builtin.set_fact:
        network_ifupdown_interfaces: |
          {{
            network_ifupdown_interfaces
            +
            [
              {
                'name': network_mac_interface_map[item.mac_address],
                'method': item.method
              }
              | ansible.builtin.combine(ipv4)
            ]
          }}
      when: item.managed_by | default('') == 'ifupdown'
      vars:
        ipv4: |
          {% if item.method == 'static' %}
          {{
            {
              'ip4': {
                'address': network_name_ip_map[item.name],
                'mask_bits': (networks.items() | selectattr('1.ip4_assignments', 'contains', network_name_ip_map[item.name]) | first).0 | ansible.utils.ipaddr('prefix')
              }
            }
          }}
          {% else %}
          {{ {} }}
          {% endif %}
      loop: "{{ vars.interfaces }}"
    - name: ifupdown managed interfaces are configured
      ansible.builtin.include_role:
        name: network-ifupdown
      vars:
        # `network_ifupdown_interfaces` is populated above
        # Proxmox manages `/etc/network/interfaces` for the main network on proxmox hosts
        network_ifupdown_manage_global_interfaces_file: "{{ 'proxmox_virtual_environment_hosts' not in group_names }}"
      when: network_ifupdown_interfaces | length > 0
...

The role’s tasks

The tasks are just a case of pushing out templated files to the appropriate place. I decided to install ifupdown2 as part of the role, over the default ifupdown package, as it provides the very useful ifreload command. The role will trigger this if it makes changes.

---
- name: ifupdown2 is installed (provides `ifreload`)
  become: true
  ansible.builtin.package:
    name: ifupdown2
- name: Global interfaces file is managed, if requested
  become: true
  ansible.builtin.template:
    src: generic_interfaces.j2
    dest: /etc/network/interfaces
    owner: root
    group: root
    mode: 00444
  when: network_ifupdown_manage_global_interfaces_file == true
  notify: ifreload
- name: Interface files are correct for managed interfaces
  become: true
  ansible.builtin.template:
    src: interface.j2
    dest: ""
    owner: root
    group: root
    mode: 00444
  loop: ''
  loop_control:
    loop_var: interface
  vars:
    interface_name: ''
  notify: ifreload
- name: Handlers are flushed (so network interfaces are (re)configured immediately)
  ansible.builtin.meta: flush_handlers
...

The handler, in the role’s handlers/main.yaml file, is very simple:

---
- name: ifreload
  become: true
  ansible.builtin.command:
    cmd: /sbin/ifreload -a
...

Templates

`generic_interfaces.j2`

The only reason this is a template is to include the ansible_managed templated variable:

{{ ansible_managed | comment }}

# This file describes the network interfaces available on your system
# and how to activate them. For more information, see interfaces(5).

source /etc/network/interfaces.d/*

# The loopback network interface
auto lo
iface lo inet loopback

`interface.j2`

For now, this is a very simple dhcp or static configuration template - my plan (I’ve added it to my enormous ‘to-do’ list) is to migrate my router’s configuration to use this role as well, which will add support for a lot of other features:

{{ ansible_managed | comment }}
auto {{ interface.name }}
iface {{ interface.name }} inet {{ interface.method }}
{% if interface.method == "static" %}
  address {{ interface.ip4.address }}/{{ interface.ip4.mask_bits }}
{% endif %}

Publishing the role

Finally, I added licence headers and files, a readme file and a meta/main.yaml so I could publish the role on GitHub and use it myself from there:

---
# Copyright 2025 Laurence Alexander Hurst
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program.  If not, see <https://www.gnu.org/licenses/>.
galaxy_info:
  role_name: network-ifupdown
  author: Laurence Alexander Hurst
  description: Ansible role for managing ifupdown network interfaces
  license: GPLv3
  # Not been tested on anything prior to this - may work but YMMV
  min_ansible_version: 2.15.13
  repository: https://github.com/loz-hurst/ansible-role-network-ifupdown.git
dependencies: []
...

Configuring Ceph

As I plan to re-install Proxmox to setup the new disks and setup Ceph using its tools in the process, I manually made these changes to the existing nodes.

Hosts file changes

Before reconfiguring the network, I changed the code that builds the list for /etc/hosts on Proxmox (I don’t put static host entries on any other hosts) to include the new, private network, names:

# /etc/hosts must have the local machine's IP configured before install
# see: https://forum.proxmox.com/threads/proxmox-ve-installation-8-0-errors-were-encountered-while-processing-pve-manager-proxmox-ve.134808/
- name: Proxmox hostnames variable is initialised
  ansible.builtin.set_fact:
    pve_hostnames: '{{ groups.proxmox_virtual_environment_hosts }}'
- name: names are known for additional interfaces for all PVE hosts
  ansible.builtin.set_fact:
    pve_hostnames: >-
      {{
        pve_hostnames
        +
        hostvars[item].interfaces | selectattr('name', 'defined') | map(attribute='name')
      }}
  loop: '{{ groups.proxmox_virtual_environment_hosts }}'
- name: Map of all PVE names to static IP addresses is known
  ansible.builtin.set_fact:
    pve_hostnames_ip_map: >-
      {{
        pve_hostnames_ip_map
        |
        combine({
          item.value: item.key
        })
      }}
  vars:
    pve_hostnames_ip_map: {}  # Fact will take precedence once set
  when: item.value in pve_hostnames
  loop: "{{ networks.values() | map(attribute='ip4_assignments') | map('dict2items') | flatten }}"
- name: Additional hostnames are added to mapping
  ansible.builtin.set_fact:
    pve_hosts_ips: >-
      {{
        pve_hosts_ips
        +
        [{
          'hostname': this_host_fqdn,
          'ipv4': {
            'address': pve_hostnames_ip_map[item]
          },
        }]
      }}
  vars:
    pve_hosts_ips: []  # Fact will take precedence once set
    this_host_fqdn: >-
      {%- if '.' in item -%}
        {{ item }}
      {%- else -%}
        {{ item }}.{{ ansible_facts.domain }}
      {%- endif -%}
  loop: '{{ pve_hostnames }}'
- name: Hosts file is configured for Proxmox VE hosts
  ansible.builtin.include_role:
    name: hosts_file
  vars:
    hosts_file_hosts: '{{ pve_hosts_ips }}'

Reconfiguring Ceph

I began making the actual change by shutting down all of the VMs, as they all sit on Ceph so reconfiguring Ceph would upset them.

Then, in /etc/pve/ceph.conf, I added definitions for the public and cluster network to the [global] section:

public_network = 192.168.10.0/24
cluster_network = 172.16.0.0/28

I then simply restarted ceph-osd on each of the Proxmox nodes.