This post follows on from automating Debian install, picking up doing the post-install configuration from the point of having a (generic) baseline OS installed. I am using Ansible to do the post-installation configuration but any other configuration management tool could be used and I have personally used SaltStack, Puppet and cfengine to implement this approach (simple, minimal OS install handing off to configuration management tool to do host-specific customisation post-install) with a variety of Linux distributions. Reasons I like this approach include that all host-specific configuration and data is in one location, creating clarity about whether a setting or configuration comes from the install script (e.g. kickstart/preseed) or configuration management tool (or both), and all customisations being applied by the configuration management tool means those remain correct (and corrected by the tool if they deviate) throughout the lifetime of the system.

Finding connection information for unconfigured hosts

In order to connect to a host to configure it correctly, we need to somehow link the host to the information about that host. The easiest way for me is to use the MAC address as an identifier. When it comes to targeting pxe booting (which will probably be the subject of a future blog post about end-to-end automation of (re)installs), I will also wish to have the MAC address so this approach I consider a “technical investment” (c.f. technical debt). I configured this as a variable on the host in the inventory:

all:
  hosts:
    proxmox-1:
      mac_address: 08:00:27:96:56:50

For this development work, I used dnsmasq however my live network is using ISC’s DHCP server, so I wrote support for both. The DHCP host, on which to look up the IP address, is an argument dhcp_server_host that I set at the all group level to lookup the first host in the dhcp_servers group (dhcp_server_host: "{{ groups.dhcp_servers | default([None]) | first }}"). This works for my current environment but makes the role flexible for other setups.

The proof-of-concept code to select which method to use just tests to see if specific file exists, either the dnsmasq leases file or ISC’s leases utility (if, for some reason, both exist then both sets of tasks will be included and the later will override the first).

If not configured, ansible_host takes the value of the inventory host name - testing if ansible_host is undefined is therefore useless (as it is always set to something) so, to tell if it has been explicitly set, the test is “is ansible_host different to inventory_hostname?” although there is a clear problem if the ansible_host variable is explicitly (if redundantly) set to inventory_hostname.

Detecting DHCP server software

This ended up as a task file called detect-dhcp-server-software.yaml in a dhcp role:

---
# This originated from a playbook to lookup IPs by mac address. For
# that reason, it uses lease related artefacts to identify the
# software and "delegate_to" to run on the dhcp host (presuming that
# anything running directly against the dhcp server would not need to
# use this set of tasks). Both of these decisions may need to be
# revisited in the future.
# I did try removing the delegation and apply it at the "include_tasks"
# level, however delegate_to is not permitted on an include_tasks by
# Ansible so it has to be done this way. Delegation will not occur if
# `dhcp_server_host` is set to `None`.
- name: dnsmasq specific file is `stat`ed
  delegate_to: "{{ dhcp_server_host }}"
  ansible.builtin.stat:
    path: /var/lib/misc/dnsmasq.leases
  register: dnsmasq_leases_file
- name: isc-dhcp-server specific file is `stat`ed
  delegate_to: "{{ dhcp_server_host }}"
  ansible.builtin.stat:
    path: /usr/sbin/dhcp-lease-list
  register: isc_dhcp_leases_command
- name: Know if DHCP server software is dnsmasq
  ansible.builtin.set_fact:
    dhcp_server_software: dnsmasq
  when: dnsmasq_leases_file.stat.exists
- name: Know if DHCP server software is isc-dhcp-server
  ansible.builtin.set_fact:
    dhcp_server_software: isc-dhcp-server
  when: isc_dhcp_leases_command.stat.exists
- name: Detected software is reported
  ansible.builtin.debug: msg="DHCP server is {{ dhcp_server_software }}"
...

Since this task file is not intended to be used directly, it is used by other tasks in the role to transparently flex to different server software, I did not add an argument specification for it to meta/argument_specs.yaml in the role. If I did, the only argument is the DHCP server host - dhcp_server_host.

Looking up the DHCP IP lease

Originally, this just found the host’s lease (from it’s mac_address variable) from each server software’s leases file however I later added the wait_for_lease option, to optionally wait until a lease exists if there is not one. When I did this, it became easier to retrieve a full set of leases (in a standardised format) from the server software in use, and then use a common set of tasks to work with this abstract information. This task file is called lookup-host.yaml and it recurses by including itself, if needed, until a lease exists (yes, this will loop infinitely (well, until the call stack overflows) if none appears).

---
- name: DHCP leases from are fetched from DHCP server
  # Uses `dhcp_server_host` for where to connect
  ansible.builtin.include_tasks: lookup-dhcp-leases.yaml
- name: dhcp_lease for the requested MAC address is set
  ansible.builtin.set_fact:
    dhcp_lease:
      ip: "{{ this_lease.ip | ansible.utils.ipaddr }}"
      hostname: "{{ this_lease.hostname }}"
      expires: "{{ this_lease.expires }}"
  vars:
    # Sort sorts ascending, so `last` is the most recent lease
    this_lease: >-
      {{
        dhcp_leases
        | selectattr('mac', 'eq', mac_address)
        | sort(attribute='expires')
        | last
      }}
  when: dhcp_leases is defined and (dhcp_leases | selectattr('mac', 'eq', mac_address) | length > 0)
- name: dhcp_lease is None if no leases for the MAC found
  ansible.builtin.set_fact:
    dhcp_lease: null
  when: dhcp_leases is not defined or (dhcp_leases | selectattr('mac', 'eq', mac_address) | length == 0)
- name: Recursion delay of {{ wait_for_lease_delay }}s has happened, if required
  delegate_to: localhost  # Stop Ansible trying to connect to the host to do the wait_for
  ansible.builtin.wait_for:
    timeout: "{{ wait_for_lease_delay }}"
  when: wait_for_lease and dhcp_lease is none
- name: Recursion has happened, if required
  ansible.builtin.include_tasks: lookup-host.yaml
  when: wait_for_lease and dhcp_lease is none
...

The lookup-dhcp-leases.yaml tasks file just includes the software-specific lookup-dhcp-leases-<software>.yaml file for the DHCP server software in use:

---
- name: DHCP server software is detected (if not already set)
  ansible.builtin.include_tasks: detect-dhcp-server-software.yaml
  when: dhcp_server_software is undefined
- name: Appropriate lookup tasks are included
  ansible.builtin.include_tasks: lookup-dhcp-leases-{{ dhcp_server_software }}.yaml
...

It is worth noting that dhcp_server_host can be explicitly set to null/None (in yaml/python parlance), in which case any delegate_to: {{ dhcp_server_host }} tasks will run against the current inventory host (I found this out empirically).

I found the format of dnsmasq’s leases file frustratingly hard to locate in any documentation, in the end I found a mailing list post which describes it:

Fields in order:

  1. Time of lease expiry as epoch time. Can be changed at compile time to remaining lease time (in seconds) or total lease renewal time.
  2. MAC address.
  3. IP address.
  4. Computer name, if known (always unqualified).
  5. Client-ID, if known.

For dnsmasq, the task file reads the leases file directly:

---
- name: DHCP leases file is read from the dhcp server
  delegate_to: "{{ dhcp_server_host }}"
  # On my system this file is world-readable, so no special
  # permissions required.
  # Treat this space-delimited file like a CSV for ease of parsing.
  community.general.read_csv:
    delimiter: ' '
    dialect: unix
    fieldnames:
      - expires
      - mac
      - ip
      - hostname
      - client_id
    path: /var/lib/misc/dnsmasq.leases
  register: dhcp_leases_csv
- name: CSV data is in correct data types and format
  ansible.builtin.set_fact:
    dhcp_leases: >-
      {{ dhcp_leases + [
        {
          'mac': lease.mac,
          'ip': lease.ip,
          'hostname': lease.hostname,
          'expires': lease.expires,
        }
      ] }}
  loop: "{{ dhcp_leases_csv.list }}"
  vars:
    dhcp_leases: []  # Start with empty list, fact will take precedence
    lease: "{{ item | combine({ 'expires': item.expires | int }) }}"
...

For ISC’s DHCP Server, I used the dhcp-lease-list tool. This has some limitations, chiefly it presumes a single MAC address does not have multiple leases using different client id (or, at least, it does not expose the client id making this impossible to detect and by default only shows the latest lease for each MAC address). This is one of the reasons I chose to ignore this situation in my tasks. I could, alternatively, parse the software’s leases file but its format is complex (using delimited blocks, rather than lines, per client) and would be hard to parse using existing Ansible modules.

---
- name: DHCP leases file is read from the dhcp server
  delegate_to: "{{ dhcp_server_host }}"
  # Despite being in /sbin, on my system the leases file file is
  # world-readable, and no special permissions were required.
  ansible.builtin.command: /usr/sbin/dhcp-lease-list --parsable --all
  changed_when: false  # Read-only command
  register: dhcp_leases_output
- name: Leases results are parsed
  ansible.builtin.set_fact:
    # Example lease:
    # MAC aa:bb:cc:dd:ee:ff IP 192.168.0.1 HOSTNAME -NA- BEGIN 2023-08-15 16:06:33 END 2023-08-15 17:06:33 MANUFACTURER
    dhcp_leases_parsed: >-
      {{ dhcp_leases_parsed + [
        dict(
          ['mac', 'ip', 'hostname', 'start_time', 'end_time', 'manufacturer']
          |
          zip(
            item
            |
            regex_search(
              '^MAC (?P<mac>[0-9a-f:]+) IP (?P<ip>[0-9\\.]+) HOSTNAME (?P<hostname>[^ ]+) BEGIN (?P<start_time>[0-9-: ]+) END (?P<end_time>[0-9-: ]+) MANUFACTURER (?P<manufacturer>.*)$',
              '\g<mac>',
              '\g<ip>',
              '\g<hostname>',
              '\g<start_time>',
              '\g<end_time>',
              '\g<manufacturer>',
            )
          )
        )
      ] }}
  vars:
    dhcp_leases_parsed: []  # Start with empty list, fact will take precedence
  loop: "{{ dhcp_leases_output.stdout_lines }}"
- name: Parsed output is in correct format and data types
  ansible.builtin.set_fact:
    dhcp_leases: >-
      {{ dhcp_leases + [
        {
          'mac': lease.mac,
          'ip': lease.ip,
          'hostname': lease.hostname,
          'expires': lease.expires
        }
      ] }}
  loop: "{{ dhcp_leases_parsed }}"
  vars:
    dhcp_leases: []  # Start with empty list, fact will take precedence
    # Times in leases file are UTC but datetime will assume localtime
    # unless timezone is explicit.
    lease: >-
      {{
        item
        |
        combine({
          'expires':
            (
              (item.end_time ~ '+0000')
              |
              to_datetime(format='%Y-%m-%d %H:%M:%S%z')
            ).timestamp()
            |
            int
        })
      }}
...

I added both lookup-dhcp-leases and lookup-host argument specifications to meta/argument_specs.yaml in the role, which Ansible uses to validate the role’s task file is being accessed with the correct variables defined. This file supports using different tasks files as the entry point:

---
argument_specs:
  lookup-host:
    short_description: >-
      Lookup the IP address of the current inventory host.
    description: >
      Lookup the current inventory host's IP address (from its
      `mac_address`) on the DHCP server specified by
      `dhcp_server_host` and set `dhcp_lease` accordingly. It
      assumes only the most recent IP lease for the MAC address is
      relevant - i.e. does not consider the case of multiple client
      ids for the same MAC.

      `dhcp_lease` will be a dictionary with the keys `ip`,
      `hostname`, `expires` which holds the IPv4 address,
      hostname and expiry (in seconds since Unix epoch) of the
      most recent lease according to the DHCP server.
    options:
      dhcp_server_host:
        description: >-
          The inventory host to delegate the lookup tasks to (i.e.
          the DHCP server).
        type: str
      mac_address:
        description: >-
          The mac address of the host to lookup the IP for (in the
          format aa:bb:cc:dd:ee:ff)
        type: str
        required: true
      wait_for_lease:
        description: >-
          If a DHCP IP lease is not found for the mac address, retry
          until one becomes available (when `true`).
        type: bool
        default: false
      wait_for_lease_delay:
        description: >-
          Time (in seconds) between retries when `wait_for_lease`
          is `true`.
        type: int
        default: 5
  lookup-dhcp-leases:
    short_description: >-
      Lookup all the leases on the DHCP server.
    description: >
      Lookup the DHCP leases from the DHCP server. Will try to detect the
      dhcp software if `dhcp_server_software` is not set.

      `dhcp_leases` will be a list of dictionaries with the keys `mac`,
      `ip`, `hostname`, `expires` which holds the MAC address,
      IPv4 address, hostname and expiry (in seconds since Unix epoch)
      of each lease. (Note, due to limitations in isc's dhcp-lease-list
      output, does not include client id).
    options:
      dhcp_server_host:
        description: >-
          The inventory host to delegate the lookup tasks to (i.e.
          the DHCP server).
        type: str
...

The default values I specified, I added to the role’s defaults/main.yaml:

---
wait_for_lease: false
wait_for_lease_delay: 5
...

Using the role

The recipe to use the role for looking up the IP when the inventory host cannot be looked up and no specific ansible_host value has been given is:

- name: Host's dhcp details are known
  # * uses host's `mac_address` variable
  # * uses default value for `dhcp_server_host` variable (which
  #   looks up from `dhcp_servers` group)
  ansible.builtin.include_role:
    name: dhcp
    tasks_from: lookup-host
  when: >-
    ansible_host == inventory_hostname
    and
    inventory_hostname is not ansible.utils.resolvable

However, the Debian installer will always DHCP in my environment so for the purposes of automating the installation process, a DHCP lookup will take place every time (even if inventory_hostname is resolvable).

I will pick up from here in another post, I started working on this one on 22nd August and decided to split it up in order to get it published and break up what became a very long piece…