Following on from my post on looking up DHCP client IP addresses with Ansible, I had intermittent problems with hosts changing IP, which I initially observed if the OS was uncleanly shutdown. The root cause (discussed below) is that the Debian installer, the DHCP client used by initramfs for dropbear and OS DHCP client are not all using the same client id. In the situation I first saw this, I suspected that on a clean shutdown the OS’s DHCP client issues a release on the IP so it is available (and with very low IP churn in my development environments it happened to be issued again when it came back up) however on an unclean shutdown there was still a valid lease to the OS’s client id resulting in dropbear getting a different IP but post-unlock the OS (using the same client-id as the unreleased lease) recovered the “old” IP lease. I confirmed this hypothesis with empirical testing. In this post, I discuss why this is happening and use Ansible to dynamically configure the DHCP server to workaround the problem.

Why is this happening?

The changing IP is happening because both dnsmasq and ISC’s DHCP server default configurations adhere to the DHCP standard specified in RFC 2131, specifically 4.2 DHCP server administrative controls that describes this behaviour:

A DHCP server needs to use some unique identifier to associate a client with its lease. The client MAY choose to explicitly provide the identifier through the ‘client identifier’ option. If the client supplies a ‘client identifier’, the client MUST use the same ‘client identifier’ in all subsequent messages, and the server MUST use that identifier to identify the client. If the client does not provide a ‘client identifier’ option, the server MUST use the contents of the ‘chaddr’ field to identify the client. It is crucial for a DHCP client to use an identifier unique within the subnet to which the client is attached in the ‘client identifier’ option. Use of ‘chaddr’ as the client’s unique identifier may cause unexpected results, as that identifier may be associated with a hardware interface that could be moved to a new client. Some sites may choose to use a manufacturer’s serial number as the ‘client identifier’, to avoid unexpected changes in a clients network address due to transfer of hardware interfaces among computers. Sites may also choose to use a DNS name as the ‘client identifier’, causing address leases to be associated with the DNS name rather than a specific hardware box.

What to do?

So, what do about this situation? I am deliberately trying to keep this dynamic, so assigning a specific IP address to the MAC is not something I want to do for the general case (there may be some systems that have static IP assignments). Another option is to rework my playbook(s) to cope with the changing IP, somehow polling for the new IP however this will be tricky to code when (e.g.) waiting for the host to come up post install - as the number and timing at which the IP changes could be difficult to predict both because of the number of DHCP clients used and the number of times the system might DHCP depending on the install process: during initial PXE boot, during iPXE (these first two only if PXE booting), during the install (only if do a network install) and finally during the OS boot after install.

I decided to temporarily configure the DHCP server to ignore the client id during the installation process and remove that setting once the system is installed. This ensures that through the installation process the system should not change IP, although as the ISC DHCP server’s configuration man page points out: “This is a violation of the DHCP protocol, but can prevent clients whose client identifiers change regularly from holding many leases at the same time.”. There remains a risk that another host might hop in and acquire the dynamic address between a release and renew but in a low-churn environment that risk is low.

I confirmed empirically (by monitoring the network traffic) that:

  • the installed Debian OS does release the IP on a clean shutdown, including via the reboot command.
  • none of these issue a DHCP release at any point during the installation process:
    • PXE firmware
    • iPXE client
    • Debian installer
  • (one for later) dropbear in the initramfs does not issue a DHCP release

The upshot of this is that, if one tells the DHCP server to ignore the client id (and so use the MAC address to link clients to existing leases) the only point at which the IP could change is after a clean shutdown of the installed OS. At this point, the client does not have a lease at all so the IP that was assigned could be handed out to another client - arguably a race condition. This will be problematic for reinstalls (the IP may change between the system being instructed to reboot and the installer starting) or in a number of other situations I have not covered but a DHCP release may occur, e.g. a client boots into something other than the installer (locally or via PXE) and has to be manually rebooted into the installer.

Temporarily ignoring client id for DHCP

As I mentioned in my last blog post, I have two different DHCP servers in use so I wish to configure this for both. I will need to reuse the recipe, also from that post, to detect which server software is in use and I am adding to the collection of tasks to do with DHCP.

Adding tasks to ignore client id

I need two sets of tasks for this - one to add the configuration to ignore the client id and one to remove that configuration.

Add configuration

The configuration for dnsmasq is (aa:bb:cc:dd:ee:ff is the client MAC address):

dhcp-host=aa:bb:cc:dd:ee:ff,id:*

For ISC’s DHCP server, based on the documentation it would seem deny duplicates; inside the host scope, with a MAC match, would do this. However despite much testing I have been unable to get deny duplicates; to work, even setting it in group and global scopes. It does seem that I am not alone in finding this option does not work as described. However, I subsequently found out about the ignore-client-uids parameter which does seem to work and can be set at the host scope:

host my-host.my-domain.tld {
  hardware ethernet aa:bb:cc:dd:ee:ff:dd;
  ignore-client-uids true;
}

In both cases, my intention is to add the configuration in a separate file for each server so that reversing the configuration is straightforward (deleting the file, rather than trying to unpick lines added to a bigger configuration file).

It is also necessary to also remove all but the most recent lease (and correct the latest one to not have the client id) as any existing value with a client id that matches will be used in preference to honouring the new configuration (so the client’s IP may still change if there is an existing lease for that client id in the leases file). This behaviour makes sense on one level, in that (presuming it has not expired) the leases with the same MAC but different client ids and hence different IPs are valid however, on the other hand, they would never be handed out by the new configuration so one could argue they are therefore invalid. I observed this behaviour with both dnsmasq and ISC’s DHCP server.

Following the same pattern as before, I made roles/dhcp/tasks/host-ignore-client-id.yaml and roles/dhcp/tasks/host-unignore-client-id.yaml which detect which server software is in use then include the software specific version:

---
- name: DHCP server software is detected (if not already set)
  ansible.builtin.include_tasks: detect-dhcp-server-software.yaml
  when: dhcp_server_software is undefined
- name: Appropriate ignore client id tasks are included
  ansible.builtin.include_tasks: host-ignore-client-id-{{ dhcp_server_software }}.yaml
...

and

---
- name: DHCP server software is detected (if not already set)
  ansible.builtin.include_tasks: detect-dhcp-server-software.yaml
  when: dhcp_server_software is undefined
- name: Appropriate unignore client id tasks are included
  ansible.builtin.include_tasks: host-unignore-client-id-{{ dhcp_server_software }}.yaml
...

Configuration files for dnsmasq in /etc/dnsmasq.d are automatically read by the default configuration (at least on Debian), so I just need to drop the configuration in there and restart the daemon. The procedure to remove the existing leases involves stopping the DHCP server service, modifying the leases file then starting the service again. This will naturally restart the daemon so no separate restart handler needed to be notified. The resultant host-ignore-client-id-dnsmasq.yaml just deploys the configuration file and triggers the handler (so as to only do the hard work of doctoring the leases file if the configuration file changes):

---
- name: Configuration file exists for this MAC address
  delegate_to: "{{ dhcp_server_host }}"
  become: true
  ansible.builtin.copy:
    dest: /etc/dnsmasq.d/ignore-client-id-{{ mac_address.replace(':', '-') }}.conf
    content: >
      dhcp-host={{ mac_address }},id:*
    owner: root
    group: root
    mode: 00444
  # N.B. this handler will stop then start the dnsmasq daemon, so
  # `dnsmasq is restarted` does not need to be notified.
  notify: "dhcp : mac address dhcp client id is removed"
...

For isc-dhcp-server, I also created a separate configuration file per host but these have to be expressly included in the main configuration file, dhcpd.conf, as a modular approach is not the default:

---
- name: Host configuration directory exists
  delegate_to: "{{ dhcp_server_host }}"
  become: true
  ansible.builtin.file:
    path: /etc/dhcp/dhcpd.d
    state: directory
    owner: root
    group: root
    mode: 00700
- name: Configuration for this host exists
  delegate_to: "{{ dhcp_server_host }}"
  become: true
  ansible.builtin.copy:
    force: false # Only change if destination does not exist
    dest: /etc/dhcp/dhcpd.d/host-{{ inventory_hostname }}.conf
    content: |
      host {{ inventory_hostname }} {
        hardware ethernet {{ mac_address }};
        ignore-client-uids true;
      }
  register: isc_dhcp_server_host_file_exists
  # N.B. this handler will stop then start the isc-dhcp-server daemon, so
  # `isc-dhcp-server is restarted` does not need to be notified.
  notify: mac address dhcp client id is removed
- name: Host configuration is included in main configuration file
  delegate_to: "{{ dhcp_server_host }}"
  become: true
  ansible.builtin.lineinfile:
    path: /etc/dhcp/dhcpd.conf
    line: 'include "/etc/dhcp/dhcpd.d/host-{{ inventory_hostname }}.conf";'
  register: isc_dhcp_server_host_file_included
  # Even if the ignore-client-uids was present, if it wasn't included still
  # need to clear out duplicate leases.
  # N.B. this handler will stop then start the isc-dhcp-server daemon, so
  # `isc-dhcp-server is restarted` does not need to be notified.
  notify: mac address dhcp client id is removed
# XXX This assumes only one host stanza _or_ last host stanza is the
# one that will apply. No accounting for multiple stanzas etc. (what
# if multiple fixed-addresses on different networks? How to deal with
# that automatically? Probably too complicated.)
- name: Host is configured to ignore client uids
  delegate_to: "{{ dhcp_server_host }}"
  become: true
  ansible.builtin.lineinfile:
    path: /etc/dhcp/dhcpd.d/host-{{ inventory_hostname }}.conf
    line: '  ignore-client-uids true;'
    insertbefore: '}'
  register: isc_dhcp_server_host_file_deny_duplicates_configured
  # N.B. this handler will stop then start the isc-dhcp-server daemon, so
  # `isc-dhcp-server is restarted` does not need to be notified.
  notify: mac address dhcp client id is removed
...
Remove configuration

The “unignore” tasks for dnsmasq is similarly straight forward, although this just needs to do a straight restart of the daemon as the existing lease (if any) will continue to be valid:

---
---
- name: The configuration file for this mac address is deleted
  delegate_to: "{{ dhcp_server_host }}"
  become: true
  ansible.builtin.file:
    path: /etc/dnsmasq.d/ignore-client-id-{{ mac_address.replace(':', '-') }}.conf
    state: absent
  notify: "dhcp : dnsmasq is restarted"
...
...

The isc-dhcp-server version is complicated by the need to also remove the include line from the main configuration. I made it undo any changes made by the host-ignore-client-id-isc-dhcp-server.yaml tasks using register variables. This means the “undo” only works as part of the same run as the “do” part and is therefore more fragile. The undo will not work if an error occurs and the play stops between “do” and “undo” tasks. It also means the undo will not change any configuration that is intentionally set to ignore the client outside of this process and not delete the rest of the client-specific options in the included file, if they existed. isc-dhcp-server only supports one host block per-host, unless they are host blocks with different fixed ips on different subnets.

---
# XXX This assumes only one matching line. As with adding tasks, no
# accounting for multiple stanzas etc. (what if multiple
# fixed-addresses on different networks, some with this setting? How
# to deal with that automatically? Probably too complicated.)
- name: Host is not configured to ignore uids
  delegate_to: "{{ dhcp_server_host }}"
  become: true
  ansible.builtin.lineinfile:
    path: /etc/dhcp/dhcpd.d/host-{{ inventory_hostname }}.conf
    line: '  ignore-client-uids true;'
    state: absent
  when: isc_dhcp_server_host_file_deny_duplicates_configured.changed
  notify: "dhcp : isc-dhcp-server is restarted"
- name: Host configuration is not included in main configuration file
  delegate_to: "{{ dhcp_server_host }}"
  become: true
  ansible.builtin.lineinfile:
    path: /etc/dhcp/dhcpd.conf
    line: 'include "/etc/dhcp/dhcpd.d/host-{{ inventory_hostname }}.conf";'
    state: absent
  when: isc_dhcp_server_host_file_included.changed
  notify: "dhcp : isc-dhcp-server is restarted"
- name: Configuration for this host does not exist
  delegate_to: "{{ dhcp_server_host }}"
  become: true
  ansible.builtin.file:
    path: /etc/dhcp/dhcpd.d/host-{{ inventory_hostname }}.conf
    state: absent
  when: isc_dhcp_server_host_file_exists.changed
  notify: "dhcp : isc-dhcp-server is restarted"
...

Removing old leases for a host (to clear existing duplicates)

The handler to remove the MAC address uses the ansible.builtin.include_tasks module to include a task file that does this work. This is to workaround handlers only being able to specify a single module to execute, I recommend this medium article for background reading if you want to read about some approaches to this. The handler definition goes in roles/dhcp/handlers/main.yaml:

- name: mac address dhcp client id is removed
  ansible.builtin.include_tasks: remove-mac-address-client-id.yaml

The handler tasks to remove client id based DHCP leases for a MAC address use the same software specific pattern as before (in roles/dhcp/tasks/remove-mac-address-client-id.yaml):

---
- name: DHCP server software is set
  ansible.builtin.include_tasks: detect-dhcp-server-software.yaml
  when: dhcp_server_software is undefined
- name: Appropriate remove mac address client id tasks are included
  ansible.builtin.include_tasks: remove-mac-address-client-id-{{ dhcp_server_software }}.yaml
...

For dnsmasq, in roles/dhcp/tasks/remove-mac-address-client-id-dnsmasq.yaml:

---
- name: The dnsmasq service is stopped
  delegate_to: "{{ dhcp_server_host }}"
  become: true
  ansible.builtin.service:
    name: dnsmasq
    state: stopped
- name: DHCP leases file is read from the DHCP server
  delegate_to: "{{ dhcp_server_host }}"
  # On my system this file is world-readable, so no special
  # permissions required.
  # Treat this space-delimited file like a CSV for ease of parsing.
  community.general.read_csv:
    delimiter: ' '
    dialect: unix
    fieldnames:
      - expires
      - mac
      - ip
      - hostname
      - client_id
    path: /var/lib/misc/dnsmasq.leases
  register: dhcp_leases_csv
# This ensure dhcp_leases is always set,  even if leases file is empty.
- name: List of leases is empty to start
  ansible.builtin.set_fact:
    dhcp_leases: []
- name: CSV data is in correct data types
  ansible.builtin.set_fact:
    dhcp_leases: "{{ dhcp_leases + [lease] }}"
  loop: "{{ dhcp_leases_csv.list }}"
  vars:
    lease: "{{ item | combine({ 'expires': item.expires | int }) }}"
- name: Existing leases for this MAC address are removed
  delegate_to: "{{ dhcp_server_host }}"
  become: true
  ansible.builtin.lineinfile:
    path: /var/lib/misc/dnsmasq.leases
    # Match lines that begin with an expiry timestamp followed by
    # this MAC address
    regexp: "^[0-9]+ {{ mac_address }} "
    state: absent
- name: Latest lease for this MAC address, with client-id removed, is present
  delegate_to: "{{ dhcp_server_host }}"
  become: true
  ansible.builtin.lineinfile:
    path: /var/lib/misc/dnsmasq.leases
    # N.B. deliberately replacing client id with '*'
    line: >-
      {{ latest_lease.expires }}
      {{ latest_lease.mac }}
      {{ latest_lease.ip }}
      {{ latest_lease.hostname }}
      *
  vars:
    latest_lease: >-
      {{
        dhcp_leases
        | selectattr('mac', 'eq', mac_address)
        | sort(attribute='expires')
        | last
      }}
  when: dhcp_leases | selectattr('mac', 'eq', mac_address) | length > 0
- name: The dnsmasq service is started
  delegate_to: "{{ dhcp_server_host }}"
  become: true
  ansible.builtin.service:
    name: dnsmasq
    state: started
...

For isc-dhcp-server, I crafted a perl one-liner based on a recipe I found online for deleting leases by IP address but modified to do it by MAC address. The format of the isc-dhcp-server leases file does not lend itself to easy modification with stream editing software.

---
- name: Perl is installed
  delegate_to: "{{ dhcp_server_host }}"
  become: true
  ansible.builtin.package:
    name: perl
    state: present
- name: The isc-dhcp-server service is stopped
  delegate_to: "{{ dhcp_server_host }}"
  become: true
  ansible.builtin.service:
    name: isc-dhcp-server
    state: stopped
- name: The mac address is not in the leases file
  delegate_to: "{{ dhcp_server_host }}"
  become: true
  # Based on <https://stackoverflow.com/a/53462379>
  ansible.builtin.command:
    argv:
      - perl
      - -0777spi
      - -e
      - s/^lease [0-9.]+ \{.*?^\s+hardware ethernet \Q$mac\E;\n.*?^\}\n//smg
      - --
      - -mac={{ mac_address }}
      - /var/lib/dhcp/dhcpd.leases
- name: The isc-dhcp-server service is started
  delegate_to: "{{ dhcp_server_host }}"
  become: true
  ansible.builtin.service:
    name: isc-dhcp-server
    state: started
...

The restart handlers (back in handlers/main.yaml) are very simple:

- name: dnsmasq is restarted
  become: true
  delegate_to: "{{ dhcp_server_host }}"
  ansible.builtin.service:
     name: dnsmasq
     state: restarted
- name: isc-dhcp-server is restarted
  become: true
  delegate_to: "{{ dhcp_server_host }}"
  ansible.builtin.service:
     name: isc-dhcp-server
     state: restarted

dhcp role entry points

Finally, the new entry points need their arguments documenting (and enforcing) in meta/argument_specs.yaml:

host-ignore-client-id:
  short_description: >-
    Configure the DHCP server to ignore client id for a mac address.
  description: >
    Configure the DHCP server (not necessarily on the current
    targeted host) to ignore the client id from the DHCP client
    (causing it to instead use the MAC address) from the given MAC
    address.
  options:
    dhcp_server_host:
      description: >-
        The inventory host to delegate the DHCP tasks to (i.e.
        the DHCP server).
      type: str
      required: true
    mac_address:
      description: >-
        The mac address of the host to ignore the client id for
        (in the format aa:bb:cc:dd:ee:ff)
      type: str
      required: true
host-unignore-client-id:
  short_description: >-
    Configure the DHCP server to stop ignoring client id for a mac
    address.
  description: >
    Configure the DHCP server (not necessarily on the current
    targeted host) to stop ignoring the client id from the DHCP
    client from the given MAC address, when previously configured
    by host-ignore-client-id..
  options:
    dhcp_server_host:
      description: >-
        The inventory host to delegate the DHCP tasks to (i.e.
        the DHCP server).
      type: str
      required: true
    mac_address:
      description: >-
        The mac address of the host to stop ignoring the client id
        for (in the format aa:bb:cc:dd:ee:ff)
      type: str
      required: true

While working on this, I happened to update ansible-core to version 2.15.6 on my laptop and discovered a bug, which I reported on GitHub, that was not present in the Debian packaged version (2.14.3). My dhcp role has a number of entry points but no default entry point, i.e. no main.yaml in its tasks directory. To work around this, I added one temporarily (until the bug is fixed):

---
- name: Do not use this entry point
  ansible.builtin.fail: msg="main.yaml only exists to work around https://github.com/ansible/ansible/issues/82241 - do not use!"
...