I previously created a playbook to remotely unlock my encrypted root paritions, which works well but as a playbook it can only be run directly or utilised via ansible.builtin.import_playbook. To make it more flexible, for example to allow it to be used as part of a reboot cycle inside another role, I decided to make it a role.

The first part of my existing playbook deals with finding the host’s connection information from the DHCP server, if the inventory_hostname is not resolvable and ansible_host has not already been set to a specific value. I decided not to include this, in order to keep the role focused on the task at hand (unlocking) - it will accept the target host as an argument, falling back to ansible_host and finally inventory_hostname. (Aside: I am dissatisfied with my current “look up the IP from the DHCP server” discovery method as there are a number of problematic edge-cases, it is on my backlog to look into revising it).

The new role

The tasks/main.yaml of the new unlock-crypt role are almost a copy-and-paste of the existing playbook’s unlock sequence:

---
- name: Wait for dropbear ssh daemon to be up
  delegate_to: localhost
  ansible.builtin.wait_for:
    host: "{{ unlock_crypt_target_host }}"
    port: "{{ unlock_crypt_port }}"
- name: Unlock encrypted disk
  # No python in the initramfs. Work around ansible.builtin.raw
  # not supporting stdin (https://github.com/ansible/ansible/issues/34556)
  delegate_to: localhost
  # Accept new (but not changed) host keys - so first connection after
  # install works (provided old key has been removed)
  ansible.builtin.command:
    cmd: >
      ssh
      -o StrictHostKeyChecking=accept-new
      -p {{ unlock_crypt_port }}
      -l root
      {{ unlock_crypt_target_host }}
      cryptroot-unlock
    stdin: "{{ unlock_crypt_key }}"
    stdin_add_newline: false
- name: Wait for host to be properly up
  delegate_to: localhost
  ansible.builtin.wait_for:
    host: "{{ unlock_crypt_target_host }}"
    port: "{{ unlock_crypt_check_port }}"
...

The arguments_spec.yaml file looked like this:

---
argument_specs:
  main:
    short_description: Unlocks the LUKS disk of a remote system
    author: Laurence Alexander Hurst
    options:
      unlock_crypt_target_host:
        description: Target host (must be accessible by `ssh`ing to this name/address)
        type: str
        default: "{{ hostvars[inventory_hostname]['ansible_host'] | default(inventory_hostname) }}"
      unlock_crypt_port:
        description: Port of the ssh server on the target host to which to issue the unlock command
        type: int
        default: 22
      unlock_crypt_key:
        description: Unlock key
        type: str
        required: true
      unlock_crypt_check_port:
        description: Port to wait to appear on the target server to confirm it is unlocked and reade
        type: int
        default: 22
...

and the defaults/main.yaml file:

---
unlock_crypt_port: 22
unlock_crypt_target_host: "{{ hostvars[inventory_hostname]['ansible_host'] | default(inventory_hostname) }}"
unlock_crypt_check_port: 22
...

Using the new role

Initially I did a straight drop-in replace of the existing task in my unlock-crypt.yaml playbook:

- name: Unlock remote host
  ansible.builtin.include_role:
  name: unlock-crypt
  vars:
    unlock_crypt_target_host: >-
      {{
        dhcp_lease.ip
        | default(hostvars[inventory_hostname]['ansible_host'])
        | default(inventory_hostname)
      }}
    unlock_crypt_port: 2222
    unlock_crypt_key: >-
      {{
        lookup(
          'community.hashi_vault.vault_read',
          'kv/hosts/' + inventory_hostname + '/luks/passphrase'
        ).data.passphrase
      }}

However, when I wanted to use this in another role (specifically installing Proxmox virtual environment, which requires a kernel change before installing the main packages and therefore a reboot) it made sense to move the port and key variables to a group’s variables. I originally did this via the domain level groups but that relied on a fact to correctly determine the domain which may not work, depending on if there are cached facts available, if the host is not yet unlocked. In the end I created a new_world_order group, on which I also set ansible_user to the specific user I created host-specific passwords for, as this should be the norm for all my hosts eventually.

unlock_crypt_port: 2222
unlock_crypt_key: >-
  {{
    lookup(
      'community.hashi_vault.vault_read',
      'kv/hosts/' + inventory_hostname + '/luks/passphrase'
    ).data.passphrase
  }}

The use of the the IP address by preference might be specific to the installation process (I need to think this through - the changing client ID problem makes me thing this is a nuanced problem) so I left this setting local to the unlock-crypt.yaml playbook for now.

With these variables defined at the group level, using this in another role was easy (although you can see from the preceeding comment that this is not the final arrangement - I have just not yet decided on the best way to proceed):

# XXX this is very site specific... (maybe we need to take a reboot
# method/role that is an argument, defaulting to
# ansible.builtin.reboot?)
- name: Host is rebooted if kernel is updated
  become: true
  # Ansible's reboot command waits, and checks, for the host to
  # come back, which will never happen. Even with async (see below)
  # there is a race condition if the ssh connection gets closed (by
  # the shutdown process) before Ansible has disconnected so it is
  # necessary to delay the shutdown command by longer than the
  # async value, in order to avoid this problem.
  ansible.builtin.shell: 'sleep 2 && /usr/bin/systemctl reboot --message="Ansible triggered reboot for LUKS unlock sanity check."'
  # Run in background, waiting 1 second before closing connection
  async: 1
  # Launch in fire-and-forget mode - with a poll of 0 Ansible skips
  # polling entirely and moves to the next task, which is precisely
  # what we need.
  poll: 0
  when: kernel_updated.changed
- name: Unlock remote host
  # Uses:
  # unlock_crypt_port from `all` group_vars
  # unlock_crypt_key from `all` group_vars
  ansible.builtin.include_role:
    name: unlock-crypt
  when: kernel_updated.changed