Ansibleising iPXE configuration

This post ws going to be about migrating my iPXE configuration from being managed by SaltStack to Ansible. It was supposed to be the start of migrating all of my configuration management over to simplify bare-metal DR and managing test/development infrastructure without first deploying a master, or control, server. Starting with iPXE is motivated by needing to update the configuration following my lab experiment, moving it from duplicating the kernel and initial ramdisks on the tftp server to fetching them directly from a mirror. However, things turned out not to be so simple and I ended up restructuring the network and migrating some other configurations before getting back to iPXE.

Detecting the lab environment

Something I have been racking my brain about is how to detect whether a device is in the lab or live network. Embarrassingly recently, after much musing such as perhaps matching different mac address, switch detection, default route (or lack of, in the air-gapped lab) or even system serial numbers, I realised that as I could just use the domain name of the machine. I have given the lab a subdomain of my live one, which makes it different and therefore usable as a differentiator. To work around bootstrapping the first machine in a network, which is why I was fixated on using physical system identifiers initially, I just need to set its domain name correctly locally. This can be done during the base OS install or manually (re)configured, e.g. if using a “Live” boot disk to start recovering systems as I did to setup this network.

After a false-start with include_vars based on domain (which fails if the file to be included does not exist), I found this was easy to do using Ansible’s support for dynamic groups with ansible.builtin.group_by. In this case, I created a dynamic group called domain_ then the domain name, with dots replaced by underscores. I did initially try hyphens but, despite there being examples with hyphens in the documentation, I got a warning about unreplaced invalid characters so stuck with underscores.

    - name: Group hosts by their domain, dynamically
      ansible.builtin.group_by:
        key: domain_{{ ansible_facts.domain | replace('.', '_') }}

The dynamic groups pick up files in the group_vars directory (see the Ansible documentation on organizing host and group variables for more information) without any further persuasion, so I can just tag whether the environment is production or development by setting a variable in the corresponding file. For example:

group_vars/domain_lab_my_domain_tld.yaml:

---
my_environment: dev
...

group_vars/domain_my_domain_tld.yaml:

---
my_environment: prod
...

2 factor authentication

My next challenge was that Ansible doesn’t play nice with 2 factor authentication. I had managed to work around this when running locally by pre-authenticating to sudo (e.g. by running sudo -l and supplying my 2fa code, which then authorised by session for 5 minutes) before running ansible-playbook. This obviously doesn’t work on remote hosts and kept timing out, even after upping the SUDO session time to 30 minutes, so I tried to write a little plugin that wraps the Ansible provided one in order to detect the 2FA prompt. As the TOTP tokens are only valid for 30 seconds, this would mean that playbooks need to be quick. Longer term I need to look at alternative solutions but I do not want to compromised my current setup by going back to just using passwords (even if 2 different passwords are required to authenticate).

The become plugin just needs placing in the become_plugins directory and can be used by setting the become_method keyword, or ansible_become_method variable, to its name.

# Extending the Ansible provided sudo module
from ansible.plugins.become.sudo import BecomeModule as BecomeBase

class BecomeModule(BecomeBase):
    name = 'sudo_2fa'

    # Modelled on the same method in the Ansible provided su module
    def check_password_prompt(self, b_output):
        ''' checks if the expected password prompt exists in b_output '''

        # Prompt given by pam_google_authenticator
        prompts = ['Verification code']
        b_password_string = b"|".join((br'(\w+\'s )?' + to_bytes(p)) for p in prompts)
        # Colon or unicode fullwidth colon
        b_password_string = b_password_string + to_bytes(u' ?(:|：) ?')
        b_su_prompt_localizations_re = re.compile(b_password_string, flags=re.IGNORECASE)
        return bool(b_su_prompt_localizations_re.match(b_output))

Anyway, in the words of Monty Python, “get on with it!…”

Next steps

I originally wrote the title “iPXE configuration” here, thinking it would be a simple case of telling Ansible to do that but I failed to find a way to get Ansible working properly with my 2 factor sudo setup. My little plugin, above, only worked intermittently and would not work on remote hosts. Basically, this is the point I disappeared down a rabbit-hole that took many months to climb back out of - starting with coming up with a design to restructure my network to introduce a bastion/jumphost for Ansible.