Adding a bastion host - migrating additional roles to ansible

Having thought I had finished migrating my monitoring server configuration to Ansible, however when I came to start the next step of setting up the bastion I noticed I had neglected to consider the existing 2 factor configuration. Which is rather shocking considering Ansible not playing nicely with it is the motivation for this work…

Investigating where this configuration originates from in my SaltStack states, I realised I had migrated the specific states applied to my monitoring server (via roles I add in its pillar data) and missed all of the generic roles applied more broadly.

Overview of generic roles

These generic roles, and what they do, are:

all - applied to all hosts
- Includes all roles defined as keys in the pillar dictionary roles, where the value for that key (role) is truthy
hardware.all - not a role, strictly, but I’ve included it here as it is the only non-role applied to everything.
- Includes all hardware states in the pillar list hardware
linux - applied to all Linux hosts (kernel grain is Linux)
- Includes the unix role
- Includes the mail.profiles state
- Includes the linux.hostname state
osx - applied to all hosts running macOS (OS grain is MacOS)
- Includes the unix role
- Includes the osx.ssh_keys state
- Includes the desktop role
debian - applied to all hosts running Debian Linux (OS grain is Debian)
- Includes the linux.apt.repos state
- Includes the linux.packages.debian-goodies state
- Includes the linux.packages.reportbug state
- Includes the linux.debian.auto-update state
- Includes the linux.network.debian.interfaces state if networking.interfaces dictionary is defined in the pillar data
my-system - applied to all hosts whose name is inside my domain (minion id matches *.my.domain.tld)
- Includes the linux.hardware.extra-mounts state
- Includes the linux.hardware.pcspkr-blacklist state
- Includes the linux.base-utils state
- Includes the linux.server.ssh state
- Includes the linux.security.google-authenticator state
- Includes the linux.security.google-authenticator.sudo state

As can be seen, my roles are all “meta” in that they do not do anything other than include other roles and/or states. This list is a lot of names and not a lot of information about them, but I will now go through them and migrate what I need do. To begin, all and hardware.all can be disregarded as they just include things from the pillar data - however this does mean I need to revisit the pillar data for my monitoring server. When I did this, I found these additional roles (and what they do) that are not defined in the host-specific configuration (they either come from other pillar data files included in the host-specific one or from generic pillar files applied generally):

backuppc-client - applied to all hosts whose host-specific pillar file contains any backuppc client configuration pillar (there are different pillar files for different backup methods, e.g. local/remote, tar/rsync/ssh etc.)
- Includes the appropriate state from backuppc.client. based on client type pillar value
webserver - sets up nginx on the host
- Includes the linux.server.web.nginx state
webserver-ssl - configures ssl for nginx on the host
- Includes the webserver role
- Includes the linux.server.web.nginx.ssl state
ntp-client-local - configures the ntp client
- Includes the linux.network.ntp.client.local state

These additional roles are included by other roles but not yet described:

unix (linux and osx roles)
- Included the packages.coreutils state
- Included the packages.git state
- Included the unix.users.root state
- Included the unix.users.my-user state
- Included the unix.users.user-local-config.dotfiles state
- Included the unix.cifs-creds state
desktop (osx role)
- Includes the packages.sublime-text state
- Includes the software.desktop.sublime-text state
- Includes the packages.x2goclient state

The following roles and states have already been migrated, during the work done to date migrating the monitoring server:

webserver role
webserver-ssl role
linux.apt.repos state

The osx.ssh_keys, linux.security.google-authenticator and linux.security.google-authenticator.sudo can also be disregarded as I currently do not have any macOS systems and the 2 factor authentication is about to be re-worked on all my hosts through setting up the bastion host. I disregarded linux.network.debian.interfaces for now, as my new Ansible setup will install and configure NetworkManager on the bastion (although I probably want to statically configure it, so it remains accessible if there are network problems - one for the “TODO” list). Ansible uses ssh to login and so setting up ssh is already part of my base OS install so I disregarded linux.server.ssh. Finally, I also disregarded the desktop role as none of the software it installs (via the states it includes) I use any more.

This leaves me with this list of roles to migrate:

~~all~~
~~hardware.all~~
linux
~~osx~~
debian
my-system
backuppc-client
~~webserver~~
~~webserver-ssl~~
ntp-client-local
unix
~~desktop~~

linux role

The linux role, in its entirety, is:

# Base role for all Linux systems
include:
  - roles.unix
# All Linux systems need some sort of functioning mail service, even if only
# for local daemons to send notications locally.
  - mail.profiles
  - linux.hostname  # Have Salt maintain the hostname

roles.unix I will look at later, so I just need to look at mail.profiles and linux.hostname.

mail.profiles

This role installs Postfix and configures it according to the profile attached to the system. For most systems, this is a very simple configuration to relay local mail out to my mail server. For now, I will solely implement this and kick configuring the mail server’s postfix install until I bring that under Ansible’s control.

I created a role called mail-transfer-agent for this. I applied this role to all Linux systems except those in the dummy and mail_servers groups:

- hosts: all:!dummy:!mail_servers
  tags: common
  tasks:
  - name: Mail relaying is configured on Linux systems that aren't mail servers
    ansible.builtin.import_role:
      name: mail-transfer-agent
    vars:
      relay_host: "{{ mail_relay_host }}"
      relay_domains: "{{ mail_relay_domains }}"
      recipient_delimiter: "{{ mail_recipient_delimiter }}"
      forward_root_to: "{{ mail_forward_root_to }}"
    when: ansible_facts.system == 'Linux'

The variables I set in the relevant group file in the group_vars/ directory, e.g. for internal host group:

---
# Send via mailserver's VPN address, since Virgin IPs are
# blacklisted by spamhaus so direct connections are rejected
# by my host.
mail_relay_host: mailserver.home.mydomain.tld
mail_forward_root_to: user@mydomain.tld
# Allow systems to relay (i.e. send email from local daemons etc.
# onwards) to these domains.
mail_relay_domains:
  - mydomain.tld
  - myotherdomain.tld
...

I also started a mail_servers group with (e.g.) recipient_delimiter set, however this is not complete (I’m leaving this for another day). Fortunately my mail server is not in the internal group, so for now I do not need to worry about the orders these variables are resolved in and conflicts, but I can foresee this being a problem in the future (e.g. if I have an internal system that is also a mail server).

The new role’s argument specification (meta/argument_specs.yaml):

---
argument_specs:
  main:
    short_description: Configure the system's mail transfer agent
    author: Laurence Alexander Hurst
    options:
      relay_host:
        description: The relay host
        type: str
        required: true
      relay_domains:
        descriptions: Domain names to relay (i.e. send email from local daemons etc. onwards)
        type: list
        elements: str
        required: true
      forward_root_to:
        description: Email address to forward root's email to
        type: str
        required: true
      recipient_delimiter:
        description: Recipient delimiter value
        type: str
        required: false
...

I added a handler to restart postfix (e.g. for configuration changes) in the role’s handlers/main.yaml:

---
- name: Restart Postfix
  become: true
  ansible.builtin.service:
    name: postfix
    state: restarted
...

The template for the Postfix main configuration, templates/main.cf, is:

smtpd_banner = $myhostname ESMTP $mail_name (Debian/GNU)
biff = no

# appending .domain is the MUA's job.
append_dot_mydomain = no

# Uncomment the next line to generate "delayed mail" warnings
#delay_warning_time = 4h

readme_directory = no

# See http://www.postfix.org/COMPATIBILITY_README.html -- default to 2 on
# fresh installs.
compatibility_level = 2

smtpd_relay_restrictions = reject_unauth_destination, permit_mynetworks
myhostname = {{ ansible_facts.fqdn }} 
alias_maps = hash:/etc/aliases
alias_database = hash:/etc/aliases
myorigin = /etc/mailname
mydestination = $myhostname, localhost.home.entek.org.uk, localhost
relayhost = {{ relay_host }}
mynetworks = 127.0.0.0/8 [::ffff:127.0.0.0]/104 [::1]/128
mailbox_size_limit = 0
{% if recipient_delimiter %}
recipient_delimiter = {{ recipient_delimiter }}
{% endif %}
relay_domains = {{ relay_domains | join(', ') }}
# Only listen for local connections
inet_interfaces = loopback-only
inet_protocols = all

Finally, the tasks to install, configure and start Postfix are:

---
- name: Install postfix
  become: true
  ansible.builtin.package:
    name: postfix
    state: present
- name: Deploy postfix main configuration
  become: true
  ansible.builtin.template:
    owner: root
    group: root
    mode: 00644
    src: main.cf
    dest: /etc/postfix/main.cf
  notify: Restart Postfix
- name: Configure root user forwarding
  become: true
  # This used to be done by creating /root/.forward but 
  # /etc/aliases seemed better for some reason.
  ansible.builtin.lineinfile:
    path: /etc/aliases
    line: "root:\t{{ forward_root_to }}"
    regexp: '^root:\s'
    state: present
  register: root_forward_aliases
- name: Update aliases database
  become: true
  ansible.builtin.command:
    cmd: /usr/sbin/postalias /etc/aliases
  when: root_forward_aliases.changed
- name: Ensure postfix is running
  become: true
  ansible.builtin.service:
    name: postfix
    enabled: true
    state: started
...

linux.hostname

linux.hostname sets the hostname to the Salt id grain (which is the minion’s id, set in /etc/salt/minion_id). As I migrated to Ansible, I originally thought that rather than use the inventory hostname (which allows the name in the inventory to be completely arbitrary), I could use the host’s IP’s reverse DNS (i.e. the IP address’ PTR record). I previously setup Debian pre-seed files and DHCP server to use DNS for setting hostname so it seems sensible to tell Ansible to set the hostname via the same mechanism. However, Ansible determines the “default” interface by getting the route to either Google’s DNS server or ipv6.google.com - this causes multiple problems - on multi-homed systems, the route to Google might not be the interface whose IP address should be used for the hostname and on networks with no route to Goole (e.g. with no default route - like my air-gapped home-lab this fails completely. Trusting reverse DNS also only makes sense for systems when they are in a trusted DNS server (i.e. my internal one) so mobile clients, like my laptop, needs consideration and I might also have problems if I make some things more dynamic within my network.

That all said, my playbook decides to use local as the connection method when the target’s hostname matches the inventory hostname. In the end, I decided to adopt this approach instead (relying on Ansible’s variable precedence rules):

Set the hostname variable to {{ inventory_hostname }}{% if '.' not in inventory_hostname and ansible_facts.domain %}.{{ ansible_facts.domain }}{% endif %} in group_vars/all.yaml.
Override the hostname variable for specific hosts (or groups) where it is desirable for the hostname not to match the inventory hostname.
Create a hostname role that simply configures the system’s hostname according to the hostname variable.

The and ansible_facts.domain truthy test in {{ inventory_hostname }}{% if '.' not in inventory_hostname and ansible_facts.domain %}.{{ ansible_facts.domain }}{% endif %} was needed to cater for non Active Directory domain joined Windows systems, which do not have a populated domain fact, but is a sensible safety net in case it is not set on other systems.

The `hostname` Ansible role

I gave it this argument spec in meta/argument_specs.yaml:

---
argument_specs:
  main:
    short_description: Set the system's hostname
    author: Laurence Alexander Hurst
    options:
      hostname:
        description: The desired FQDN for the system
        type: str
        required: true
...

The role’s tasks/main.yaml looks like this:

---
- name: Set Windows hostname
  block:
    - name: Set the new DNS suffix (takes effect on reboot)
      ansible.windows.win_regedit:
        # Can't use split filter as it does not support limiting
        # the number of splits.
        data: "{{ hostname.split('.', 1)[1] }}"
        path: HKLM:\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters
        # According to http://systemmaanger.ru/win2k_regestry.en/92767.htm
        # NV Domain is used to update Domain (at the same key) on reboot,
        # so setting this and rebooting should be all that's needed.
        name: "NV Domain"
        state: present
        type: string
      notify: Reboot Windows  # From `common` role
    - name: Set hostname
      ansible.windows.win_hostname:
        # Windows will only allow short names here
        name: "{{ hostname | split('.') | first }}"
      register: win_hostname_result
    - name: Check if reboot required and notify handlers
      ansible.builtin.debug: msg="Reboot is{% if not win_hostname_result.reboot_required %} not{% endif %} required"
      changed_when: win_hostname_result.reboot_required
      notify: Reboot Windows # From `common` role
  when: ansible_facts.system == 'Win32NT' and ansible_facts.fqdn != hostname
- name: Set Linux hostname
  block:
    # XXX what about ipv6?
    - name: Set localhost entry in /etc/hosts correctly
      become: yes
      # Ensure that 127.0.0.1 is only "localhost" and not (an old) hostname
      ansible.builtin.lineinfile:
        path: /etc/hosts
        line: "127.0.0.1\tlocalhost"
        regexp: '^127\.0\.0\.1'
        owner: root
        group: root
        mode: 00644
    - name: Set hostname entry in /etc/hosts correctly
      become: yes
      # XXX Check if 127.0.1.1 is Debian specific or is also default on EL (e.g. Rocky)?
      # Not that it matters if 127.0.0.1 is correctly set to just
      # `localhost`.
      ansible.builtin.lineinfile:
        path: /etc/hosts
        line: "127.0.1.1\t{{ hostname }}\t{{ hostname | split('.') | first }}"
        regexp: '^127\.0\.1\.1'
        owner: root
        group: root
        mode: 00644
    - name: Set hostname
      become: yes
      ansible.builtin.hostname:
        # Should only ever set the hostname to the shortname.
        # Name resolution configuration will determine FQDN
        # (which is why we configure hosts with it).
        # See `hostname` command's manual page section "THE FQDN".
        name: "{{ hostname | split('.') | first }}"
  when: ansible_facts.system == 'Linux'
...

For my Windows system, I set the hostname to the FQDN directly on the host in inventory.yaml (as domain is blank) for now - longer term I might want to do something cleverer (to tie in with my domain to environment detection, which won’t work with this static hostname):

windows:
  hostname: windows.home.my.domain.tld

Looking up the dns name

Despite abandoning the idea, I have included this information for posterity as it may be of future use to me (or someone else)…

I encountered a problem at this point, due to my previous decision to use fact caching. With caching enabled, it seems Ansible does not update the cache until the timeout expires (which is one way caches can work, another is to use the cache values only if fresh ones are not available) - with an infinite timeout this means it will never be updated.

While testing this, I observed some very odd behaviour on my laptop, when I noticed its reverse DNS (according to Ansible did not match what dig returned - troubleshooting this I determined that default_ipv4.address was still returning the hosts IP address from when I was at home. I was in a Costa Coffee shop, between some personal appointments, working on this with my laptop connected to my mobile broadband router. Despite default_ipv4.address returning what must have been the cached value, the all_ipv4_addresses fact did contain the correct (current) IP address. This is despite that IP address not being listed in any other fact - the default_ipv4 and interface’s facts all had the old IP address. This meant that all_ipv4_addresses contained IP address that did not exist according to the other facts, and the other facts contained IP address that did not exist according to all_ipv4_addresses. I went to report this as a bug, however when I tried to reproduce it at home I was unable to.

In any case, the behaviour I want is to update the cache of all “alive” systems and use the existing cached values for everything else - the cache can be explicitly updated by running the ansible.builtin.setup module, and as Ansible will exclude any system it could not previously connect to this will not be attempted on ones that are uncontactable so their “old” cache data will remain intact.

Once I have up-to-date cache values, looking up the DNS name is a simple case of using the community.general.dig module - although, despite the documentation saying TYPE can be specified as either qtype=TYPE or appended as /TYPE, only the latter works to make the module internally reverse the ip address and append .in-addr.arpa. before making the DNS lookup. This difference in behaviour is not explicitly documented although only the /PTR form is show in the examples with a plain IP address.

- name: Ensure fact cache is up to date
  ansible.builtin.setup:
- name: Get the reverse DNS
  ansible.builtin.set_fact:
      # The lookup documentation says these are synonymous - in practice, only the '/PTR' format works for directly looking up reverse record from IPs
    #reverse_dns_name: "{{ lookup('community.general.dig', ansible_facts.default_ipv4.address, qtype='PTR') }}"
    reverse_dns_name: "{{ lookup('community.general.dig', ansible_facts.default_ipv4.address ~ '/PTR') }}"

If the IP does not have a pointer DNS record, the fact will contain the string NXDOMAIN.

debian role

This role applies these states:

linux.apt.repos
linux.packages.debian-goodies
linux.packages.reportbug
linux.debian.auto-update
linux.network.debian.interfaces if networking.interfaces dictionary is defined in the pillar data

linux.apt.repos configures the apt repositories, which are already done in an existing role. For now I am using Network Manager to configure my systems, so I am not migrating linux.network.debian.interfaces to Ansible.

linux.packages.debian-goodies, linux.packages.reportbug and linux.debian.auto-update install a selection of packages, some after pre-configuring them with debconf. As these are applied to all Debian systems in my Salt Stack configuration, I just added them to my existing common role’s tasks. I added a task to include an OS family specific task file (if it exists) to the Linux-specific task file:

- name: Include distribution specific tasks (if they exist)
  ansible.builtin.include_tasks: "{{ item }}"
  loop: "{{ q('ansible.builtin.first_found', [ansible_facts['os_family'] + '.yaml',], skip=true) }}"

The Debian task file (tasks/Debian.yaml in my common role) does these:

---
- name: Install Debian specific packages
  become: true
  ansible.builtin.package:
    name:
      # Provides checkrestart and other goodies
      - debian-goodies
      # Debian's bug reporting tool
      - reportbug
- name: Configure Debian auto-update
  block:
    - name: Install debconf utilities
      become: true
      ansible.builtin.package:
        name: debconf-utils
    - name: Display both news and changelogs
      become: true
      ansible.builtin.debconf:
        name: apt-listchanges
        question: apt-listchanges/which
        value: both
        vtype: select
    - name: Include headers listing name of package and names of binary packages being upgraded
      become: true
      ansible.builtin.debconf:
        name: apt-listchanges
        question: apt-listchanges/headers
        value: true
        vtype: boolean
    # Do this after debconf answers have been pre-seeded (above)
    - name: Install apt-listchanges packages
      become: true
      ansible.builtin.package:
        name: apt-listchanges
        state: present
    - name: Enable unattended upgrades
      become: true
      ansible.builtin.debconf:
        name: unattended-upgrades
        question: unattended-upgrades/enable_auto_updates
        value: true
        vtype: boolean
    # Do this after debconf answers have been pre-seeded (above)
    - name: Install unattended-upgrades packages
      become: true
      ansible.builtin.package:
        name: unattended-upgrades
        state: present
...

my-system role

This role applies the following states:

linux.hardware.extra-mounts
linux.hardware.pcspkr-blacklist
linux.base-utils
linux.server.ssh
linux.security.google-authenticator
linux.security.google-authenticator.sudo - Enforce sudo only via 2nd-factor (implicitly installs sudo via requirements)

These were separated out in Salt Stack as they were the common tasks that were applied to only systems I owned, i.e. those which I have root access to (to install software etc.). I have not separated out tasks for systems I only have user level access to, yet, so will be merging these with existing roles instead.

linux.hardware.extra-mounts

This role added extra mount points, defined in the pillar data. Despite the name, some mounts were network mounts rather than hardware ones - for now, for the systems I am currently migrating, it is only network mounts I am doing. I merged this role into my existing common Ansible role, which uses variables to determine which (if any) mounts to add. It looks for any variables whose name begins mount_ and expects each to be a dictionary of data containing everything necessary to mount the filesystem. This includes any credential files and helper packages. To enable me to deploy the credentials, I ended up installing HasiCorp Vault by hand following my previous Vault installation steps.

To minimise some duplication, I used YAML node anchors to reuse (e.g.) the mount options. The mount variables, for example for my desktop systems, look like this:

---
# Save some repetition
nas_cifs_creds: &nas_cifs_creds
  - path: /etc/cifs_creds/nas
    content: |
      username={{ lookup('community.hashi_vault.vault_read', '/kv/nas').data.username }}
      password={{ lookup('community.hashi_vault.vault_read', '/kv/nas').data.password }}
      domain={{ lookup('community.hashi_vault.vault_read', '/kv/nas').data.domain }}
    mode: "00400"
    owner: root
    group: root
    dir_mode: "00700"
nas_mount_options: &nas_mount_options credentials=/etc/cifs_creds/nas,file_mode=0660,dir_mode=0770,uid=my_user,gid=store,nounix,_netdev
mount_nas_documents:
  src: //nas.home.my.domain.tld/documents
  path: /media/nas/documents
  fstype: cifs
  opts: *nas_mount_options
  boot: "{% if 'mobile' in group_names %}false{% else %}true{% endif %}"
  helper_packages:
    - cifs-utils
  state: "{% if 'mobile' in group_names %}present{% else %}mounted{% endif %}"
  files: *nas_cifs_creds
mount_nas_archive:
  src: //nas.home.my.domain.tld/archive
  path: /media/nas/archive
  fstype: cifs
  opts: *nas_mount_options
  boot: "{% if 'mobile' in group_names %}false{% else %}true{% endif %}"
  helper_packages:
    - cifs-utils
  state: "{% if 'mobile' in group_names %}present{% else %}mounted{% endif %}"
  files: *nas_cifs_creds
# etc.
...

The tasks to actually setup the mounts (in my common role) are:

- name: Setup additional mounts
  block:
    - name: Collate list of mounts
      ansible.builtin.set_fact:
        additional_mounts: "{{ additional_mounts + q('ansible.builtin.vars', item) }}"
      loop: "{{ q('ansible.builtin.varnames', '^mount_.+') }}"
      vars:
        additional_mounts: []  # Fact will take precedence once defined
    - block:
        - name: Ensure required helper packages are installed
          become: true
          ansible.builtin.package:
            name: "{{ item }}"
            state: present
          loop: "{{ additional_mounts | selectattr('helper_packages', 'defined') | map(attribute='helper_packages') | unique }}"
        - name: Ensure directories for required files exist
          become: true
          ansible.builtin.file:
            path: "{{ item.path | dirname }}"
            owner: "{{ item.dir_owner | default(item.owner) }}"
            group: "{{ item.dir_group | default(item.group) }}"
            mode: "{{ item.dir_mode | default(item.mode) }}"
            state: directory
          loop: "{{ additional_mounts | selectattr('files', 'defined') | map(attribute='files') | flatten | unique }}"
        - name: Create required files (e.g. credential files)
          become: true
          ansible.builtin.copy:
            dest: "{{ item.path }}"
            owner: "{{ item.owner }}"
            group: "{{ item.group }}"
            mode: "{{ item.mode }}"
            content: "{{ item.content }}"
          loop: "{{ additional_mounts | selectattr('files', 'defined') | map(attribute='files') | flatten | unique }}"
        - name: Ensure mounts are in correct state
          become: true
          ansible.posix.mount:
            path: '{{ item.path }}'
            src: '{{ item.src }}'
            fstype: '{{ item.fstype }}'
            opts: '{{ item.opts | default("") }}'
            boot: '{{ item.boot | default(true) }}'
            state: '{{ item.state | default("mounted") }}'
          loop: '{{ additional_mounts }}'
      when: additional_mounts | default(false)

linux.hardware.pcspkr-blacklist

This module simply blacklists the pcspkr module to stop the systems beeping. I added this to the hardware role in Ansible with a disable-pcspkr task:

---
- name: pcspkr module is blacklisted and unloadable by modprobe
  become: true
  ansible.builtin.template:
    dest: /etc/modprobe.d/pcspkr-blacklist.conf
    src: blacklist-module.conf
    mode: 00444
  vars:
    module_name: pcspkr
- name: pcspkr is not currently loaded
  become: true
  community.general.modprobe:
    name: pcspkr
    state: absent
...

The template file is very simple:

{{ ansible_managed | comment }}
blacklist {{ module_name }}
install {{ module_name }} /usr/bin/false

And I added disable-pcspkr to the list of choices in the argument_specs.yaml file.

I added this as a group variable on internal hosts:

hardware_components:
  - disable-pcspkr

and added it to the lists on hosts with host-level hardware_components list.

linux.base-utils

The SaltStack version of this module installs:

unzip
vim
tmux
openssh-client
less
hwinfo
lsof
htop
unrar (which is now called unrar-free in Debian)
imagemagick
man

I already have a common role that installs a number of these packages (amongst others, like bc), so I just added the missing ones to the list (unzip, openssh-client, less, hwinfo, lsof, unrar-free).

imagemagick was already part of my desktop role, so I did not need to do anything to add that.

backuppc-client

This role configures clients for being backed-up with BackupPC. There are only 2 modules in my SaltStack configuration, remote-ssh and local-sudo. remote-ssh places the BackupPC’s ssh key into the root user’s authorized_keys file, local-sudo gives the backuppc user permissions to run the two specific tar commands that BackupPC uses for local back-ups through sudo.

For the new, Ansible, version I decided to replace direct login as root with a specific user and sudo premissions to run rsync only. It will be further locked down by specifying the ssh command in authorized_keys, restricting the user to only running that command on login - inspired by a post I found on the BackupPC-Users mailing list.

I began by adding a backuppc role - this will, eventually, also configure the server but for now I just added client configuration. My existing Windows systems are currently backed up with a packed up version of cygwin’s rsyncd provided by the BackupPC project, although this is not currently setup by any automation/configuration management too and as I noted before:

Modernising, by utilising volume-shallow-copies, and seeing if this can be done with native tools is on my very long to-do list but by the same token this same setup has been working excellently and unchanged for at least 1416 years.

Global role setup

As I knew what I plan to do from the start, with the two ways of backing up (for now), I began by creating the meta/argument_specs.yaml, defaults/main.yaml and tasks/client.yaml (which will just include the relevant client configuration state).

meta/argument_specs.yaml looks like:

---
argument_specs:
  client:
    description: Configures the host as a client to be backed up
    options:
      backuppc_method:
        description: Method backuppc will use to backup this host. Note many method require information from ansible_facts.
        type: str
        default: remote_ssh_sudo_rsync
        choices:
        # XXX add this one (and should it use winrm or Window's native ssh? CIFS would be native, but is it the best choice?)
        #  - remote_winrm_vss_cifs
          - remote_ssh_sudo_rsync
          - local_sudo
      backuppc_user:
        description: User backuppc will login to this host as
        type: str
        default: backuppc
...

defaults/main.yaml:

---
backuppc_user: backuppc
backuppc_method: remote_ssh_sudo_rsync
...

and tasks/client.yaml:

---
- name: The appropriate client method desired state is included.
  ansible.builtin.include_tasks: client_{{ backuppc_method }}.yaml 
...

I also created a new group in the inventory, called backuppc_clients and added a play in my site.yaml to apply this role to it:

- name: Backups are configured
  hosts: backuppc_clients
  tasks:
    - name: BackupPC client configuration is done
      ansible.builtin.import_role:
        name: backuppc
        tasks_from: client.yaml

local-sudo

I began with this one as it only applies to one host (the backup server itself), making it less impactful, and requires no reworking from the existing SaltStack setup.

For now, I just duplicated the SaltStack configuration - this does not have a state for the backuppc user, I presume I did this because the local backup method is only applied to the backup server itself which will have the user created when the BackupPC software is installed. As backuppc was the first Salt state I ever made, it is also possible this was an oversight but, for now, I left it as-is and may revisit that decision when I migrate the server configuration management to Ansible.

The tasks/client_local_sudo.yaml just ensures that the backuppc’s user has the permissions it needs. While doing this, I discovered the limitation in the community.general.sudoers Ansible module that prompeted me to create my own sudoers role has been fixed in the released module so I removed this role.

---
- name: sudo is installed
  become: true
  ansible.builtin.package:
    name: sudo
- name: BackupPC user is permitted to tar/untar via sudo
  become: yes
  community.general.sudoers:
    name: backuppc
    user: ''
    runas: root
    host: ""
    commands:
      # First command is the backup one, second is restore.
      - /usr/bin/env LC_ALL=C /bin/tar -c -v -f - --totals -C *
      - /usr/bin/env LC_ALL=C /bin/tar -x -p --numeric-owner --same-owner -v -f - -C *
...

In my backuppc_servers role, I set the backup_method to local_sudo (by adding group_vars/backuppc_servers.yaml):

---
backuppc_method: local_sudo
...

remote-ssh

Creating the tasks

In order to allow BackupPC to ssh into the remote system, the public ssh key it uses needs to be pushed out to each client. I began by adding this to the backuppc role’s meta/argument_specs.yaml for the client (I have not migrated the server side, which will need the private key deploying, yet):

backuppc_user_ssh_public_key:
  description: SSH public key for backuppc user (only needed for ssh-based backups)
  required: false

Next, I wanted to restrict the new backup user to only being able to run the specific commands required to perform the backup function. As it needs superuser (root) level access, to be able to backup everything, I needed to ensure that these commands were the only commands it could run via Secure Shell (SSH) and sudo.

Initially I only set one command, by adding command="..." to the key options in the user’s .ssh/authorized_keys file but I discovered that my (default) settings mean different arguments are passed for full and incremental backups. I therefore modified this approach by adding a script based on ServerFault answer to restrict commands to the two rsync commands used by incremental and full backups to the role’s templates folder:

#!/bin/sh
{{ ansible_managed | comment }}

# Restricts ssh session to only running full or increemental rsync commands

case "$SSH_ORIGINAL_COMMAND" in
  "/usr/bin/sudo {{ backuppc_full_rsync_command }}")
    $SSH_ORIGINAL_COMMAND
    ;;
  "/usr/bin/sudo {{ backuppc_incr_rsync_command }}")
    $SSH_ORIGINAL_COMMAND
    ;;
  *)
    echo "Access denied by ssh command filter script $0." >&2
    exit 1
    ;;
esac

I determined the commands empirically, by adding ForceCommand logger -p user.notice "$SSH_ORIGINAL_COMMAND" to /etc/ssh/sshd_config on one of the clients and telling BackupPC to start a backup (full and then incremental), which caused the command to be logged.

With all this in place, the client_remote_ssh_sudo_rsync.yaml tasks file became relatively straight-forward; in order to configure both sudo and ssh with the same commands I used a fact:

---
- name: rsync and sudo packages are installed
  become: true
  ansible.builtin.package:
    name:
      - rsync
      - sudo
- name: BackupPC user exists
  become: yes
  ansible.builtin.user:
    name: '{{ backuppc_user }}'
    home: /var/lib/backuppc
    password_lock: true
# rsync command Detemined by adding this to /etc/ssh/sshd_config on
# a client, inside a host match against the backuppc server:
# ForceCommand logger -p user.notice "$SSH_ORIGINAL_COMMAND"
- name: rsync command line is stored in reusable fact
  ansible.builtin.set_fact:
    backuppc_full_rsync_command: >-
        /usr/bin/rsync
        --server
        --sender
        -slHogDtpAXrce.iLsfxC
    backuppc_incr_rsync_command: >-
        /usr/bin/rsync
        --server
        --sender
        -slHogDtpAXre.iLsfxC
- name: BackupPC user may run rsync command with sudo
  become: yes
  community.general.sudoers:
    name: backuppc
    user: '{{ backuppc_user }}'
    runas: root
    host: "{{ ansible_facts.hostname }}"
    commands:
      - '{{ backuppc_full_rsync_command }}'
      - '{{ backuppc_incr_rsync_command }}'
- name: BackupPC restricted to running specific rsync commands
  become: yes
  ansible.builtin.template:
    dest: /usr/local/bin/backup-restricted-ssh-commands
    src: client_remote_ssh_sudo_rsync_commands.sh
    # Do not allow backuppc user to modify this file
    owner: root
    group: root
    mode: 00555
- name: BackupPC ssh keys are configured
  become: yes
  become_user: '{{ backuppc_user }}'
  ansible.posix.authorized_key:
    user: '{{ backuppc_user }}'
    key: '{{ backuppc_user_ssh_public_key }}'
    key_options: 'no-port-forwarding,no-X11-forwarding,no-agent-forwarding,command="/usr/local/bin/backup-restricted-ssh-commands"'
    exclusive: true
...

Configuring ssh keys

I imported the backuppc user’s existing keys to my Hasicorp Vault:

vault kv put -mount=kv /backuppc/users/backuppc/ssh_keys private_key=@id_ed25519 public_key=@id_ed25519.pub

Once imported, I simply added a lookup to a new group_vars/backuppc_clients.yaml file to make it available to hosts that need it:

---
backuppc_user_ssh_public_key: "{{ lookup('community.hashi_vault.vault_read', '/kv/backuppc/users/backuppc/ssh_keys').data.public_key }}"
...

Full role-out

After testing this, I rolled it out to all of my existing systems (I simply made servers a child of backuppc_clients in my inventory) and changed PermitRootLogin prohibit-password to PermitRootLogin no, as well as ensuring my keys are configured and disabling password login, on all of them.

On the BackupPC server, I changed the global RsyncClientPath to /usr/bin/sudo /usr/bin/rsync and RsyncSshArgs to ['-e', '$sshPath -l backuppc'] (from ... -l root) to make this the default for the rsync method. This worked particularly well as systems I backup without root access were already overriden with the appropriate username argument.

ntp-client-local

Since I started working on the tasks documented in this post (2nd June 2023), I have already added ntp client configuration to Ansible so this was done.

unix

This SaltStack module installs coreutils and git packages, which I added to my common role as I did with linux.base-utils.

unix.users.root configured the root user password, which is already done during the “bootstrapping” step of setting up a host now - so this is already taken care of.

unix.cifs-creds stores credentials for use when mounting CIFS filesystems. This has already been migrated to be part of a more generic “additional mounts” block in the common role, which supports configuring arbitrary files, so is redundant.

unix.users.my-user and unix.users.user-local-config.dotfiles

unix.users.my-user creates my own user, and unix.users.user-local-config.dotfiles deploys a load of “dotfiles” from SaltStack. I migrated these to a new interactive role and added a variable interactive_users, which contains a list of users to configure.

Creating users

I added an entry point that configures one user (tasks/configure_user.yaml), and made the default entry point (tasks/main.yaml) call that for each of the interactive_users. For the GECOS (a.k.a. “comment”) field, the adduser script that Debian uses prompts for (in this order): full name, room number, work phone, home phone, other (which I am using for email address, per RedHat article on GECOS information). So I decided to support these via the role’s arguments for “real” (interactive) users.

The new role’s meta/argument_specs.yaml looks like:

---
argument_specs:
  configure_user:
    description: Configures a single user expected to use an interactive login
    options: &user_config
      interactive_username:
        description: Username for the user
        type: str
        required: true
      interactive_password:
        description: Hashed password for the user
        type: str
        required: true
      interactive_groups:
        description: List of groups user should be in (if they exist)
        type: list
        elements: str
        default: []
      interactive_home:
        description: Home directory location for the user
        type: str
        default: "`'/home/' + interactive_username`"
      interactive_full_name:
        description: Full name of the user (GECOS field)
        type: str
        default: ''
      interactive_room_number:
        description: Room number of the user (GECOS field)
        type: str
        default: ''
      interactive_work_phone:
        description: Work phone number of the user (GECOS field)
        type: str
        default: ''
      interactive_home_phone:
        description: Home phone number of the user (GECOS field)
        type: str
        default: ''
      interactive_email:
        description: Email address of the user (GECOS field)
        type: str
        default: ''
  main:
    description: Configures users expected to use an interactive login
    options:
      interactive_users:
        description: A list of users (each as per arguments to configure_user) to configure
        type: list
        required: true
        elements: dict
        options: *user_config
...

The main.yaml (which will be expanded to do other things) just calls configure_user for each user in interactive_users. I choose to use include_role as it will check the arguments again in this call, protecting me from making an error when I map the items arguments to the included task’s arguments:

---
- name: Configure interactive users
  ansible.builtin.include_role:
    name: interactive
    tasks_from: configure_user
  vars:
    interactive_username: '{{ item.interactive_username }}'
    interactive_password: '{{ item.interactive_password }}'
    interactive_groups: '{{ item.interactive_groups }}'
    interactive_host: "{{ item.interactive_home | default('/home/' + interactive_username ) }}"
    interactive_full_name: "{{ item.interactive_full_name | default('') }}"
    interactive_room_number: "{{ item.interactive_room_number | default('') }}"
    interactive_work_phone: "{{ item.interactive_work_phone | default('') }}"
    interactive_home_phone: "{{ item.interactive_home_phone | default('') }}"
    interactive_email: "{{ item.interactive_email | default('') }}"
  loop: '{{ interactive_users }}'
...

My SaltStack state used the optional_groups feature of the user state to deal with some groups that may, or may not, exist on the target host. I decided in my Ansible task to check for, and only specify, those groups that exist thereby make all groups (except the primary group) “optional” rather than differentiate optional/non-optional.

The original Salt state:

{% from 'jinja2/macros/user.j2' import if_deploy_my_user, my_user %}
 {#
 
 IMPORTANT
 ---------
 
 This state should be entirely self-contained in order to boot-strap the user before other states that need it.
 
 This means:
   - no includes
   - no requires/require_in/watch/watch_in/etc. involving states from other files.
 
 #}
 {# Ensure user is setup, if desired #}
 {% call if_deploy_my_user() %}
 {{ my_user() }}:
   user:
     - present
     - fullname: Peppa Pig
     - password: {{ salt['pillar.get']('my_user:password') }}
     - shell: /bin/bash
     - uid: 1000
     - usergroup: True
     - groups:
       - {{ my_user() }}
       - plugdev
       - cdrom
       - floppy
       - audio
       - video
       - dip
     - optional_groups:
 # Fuse on laptop
       - fuse
 # Dialout on laptop (for USB serial link)
       - dialout
 # Groups on DS9 for storage
       - store
 # sudo access on Debian
       - sudo
 # Control network connections on Debian
       - netdev
 # Access salt state/pillar files without becomming root on salt master (see that state for where permissions are set)
       - salt
 # Adm has group read of most log files on Debian by default, so useful to be in that group
       - adm
 # use virsh and control VMs, useful on starbase2
       - libvirt
     - require:
       - pkg: {{ my_user() }}
       - group: {{ my_user() }}
   group:
     - present
     - gid: 1000
   pkg:
     - installed
     - name: bash
 
 {% endcall %}

And my Ansible tasks/configure_user.yaml file:

---
# Do this here, so even if already populated it is refreshed with
# any groups that have been created by tasks in the current run.
- name: Updated getent database is retrieved
  ansible.builtin.getent:
    database: group
- name: Final list of groups for the user is initialised
  ansible.builtin.set_fact:
    interactive_groups_final: []
- name: Final list of groups for the user is known
  ansible.builtin.set_fact:
    interactive_groups_final: >-
      {{
        interactive_groups_final
        +
        [check_group]
      }}
  loop: '{{ interactive_groups | default([]) }}'
  loop_control:
    loop_var: check_group
  when: check_group in ansible_facts.getent_group.keys()
- name: Missing groups are reported
  ansible.builtin.debug:
    msg: >-
      Groups not being added to {{ interactive_username }}
      {% if interactive_full_name is defined %}({{ interactive_full_name }}){% endif %}
      on {{ inventory_hostname }}
      {% if inventory_hostname != ansible_host %}({{ ansible_host }}){% endif %}
      because they do not exist:
      {{ interactive_groups | difference(interactive_groups_final)  }}
  when: interactive_groups | difference(interactive_groups_final) | length != 0
- name: User is correctly configured
  become: true
  ansible.builtin.user:
    name: '{{ interactive_username }}'
    password: '{{ interactive_password }}'
    # GECOS field: full name,room number,work phone,home phone,email
    comment: "{{ interactive_full_name | default('') }},\
      {{ interactive_room_number | default('') }},\
      {{ interactive_work_phone | default('') }},\
      {{ interactive_home_phone | default('') }},\
      {{ interactive_email | default('') }}"


    groups: '{{ interactive_groups_final }}'
    
    home: "{{ interactive_home | default('/home/' + interactive_username ) }}"
    move_home: true  # Attempt to move old home if in the wrong place
...

As suggested by the Ansible FAQ, I used mkpasswd --method=sha-512 to generate the paswword hash to a file and imported it to my vault:

vault kv put -mount=kv interactive_users/$username hashed_password=@tmppass

For now. I added my user (my SaltStack version only supported one, this will allow me to configure serveral) to a new group’s variables in group_vars/interactive.yaml - my goal is to populate this list entirely from the vault so not even the usernames are hardcoded (or visible in the Ansible configuration):

---
interactive_users:
  - interactive_username: username
    interactive_full_name: full name
    interactive_password: "{{ lookup('community.hashi_vault.vault_read', '/kv/interactive_users/laurence').data.hashed_password }}"
    interactive_groups:
      - plugdev
      - cdrom
      - floppy
      - audio
      - video
      - dip
      - fuse
      - dialout  # For USB serial device
      - store  # Group for shared storage
      - sudo  # sudo access on Debian
      - netdev  # Control network devices on Debian
      - adm  # adm has read access to most logs in Debian by default
      - libvirt # Use virsh and control VMs on starbase2
      - vboxusers # Access to use VirtualBox
...

I applied the new role to this new interactive group in my site.yaml:

- name: Interactive systems are configured
  hosts: interactive
  tags: interactive
  roles:
    - interactive

and added the group to my inventory:

interactive:
  hosts:
    server1_I_interactively_login_to:
    server2_I_interactively_login_to:
  children:
    desktops

There is an issue with this, as the new interactive role’s configure_user will only work on Linux yet I have a Windows (an one day, perhaps MacOS) system in my desktops group. But I put that in “tomorrow’s problem” pile (and created an issue, titled “interactive role only works on Linux but is applied to Windows hosts too”, for it in my git frontend).

Pushing out configuration files

Over the years I have tried many ways of managing my “dotfiles” (user configuration files on *nix) in a more organised and co-ordinated way than copying them to each system I log into (which was my approach for a long time). From holding them in a naive Git repository to using GNU Stow (based on an article I found) with custom scripts to detect if software is installed or not and call Stow accordingly.

With my GNU Stow solution, I had split up my bashrc file into a modular file, with some files added to a modular directory based on detected software that was included with this code in my bashrc file:

if [ -d $HOME/.bashrc.d ]
 then
 	for file in $HOME/.bashrc.d/*.bashrc
 	do
 		source $file
 	done
 fi

The problem with this approach was on some systems, with shared storage, reading these small files significantly delayed login compared to the same commands in one large .bashrc file. In the end, I replaced this approach with a templated .bashrc in SaltStack that included the modular files directly, modified to incorporate the tests from my Stow based solution as wrappers for the commands that were previously ommitted entirely if the test did not pass.

It is this approach that I will also be using initially for Ansible - a direct SaltStack to Ansible migration of the existing setup, hopefully making this straightforward.

The SaltStack version uses some Jinja macros, with tests, to determine whether each configuration file needed to be deployed:

{%- macro user_config_file_present(name, target, test, source=None, template=None, contents_pillar=None) %}
{{ name }}-present:
  file.managed:
    - name: {{ my_user_home() }}/{{ target }}
    {% if source -%}
    - source: {{ source }}
    {%- endif %}
    {% if template -%}
    - template: {{ template }}
    {%- endif %}
    {% if contents_pillar -%}
    - contents_pillar: {{ contents_pillar }}
    {%- endif %}
    - user: {{ my_user() }}
    - group: {{ my_group() }}
    - mode: 400
    - follow_symlinks: False
    - makedirs: True
    - dir_mode: 700
    - onlyif:
      - {{ test }}
{% endmacro -%}
{%- macro user_config_dir_present(name, target, test, source) %}
{{ name }}-present:
  file.recurse:
    - name: {{ my_user_home() }}/{{ target }}
    - source: {{ source }}
    - user: {{ my_user() }}
    - group: {{ my_group() }}
    - file_mode: 400
    - dir_mode: 700
    - clean: True
    - replace: True
    - onlyif:
      - {{ test }}
{% endmacro -%}
{%- macro user_config_file_absent(name, target, test) %}
{{ name }}-absent:
  file.absent:
    - name: {{ my_user_home() }}/{{ target }}
    - unless:
      - {{ test }}
{% endmacro -%}
{%- macro user_config_file(name, target, test, source=None, template=None, contents_pillar=None) %}
{{ user_config_file_present(name, target, test, source, template, contents_pillar) }}
{{ user_config_file_absent(name, target, test) }}
{% endmacro -%}
{%- macro user_config_dir(name, target, test, source) %}
{{ user_config_dir_present(name, target, test, source) }}
{{ user_config_file_absent(name, target, test) }}
{% endmacro -%}

which was used, e.g. for Bash’ configuration:

{% from 'jinja2/macros/user.j2' import user_config_file %}

{{ user_config_file('bashrc', '.bashrc', 'which bash', 'salt://unix/users/user-local-config/dotfiles/bash/bashrc.j2', 'jinja') }}

To do similar with Ansible, I created a vars/dotfiles directory in my interactive role which contains metadata about the “dotfiles” I wish to deploy - currently it supports a list of tests (all of which much be true to deploy) and a list of files to push out. For example, for bash (vars/dotfiles/bash.yaml):

---
dotfile_bash:
  dotfile_tests:
    - type: command_exists
      name: bash
  dotfile_files:
      - src: bashrc
        dest: ~/.bashrc
        type: template
      - src: bash_profile
        dest: ~/.bash_profile
        type: file
...

And for vim (vars/dotfiles/vim.yaml):

---
dotfile_vim:
  dotfile_tests:
    - type: command_exists
      name: vim
  dotfile_files:
      - src: vimrc
        dest: ~/.vimrc
        type: file
      - src: vim/
        dest: ~/.vim/
        type: file
...

In the role’s tasks/main.yaml, I load these variable files and the call a new entry point, deploy_dotfiles once for each user:

- name: Dotfile metadata is loaded
  ansible.builtin.include_vars:
    name: dotfiles_metadata
    dir: dotfiles
- name: Deploy dotfiles
  ansible.builtin.include_role:
    name: interactive
    tasks_from: deploy_dotfiles
  vars:
    dotfile_target_user: '{{ item.interactive_username }}'
    dotfiles: '{{ dotfiles_metadata.values() }}'
  loop: '{{ interactive_users }}'

The new entry in meta/argument_specs.yaml for the entry point is:

deploy_dotfiles:
  description: Deploys a set of "dotfile" configuration files to an interactive user
  options:
    dotfile_target_user:
      description: User to deploy dotfiles to
      type: str
      required: true
    dotfiles:
      description: List of dotfiles to deploy
      type: list
      required: true
      elements: dict
      options:
        dotfile_tests:
          description: List of tests, all of which must pass to deploy the dotfile(s)
          type: list
          default: []
          elements: dict
          options:
            name:
              description: Target of the test
              type: str
              required: true
            type:
              description: Type of test to apply to `name`
              type: str
              required: true
              choices:
                - command_exists
                - directory_exists
                - file_exists
        dotfile_files:
          description: List of files to deploy if tests pass (will be deleted if tests fail).
          type: list
          required: true
          elements: dict
          options:
            src:
              description: Source file to deploy
              type: str
              required: true
            dest:
              description: Destination to deploy file to
              type: str
              required: true
            mode:
              description: File mode (default umask will apply if not set)
              required: false
            type:
              description: Method to use to deploy file
              type: str
              required: true
              choices:
                - file
                - template
            no_log:
              description: Suppress logging (e.g. for files containing sensitive data)
              type: bool
              default: false

The tasks/deploy_dotfiles.yaml loops over the list of dotfiles and includes a further task file, which does the file deployment:

---
- name: Each dotfile is deployed
  ansible.builtin.include_tasks: deploy_dotfile.yaml
  vars:
    dotfile_target_file: '{{ dotfile.dotfile_target_file }}'
    dotfile_tests: '{{ dotfile.dotfile_tests | default([]) }}'
    dotfile_files: '{{ dotfile.dotfile_files }}'
  loop: '{{ dotfiles }}'
  loop_control:
    # Avoid conflict with outer `item` when nested (e.g. looping over users)
    loop_var: dotfile
...

Finally, the innermost tasks file replaced the macros I made for SaltStack:

—
- name: Test status is initialised to true
  ansible.builtin.set_fact:
    # Start with the test pass and try to prove it is not so
    dotfile_test_result: true
- name: If commands exist is tested
  ansible.builtin.shell: ‘command -v {{ dotfile_command_test.name | quote }}’
  # Of course this might return failure, that’s the point.
  failed_when: false
  # `command -v` is always a read-only operation
  changed_when: false
  # Can be run in check mode
  check_mode: false
  register: dotfile_command_test_results
  loop: "{{ dotfile_tests | selectattr('type', 'eq', 'command_exists') }}"
  loop_control:
    # Avoid conflict with outer `item` when nested (e.g. looping over users)
    loop_var: dotfile_command_test  
- name: Test status is updated
  ansible.builtin.set_fact:
    dotfile_test_result: false
  when: dotfile_command_test_results.results | selectattr(‘rc’, ‘ne’, 0) | length > 0
- name: Files to be tested are statted
  ansible.builtin.stat:
    path: '{{ item.name }}'
  loop: "{{ dotfile_tests | selectattr('type', 'in', ['file_exists', 'directory_exists']) }}"
  register: dotfile_stat_results
- name: Test status is updated (non-existent directories)
  ansible.builtin.set_fact:
    dotfile_test_result: false
  when: >-
      dotfile_stat_results.results 
      | selectattr('item', 'in', dotfile_tests | selectattr('type', 'eq', 'directory_exists'))
      | map(attribute='stat')
      | selectattr('exists', 'eq', 'true')
      | selectattr('isdir', 'defined')
      | selectattr('isdir', 'eq', 'true')
      | length
      !=
      dotfile_stat_results.results
      | selectattr('item', 'in', dotfile_tests | selectattr('type', 'eq', 'directory_exists'))
      | length
- name: Test status is updated (non-existent files)
  ansible.builtin.set_fact:
    dotfile_test_result: false
  when: >-
      dotfile_stat_results.results
      | selectattr('item', 'in', dotfile_tests | selectattr('type', 'eq', 'file_exists'))
      | map(attribute='stat')
      | selectattr('exists', 'eq', 'false')
      | length > 0
      or
      dotfile_stat_results.results
      | selectattr('item', 'in', dotfile_tests | selectattr('type', 'eq', 'file_exists'))
      | map(attribute='stat')
      | selectattr('isdir', 'eq', 'true')
      | length > 0      
- name: Plain dotfiles are copied
  become: yes
  become_user: ‘{{ dotfile_target_user }}’
  ansible.builtin.copy:
    src: dotfiles/{{ plain_dotfile.src }}
    dest: ‘{{ plain_dotfile.dest }}’
    mode: '{{ plain_dotfile.mode | default(omit) }}'
  loop: “{{ dotfile_files | selectattr(‘type’, ‘eq’, ‘file’) }}”
  loop_control:
    # Avoid conflict with outer `item` and `dotfile` when nested
    # (e.g. looping over users and dotfiles)
    loop_var: plain_dotfile
  when: dotfile_test_result
  no_log: '{{ plain_dotfile.no_log | default(false) }}'
- name: Templated dotfiles are deployed
  become: yes
  become_user: ‘{{ dotfile_target_user }}’
  ansible.builtin.template:
    src: dotfiles/{{ template_dotfile.src }}
    dest: ‘{{ template_dotfile.dest }}’
    mode: '{{ template_dotfile.mode | default(omit) }}'
  loop: “{{ dotfile_files | selectattr(‘type’, ‘eq’, ‘template’) }}”
  loop_control:
    # Avoid conflict with outer `item` and `dotfile` when nested
    # (e.g. looping over users and dotfiles)
    loop_var: template_dotfile
  when: dotfile_test_result
  no_log: '{{ template_dotfile.no_log | default(false) }}'
- name: Redundant dotfiles are deleted
  become: true
  become_user: ‘{{ dotfile_target_user }}’
  ansible.builtin.file:
    name: ‘{{ redundant_dotfile.dest }}’
    state: absent
  loop: ‘{{ dotfile_files }}’
  loop_control:
    # Avoid conflict with outer `item` and `dotfile` when nested
    # (e.g. looping over users and dotfiles)
    loop_var: redundant_dotfile
  when: not dotfile_test_result
  no_log: '{{ redundant_dotfile.no_log | default(false) }}'
...

For the “dotfiles” themselves, it was just a case of putting them in the appropriate folder within the role - templates/dotfiles for template “type” files and files/dotfiles for file “type”. Directories (only from files) can be copied recursively with the ansible.builtin.copy module, so no separate support is required in the tasks for this.

User-specific configuration

Some configuration files, such as .fetchmailrc and .forward etc., contain user-specific settings and so a generic config cannot just be pushed out.

To start addressing this, I added the templated dotfile metadata to my roles’s vars/dotfiles, e.g. vars/dotfiles/fetchmail.yaml:

---
dotfile_fetchmail:
  dotfile_tests:
    - type: command_exists
      name: fetchmail
  dotfile_files:
      - src: fetchmailrc
        dest: ~/.fetchmailrc
        type: template
...

I then, needed to get the variables through to the template. To do this, I added a new option interactive_dotfile_data to my role’s configure_user entry point (later reused for main.yaml, so it has to be added here even though configure_user doesn’t currently use it), which defines the options available for all dotfiles (in this example, just for fetchmail which was the first I implemented):

interactive_dotfile_data: &dotfile_config
  description: User specific data for deploy_dotfiles' dotfile_user_data parameter
  type: dict
  default: {}
  options:
    fetchmail:
      description: Settings for user's .fetchmailrc
      type: list
      default: []
      elements: dict
      options:
        server:
          description: Hostname for server to poll
          type: str
          required: true
        protocol:
          description: Protocol with which to connect to the server
          type: str
          default: imap
          choices:
            # Taken from fetchmail's man page
            - auto
            - pop2
            - pop3
            - apop
            - rpop
            - kpop
            - sdps
            - imap
            - etrn
            - odmr
        remote_user:
          description: Username to login to the server with
          type: str
          required: true
        local_user:
          description: Local user to receive fetched mail
          type: str
          required: true
        ssl:
          description: Connect with SSL
          type: bool
          default: true

Then, using YAML anchors, I also made these same options available as dotfile_user_data to the deploy_dotfiles entry point (which already takes dotfile_target_user as an option):

dotfile_user_data: *dotfile_config

Finally, to pass the date from the user structure to the dotfiles deployment, I updated main.yaml to pass interactive_dotfile_data from interactive_users through to deploy_dotfiles (only change is adding dotfile_user_data under vars):

- name: Deploy dotfiles
  ansible.builtin.include_role:
    name: interactive
    tasks_from: deploy_dotfiles
  vars:
    dotfile_target_user: '{{ item.interactive_username }}'
    dotfile_user_data: '{{ item.interactive_dotfile_data | default({}) }}'
    dotfiles: '{{ dotfiles_metadata.values() }}'
  loop: '{{ interactive_users }}'

I then added appropriate settings for relvant configuration files to my interactive_users configuration for my user (e.g.):

interactive_dotfile_data:
  fetchmail:
    - server: imap.somesystem.com
      remote_user: my.login@somesystem.com
      local_user: me@local.domain.tld

When I came to do .forward and .netrc, I added new tests to dotfile_tests in the role’s meta/argument_specs.yaml:

- dotfile_variable_list_not_empty
- dotfile_variable_string_not_empty

The tests themselves were relatively straight-forward:

- name: Test status is updated (empty string variables)
  ansible.builtin.set_fact:
    dotfile_test_result: false
  loop: "{{ dotfile_tests | selectattr('type', 'eq', 'variable_string_not_empty') }}"
  loop_control:
    # Avoid conflict with outer `item` when nested (e.g. looping over users)
    loop_var: dotfile_variable_test
  when: dotfile_user_data[dotfile_variable_test.name] is not string or (dotfile_user_data[dotfile_variable_test.name] | length == 0)
- name: Test status is updated (empty list variables)
  ansible.builtin.set_fact:
    dotfile_test_result: false
  loop: "{{ dotfile_tests | selectattr('type', 'eq', 'dotfile_variable_list_not_empty') }}"
  loop_control:
    # Avoid conflict with outer `item` when nested (e.g. looping over users)
    loop_var: dotfile_variable_test
  # Lists are iterable but so are strings and mapping types
  # see: https://stackoverflow.com/a/39453833
  when: >-
    dotfile_variable_test.name not in dotfile_user_data
    or
    dotfile_user_data[dotfile_variable_test.name] is not iterable
    or
    dotfile_user_data[dotfile_variable_test.name] is string
    or
    dotfile_user_data[dotfile_variable_test.name] is mapping
    or
    (dotfile_user_data[dotfile_variable_test.name] | length == 0)

The config YAML file for this dotfile’s meta data (in the role’s vars/forward.yaml) looks like this (the template file is literally just {{ dotfile_user_data.forward }}):

---
dotfile_fetchmail:
  dotfile_tests:
    - type: dotfile_variable_string_not_empty
      name: forward
  dotfile_files:
      - src: forward
        dest: ~/.forward
        type: template
...

Now, the content (and whether it exists) of a user’s .forward file is controlled by whether a value is given for the setting, e.g.:

interactive_dotfile_data:
  forward: my_user@some.domain.tld

I then repeated this pattern for my other dotfiles that contain user-specific data.

Host-specific user-specific configuration

Some settings are host specific, for example passwords stored in user’s .netrc file. For these, I added variables named dotfile_<username>_<setting>. For example:

dotfile_myuser_netrc:
  - machine: outlook.office365.com
    login: "{{ lookup('community.hashi_vault.vault_read', '/kv/interactive_users/myuser/netrc/outlook.office365.com').data.login }}"
    password: "{{ lookup('community.hashi_vault.vault_read', '/kv/interactive_users/myuser/netrc/outlook.office365.com').data.password }}"

I modified site.yaml to update the interactive_users variable with the values from these variables before applying the interactive role. As a result, the common settings are set in the group’s interactive_users variable and host-specific additional and override values are in these dotfile_... variables:

- name: Interactive systems are configured
  hosts: interactive
  tags: interactive
  tasks:
    - name: dotfile variable map is initialised
      ansible.builtin.set_fact:
        user_dotfile_data_map: {}
    - name: dotfile variable map is populated
      ansible.builtin.set_fact:
        user_dotfile_data_map: >-
          {{
            user_dotfile_data_map
            | combine({
              user: user_dotfile_data_map[user] | default({})
              | combine({
                variable: value
              })
            })
          }}
      vars:
        user: "{{ (item | split('_', 2))[1] }}"
        variable: "{{ (item | split('_', 2))[2] }}"
        value: "{{ lookup('ansible.builtin.vars', item) }}"
      loop: "{{ q('ansible.builtin.varnames', '^dotfile_') }}"
    - name: Convert interactive user list to username -> user info dictionary
      ansible.builtin.set_fact:
        interactive_user_dict: "{{ dict(interactive_users | map(attribute='interactive_username') | zip(interactive_users)) }}"
    - name: Update interactive user dictionary
      ansible.builtin.set_fact:
        interactive_user_dict: >-
          {{
            interactive_user_dict
            | combine({
              item: interactive_user_dict[item]
                | combine({
                  'interactive_dotfile_data':
                    interactive_user_dict[item].interactive_dotfile_data
                    | default({})
                    | combine(user_dotfile_data_map[item] | default({}))
                })
            })
          }}
      loop: "{{ interactive_users | map(attribute='interactive_username') }}"
    - name: Update interactive users list
      ansible.builtin.set_fact:
        interactive_users: '{{ interactive_user_dict.values() }}'
    - ansible.builtin.import_role:
        name: interactive

Fin

Finally (this post was started on 2nd June, so only 2 months to get it done!), I have migrated all of the roles for my monitoring server to Ansible and can (at long last) actually start building out the bastion/jumphost that started me down this rabbit home back in 2022 (in a new post)…