Adding a bastion host - migrating additional roles to ansible
Having thought I had finished migrating my monitoring server configuration to Ansible, however when I came to start the next step of setting up the bastion I noticed I had neglected to consider the existing 2 factor configuration. Which is rather shocking considering Ansible not playing nicely with it is the motivation for this work…
Investigating where this configuration originates from in my SaltStack states, I realised I had migrated the specific states applied to my monitoring server (via roles I add in its pillar data) and missed all of the generic roles applied more broadly.
Overview of generic roles
These generic roles, and what they do, are:
all
- applied to all hosts- Includes all roles defined as keys in the pillar dictionary
roles
, where the value for that key (role) is truthy
- Includes all roles defined as keys in the pillar dictionary
hardware.all
- not a role, strictly, but I’ve included it here as it is the only non-role applied to everything.- Includes all hardware states in the pillar list
hardware
- Includes all hardware states in the pillar list
linux
- applied to all Linux hosts (kernel
grain isLinux
)- Includes the
unix
role - Includes the
mail.profiles
state - Includes the
linux.hostname
state
- Includes the
osx
- applied to all hosts running macOS (OS
grain isMacOS
)- Includes the
unix
role - Includes the
osx.ssh_keys
state - Includes the
desktop
role
- Includes the
debian
- applied to all hosts running Debian Linux (OS
grain isDebian
)- Includes the
linux.apt.repos
state - Includes the
linux.packages.debian-goodies
state - Includes the
linux.packages.reportbug
state - Includes the
linux.debian.auto-update
state - Includes the
linux.network.debian.interfaces
state ifnetworking.interfaces
dictionary is defined in the pillar data
- Includes the
my-system
- applied to all hosts whose name is inside my domain (minion id matches*.my.domain.tld
)- Includes the
linux.hardware.extra-mounts
state - Includes the
linux.hardware.pcspkr-blacklist
state - Includes the
linux.base-utils
state - Includes the
linux.server.ssh
state - Includes the
linux.security.google-authenticator
state - Includes the
linux.security.google-authenticator.sudo
state
- Includes the
As can be seen, my roles are all “meta” in that they do not do anything other than include other roles and/or states. This list is a lot of names and not a lot of information about them, but I will now go through them and migrate what I need do. To begin, all
and hardware.all
can be disregarded as they just include things from the pillar data - however this does mean I need to revisit the pillar data for my monitoring server. When I did this, I found these additional roles (and what they do) that are not defined in the host-specific configuration (they either come from other pillar data files included in the host-specific one or from generic pillar files applied generally):
backuppc-client
- applied to all hosts whose host-specific pillar file contains any backuppc client configuration pillar (there are different pillar files for different backup methods, e.g. local/remote, tar/rsync/ssh etc.)- Includes the appropriate state from
backuppc.client.
based on client type pillar value
- Includes the appropriate state from
webserver
- sets up nginx on the host- Includes the
linux.server.web.nginx
state
- Includes the
webserver-ssl
- configures ssl for nginx on the host- Includes the
webserver
role - Includes the
linux.server.web.nginx.ssl
state
- Includes the
ntp-client-local
- configures the ntp client- Includes the
linux.network.ntp.client.local
state
- Includes the
These additional roles are included by other roles but not yet described:
unix
(linux
andosx
roles)- Included the
packages.coreutils
state - Included the
packages.git
state - Included the
unix.users.root
state - Included the
unix.users.my-user
state - Included the
unix.users.user-local-config.dotfiles
state - Included the
unix.cifs-creds
state
- Included the
desktop
(osx
role)- Includes the
packages.sublime-text
state - Includes the
software.desktop.sublime-text
state - Includes the
packages.x2goclient
state
- Includes the
The following roles and states have already been migrated, during the work done to date migrating the monitoring server:
webserver
rolewebserver-ssl
rolelinux.apt.repos
state
The osx.ssh_keys
, linux.security.google-authenticator
and linux.security.google-authenticator.sudo
can also be disregarded as I currently do not have any macOS systems and the 2 factor authentication is about to be re-worked on all my hosts through setting up the bastion host. I disregarded linux.network.debian.interfaces
for now, as my new Ansible setup will install and configure NetworkManager on the bastion (although I probably want to statically configure it, so it remains accessible if there are network problems - one for the “TODO” list). Ansible uses ssh to login and so setting up ssh is already part of my base OS install so I disregarded linux.server.ssh
. Finally, I also disregarded the desktop
role as none of the software it installs (via the states it includes) I use any more.
This leaves me with this list of roles to migrate:
all
hardware.all
linux
osx
debian
my-system
backuppc-client
webserver
webserver-ssl
ntp-client-local
unix
desktop
linux role
The linux
role, in its entirety, is:
# Base role for all Linux systems
include:
- roles.unix
# All Linux systems need some sort of functioning mail service, even if only
# for local daemons to send notications locally.
- mail.profiles
- linux.hostname # Have Salt maintain the hostname
roles.unix
I will look at later, so I just need to look at mail.profiles
and linux.hostname
.
mail.profiles
This role installs Postfix and configures it according to the profile attached to the system. For most systems, this is a very simple configuration to relay local mail out to my mail server. For now, I will solely implement this and kick configuring the mail server’s postfix install until I bring that under Ansible’s control.
I created a role called mail-transfer-agent
for this. I applied this role to all Linux systems except those in the dummy
and mail_servers
groups:
- hosts: all:!dummy:!mail_servers
tags: common
tasks:
- name: Mail relaying is configured on Linux systems that aren't mail servers
ansible.builtin.import_role:
name: mail-transfer-agent
vars:
relay_host: "{{ mail_relay_host }}"
relay_domains: "{{ mail_relay_domains }}"
recipient_delimiter: "{{ mail_recipient_delimiter }}"
forward_root_to: "{{ mail_forward_root_to }}"
when: ansible_facts.system == 'Linux'
The variables I set in the relevant group file in the group_vars/
directory, e.g. for internal
host group:
---
# Send via mailserver's VPN address, since Virgin IPs are
# blacklisted by spamhaus so direct connections are rejected
# by my host.
mail_relay_host: mailserver.home.mydomain.tld
mail_forward_root_to: user@mydomain.tld
# Allow systems to relay (i.e. send email from local daemons etc.
# onwards) to these domains.
mail_relay_domains:
- mydomain.tld
- myotherdomain.tld
...
I also started a mail_servers
group with (e.g.) recipient_delimiter
set, however this is not complete (I’m leaving this for another day). Fortunately my mail server is not in the internal
group, so for now I do not need to worry about the orders these variables are resolved in and conflicts, but I can foresee this being a problem in the future (e.g. if I have an internal system that is also a mail server).
The new role’s argument specification (meta/argument_specs.yaml
):
---
argument_specs:
main:
short_description: Configure the system's mail transfer agent
author: Laurence Alexander Hurst
options:
relay_host:
description: The relay host
type: str
required: true
relay_domains:
descriptions: Domain names to relay (i.e. send email from local daemons etc. onwards)
type: list
elements: str
required: true
forward_root_to:
description: Email address to forward root's email to
type: str
required: true
recipient_delimiter:
description: Recipient delimiter value
type: str
required: false
...
I added a handler to restart postfix (e.g. for configuration changes) in the role’s handlers/main.yaml
:
---
- name: Restart Postfix
become: true
ansible.builtin.service:
name: postfix
state: restarted
...
The template for the Postfix main configuration, templates/main.cf
, is:
smtpd_banner = $myhostname ESMTP $mail_name (Debian/GNU)
biff = no
# appending .domain is the MUA's job.
append_dot_mydomain = no
# Uncomment the next line to generate "delayed mail" warnings
#delay_warning_time = 4h
readme_directory = no
# See http://www.postfix.org/COMPATIBILITY_README.html -- default to 2 on
# fresh installs.
compatibility_level = 2
smtpd_relay_restrictions = reject_unauth_destination, permit_mynetworks
myhostname = {{ ansible_facts.fqdn }}
alias_maps = hash:/etc/aliases
alias_database = hash:/etc/aliases
myorigin = /etc/mailname
mydestination = $myhostname, localhost.home.entek.org.uk, localhost
relayhost = {{ relay_host }}
mynetworks = 127.0.0.0/8 [::ffff:127.0.0.0]/104 [::1]/128
mailbox_size_limit = 0
{% if recipient_delimiter %}
recipient_delimiter = {{ recipient_delimiter }}
{% endif %}
relay_domains = {{ relay_domains | join(', ') }}
# Only listen for local connections
inet_interfaces = loopback-only
inet_protocols = all
Finally, the tasks to install, configure and start Postfix are:
---
- name: Install postfix
become: true
ansible.builtin.package:
name: postfix
state: present
- name: Deploy postfix main configuration
become: true
ansible.builtin.template:
owner: root
group: root
mode: 00644
src: main.cf
dest: /etc/postfix/main.cf
notify: Restart Postfix
- name: Configure root user forwarding
become: true
# This used to be done by creating /root/.forward but
# /etc/aliases seemed better for some reason.
ansible.builtin.lineinfile:
path: /etc/aliases
line: "root:\t{{ forward_root_to }}"
regexp: '^root:\s'
state: present
register: root_forward_aliases
- name: Update aliases database
become: true
ansible.builtin.command:
cmd: /usr/sbin/postalias /etc/aliases
when: root_forward_aliases.changed
- name: Ensure postfix is running
become: true
ansible.builtin.service:
name: postfix
enabled: true
state: started
...
linux.hostname
linux.hostname
sets the hostname to the Salt id
grain (which is the minion’s id, set in /etc/salt/minion_id
). As I migrated to Ansible, I originally thought that rather than use the inventory hostname (which allows the name in the inventory to be completely arbitrary), I could use the host’s IP’s reverse DNS (i.e. the IP address’ PTR
record). I previously setup Debian pre-seed files and DHCP server to use DNS for setting hostname so it seems sensible to tell Ansible to set the hostname via the same mechanism. However, Ansible determines the “default” interface by getting the route to either Google’s DNS server or ipv6.google.com
- this causes multiple problems - on multi-homed systems, the route to Google might not be the interface whose IP address should be used for the hostname and on networks with no route to Goole (e.g. with no default route - like my air-gapped home-lab this fails completely. Trusting reverse DNS also only makes sense for systems when they are in a trusted DNS server (i.e. my internal one) so mobile clients, like my laptop, needs consideration and I might also have problems if I make some things more dynamic within my network.
That all said, my playbook decides to use local
as the connection method when the target’s hostname matches the inventory hostname. In the end, I decided to adopt this approach instead (relying on Ansible’s variable precedence rules):
- Set the
hostname
variable to{{ inventory_hostname }}{% if '.' not in inventory_hostname and ansible_facts.domain %}.{{ ansible_facts.domain }}{% endif %}
ingroup_vars/all.yaml
. - Override the
hostname
variable for specific hosts (or groups) where it is desirable for the hostname not to match the inventory hostname. - Create a hostname role that simply configures the system’s hostname according to the
hostname
variable.
The and ansible_facts.domain
truthy test in {{ inventory_hostname }}{% if '.' not in inventory_hostname and ansible_facts.domain %}.{{ ansible_facts.domain }}{% endif %}
was needed to cater for non Active Directory domain joined Windows systems, which do not have a populated domain
fact, but is a sensible safety net in case it is not set on other systems.
The hostname
Ansible role
I gave it this argument spec in meta/argument_specs.yaml
:
---
argument_specs:
main:
short_description: Set the system's hostname
author: Laurence Alexander Hurst
options:
hostname:
description: The desired FQDN for the system
type: str
required: true
...
The role’s tasks/main.yaml
looks like this:
---
- name: Set Windows hostname
block:
- name: Set the new DNS suffix (takes effect on reboot)
ansible.windows.win_regedit:
# Can't use split filter as it does not support limiting
# the number of splits.
data: "{{ hostname.split('.', 1)[1] }}"
path: HKLM:\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters
# According to http://systemmaanger.ru/win2k_regestry.en/92767.htm
# NV Domain is used to update Domain (at the same key) on reboot,
# so setting this and rebooting should be all that's needed.
name: "NV Domain"
state: present
type: string
notify: Reboot Windows # From `common` role
- name: Set hostname
ansible.windows.win_hostname:
# Windows will only allow short names here
name: "{{ hostname | split('.') | first }}"
register: win_hostname_result
- name: Check if reboot required and notify handlers
ansible.builtin.debug: msg="Reboot is{% if not win_hostname_result.reboot_required %} not{% endif %} required"
changed_when: win_hostname_result.reboot_required
notify: Reboot Windows # From `common` role
when: ansible_facts.system == 'Win32NT' and ansible_facts.fqdn != hostname
- name: Set Linux hostname
block:
# XXX what about ipv6?
- name: Set localhost entry in /etc/hosts correctly
become: yes
# Ensure that 127.0.0.1 is only "localhost" and not (an old) hostname
ansible.builtin.lineinfile:
path: /etc/hosts
line: "127.0.0.1\tlocalhost"
regexp: '^127\.0\.0\.1'
owner: root
group: root
mode: 00644
- name: Set hostname entry in /etc/hosts correctly
become: yes
# XXX Check if 127.0.1.1 is Debian specific or is also default on EL (e.g. Rocky)?
# Not that it matters if 127.0.0.1 is correctly set to just
# `localhost`.
ansible.builtin.lineinfile:
path: /etc/hosts
line: "127.0.1.1\t{{ hostname }}\t{{ hostname | split('.') | first }}"
regexp: '^127\.0\.1\.1'
owner: root
group: root
mode: 00644
- name: Set hostname
become: yes
ansible.builtin.hostname:
# Should only ever set the hostname to the shortname.
# Name resolution configuration will determine FQDN
# (which is why we configure hosts with it).
# See `hostname` command's manual page section "THE FQDN".
name: "{{ hostname | split('.') | first }}"
when: ansible_facts.system == 'Linux'
...
For my Windows system, I set the hostname to the FQDN directly on the host in inventory.yaml
(as domain
is blank) for now - longer term I might want to do something cleverer (to tie in with my domain to environment detection, which won’t work with this static hostname):
windows:
hostname: windows.home.my.domain.tld
Looking up the dns name
Despite abandoning the idea, I have included this information for posterity as it may be of future use to me (or someone else)…
I encountered a problem at this point, due to my previous decision to use fact caching. With caching enabled, it seems Ansible does not update the cache until the timeout expires (which is one way caches can work, another is to use the cache values only if fresh ones are not available) - with an infinite timeout this means it will never be updated.
While testing this, I observed some very odd behaviour on my laptop, when I noticed its reverse DNS (according to Ansible did not match what dig
returned - troubleshooting this I determined that default_ipv4.address
was still returning the hosts IP address from when I was at home. I was in a Costa Coffee shop, between some personal appointments, working on this with my laptop connected to my mobile broadband router. Despite default_ipv4.address
returning what must have been the cached value, the all_ipv4_addresses
fact did contain the correct (current) IP address. This is despite that IP address not being listed in any other fact - the default_ipv4
and interface’s facts all had the old IP address. This meant that all_ipv4_addresses
contained IP address that did not exist according to the other facts, and the other facts contained IP address that did not exist according to all_ipv4_addresses
. I went to report this as a bug, however when I tried to reproduce it at home I was unable to.
In any case, the behaviour I want is to update the cache of all “alive” systems and use the existing cached values for everything else - the cache can be explicitly updated by running the ansible.builtin.setup
module, and as Ansible will exclude any system it could not previously connect to this will not be attempted on ones that are uncontactable so their “old” cache data will remain intact.
Once I have up-to-date cache values, looking up the DNS name is a simple case of using the community.general.dig
module - although, despite the documentation saying TYPE
can be specified as either qtype=TYPE
or appended as /TYPE
, only the latter works to make the module internally reverse the ip address and append .in-addr.arpa.
before making the DNS lookup. This difference in behaviour is not explicitly documented although only the /PTR
form is show in the examples with a plain IP address.
- name: Ensure fact cache is up to date
ansible.builtin.setup:
- name: Get the reverse DNS
ansible.builtin.set_fact:
# The lookup documentation says these are synonymous - in practice, only the '/PTR' format works for directly looking up reverse record from IPs
#reverse_dns_name: "{{ lookup('community.general.dig', ansible_facts.default_ipv4.address, qtype='PTR') }}"
reverse_dns_name: "{{ lookup('community.general.dig', ansible_facts.default_ipv4.address ~ '/PTR') }}"
If the IP does not have a pointer DNS record, the fact will contain the string NXDOMAIN
.
debian role
This role applies these states:
linux.apt.repos
linux.packages.debian-goodies
linux.packages.reportbug
linux.debian.auto-update
linux.network.debian.interfaces
ifnetworking.interfaces
dictionary is defined in the pillar data
linux.apt.repos
configures the apt repositories, which are already done in an existing role. For now I am using Network Manager to configure my systems, so I am not migrating linux.network.debian.interfaces
to Ansible.
linux.packages.debian-goodies
, linux.packages.reportbug
and linux.debian.auto-update
install a selection of packages, some after pre-configuring them with debconf. As these are applied to all Debian systems in my Salt Stack configuration, I just added them to my existing common
role’s tasks. I added a task to include an OS family specific task file (if it exists) to the Linux-specific task file:
- name: Include distribution specific tasks (if they exist)
ansible.builtin.include_tasks: "{{ item }}"
loop: "{{ q('ansible.builtin.first_found', [ansible_facts['os_family'] + '.yaml',], skip=true) }}"
The Debian task file (tasks/Debian.yaml
in my common
role) does these:
---
- name: Install Debian specific packages
become: true
ansible.builtin.package:
name:
# Provides checkrestart and other goodies
- debian-goodies
# Debian's bug reporting tool
- reportbug
- name: Configure Debian auto-update
block:
- name: Install debconf utilities
become: true
ansible.builtin.package:
name: debconf-utils
- name: Display both news and changelogs
become: true
ansible.builtin.debconf:
name: apt-listchanges
question: apt-listchanges/which
value: both
vtype: select
- name: Include headers listing name of package and names of binary packages being upgraded
become: true
ansible.builtin.debconf:
name: apt-listchanges
question: apt-listchanges/headers
value: true
vtype: boolean
# Do this after debconf answers have been pre-seeded (above)
- name: Install apt-listchanges packages
become: true
ansible.builtin.package:
name: apt-listchanges
state: present
- name: Enable unattended upgrades
become: true
ansible.builtin.debconf:
name: unattended-upgrades
question: unattended-upgrades/enable_auto_updates
value: true
vtype: boolean
# Do this after debconf answers have been pre-seeded (above)
- name: Install unattended-upgrades packages
become: true
ansible.builtin.package:
name: unattended-upgrades
state: present
...
my-system role
This role applies the following states:
linux.hardware.extra-mounts
linux.hardware.pcspkr-blacklist
linux.base-utils
linux.server.ssh
linux.security.google-authenticator
linux.security.google-authenticator.sudo
- Enforce sudo only via 2nd-factor (implicitly installs sudo via requirements)
These were separated out in Salt Stack as they were the common tasks that were applied to only systems I owned, i.e. those which I have root access to (to install software etc.). I have not separated out tasks for systems I only have user level access to, yet, so will be merging these with existing roles instead.
linux.hardware.extra-mounts
This role added extra mount points, defined in the pillar data. Despite the name, some mounts were network mounts rather than hardware ones - for now, for the systems I am currently migrating, it is only network mounts I am doing. I merged this role into my existing common
Ansible role, which uses variables to determine which (if any) mounts to add. It looks for any variables whose name begins mount_
and expects each to be a dictionary of data containing everything necessary to mount the filesystem. This includes any credential files and helper packages. To enable me to deploy the credentials, I ended up installing HasiCorp Vault by hand following my previous Vault installation steps.
To minimise some duplication, I used YAML node anchors to reuse (e.g.) the mount options. The mount variables, for example for my desktop systems, look like this:
---
# Save some repetition
nas_cifs_creds: &nas_cifs_creds
- path: /etc/cifs_creds/nas
content: |
username={{ lookup('community.hashi_vault.vault_read', '/kv/nas').data.username }}
password={{ lookup('community.hashi_vault.vault_read', '/kv/nas').data.password }}
domain={{ lookup('community.hashi_vault.vault_read', '/kv/nas').data.domain }}
mode: "00400"
owner: root
group: root
dir_mode: "00700"
nas_mount_options: &nas_mount_options credentials=/etc/cifs_creds/nas,file_mode=0660,dir_mode=0770,uid=my_user,gid=store,nounix,_netdev
mount_nas_documents:
src: //nas.home.my.domain.tld/documents
path: /media/nas/documents
fstype: cifs
opts: *nas_mount_options
boot: "{% if 'mobile' in group_names %}false{% else %}true{% endif %}"
helper_packages:
- cifs-utils
state: "{% if 'mobile' in group_names %}present{% else %}mounted{% endif %}"
files: *nas_cifs_creds
mount_nas_archive:
src: //nas.home.my.domain.tld/archive
path: /media/nas/archive
fstype: cifs
opts: *nas_mount_options
boot: "{% if 'mobile' in group_names %}false{% else %}true{% endif %}"
helper_packages:
- cifs-utils
state: "{% if 'mobile' in group_names %}present{% else %}mounted{% endif %}"
files: *nas_cifs_creds
# etc.
...
The tasks to actually setup the mounts (in my common
role) are:
- name: Setup additional mounts
block:
- name: Collate list of mounts
ansible.builtin.set_fact:
additional_mounts: "{{ additional_mounts + q('ansible.builtin.vars', item) }}"
loop: "{{ q('ansible.builtin.varnames', '^mount_.+') }}"
vars:
additional_mounts: [] # Fact will take precedence once defined
- block:
- name: Ensure required helper packages are installed
become: true
ansible.builtin.package:
name: "{{ item }}"
state: present
loop: "{{ additional_mounts | selectattr('helper_packages', 'defined') | map(attribute='helper_packages') | unique }}"
- name: Ensure directories for required files exist
become: true
ansible.builtin.file:
path: "{{ item.path | dirname }}"
owner: "{{ item.dir_owner | default(item.owner) }}"
group: "{{ item.dir_group | default(item.group) }}"
mode: "{{ item.dir_mode | default(item.mode) }}"
state: directory
loop: "{{ additional_mounts | selectattr('files', 'defined') | map(attribute='files') | flatten | unique }}"
- name: Create required files (e.g. credential files)
become: true
ansible.builtin.copy:
dest: "{{ item.path }}"
owner: "{{ item.owner }}"
group: "{{ item.group }}"
mode: "{{ item.mode }}"
content: "{{ item.content }}"
loop: "{{ additional_mounts | selectattr('files', 'defined') | map(attribute='files') | flatten | unique }}"
- name: Ensure mounts are in correct state
become: true
ansible.posix.mount:
path: '{{ item.path }}'
src: '{{ item.src }}'
fstype: '{{ item.fstype }}'
opts: '{{ item.opts | default("") }}'
boot: '{{ item.boot | default(true) }}'
state: '{{ item.state | default("mounted") }}'
loop: '{{ additional_mounts }}'
when: additional_mounts | default(false)
linux.hardware.pcspkr-blacklist
This module simply blacklists the pcspkr module to stop the systems beeping. I added this to the hardware
role in Ansible with a disable-pcspkr
task:
---
- name: pcspkr module is blacklisted and unloadable by modprobe
become: true
ansible.builtin.template:
dest: /etc/modprobe.d/pcspkr-blacklist.conf
src: blacklist-module.conf
mode: 00444
vars:
module_name: pcspkr
- name: pcspkr is not currently loaded
become: true
community.general.modprobe:
name: pcspkr
state: absent
...
The template file is very simple:
{{ ansible_managed | comment }}
blacklist {{ module_name }}
install {{ module_name }} /usr/bin/false
And I added disable-pcspkr
to the list of choices in the argument_specs.yaml
file.
I added this as a group variable on internal
hosts:
hardware_components:
- disable-pcspkr
and added it to the lists on hosts with host-level hardware_components
list.
linux.base-utils
The SaltStack version of this module installs:
- unzip
- vim
- tmux
- openssh-client
- less
- hwinfo
- lsof
- htop
- unrar (which is now called
unrar-free
in Debian) - imagemagick
- man
I already have a common
role that installs a number of these packages (amongst others, like bc
), so I just added the missing ones to the list (unzip
, openssh-client
, less
, hwinfo
, lsof
, unrar-free
).
imagemagick
was already part of my desktop
role, so I did not need to do anything to add that.
backuppc-client
This role configures clients for being backed-up with BackupPC. There are only 2 modules in my SaltStack configuration, remote-ssh
and local-sudo
. remote-ssh
places the BackupPC’s ssh key into the root
user’s authorized_keys
file, local-sudo
gives the backuppc
user permissions to run the two specific tar
commands that BackupPC uses for local back-ups through sudo
.
For the new, Ansible, version I decided to replace direct login as root with a specific user and sudo
premissions to run rsync only. It will be further locked down by specifying the ssh command in authorized_keys
, restricting the user to only running that command on login - inspired by a post I found on the BackupPC-Users mailing list.
I began by adding a backuppc
role - this will, eventually, also configure the server but for now I just added client configuration. My existing Windows systems are currently backed up with a packed up version of cygwin’s rsyncd provided by the BackupPC project, although this is not currently setup by any automation/configuration management too and as I noted before:
Modernising, by utilising volume-shallow-copies, and seeing if this can be done with native tools is on my very long to-do list but by the same token this same setup has been working excellently and unchanged for at least
1416 years.
Global role setup
As I knew what I plan to do from the start, with the two ways of backing up (for now), I began by creating the meta/argument_specs.yaml
, defaults/main.yaml
and tasks/client.yaml
(which will just include the relevant client configuration state).
meta/argument_specs.yaml
looks like:
---
argument_specs:
client:
description: Configures the host as a client to be backed up
options:
backuppc_method:
description: Method backuppc will use to backup this host. Note many method require information from ansible_facts.
type: str
default: remote_ssh_sudo_rsync
choices:
# XXX add this one (and should it use winrm or Window's native ssh? CIFS would be native, but is it the best choice?)
# - remote_winrm_vss_cifs
- remote_ssh_sudo_rsync
- local_sudo
backuppc_user:
description: User backuppc will login to this host as
type: str
default: backuppc
...
defaults/main.yaml
:
---
backuppc_user: backuppc
backuppc_method: remote_ssh_sudo_rsync
...
and tasks/client.yaml
:
---
- name: The appropriate client method desired state is included.
ansible.builtin.include_tasks: client_{{ backuppc_method }}.yaml
...
I also created a new group in the inventory, called backuppc_clients
and added a play in my site.yaml
to apply this role to it:
- name: Backups are configured
hosts: backuppc_clients
tasks:
- name: BackupPC client configuration is done
ansible.builtin.import_role:
name: backuppc
tasks_from: client.yaml
local-sudo
I began with this one as it only applies to one host (the backup server itself), making it less impactful, and requires no reworking from the existing SaltStack setup.
For now, I just duplicated the SaltStack configuration - this does not have a state for the backuppc
user, I presume I did this because the local backup method is only applied to the backup server itself which will have the user created when the BackupPC software is installed. As backuppc
was the first Salt state I ever made, it is also possible this was an oversight but, for now, I left it as-is and may revisit that decision when I migrate the server configuration management to Ansible.
The tasks/client_local_sudo.yaml
just ensures that the backuppc
’s user has the permissions it needs. While doing this, I discovered the limitation in the community.general.sudoers
Ansible module that prompeted me to create my own sudoers
role has been fixed in the released module so I removed this role.
---
- name: sudo is installed
become: true
ansible.builtin.package:
name: sudo
- name: BackupPC user is permitted to tar/untar via sudo
become: yes
community.general.sudoers:
name: backuppc
user: ''
runas: root
host: ""
commands:
# First command is the backup one, second is restore.
- /usr/bin/env LC_ALL=C /bin/tar -c -v -f - --totals -C *
- /usr/bin/env LC_ALL=C /bin/tar -x -p --numeric-owner --same-owner -v -f - -C *
...
In my backuppc_servers
role, I set the backup_method
to local_sudo
(by adding group_vars/backuppc_servers.yaml
):
---
backuppc_method: local_sudo
...
remote-ssh
Creating the tasks
In order to allow BackupPC to ssh into the remote system, the public ssh key it uses needs to be pushed out to each client. I began by adding this to the backuppc
role’s meta/argument_specs.yaml
for the client (I have not migrated the server side, which will need the private key deploying, yet):
backuppc_user_ssh_public_key:
description: SSH public key for backuppc user (only needed for ssh-based backups)
required: false
Next, I wanted to restrict the new backup user to only being able to run the specific commands required to perform the backup function. As it needs superuser (root
) level access, to be able to backup everything, I needed to ensure that these commands were the only commands it could run via Secure Shell (SSH) and sudo.
Initially I only set one command, by adding command="..."
to the key options in the user’s .ssh/authorized_keys
file but I discovered that my (default) settings mean different arguments are passed for full and incremental backups. I therefore modified this approach by adding a script based on ServerFault answer to restrict commands to the two rsync commands used by incremental and full backups to the role’s templates
folder:
#!/bin/sh
{{ ansible_managed | comment }}
# Restricts ssh session to only running full or increemental rsync commands
case "$SSH_ORIGINAL_COMMAND" in
"/usr/bin/sudo {{ backuppc_full_rsync_command }}")
$SSH_ORIGINAL_COMMAND
;;
"/usr/bin/sudo {{ backuppc_incr_rsync_command }}")
$SSH_ORIGINAL_COMMAND
;;
*)
echo "Access denied by ssh command filter script $0." >&2
exit 1
;;
esac
I determined the commands empirically, by adding ForceCommand logger -p user.notice "$SSH_ORIGINAL_COMMAND"
to /etc/ssh/sshd_config
on one of the clients and telling BackupPC to start a backup (full and then incremental), which caused the command to be logged.
With all this in place, the client_remote_ssh_sudo_rsync.yaml
tasks file became relatively straight-forward; in order to configure both sudo and ssh with the same commands I used a fact:
---
- name: rsync and sudo packages are installed
become: true
ansible.builtin.package:
name:
- rsync
- sudo
- name: BackupPC user exists
become: yes
ansible.builtin.user:
name: '{{ backuppc_user }}'
home: /var/lib/backuppc
password_lock: true
# rsync command Detemined by adding this to /etc/ssh/sshd_config on
# a client, inside a host match against the backuppc server:
# ForceCommand logger -p user.notice "$SSH_ORIGINAL_COMMAND"
- name: rsync command line is stored in reusable fact
ansible.builtin.set_fact:
backuppc_full_rsync_command: >-
/usr/bin/rsync
--server
--sender
-slHogDtpAXrce.iLsfxC
backuppc_incr_rsync_command: >-
/usr/bin/rsync
--server
--sender
-slHogDtpAXre.iLsfxC
- name: BackupPC user may run rsync command with sudo
become: yes
community.general.sudoers:
name: backuppc
user: '{{ backuppc_user }}'
runas: root
host: "{{ ansible_facts.hostname }}"
commands:
- '{{ backuppc_full_rsync_command }}'
- '{{ backuppc_incr_rsync_command }}'
- name: BackupPC restricted to running specific rsync commands
become: yes
ansible.builtin.template:
dest: /usr/local/bin/backup-restricted-ssh-commands
src: client_remote_ssh_sudo_rsync_commands.sh
# Do not allow backuppc user to modify this file
owner: root
group: root
mode: 00555
- name: BackupPC ssh keys are configured
become: yes
become_user: '{{ backuppc_user }}'
ansible.posix.authorized_key:
user: '{{ backuppc_user }}'
key: '{{ backuppc_user_ssh_public_key }}'
key_options: 'no-port-forwarding,no-X11-forwarding,no-agent-forwarding,command="/usr/local/bin/backup-restricted-ssh-commands"'
exclusive: true
...
Configuring ssh keys
I imported the backuppc user’s existing keys to my Hasicorp Vault:
vault kv put -mount=kv /backuppc/users/backuppc/ssh_keys private_key=@id_ed25519 public_key=@id_ed25519.pub
Once imported, I simply added a lookup to a new group_vars/backuppc_clients.yaml
file to make it available to hosts that need it:
---
backuppc_user_ssh_public_key: "{{ lookup('community.hashi_vault.vault_read', '/kv/backuppc/users/backuppc/ssh_keys').data.public_key }}"
...
Full role-out
After testing this, I rolled it out to all of my existing systems (I simply made servers
a child of backuppc_clients
in my inventory) and changed PermitRootLogin prohibit-password
to PermitRootLogin no
, as well as ensuring my keys are configured and disabling password login, on all of them.
On the BackupPC server, I changed the global RsyncClientPath
to /usr/bin/sudo /usr/bin/rsync
and RsyncSshArgs
to ['-e', '$sshPath -l backuppc']
(from ... -l root
) to make this the default for the rsync
method. This worked particularly well as systems I backup without root access were already overriden with the appropriate username argument.
ntp-client-local
Since I started working on the tasks documented in this post (2nd June 2023), I have already added ntp
client configuration to Ansible so this was done.
unix
This SaltStack module installs coreutils
and git
packages, which I added to my common
role as I did with linux.base-utils
.
unix.users.root
configured the root user password, which is already done during the “bootstrapping” step of setting up a host now - so this is already taken care of.
unix.cifs-creds
stores credentials for use when mounting CIFS filesystems. This has already been migrated to be part of a more generic “additional mounts” block in the common
role, which supports configuring arbitrary files, so is redundant.
unix.users.my-user and unix.users.user-local-config.dotfiles
unix.users.my-user
creates my own user, and unix.users.user-local-config.dotfiles
deploys a load of “dotfiles” from SaltStack. I migrated these to a new interactive
role and added a variable interactive_users
, which contains a list of users to configure.
Creating users
I added an entry point that configures one user (tasks/configure_user.yaml
), and made the default entry point (tasks/main.yaml
) call that for each of the interactive_users
. For the GECOS (a.k.a. “comment”) field, the adduser
script that Debian uses prompts for (in this order): full name, room number, work phone, home phone, other (which I am using for email address, per RedHat article on GECOS information). So I decided to support these via the role’s arguments for “real” (interactive) users.
The new role’s meta/argument_specs.yaml
looks like:
---
argument_specs:
configure_user:
description: Configures a single user expected to use an interactive login
options: &user_config
interactive_username:
description: Username for the user
type: str
required: true
interactive_password:
description: Hashed password for the user
type: str
required: true
interactive_groups:
description: List of groups user should be in (if they exist)
type: list
elements: str
default: []
interactive_home:
description: Home directory location for the user
type: str
default: "`'/home/' + interactive_username`"
interactive_full_name:
description: Full name of the user (GECOS field)
type: str
default: ''
interactive_room_number:
description: Room number of the user (GECOS field)
type: str
default: ''
interactive_work_phone:
description: Work phone number of the user (GECOS field)
type: str
default: ''
interactive_home_phone:
description: Home phone number of the user (GECOS field)
type: str
default: ''
interactive_email:
description: Email address of the user (GECOS field)
type: str
default: ''
main:
description: Configures users expected to use an interactive login
options:
interactive_users:
description: A list of users (each as per arguments to configure_user) to configure
type: list
required: true
elements: dict
options: *user_config
...
The main.yaml
(which will be expanded to do other things) just calls configure_user
for each user in interactive_users
. I choose to use include_role
as it will check the arguments again in this call, protecting me from making an error when I map the item
s arguments to the included task’s arguments:
---
- name: Configure interactive users
ansible.builtin.include_role:
name: interactive
tasks_from: configure_user
vars:
interactive_username: '{{ item.interactive_username }}'
interactive_password: '{{ item.interactive_password }}'
interactive_groups: '{{ item.interactive_groups }}'
interactive_host: "{{ item.interactive_home | default('/home/' + interactive_username ) }}"
interactive_full_name: "{{ item.interactive_full_name | default('') }}"
interactive_room_number: "{{ item.interactive_room_number | default('') }}"
interactive_work_phone: "{{ item.interactive_work_phone | default('') }}"
interactive_home_phone: "{{ item.interactive_home_phone | default('') }}"
interactive_email: "{{ item.interactive_email | default('') }}"
loop: '{{ interactive_users }}'
...
My SaltStack state used the optional_groups
feature of the user
state to deal with some groups that may, or may not, exist on the target host. I decided in my Ansible task to check for, and only specify, those groups that exist thereby make all groups (except the primary group) “optional” rather than differentiate optional/non-optional.
The original Salt state:
{% from 'jinja2/macros/user.j2' import if_deploy_my_user, my_user %}
{#
IMPORTANT
---------
This state should be entirely self-contained in order to boot-strap the user before other states that need it.
This means:
- no includes
- no requires/require_in/watch/watch_in/etc. involving states from other files.
#}
{# Ensure user is setup, if desired #}
{% call if_deploy_my_user() %}
{{ my_user() }}:
user:
- present
- fullname: Peppa Pig
- password: {{ salt['pillar.get']('my_user:password') }}
- shell: /bin/bash
- uid: 1000
- usergroup: True
- groups:
- {{ my_user() }}
- plugdev
- cdrom
- floppy
- audio
- video
- dip
- optional_groups:
# Fuse on laptop
- fuse
# Dialout on laptop (for USB serial link)
- dialout
# Groups on DS9 for storage
- store
# sudo access on Debian
- sudo
# Control network connections on Debian
- netdev
# Access salt state/pillar files without becomming root on salt master (see that state for where permissions are set)
- salt
# Adm has group read of most log files on Debian by default, so useful to be in that group
- adm
# use virsh and control VMs, useful on starbase2
- libvirt
- require:
- pkg: {{ my_user() }}
- group: {{ my_user() }}
group:
- present
- gid: 1000
pkg:
- installed
- name: bash
{% endcall %}
And my Ansible tasks/configure_user.yaml
file:
---
# Do this here, so even if already populated it is refreshed with
# any groups that have been created by tasks in the current run.
- name: Updated getent database is retrieved
ansible.builtin.getent:
database: group
- name: Final list of groups for the user is initialised
ansible.builtin.set_fact:
interactive_groups_final: []
- name: Final list of groups for the user is known
ansible.builtin.set_fact:
interactive_groups_final: >-
{{
interactive_groups_final
+
[check_group]
}}
loop: '{{ interactive_groups | default([]) }}'
loop_control:
loop_var: check_group
when: check_group in ansible_facts.getent_group.keys()
- name: Missing groups are reported
ansible.builtin.debug:
msg: >-
Groups not being added to {{ interactive_username }}
{% if interactive_full_name is defined %}({{ interactive_full_name }}){% endif %}
on {{ inventory_hostname }}
{% if inventory_hostname != ansible_host %}({{ ansible_host }}){% endif %}
because they do not exist:
{{ interactive_groups | difference(interactive_groups_final) }}
when: interactive_groups | difference(interactive_groups_final) | length != 0
- name: User is correctly configured
become: true
ansible.builtin.user:
name: '{{ interactive_username }}'
password: '{{ interactive_password }}'
# GECOS field: full name,room number,work phone,home phone,email
comment: "{{ interactive_full_name | default('') }},\
{{ interactive_room_number | default('') }},\
{{ interactive_work_phone | default('') }},\
{{ interactive_home_phone | default('') }},\
{{ interactive_email | default('') }}"
groups: '{{ interactive_groups_final }}'
home: "{{ interactive_home | default('/home/' + interactive_username ) }}"
move_home: true # Attempt to move old home if in the wrong place
...
As suggested by the Ansible FAQ, I used mkpasswd --method=sha-512
to generate the paswword hash to a file and imported it to my vault:
vault kv put -mount=kv interactive_users/$username hashed_password=@tmppass
For now. I added my user (my SaltStack version only supported one, this will allow me to configure serveral) to a new group’s variables in group_vars/interactive.yaml
- my goal is to populate this list entirely from the vault so not even the usernames are hardcoded (or visible in the Ansible configuration):
---
interactive_users:
- interactive_username: username
interactive_full_name: full name
interactive_password: "{{ lookup('community.hashi_vault.vault_read', '/kv/interactive_users/laurence').data.hashed_password }}"
interactive_groups:
- plugdev
- cdrom
- floppy
- audio
- video
- dip
- fuse
- dialout # For USB serial device
- store # Group for shared storage
- sudo # sudo access on Debian
- netdev # Control network devices on Debian
- adm # adm has read access to most logs in Debian by default
- libvirt # Use virsh and control VMs on starbase2
- vboxusers # Access to use VirtualBox
...
I applied the new role to this new interactive
group in my site.yaml
:
- name: Interactive systems are configured
hosts: interactive
tags: interactive
roles:
- interactive
and added the group to my inventory:
interactive:
hosts:
server1_I_interactively_login_to:
server2_I_interactively_login_to:
children:
desktops
There is an issue with this, as the new interactive role’s configure_user will only work on Linux yet I have a Windows (an one day, perhaps MacOS) system in my desktops
group. But I put that in “tomorrow’s problem” pile (and created an issue, titled “interactive role only works on Linux but is applied to Windows hosts too”, for it in my git frontend).
Pushing out configuration files
Over the years I have tried many ways of managing my “dotfiles” (user configuration files on *nix) in a more organised and co-ordinated way than copying them to each system I log into (which was my approach for a long time). From holding them in a naive Git repository to using GNU Stow (based on an article I found) with custom scripts to detect if software is installed or not and call Stow accordingly.
With my GNU Stow solution, I had split up my bashrc file into a modular file, with some files added to a modular directory based on detected software that was included with this code in my bashrc file:
if [ -d $HOME/.bashrc.d ]
then
for file in $HOME/.bashrc.d/*.bashrc
do
source $file
done
fi
The problem with this approach was on some systems, with shared storage, reading these small files significantly delayed login compared to the same commands in one large .bashrc
file. In the end, I replaced this approach with a templated .bashrc
in SaltStack that included the modular files directly, modified to incorporate the tests from my Stow based solution as wrappers for the commands that were previously ommitted entirely if the test did not pass.
It is this approach that I will also be using initially for Ansible - a direct SaltStack to Ansible migration of the existing setup, hopefully making this straightforward.
The SaltStack version uses some Jinja macros, with tests, to determine whether each configuration file needed to be deployed:
{%- macro user_config_file_present(name, target, test, source=None, template=None, contents_pillar=None) %}
{{ name }}-present:
file.managed:
- name: {{ my_user_home() }}/{{ target }}
{% if source -%}
- source: {{ source }}
{%- endif %}
{% if template -%}
- template: {{ template }}
{%- endif %}
{% if contents_pillar -%}
- contents_pillar: {{ contents_pillar }}
{%- endif %}
- user: {{ my_user() }}
- group: {{ my_group() }}
- mode: 400
- follow_symlinks: False
- makedirs: True
- dir_mode: 700
- onlyif:
- {{ test }}
{% endmacro -%}
{%- macro user_config_dir_present(name, target, test, source) %}
{{ name }}-present:
file.recurse:
- name: {{ my_user_home() }}/{{ target }}
- source: {{ source }}
- user: {{ my_user() }}
- group: {{ my_group() }}
- file_mode: 400
- dir_mode: 700
- clean: True
- replace: True
- onlyif:
- {{ test }}
{% endmacro -%}
{%- macro user_config_file_absent(name, target, test) %}
{{ name }}-absent:
file.absent:
- name: {{ my_user_home() }}/{{ target }}
- unless:
- {{ test }}
{% endmacro -%}
{%- macro user_config_file(name, target, test, source=None, template=None, contents_pillar=None) %}
{{ user_config_file_present(name, target, test, source, template, contents_pillar) }}
{{ user_config_file_absent(name, target, test) }}
{% endmacro -%}
{%- macro user_config_dir(name, target, test, source) %}
{{ user_config_dir_present(name, target, test, source) }}
{{ user_config_file_absent(name, target, test) }}
{% endmacro -%}
which was used, e.g. for Bash’ configuration:
{% from 'jinja2/macros/user.j2' import user_config_file %}
{{ user_config_file('bashrc', '.bashrc', 'which bash', 'salt://unix/users/user-local-config/dotfiles/bash/bashrc.j2', 'jinja') }}
To do similar with Ansible, I created a vars/dotfiles
directory in my interactive
role which contains metadata about the “dotfiles” I wish to deploy - currently it supports a list of tests (all of which much be true to deploy) and a list of files to push out. For example, for bash (vars/dotfiles/bash.yaml
):
---
dotfile_bash:
dotfile_tests:
- type: command_exists
name: bash
dotfile_files:
- src: bashrc
dest: ~/.bashrc
type: template
- src: bash_profile
dest: ~/.bash_profile
type: file
...
And for vim (vars/dotfiles/vim.yaml
):
---
dotfile_vim:
dotfile_tests:
- type: command_exists
name: vim
dotfile_files:
- src: vimrc
dest: ~/.vimrc
type: file
- src: vim/
dest: ~/.vim/
type: file
...
In the role’s tasks/main.yaml
, I load these variable files and the call a new entry point, deploy_dotfiles
once for each user:
- name: Dotfile metadata is loaded
ansible.builtin.include_vars:
name: dotfiles_metadata
dir: dotfiles
- name: Deploy dotfiles
ansible.builtin.include_role:
name: interactive
tasks_from: deploy_dotfiles
vars:
dotfile_target_user: '{{ item.interactive_username }}'
dotfiles: '{{ dotfiles_metadata.values() }}'
loop: '{{ interactive_users }}'
The new entry in meta/argument_specs.yaml
for the entry point is:
deploy_dotfiles:
description: Deploys a set of "dotfile" configuration files to an interactive user
options:
dotfile_target_user:
description: User to deploy dotfiles to
type: str
required: true
dotfiles:
description: List of dotfiles to deploy
type: list
required: true
elements: dict
options:
dotfile_tests:
description: List of tests, all of which must pass to deploy the dotfile(s)
type: list
default: []
elements: dict
options:
name:
description: Target of the test
type: str
required: true
type:
description: Type of test to apply to `name`
type: str
required: true
choices:
- command_exists
- directory_exists
- file_exists
dotfile_files:
description: List of files to deploy if tests pass (will be deleted if tests fail).
type: list
required: true
elements: dict
options:
src:
description: Source file to deploy
type: str
required: true
dest:
description: Destination to deploy file to
type: str
required: true
mode:
description: File mode (default umask will apply if not set)
required: false
type:
description: Method to use to deploy file
type: str
required: true
choices:
- file
- template
no_log:
description: Suppress logging (e.g. for files containing sensitive data)
type: bool
default: false
The tasks/deploy_dotfiles.yaml
loops over the list of dotfiles and includes a further task file, which does the file deployment:
---
- name: Each dotfile is deployed
ansible.builtin.include_tasks: deploy_dotfile.yaml
vars:
dotfile_target_file: '{{ dotfile.dotfile_target_file }}'
dotfile_tests: '{{ dotfile.dotfile_tests | default([]) }}'
dotfile_files: '{{ dotfile.dotfile_files }}'
loop: '{{ dotfiles }}'
loop_control:
# Avoid conflict with outer `item` when nested (e.g. looping over users)
loop_var: dotfile
...
Finally, the innermost tasks file replaced the macros I made for SaltStack:
—
- name: Test status is initialised to true
ansible.builtin.set_fact:
# Start with the test pass and try to prove it is not so
dotfile_test_result: true
- name: If commands exist is tested
ansible.builtin.shell: ‘command -v {{ dotfile_command_test.name | quote }}’
# Of course this might return failure, that’s the point.
failed_when: false
# `command -v` is always a read-only operation
changed_when: false
# Can be run in check mode
check_mode: false
register: dotfile_command_test_results
loop: "{{ dotfile_tests | selectattr('type', 'eq', 'command_exists') }}"
loop_control:
# Avoid conflict with outer `item` when nested (e.g. looping over users)
loop_var: dotfile_command_test
- name: Test status is updated
ansible.builtin.set_fact:
dotfile_test_result: false
when: dotfile_command_test_results.results | selectattr(‘rc’, ‘ne’, 0) | length > 0
- name: Files to be tested are statted
ansible.builtin.stat:
path: '{{ item.name }}'
loop: "{{ dotfile_tests | selectattr('type', 'in', ['file_exists', 'directory_exists']) }}"
register: dotfile_stat_results
- name: Test status is updated (non-existent directories)
ansible.builtin.set_fact:
dotfile_test_result: false
when: >-
dotfile_stat_results.results
| selectattr('item', 'in', dotfile_tests | selectattr('type', 'eq', 'directory_exists'))
| map(attribute='stat')
| selectattr('exists', 'eq', 'true')
| selectattr('isdir', 'defined')
| selectattr('isdir', 'eq', 'true')
| length
!=
dotfile_stat_results.results
| selectattr('item', 'in', dotfile_tests | selectattr('type', 'eq', 'directory_exists'))
| length
- name: Test status is updated (non-existent files)
ansible.builtin.set_fact:
dotfile_test_result: false
when: >-
dotfile_stat_results.results
| selectattr('item', 'in', dotfile_tests | selectattr('type', 'eq', 'file_exists'))
| map(attribute='stat')
| selectattr('exists', 'eq', 'false')
| length > 0
or
dotfile_stat_results.results
| selectattr('item', 'in', dotfile_tests | selectattr('type', 'eq', 'file_exists'))
| map(attribute='stat')
| selectattr('isdir', 'eq', 'true')
| length > 0
- name: Plain dotfiles are copied
become: yes
become_user: ‘{{ dotfile_target_user }}’
ansible.builtin.copy:
src: dotfiles/{{ plain_dotfile.src }}
dest: ‘{{ plain_dotfile.dest }}’
mode: '{{ plain_dotfile.mode | default(omit) }}'
loop: “{{ dotfile_files | selectattr(‘type’, ‘eq’, ‘file’) }}”
loop_control:
# Avoid conflict with outer `item` and `dotfile` when nested
# (e.g. looping over users and dotfiles)
loop_var: plain_dotfile
when: dotfile_test_result
no_log: '{{ plain_dotfile.no_log | default(false) }}'
- name: Templated dotfiles are deployed
become: yes
become_user: ‘{{ dotfile_target_user }}’
ansible.builtin.template:
src: dotfiles/{{ template_dotfile.src }}
dest: ‘{{ template_dotfile.dest }}’
mode: '{{ template_dotfile.mode | default(omit) }}'
loop: “{{ dotfile_files | selectattr(‘type’, ‘eq’, ‘template’) }}”
loop_control:
# Avoid conflict with outer `item` and `dotfile` when nested
# (e.g. looping over users and dotfiles)
loop_var: template_dotfile
when: dotfile_test_result
no_log: '{{ template_dotfile.no_log | default(false) }}'
- name: Redundant dotfiles are deleted
become: true
become_user: ‘{{ dotfile_target_user }}’
ansible.builtin.file:
name: ‘{{ redundant_dotfile.dest }}’
state: absent
loop: ‘{{ dotfile_files }}’
loop_control:
# Avoid conflict with outer `item` and `dotfile` when nested
# (e.g. looping over users and dotfiles)
loop_var: redundant_dotfile
when: not dotfile_test_result
no_log: '{{ redundant_dotfile.no_log | default(false) }}'
...
For the “dotfiles” themselves, it was just a case of putting them in the appropriate folder within the role - templates/dotfiles
for template
“type” files and files/dotfiles
for file
“type”. Directories (only from files
) can be copied recursively with the ansible.builtin.copy
module, so no separate support is required in the tasks
for this.
User-specific configuration
Some configuration files, such as .fetchmailrc
and .forward
etc., contain user-specific settings and so a generic config cannot just be pushed out.
To start addressing this, I added the templated dotfile metadata to my roles’s vars/dotfiles
, e.g. vars/dotfiles/fetchmail.yaml
:
---
dotfile_fetchmail:
dotfile_tests:
- type: command_exists
name: fetchmail
dotfile_files:
- src: fetchmailrc
dest: ~/.fetchmailrc
type: template
...
I then, needed to get the variables through to the template. To do this, I added a new option interactive_dotfile_data
to my role’s configure_user
entry point (later reused for main.yaml
, so it has to be added here even though configure_user
doesn’t currently use it), which defines the options available for all dotfiles (in this example, just for fetchmail
which was the first I implemented):
interactive_dotfile_data: &dotfile_config
description: User specific data for deploy_dotfiles' dotfile_user_data parameter
type: dict
default: {}
options:
fetchmail:
description: Settings for user's .fetchmailrc
type: list
default: []
elements: dict
options:
server:
description: Hostname for server to poll
type: str
required: true
protocol:
description: Protocol with which to connect to the server
type: str
default: imap
choices:
# Taken from fetchmail's man page
- auto
- pop2
- pop3
- apop
- rpop
- kpop
- sdps
- imap
- etrn
- odmr
remote_user:
description: Username to login to the server with
type: str
required: true
local_user:
description: Local user to receive fetched mail
type: str
required: true
ssl:
description: Connect with SSL
type: bool
default: true
Then, using YAML anchors, I also made these same options available as dotfile_user_data
to the deploy_dotfiles
entry point (which already takes dotfile_target_user
as an option):
dotfile_user_data: *dotfile_config
Finally, to pass the date from the user structure to the dotfiles deployment, I updated main.yaml
to pass interactive_dotfile_data
from interactive_users
through to deploy_dotfiles
(only change is adding dotfile_user_data
under vars
):
- name: Deploy dotfiles
ansible.builtin.include_role:
name: interactive
tasks_from: deploy_dotfiles
vars:
dotfile_target_user: '{{ item.interactive_username }}'
dotfile_user_data: '{{ item.interactive_dotfile_data | default({}) }}'
dotfiles: '{{ dotfiles_metadata.values() }}'
loop: '{{ interactive_users }}'
I then added appropriate settings for relvant configuration files to my interactive_users
configuration for my user (e.g.):
interactive_dotfile_data:
fetchmail:
- server: imap.somesystem.com
remote_user: my.login@somesystem.com
local_user: me@local.domain.tld
When I came to do .forward
and .netrc
, I added new tests to dotfile_tests
in the role’s meta/argument_specs.yaml
:
- dotfile_variable_list_not_empty
- dotfile_variable_string_not_empty
The tests themselves were relatively straight-forward:
- name: Test status is updated (empty string variables)
ansible.builtin.set_fact:
dotfile_test_result: false
loop: "{{ dotfile_tests | selectattr('type', 'eq', 'variable_string_not_empty') }}"
loop_control:
# Avoid conflict with outer `item` when nested (e.g. looping over users)
loop_var: dotfile_variable_test
when: dotfile_user_data[dotfile_variable_test.name] is not string or (dotfile_user_data[dotfile_variable_test.name] | length == 0)
- name: Test status is updated (empty list variables)
ansible.builtin.set_fact:
dotfile_test_result: false
loop: "{{ dotfile_tests | selectattr('type', 'eq', 'dotfile_variable_list_not_empty') }}"
loop_control:
# Avoid conflict with outer `item` when nested (e.g. looping over users)
loop_var: dotfile_variable_test
# Lists are iterable but so are strings and mapping types
# see: https://stackoverflow.com/a/39453833
when: >-
dotfile_variable_test.name not in dotfile_user_data
or
dotfile_user_data[dotfile_variable_test.name] is not iterable
or
dotfile_user_data[dotfile_variable_test.name] is string
or
dotfile_user_data[dotfile_variable_test.name] is mapping
or
(dotfile_user_data[dotfile_variable_test.name] | length == 0)
The config YAML file for this dotfile’s meta data (in the role’s vars/forward.yaml
) looks like this (the template file is literally just {{ dotfile_user_data.forward }}
):
---
dotfile_fetchmail:
dotfile_tests:
- type: dotfile_variable_string_not_empty
name: forward
dotfile_files:
- src: forward
dest: ~/.forward
type: template
...
Now, the content (and whether it exists) of a user’s .forward
file is controlled by whether a value is given for the setting, e.g.:
interactive_dotfile_data:
forward: my_user@some.domain.tld
I then repeated this pattern for my other dotfiles that contain user-specific data.
Host-specific user-specific configuration
Some settings are host specific, for example passwords stored in user’s .netrc
file. For these, I added variables named dotfile_<username>_<setting>
. For example:
dotfile_myuser_netrc:
- machine: outlook.office365.com
login: "{{ lookup('community.hashi_vault.vault_read', '/kv/interactive_users/myuser/netrc/outlook.office365.com').data.login }}"
password: "{{ lookup('community.hashi_vault.vault_read', '/kv/interactive_users/myuser/netrc/outlook.office365.com').data.password }}"
I modified site.yaml
to update the interactive_users
variable with the values from these variables before applying the interactive
role. As a result, the common settings are set in the group’s interactive_users
variable and host-specific additional and override values are in these dotfile_...
variables:
- name: Interactive systems are configured
hosts: interactive
tags: interactive
tasks:
- name: dotfile variable map is initialised
ansible.builtin.set_fact:
user_dotfile_data_map: {}
- name: dotfile variable map is populated
ansible.builtin.set_fact:
user_dotfile_data_map: >-
{{
user_dotfile_data_map
| combine({
user: user_dotfile_data_map[user] | default({})
| combine({
variable: value
})
})
}}
vars:
user: "{{ (item | split('_', 2))[1] }}"
variable: "{{ (item | split('_', 2))[2] }}"
value: "{{ lookup('ansible.builtin.vars', item) }}"
loop: "{{ q('ansible.builtin.varnames', '^dotfile_') }}"
- name: Convert interactive user list to username -> user info dictionary
ansible.builtin.set_fact:
interactive_user_dict: "{{ dict(interactive_users | map(attribute='interactive_username') | zip(interactive_users)) }}"
- name: Update interactive user dictionary
ansible.builtin.set_fact:
interactive_user_dict: >-
{{
interactive_user_dict
| combine({
item: interactive_user_dict[item]
| combine({
'interactive_dotfile_data':
interactive_user_dict[item].interactive_dotfile_data
| default({})
| combine(user_dotfile_data_map[item] | default({}))
})
})
}}
loop: "{{ interactive_users | map(attribute='interactive_username') }}"
- name: Update interactive users list
ansible.builtin.set_fact:
interactive_users: '{{ interactive_user_dict.values() }}'
- ansible.builtin.import_role:
name: interactive
Fin
Finally (this post was started on 2nd June, so only 2 months to get it done!), I have migrated all of the roles for my monitoring server to Ansible and can (at long last) actually start building out the bastion/jumphost that started me down this rabbit home back in 2022 (in a new post)…