Adding a bastion host - deploying HashiCorp Vault
Welcome to the fifth post in the rabbit-hole I disappeared down, trying to add a bastion host to my network for Ansible after trying to get started with Ansible for managing iPXE configuration in October. This post is about deploying HashiCorp Vault for secret management, beginning with the PKI certificates for Icinga monitoring.
As a recap, I am currently stalled migrating the monitoring salt roles from my states tree (nesting indicates where one role include another) that I am migrating to Ansible. Completed ones are crossed through:
server
remotely-accessible
(installs and configures fail2ban)monitoring.client
(installs and configures icinga2 client)monitoring.common
(installs monitoring-plugins and nagios-plugins-contrib packages, installs & configures munin client and nagios’ kernel & raid checks.)
monitoring.server
(installs and configures icinga2 server, php support for nginx, icingaweb2 and munin server)webserver
(installs and configures nginx)monitoring.common
(see above)
Mirroring the HashiCorp repository
As I am working in an air-gapped home lab, my first task was to get the Vault packages into the lab’s mirror. I presumed that this would be as simple as adding it to my mirror playbook with my debmirror
role. Unfortunately, HashiCorp’s Debian mirror is stored at the root of the mirror URL, https://apt.releases.hashicorp.com/, and not in a sub-directory (such as /debian
). debmirror
is unable to mirror such repositories due to not supporting an empty (or not specifed) root
directory. This has been reported as a bug to both Ubuntu and Debian. The Debian maintainer’s, entirely reasonable, stance is that because the official repositories are not structured this way, this is a feature request rather than a bug: “As a root directory is standard on official mirrors, I see the as a wishlist item.”
Rather than hacking the debmirror
script installed by the package, I found apt-mirror which does work with these repositories but it has less options for selecting what is mirror - I am unsure if this is resulting in a larger mirror than could be done with debmirror
but, if it is, I am sacrificing disk space for convenience (speed of moving forwards) in this case.
One other thing with apt-mirror
is that is always creates a subdirectory with the name of the host being mirrored, apt.releases.hashicorp.com/
in this case - just something to be mindful of when using it.
I added an apt-mirror
role to my mirroring playbook repository (fetch-key.yaml
is identical to the debmirror
role one - a horrible piece of duplication that could be solved by moving the key fetching to its own role):
---
- name: Install apt-mirror
become: yes
ansible.builtin.package:
name: apt-mirror
state: present
- name: Make target directory
ansible.builtin.file:
path: "{{ target.directory }}"
state: directory
- name: Make keys directory
ansible.builtin.file:
path: "{{ target.keyring_directory }}/keys.d"
state: directory
when: source['keys'] is defined
- name: Download keys for keyring
include_tasks: fetch-key.yaml
loop: "{{ source['keys'] | default([]) }}"
loop_control:
loop_var: key
- name: Delete any extra keys
ansible.builtin.file:
path: "{{ item }}"
state: absent
with_fileglob: "{{ target.keyring_directory }}/keys.d/*.key"
when: item.split('/')[-1].split('.')[0] not in source['keys'] | map(attribute='name') | list
- name: Delete old keyring
ansible.builtin.file:
path: "{{ target.keyring_directory }}/keyring.gpg"
state: absent
- name: Make keyring
ansible.builtin.shell: gpg --no-default-keyring --keyring {{ target.keyring_directory }}/keyring.gpg --import {{ item }}
with_fileglob: "{{ target.keyring_directory }}/keys.d/*.key"
- name: Make configuration directory
ansible.builtin.file:
path: "{{ target.mirror_list | dirname }}"
state: directory
- name: Make temporary directory
ansible.builtin.tempfile:
state: directory
register: apt_mirror_tempdir
- name: Make configuration file
ansible.builtin.template:
dest: "{{ target.mirror_list }}"
src: mirror.list.j2
- name: Run apt-mirror
ansible.builtin.command:
argv:
- apt-mirror
- "{{ target.mirror_list }}"
- name: Check download path exists for any additional files
ansible.builtin.file:
path: "{{ target.directory }}/{{ source.url | urlsplit('hostname') }}/{{ item | dirname }}"
state: directory
loop: "{{ selectors.additional_files | default([]) }}"
- name: Download any additional files from the mirror
ansible.builtin.get_url:
url: "{{ source.url }}/{{ item }}"
dest: "{{ target.directory }}/{{ source.url | urlsplit('hostname') }}/{{ item }}"
loop: "{{ selectors.additional_files | default([]) }}"
- name: Remove temporary files
ansible.builtin.file:
path: "{{ apt_mirror_tempdir.path }}"
state: absent
...
The mirror list template looks like this:
set base_path "{{ apt_mirror_tempdir.path }}"
set mirror_path "{{ target.directory }}"
{% for suite in selectors.suites %}
deb {{ source.url }} {{ suite }} {{ ' '.join(selectors.components) }}
{% for architecture in additional_architectures | default([]) %}
deb-{{ architecture }} {{ source.url }} {{ suite }} {{ ' '.join(selectors.components) }}
{% endfor %}
{% endfor %}
clean {{ source.url }}
and the argument specification follows the common pattern I adopted:
---
argument_specs:
main:
short_description: Main entry point for mirroring a repository with apt-mirror
options:
target:
description: Locations to download to
type: dict
required: true
options:
directory:
type: str
required: true
description: Directory to mirror to
keyring_directory:
type: str
required: true
description: Directory to download keys and store keyring in.
mirror_list:
type: str
required: true
description: Location to generate the config file for apt-mirror.
source:
description: Where to mirror from
type: dict
options:
url:
type: str
default: ftp.debian.org
description: Hostname to mirror from
keys:
type: list
elements: dict
options:
name:
type: str
required: true
description: Name of key (will be used for download filename)
url:
type: str
requires: true
description: Where to fetch the key from
selectors:
type: dict
required: yes
options:
suites:
type: list
elements: str
required: true
description: The list of suites to mirror
components:
type: list
elements: str
default: ['main']
description: The list of components to mirror
additional_architectures:
type: list
elements: str
default: []
description: The list of architectures to mirror
additional_files:
type: list
elements: str
default: []
description: List of additional files (relative to source -> url) to download to the mirror
...
To do the mirror, I added the apt and rpm repositories to the list of repositories to mirror (note I downloaded the gpg key as well):
# HashiCorp
- type: apt-mirror
target:
# apt-mirror automatically creates per-host sub-directories
directory: "{{ mirror_base_path }}/apt-mirror"
keyring_directory: "{{ mirror_base_path }}/keyrings/hashicorp"
mirror_list: "{{ mirror_base_path }}/apt-mirror/hashicorp-mirror.list"
source:
url: https://apt.releases.hashicorp.com
keys:
- name: hashicorp-archive-keyring
url: https://apt.releases.hashicorp.com/gpg
selectors:
additional_architectures:
- src
suites:
- bullseye
components:
- main
additional_files:
- gpg
- type: reposync
target:
directory: "{{ mirror_base_path }}"
yum_conf: "{{ mirror_base_path }}/yum-configs/hashicorp.yum.conf"
source:
repos:
# EL 7 repo broken 2023-01-23 - see https://discuss.hashicorp.com/t/404-error-from-rhel-repo/14427/9
# - name: hashicorp-el7
# description: HashiCorp-el7
# baseurl: https://rpm.releases.hashicorp.com/RHEL/7/x86_64/stable
# gpgkey: https://rpm.releases.hashicorp.com/gpg
- name: hashicorp-el8
description: HashiCorp-el8
baseurl: https://rpm.releases.hashicorp.com/RHEL/8/x86_64/stable
gpgkey: https://rpm.releases.hashicorp.com/gpg
At some point I want to either submit a patch to debmirror
to fix this itch, or migrate all of my debmirror
ed repositories to apt-mirror
and only maintain one method of mirroring these repositories.
Installing Vault
Once the repository is finally available, I could install Vault. The next question I faced was “where?”. Ideally, like the Bastion host, this would be a self-contained physical host doing nothing else however as I previously described I don’t have the luxury of spare physical systems (or disposable cash to buy more at the moment). The three systems I considered are my router, the monitoring system and the VM host (I do not want to run this in a VM as it will be critical to the infrastructure, so minimal parts that could break is desirable) - like the Bastion host, I concluded the monitoring box was the “least bad”. It is the easiest to harden and, although the router is also hardened as the boundary device, putting the Vault on a non-edge system immediately adds an extra layer that has to be broken through to compromise the vault from outside. I may revisit the decision not to put it in a VM, when I have had chance to fully think through and test a DR plan for Ansible with Vault - ultimately I am thinking that I may migrate to short-lived certificates for ssh access, using Vault and I would not want to create a situation where I could not login to the VM (which might have to be via the virtual console, accessed through the OS of the physical host the VM is running on) if something broke.
First I added the repository (these commands are taken more-or-less directly from HashiCorp’s installation tutorial, modified to use /usr/local/share/keyrings
instead of spaffing over /usr/share/keyrings
and the local mirror - I actually used Ansible to do this by writing an apt-repository
role that does these same steps in Ansible):
mkdir -p /usr/local/share/keyrings
wget -O- http://mirror/mirrors/apt.releases.hashicorp.com/gpg | gpg --dearmor >/usr/local/share/keyrings/hashicorp-archive-keyring.gpg
gpg --no-default-keyring --keyring /usr/local/share/keyrings/hashicorp-archive-keyring.gpg --fingerprint
# Check the fingerprint matches 798A EC65 4E5C 1542 8C8E 42EE AA16 FCBC A621 E701
echo "deb [signed-by=/usr/local/share/keyrings/hashicorp-archive-keyring.gpg] http://mirror/mirrors/apt.releases.hashicorp.com $(lsb_release -cs) main" >/etc/apt/sources.list.d/hashicorp.list
Then Vault can be installed (again, I did this in Ansible):
apt update
apt install vault
Configure Vault
Vault is configured in /etc/vault.d/vault.hcl
. I changed the default one to look like this:
storage "raft" {
path = "/opt/vault/data"
node_id = "node_hostname"
}
listener "tcp" {
address = "0.0.0.0:8200"
tls_cert_file = "/opt/vault/tls/tls.crt"
tls_key_file = "/opt/vault/tls/tls.key"
}
# Address to advertise to other vault servers for client
# redirection.
# I presumed this needs to resolve to this node specifically,
# rather than a generic alias for the cluster (e.g. "vault.fqdn")
api_addr = "https://node_host_fqdn:8200"
# For now, use loopback address as not clustering
cluster_addr = "http://127.0.0.1:8201"
SSL certificates
Vault needs an SSL certificate for its listener (“In production, Vault should always use TLS to provide secure communication between clients and the Vault server. It requires a certificate file and key file on each Vault host.”). In my production environment, I use Let’s Encrypt for most of my PKI (notable exceptions are closed systems, such a Icinga and OpenVPN) but this requires internet access.
In the long term, my plan is to create an Automatic Certificate Management Environment (ACME), a.k.a. RFC8555, server inside my lab (to enable testing with an ACME system, comparable to how the live system works) - perhaps using step-ca or boulder. In the short term, I added vault
as a subject alternative name to the Let’s Encrypt certificate for the live system and copied those certificates into the lab environment, adding the live hostname to /etc/hosts
as a stop-gap.
I also added a new dehydrated code-rack hook to allow the dehydrated
user to update the Vault certificates in /opt/vault/tls
(as the vault user, which owns the certificates).
The hook script is:
#!/bin/bash
SRCDIR=/var/lib/dehydrated/certs
TGTDIR=/opt/vault/tls
set -e
echo "This script ($0) will abort on first error." >&2
cat "$SRCDIR/$DOMAIN/fullchain.pem" | sudo -u vault /usr/bin/tee "$TGTDIR/tls.crt" >/dev/null
cat "$SRCDIR/$DOMAIN/privkey.pem" | sudo -u vault /usr/bin/tee "$TGTDIR/tls.key" >/dev/null
The Sudo permissions required are:
dehydrated hostname=(vault) NOPASSWD: /usr/bin/tee /opt/vault/tls/tls.crt, /usr/bin/tee /opt/vault/tls/tls.key
Firewall
On my most recently installed systems, I have firewalld running instead of using vanilla iptables
(or nftables).
To enable access to vault, I needed to add a new service and allow it:
firewall-cmd --permanent --new-service=vault
firewall-cmd --permanent --service=vault --set-description="HashiCorp Vault"
firewall-cmd --permanent --service=vault --set-short="HashiCorp Vault"
firewall-cmd --permanent --service=vault --add-port=8200/tcp
# This just adds it to the default zone - I will want to narrow this scope down in production
firewall-cmd --permanent --add-service=vault
# Reload to add changes to running config
firewall-cmd --reload
To do this with Ansible, I pushed out a service XML definition to /etc/firewalld/services/vault.xml
and used the, rather limited, ansible.posix.firewalld
module to add the service to the zone.
Initialise the Vault
Once vault is configured and the SSL certificates in place, I started the server:
systemctl start vault
For ease, I exported the VAULT_ADDR
environment variable (otherwise every command needs prefixing with it):
export VAULT_ADDR=https://vault.fqdn:8200
and ran the initialisation command:
vault operator init
This prints out the unseal keys and an initial root token - these will be required to unlock the vault in future so do not lose them!
The vault can then be “unsealed” (this command has to be run 3 times, by default, providing 3 of the 5 keys one at a time):
vault operator unseal
The initial root key can be used with the login command to do further work:
vault login
Exploring the Vault
We can see the enabled secret engines with vault secrets list
:
$ vault secrets list
Path Type Accessor Description
---- ---- -------- -----------
cubbyhole/ cubbyhole cubbyhole_78189996 per-token private secret storage
identity/ identity identity_ac07951e identity store
sys/ system system_adff0898 system endpoints used for control, policy and debugging
Secret engines and their mounts provide two functions:
- The engine provides a capability, for example key/value storage or interaction with external storage (e.g. cloud facilities).
- Isolation - each mount cannot, by design, access another mount. This means that the is enforced isolation between even the same engine mounted a two different paths.
And authentication methods with vault auth list
:
$ vault auth list
Path Type Accessor Description Version
---- ---- -------- ----------- -------
token/ token auth_token_68d761f00 token based credentials n/a
Enabling a secrets engine
For now, I just want to store some certificates. Vault has an entire secrets engine for certificate managementbut for now I just want to keep it simple and store/retrieve my existing ones - in the future I absolutely should let Vault manage the certificates dynamically.
Vault also has two versions of the key value engine, the first one only stores one version of each value (and more performant as a result, both for storage size and speed) while the second stores several versions and has soft-delete, and undelete operations. Again, for simplicity, I went with version one.
It is enabled with the vault secrets enable
command:
$ vault secrets enable -description="Key/Value secrets" kv
Success! Enabled the kv secrets engine at: kv/
Setting up policies
Access controls are centred around the idea of “policies”, which are a bit like access control list, that are linked to credentials to grant access.
I am going to start with two policies:
- Administrator which can create, update, delete, list and view secrets inside the
kv
store. - Ansible which can get secrets inside the
kv
store.
Once configured, this will cover most uses of the vault so I will rely on generating a “root token” using the unsealing shards for any other access, although these should not be routinely needed.
The policies can be listed with:
vault policy list
The are two default policies, root
and default
. The policy itself can be seen with (in this example, for default
):
vault policy read default
Policies are defined in HashiCorp Configuration Language (HCL) or JavaScript Object Notation (JSON). They also match the most specific match on the path - so more specific matches will override less specific (e.g. glob) matches, as shown in the documentation example. The capabilities are generally listed on the policies concepts page, although you also need to check each secrets engine for specifics (for example, not all engines distinguish between create
and update
). It would have been nice if the documentation for each engine contained a simple list of capabilities they support, for referring to when creating policies.
So my Administrator policy looks like this:
path "kv/*" {
capabilities = ["create", "update", "read", "list", "delete"]
}
and my Ansible policy is:
path "kv/*" {
capabilities = ["read"]
}
The policy can be created (or updated) by either putting it into a file and loading with:
vault policy write administrator administrator-policy.hcl
or providing via stdin, for example:
vault policy write ansible - <<EOF
path "kv/*" {
capabilities = ["read"]
}
EOF
Setting up user and application authentication
The accessor for the root token, generated when we initialised and the vault and used to authenticate to do this setup, exists in the vault and can be seen with:
vault list /auth/token/accessors
The current token (if token login is being used), including its id
- the key, can be found with:
vault token lookup
Information about the token associated with an accessor can be seen, however the id
will be hidden:
vault token lookup -accessor some_accessor_value
In general we do not want to use tokens, HashiCorp themselves have a good blog post on this topic, however their model would have Ansible retrieve a wrapped SecretID which is then sent to the system being managed and the system unwraps it, logins and then retrieves secrets directly. The difficulty with this is that it does not solve the challenge of how to authenticate Ansible in the first place. To make life simple, I will give Ansible a token for now and setup a username/password for my (admin) user.
Longer term, I suspect a token may remain the right way to authenticate Ansible itself and AppRoles used to pull secrets on the managed clients, further restricting secret access to just those hosts that consume the secrets (c.f. the model of giving Ansible access to everything to push the secrets out). Using a token, compromise of the Ansible token potentially compromises all secrets in the kv
store. I have slightly mitigated this by not permitting list
, so prior knowledge - which could be got from the Ansible playbook’s lookups - of what secrets can be retrieved is also required. In the scenario where the Ansible credentials, can be just get wrapped secret ids then the controlled systems retrieve their secrets directly (with each host restricted to secrets they have a need to know) then one might think that this is more secure. However Ansible can access each system, so compromise of Ansible as a whole (i.e. the vault token, the inventory and playbook(s) - presuming that is sufficient to access/configure all hosts) will still compromises all secrets.
Creating the token can be done directly:
vault token create -display-name=ansible -policy=ansible -orphan
or by creating a role, which is like a template for tokens, then using that to create the token:
vault write auth/token/roles/ansible allowed_policies=ansible orphan=true
vault token create -display-name=ansible -role=ansible
N.B. Making it an orphan stops it getting deleted when the parent token, which would be the token used to create it (in this case the root token), is deleted.
For the user, I enabled the userpass authentication method:
vault auth enable userpass
Then created my user (after storing a suitable password in the file passfile
, to avoid passing it on the command-line):
vault write auth/userpass/users/laurence policies=administrator password=@passfile
N.B. The password file must not have a tailing newline (\n
), or it will be included as part of the password. To save the file without a newline, with vim
, first set the mode to binary then turn off eol
. This will also remove the trailing newline from an already saved file.
:set binary
:set noeol
:wq
I saw tutorials online that show setting user passwords (and storing secrets) by passing these sensitive values on the command-line. I would strongly discourage this - by default on Linux any user can see the command line of any process (via /proc
), this can be restricted by setting hidepid
(to 1, process directories are inaccessible to all but root
and the owner, or 2, process directories are hidden completely from all except root
and their owner) on the /proc
mount - however unless the system has been hardened it is unlikely to be set (and can break some software, particularly installers, that assume all process are visible to all users).
Adding secrets to the vault
This can be done done with (this example stores the contents of the ca.crt
file in the certificate
key at path icinga/certs/ca/certificate
):
vault kv put -mount=kv icinga/certs/ca/certificate certificate=@ca.crt
Note that it is not possible to add to the keys at a path - any new put
will replace all existing keys, so one could not store the certificates key and then add the signed certificate at the same path without re-supplying the key to the put
command. This is with version 1, version 2 of the Key/Value secrets engine has a patch
command which will let you update it, however in this particular case I might want to apply different access rules to the key and the certificate, so this restriction actually encourages a sensible segregation.
The secrets (and keys) can be seen with (which doesn’t seem to support the new -mount=
style recommended with put
and get
):
vault kv list /kv
and retrieved with:
vault kv get -mount=kv icinga/certs/ca/certificate
or, to get a specific key:
vault kv get -mount=kv -field=certificate icinga/certs/ca/certificate
N.B. As some future point I will investigate using Vault’s PKI support to manage the certificate generation automatically, but one step at a time…
Mass import of existing certificates
Each host’s current icinga certificate and key are in a SaltStack Pillar data file named after the hosts fully qualified name with the dots (.
) replaced by hyphens (-
). For example the certificate and key for somehost.my.domain.tld
is in the file somehost-my-domain-tld.sls
. As these Pillar files are plain yaml, I wrote a simple python script to dump the certificates out to plain files, which I could then import directly with a small bash loop.
The format of each host’s certificate data is:
---
icinga2:
host:
certificate: |
certificate_data
key: |
key_data
...
Where certificate_data
and key_data
are the actual certificate and key. Note that in this yaml host
is literal (the actual word “host”), so the state just uses pillar.icinga2.host.certificate
and pillar.icinga2.host.key
with no need to interpolate the hostname. The correct pillar file is include using the {% include 'monitoring/icinga2/certificates/' + opts.id.replace('.', '-') + '.sls' ignore missing %}
jinja2 recipe for host-specific values to general settings (like certificates where the value is host-specific but the same key used for all hosts) that I have used generally in my pillar.
The Python script to extract the certificate and key, writing the values to files name host.crt
and host.key
(in this case host
is a placeholder for the actual fully-qualified host name), is this:
#!/usr/bin/env python3
from pathlib import Path
import yaml
for f in Path('.').glob('*.sls'):
with open(f, 'r') as fh:
certs = yaml.safe_load(fh)
# Convert the '-' based filename back to the hosts name.
# Of course this is not right if the original hostname has a hyphen.
hostname = f.stem.replace('-', '.')
with open(hostname + '.crt', 'w') as fh:
fh.write(certs['icinga2']['host']['certificate'])
with open(hostname + '.key', 'w') as fh:
fh.write(certs['icinga2']['host']['key'])
print(hostname, "done")
The bash loop to import them all into vault is then:
for file in *.crt
do
base="$( basename "$file" .crt)"
vault kv put -mount=kv icinga/certs/hosts/$base/certificate certificate=@$base.crt
vault kv put -mount=kv icinga/certs/hosts/$base/key key=@$base.key
done
Generating and import new certificates
In the lab, the live network’s certificates do not match the domain (which has lab.
prefixed) so new certificates were required. I did this using the method documented when I setup icinga2 to generate them.
After manually creating an importing the CA, I did script doing the clients (this presumes you are already authenticated to the vault):
#!/bin/bash
# Standard bash script safety:
# - abort on error
# - no uninitialised variable use
# - disable globbing
# - errors in a pipe cause the whole pipe to fail
set -eufo pipefail
host=$1
if [[ -z $host ]]
then
cat - >& <<EOF
Usage: $0 hostname.domain.tld
hostname.domain.tld should be a fully qualified name (will be the subject of the certificate)
EOF
exit 1
fi
sudo -u nagios icinga2 pki new-cert --cn $host --key $host.key --csr $host.csr
sudo -u nagios icinga2 pki sign-csr --csr $host.csr --cert $host.crt
# chown the files so the current user can import them to vault
sudo chown $USER $host.{key,csr,crt}
vault kv put -mount=kv /icinga/certs/hosts/$host/certificate certificate=@$host.crt
vault kv put -mount=kv /icinga/certs/hosts/$host/key key=@$host.key
echo "Generated and imported key and certificate for $host"
rm $host.{crt,key,csr}
The script can be used, e.g. for the current host: bash generate-import-vault-icinga-cert.bash $( hostname -f )
.
Integration with Ansible
Ansible includes a Vault lookup plugin, however it has been replaced by a new collection. I haven’t yet mirrored any collections into my air-gapped lab, so I stuck with the bundled plugin. It requires the hvac
python library to be installed - so I added that to my requirements.txt
and installed it.
To use, it is like any other lookup plugin (for example):
ansible.builtin.file:
#...
content: "{{ lookup('hashi_vault', 'secret=kv/icinga/certs/ca/certificate:certificate') }}"
The plugin will use environment variables, such as VAULT_ADDR
, so no specific configuration inside the Ansible playbooks is needed. As long as the vault address is set in the environment and authentication has been done with vault login
, the lookups will just work.
Debugging vault
To debug vault, I turned on vault’s file audit device:
# Make a directory the `vault` user can write to
mkdir /var/log/vault
chown vault /var/log/vault
vault audit enable file file_path=/var/log/vault/audit.log
It is quite verbose, so I turned it off again once I had it working:
vault audit disable file
Backing up
Obviously backing up the Vault is very important. Built-in automated backups are an Enterprise version feature, so I need to do it manually.
To do this, I first created a new policy (which I called backup
) and token for backups based on a recipe I found online:
path "sys/storage/raft/snapshot" {
capabilities = ["read"]
}
I then created the token - as before I considered using an AppRole but it does not really gain anything over a token to have a static role and secret id to manage at this stage, as this is for a periodic cron job, which will need to authenticate regularly rather then being provided with time-limited secret on startup and not needing it again:
vault write auth/token/roles/backup allowed_policies=backup orphan=true
vault token create -display-name=backup -role=backup
In the longer term, I think that I will want to deploy an AppRole with the role id deployed e.g. on install or manually (although Ansible could alert to the missing presence of it) and a wrapped secret id pushed out periodically by Ansible but this requires having everything in place for Ansible to automatically update the wrapped secret in time for the backup to take place. What I don’t want to do is allow the backup to be able to use its own secret to refresh it because then compromise of the backup secret allows infinite regeneration of new backup tokens, where as requiring a supervisor (e.g. Ansible) to generate a secret that (for example - per the recipe above) grants a ticket that can only be used twice (once to login, once to perform one act - such as dump the backup) significantly limits the damage that can be done with a secret.
Anyway, for now I used a magic token, which I placed in a file owned, and only readable, by a new user called vault-backup
(least privilege - only the script doing the backup needs to know the token but also it does not need to be root, so a new unprivileged user for the purpose seems the most secure option).
Backup script
I put settings into /etc/vault-backup.conf
, which included VAULT_ADDR
and VAULT_TOKEN
, and made sure the file was owned and only readable by the vault-backup
user:
VAULT_ADDR=https://vault.fqdn:8200
VAULT_TOKEN=12345
The script I placed in /usr/local/sbin
(called vault-backup
):
#!/bin/bash
# Standard bash script safety
set -eufo pipefail
if [[ $UID -eq 0 ]]
then
echo "This script should be run as the vault-backup user, not root." >&2
exit 1
fi
# Default settings, which may be overridden in /etc/vault-backup.conf
BACKUP_DIR=/srv/backups/vault # Where to store the backups?
KEEP_BACKUPS=7 # How many backups to keep?
# Get settings (which should include VAULT_ADDR and VAULT_TOKEN)
source /etc/vault-backup.conf
# Export any 'VAULT_'y variables (variables whose names start
# VAULT_)
export ${!VAULT_*}
target_file="${BACKUP_DIR}/backup-$( date +%FT%H%M ).snap"
echo "Taking vault snapshot, saving to ${target_file}."
vault operator raft snapshot save "${target_file}"
ln -fs "${target_file}" "${BACKUP_DIR}/backup-latest.snap"
# Check to see if there are more than KEEP_BACKUPS backups
set +f # Enable globbing
backup_files=( "${BACKUP_DIR}"/backup-[0-9][0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9]T[0-9][0-9][0-9][0-9].snap )
set -f # Done globbing
backup_count="${#backup_files[@]}"
echo "There are now ${backup_count} backups."
if [[ $backup_count -gt $KEEP_BACKUPS ]]
then
echo "There are more than ${KEEP_BACKUPS} (${backup_count}) backups. Removing some:"
# As globbing is ascii-betical (per bash man page on Pathname
# Expansion) the files will be in ascending age order (due to the
# format of the date and time in filename) so can just remove
# those elements before the ones we want to keep in the array.
for idx in ${!backup_file[@]}
do
if [[ $idx -lt $(( backup_count - KEEP_BACKUPS )) ]]
then
echo "Removing ${backup_files[idx]}..."
rm "${backup_files[idx]}"
fi
done
echo "Finished removing old backups."
fi
I then created a cronjob, using my cron-wrapper to only email output if it fails, I added a cronjob (in /etc/cron.d/vault-backup
) for it to run daily at 2am:
0 2 * * * vault-backup /usr/local/bin/cron-wrapper /usr/local/sbin/vault-backup
Monitoring backup
Finally, I added a check to Icinga that verifies the backup has been updated in the last 24 hours. This involved adding a service check by creating services-hashicorp-vault.conf
:
apply Service "check-vault-backup" {
import "generic-service"
check_command = "file_age"
command_endpoint = host.name // Execute on client
vars.file_age_file = "/srv/backups/vault/backup-latest.snap"
vars.file_age_warning_time = 86400 // 1 day (24 hours, 86,400s)
vars.file_age_critical_time = 129600 // 1.5 days (36 hours, 129,600s)
assign where host.vars.services && "hashicorp-vault" in host.vars.services
}
Then added hashicorp-vault
to the list of services on the relevant host:
object Host "xxxxxx.domain.tld" {
vars.services = [
"hashicorp-vault"
]
}
Finally, I changed the ownership of the /srv/backups/vault
directory to vault-backup:nagios
with mode 0750
- so the nagios
user can read the state of files but not modify (e.g. delete) them. The backup files themselves are owned by vault-backup:vault-backup
so this does not grant the nagios
group members access to the contents of the backups.
Revoking the initial root token
Finally, I revoked the initial root token:
vault token revoke -self
If a new root token is required, in the future, one can be generated using the unseal keys and the operator command. This process must be started with the -init
option, which will print a nonce and one-time password (OTP):
vault operator generate-root -init
The nonce value will be required to continue the process with at least 2 other unseal keys (the command will prompt for the nonce and then unseal key - the nonce can be auto-completed):
vault operator generate-root
The final user will be given the encoded token, which will need to be passed to the -decode
option with the OTP to display the actual token:
vault operator generate-root -decode=encoded_token -otp=OTP_from_init
Once Vault was up and running, I carried on with finishing migrating monitoring configuration to Ansible.