Icinga2 check for Hashicorp Vault
When restarted, Hasicorp Vault starts “sealed” and has to be unsealed to make the contents accessible. I have, on occasion, forgotten to do this (usually after a reboot for kernel update) and so I want to add checks for this situation to my monitoring. My current monitoring solution is Icinga, so I am creating a check for this.
The check script
This is a simple bash script that uses Vault’s vault
command and jq to check the vault at a given URL is contactable and is seal status.
#!/bin/bash
if ! which vault &>/dev/null
then
echo "Unable to locate vault command line client - cannot continue" >&1
exit 3 # Usage/internal error
fi
if ! which jq &>/dev/null
then
echo "Unable to locate jq (to parse vault output) - cannot continue" >&1
exit 3 # Usage/internal error
fi
if [[ $1 == "--help" ]]
then
echo "Usage: $0 [-v [-v]] vault_url_to_check"
echo "This plugin requires vault and jq commands to be installed, and"
echo "network access to the vault."
echo "-v: increase the verbosity level one level for each '-v' (up to 2)"
exit 0
fi
VERBOSITY_LEVEL=0
if [[ $1 == "-v" ]]
then
while [[ $1 == "-v" ]]
do
VERBOSITY_LEVEL=$(( VERBOSITY_LEVEL + 1 ))
shift
done
if [[ $VERBOSITY_LEVEL -gt 2 ]]
then
echo "Cannot increase verbosity beyond 2!"
# Could just set it down to 2 but this is a usage error...
exit 3 # Usage/internal error
fi
fi
VAULT_ADDRESS="$1"
if [[ -z "${VAULT_ADDRESS}" ]]
then
echo "No vault url provided - cannot continue" >&2
exit 3 # Usage/internal error
fi
if [[ $VERBOSITY_LEVEL -ge 2 ]]
then
echo "Checking vault at ${VAULT_ADDRESS}."
fi
VAULT_OUTPUT="$(vault status -address="${VAULT_ADDRESS}" -format=json 2>&1)"
if [[ "${VAULT_OUTPUT}" == "Error"* ]]
then
echo "CRITICAL: Error connecting to vault: ${VAULT_OUTPUT}"
exit 2 # Critical
fi
VAULT_SEALED="$( echo "${VAULT_OUTPUT}" | jq -r .sealed )"
VAULT_VERSION="$( echo "${VAULT_OUTPUT}" | jq -r .version )"
if [[ $VERBOSITY_LEVEL -ge 1 ]]
then
echo "Vault version is ${VAULT_VERSION}."
fi
if [[ "${VAULT_SEALED}" == "false" ]]
then
echo "OK: Vault is unsealed (version ${VAULT_VERSION})"
exit 0 # All good
else
# Vault is sealed...
VAULT_THRESHOLD="$( echo "${VAULT_OUTPUT}" | jq -r .t )"
VAULT_UNSEAL_PROGRESS="$( echo "${VAULT_OUTPUT}" | jq -r .progress )"
echo "WARNING: Vault is sealed" \
"(${VAULT_UNSEAL_PROGRESS}/${VAULT_THRESHOLD} unseal keys provided," \
"version ${VAULT_VERSION})"
exit 1 # Warning
fi
Icinga configuration
This is modelled on the Nextcloud version check I previously created.
Firstly the command needs to be defined, which I did in commands-hashicorp-vault.conf
:
object CheckCommand "hashicorp-vault" {
import "plugin-check-command"
command = [ PluginContribDir + "/check_hashicorp_vault", "$vault_url$" ]
}
And the service mapped to hosts with a list of vault urls to check defined, which I did in services-hashicorp-vault.conf
(in addition to the existing backup age check for the vault servers themselves):
apply Service for (vault_url in host.vars.vault_urls) {
import "generic-service"
check_command = "hashicorp-vault"
vars.vault_url = vault_url
}
To monitor, add a list of servers to a host (I attached them to the host that they are served from - as that makes clear where the problem lies):
object Host "myhost.home.domain.tld" {
// ...
vars.vault_urls = ["https://vault.home.mydomain.tld:8200"]
// ...
}
Check screenshots
They say a picture paints a thousand words, so here’s some screenshots of the script and configuration (above) in action.
First, everything is fine:
Next, the daemon has been restarted and the vault is sealed:
The daemon has been stopped and the vault service cannot be contacted:
Finally, the vault is being unsealed but insufficient keys have been provided so far: