When restarted, Hasicorp Vault starts “sealed” and has to be unsealed to make the contents accessible. I have, on occasion, forgotten to do this (usually after a reboot for kernel update) and so I want to add checks for this situation to my monitoring. My current monitoring solution is Icinga, so I am creating a check for this.

The check script

This is a simple bash script that uses Vault’s vault command and jq to check the vault at a given URL is contactable and is seal status.

#!/bin/bash

if ! which vault &>/dev/null
then
  echo "Unable to locate vault command line client - cannot continue" >&1
  exit 3  # Usage/internal error
fi

if ! which jq &>/dev/null
then
  echo "Unable to locate jq (to parse vault output) - cannot continue" >&1
  exit 3  # Usage/internal error 
fi

if [[ $1 == "--help" ]]
then
  echo "Usage: $0 [-v [-v]] vault_url_to_check"
  echo "This plugin requires vault and jq commands to be installed, and"
  echo "network access to the vault."
  echo "-v: increase the verbosity level one level for each '-v' (up to 2)"
  exit 0
fi

VERBOSITY_LEVEL=0
if [[ $1 == "-v" ]]
then
  while [[ $1 == "-v" ]]
  do
    VERBOSITY_LEVEL=$(( VERBOSITY_LEVEL + 1 ))
    shift
  done
  if [[ $VERBOSITY_LEVEL -gt 2 ]]
  then
    echo "Cannot increase verbosity beyond 2!"
    # Could just set it down to 2 but this is a usage error...
    exit 3  # Usage/internal error
  fi
fi

VAULT_ADDRESS="$1"

if [[ -z "${VAULT_ADDRESS}" ]]
then
    echo "No vault url provided - cannot continue" >&2
    exit 3  # Usage/internal error
fi

if [[ $VERBOSITY_LEVEL -ge 2 ]]
then
  echo "Checking vault at ${VAULT_ADDRESS}."
fi

VAULT_OUTPUT="$(vault status -address="${VAULT_ADDRESS}" -format=json 2>&1)"

if [[ "${VAULT_OUTPUT}" == "Error"* ]]
then
    echo "CRITICAL: Error connecting to vault: ${VAULT_OUTPUT}"
    exit 2  # Critical
fi

VAULT_SEALED="$( echo "${VAULT_OUTPUT}" | jq -r .sealed )"
VAULT_VERSION="$( echo "${VAULT_OUTPUT}" | jq -r .version )"

if [[ $VERBOSITY_LEVEL -ge 1 ]]
then
  echo "Vault version is ${VAULT_VERSION}."
fi

if [[ "${VAULT_SEALED}" == "false" ]]
then
    echo "OK: Vault is unsealed (version ${VAULT_VERSION})"
    exit 0  # All good
else
    # Vault is sealed...
    VAULT_THRESHOLD="$( echo "${VAULT_OUTPUT}" | jq -r .t )"
    VAULT_UNSEAL_PROGRESS="$( echo "${VAULT_OUTPUT}" | jq -r .progress )"
    echo "WARNING: Vault is sealed" \
      "(${VAULT_UNSEAL_PROGRESS}/${VAULT_THRESHOLD} unseal keys provided," \
      "version ${VAULT_VERSION})"
    exit 1  # Warning
fi

Icinga configuration

This is modelled on the Nextcloud version check I previously created.

Firstly the command needs to be defined, which I did in commands-hashicorp-vault.conf:

object CheckCommand "hashicorp-vault" {
	import "plugin-check-command"

	command = [ PluginContribDir + "/check_hashicorp_vault", "$vault_url$" ]
}

And the service mapped to hosts with a list of vault urls to check defined, which I did in services-hashicorp-vault.conf (in addition to the existing backup age check for the vault servers themselves):

apply Service for (vault_url in host.vars.vault_urls) {
  import "generic-service"

  check_command = "hashicorp-vault"

  vars.vault_url = vault_url
}

To monitor, add a list of servers to a host (I attached them to the host that they are served from - as that makes clear where the problem lies):

object Host "myhost.home.domain.tld" {
    // ...
    vars.vault_urls = ["https://vault.home.mydomain.tld:8200"]
    // ...
}

Check screenshots

They say a picture paints a thousand words, so here’s some screenshots of the script and configuration (above) in action.

First, everything is fine:

Vault OK

Next, the daemon has been restarted and the vault is sealed:

Vault sealed

The daemon has been stopped and the vault service cannot be contacted:

Vault dead

Finally, the vault is being unsealed but insufficient keys have been provided so far:

Unseal in progress