Monitor public and Ceph private network from Proxmox hosts with Icinga
Several times, I have been tripped up by the 2.5G USB3 network dongles not connecting properly in my Proxmox Virtual Environment cluster, affecting performance. To combat this, I have added checks to my Icinga monitoring so all ProxmoxVE hosts now monitor (via ping) all other ProxmoxVE hosts via both their “normal” (1G) network connection and the Ceph-private (2.5G) networks. This was interesting to write, as it turned out to be relatively complicated to append -ceph to the hostname portion of the Icinga Host name (which, as recommended by Icinga’s documentation, are the fully-qualified host names).
For context, the -ceph hostnames (for the Ceph private network addresses) are in /etc/hosts on each of the Proxmox hosts. Those network dongles are plugged into a 2.5G unmanaged switch that has no other devices attached (it is dedicated for the internal Ceph private network).
The final configuration I came up with was this, note that I had to split the name on . then remove the first item (as there is no way to select a slice of an array) before rejoining (because there is no way to limit the number of parts split splits into):
// Ping all proxmox hosts
apply Service "ping-" for (target in get_objects(Host).filter((host) => host.vars.services && "proxmox-ve" in host.vars.services).map((host) => host.name)) {
import "generic-service"
check_command = "ping"
command_endpoint = host.name
vars.ping_address = target
assign where host.vars.services && "proxmox-ve" in host.vars.services
}
// Ping all proxmox hosts' ceph private network interfaces
apply Service "ping-ceph-" for (target in get_objects(Host).filter((host) => host.vars.services && "proxmox-ve" in host.vars.services).map((host) => host.name)) {
import "generic-service"
check_command = "ping"
command_endpoint = host.name
// Extract the domain from the host's FQDN
var domain_parts = target.split(".")
domain_parts.remove(0)
var domain = domain_parts.join(".")
vars.ping_address = target.split(".")[0] + "-ceph." + domain
assign where host.vars.services && "proxmox-ve" in host.vars.services
}
The hosts already had a vars.services array listing some of the services on them (e.g. hashicorp-vault to monitor my Hashicorp Vault cluster), so I just added proxmox-ve to them and the all automatically got these checks to start checking all of the hosts with that service.