Migrate monitoring to VM

Hot on the heals of migrating BackupPC from a bare-metal (router) system to a VM, the next set of services to migrate are my monitoring services. For this I run two seperate systems, Icinga2 and Munin. Icinga excels at monitoring and alerting to problems and faults as they happen, Munin provides resource monitoring and graphing that gives a better view of “what just happened to kill our performance?”(quoted from their website) and historic views of the same.

The plan

The plan is simple:

build a new VM (I now have this all integrated into SaltStack, from the BackupPC work, so I just need to define it in the pillar for my host)
stop both tools on the old host (the router)
install Icinga and Munin on the new host
copy all the existing data over

The new VM

Munin regularly uses quite a bit of CPU to generate its graphs, however this is also a very bursty workload. The memory requirements of the processes, judged from looking at what is actively running on the router, are also very low so I decided to give them a VM with 2-cores and 2GB RAM. If this proves insufficient, it can easily be increased in the future (and the monitoring systems will help to assess that).

Stop tools on old host

systemctl stop icinga2
sed -i -e 's/^/#/' /etc/cron.d/munin

Install on new host

Added ‘munin.server’ and ‘icinga.server’ roles to new host and let SaltStack deploy.

Copy existing data over

On the old system

sudo tar -C /var/lib -f monitoring-data.tgz -zc icinga2 munin
sudo tar -C /etc -f monitoring-config.tgz -zc icinga2 munin
sudo -u postgres pg_dump -f icinga2.pgdump -d icinga2
scp monitoring-data.tgz monitoring-config.tgz icinga2.pgdump new_host:

On the new system:

sudo tar -C /var/lib -zxf monitoring-data.tgz 
sudo tar -C /etc -zxf monitoring-config.tgz
# Note - create icinga2 role and database first
sudo -u postgres postgres < icinga2.pgdump

Unforeseen work

I had to do a bit of work to get the hosts updated in all of the configuration files - the router was down as localhost, 127.0.0.1 or ::1 in most of them. I also had to create new firewall rules to allow the new monitoring server to ping into the management VLAN (being the router, these requests did not have to traverse VLANs before as the router is present on all of them (so it can route between)).

I also had to tweak the VPN configuration to give my cloudy servers fixed VPN IP addresses - the old set-up read the IP addresses from /var/run/openvpn/server.status but as the VPN endpoint (router) is now a separate system this was not easily workable. Instead, I allocated fixed IPs in the VPN subnet and added them to the home network’s DNS. At the same time I added pings to these addresses to the monitoring, resolving a long-standing open issue that I was not monitoring their VPN connection status.