Last night we had a power-cut that lasted approximately 20 minutes, about 5 minutes into which I started my usual routine of manually shutting down systems to shed load (and hence prolong runtime for our broadband infrastructure) from my UPS. At around 15 minutes into the power-cut I started to get a bit twitchy about how much runtime was left on the UPS and finally go around to doing something about it (not the ideal conditions to be setting it up, but a good motivator).

Monitoring the UPS, an APC Back-UPS BX1400UI, from the Linux router into which it is plugged I did with the Network UPS Tools (NUT).

There are two other systems attached to the UPS (plus core and wireless network infrastructure), one of which is another Linux host running a collection of virtual machines (VMs) that I installed NUT as a client on and a NetGear ReadyNAS 214 NAS, which initially I configured its internal NAS support to point to the NUT server. My longer-term plan is for the VM host to shut-down the VMs (one of which uses an iSCSI volume on the NAS) and then the NAS before shutting itself down within the first 5 minutes of a power failure, giving maximum runtime for the kit that allows us to remain connected to the internet.

I also configured my monitoring and alerting tools, Munin and Icinga, to monitor the UPS.

NUT

The server

The server needs both the server and client packages installing:

apt-get install nut-server nut-client

Next, I configured how NUT is deployed - on this system as a network server (other options are standalone and network client) - by setting the MODE in /etc/nut/nut.conf (note I changed this line, this is not the only contents of that file):

MODE=netserver

The locally attached UPS also needs configuring, this is done by adding an entry to /etc/nut/ups.conf (again, this is not the only contents of the file so be careful to add these lines, not replace the entire file with them):

[bx1400ui]
driver = usbhid-ups
port = auto
desc = "APC Back-UPS BX1400UI"

Beware some software, such as the ReadyNAS’ system, does not like non alphanumeric characters (e.g. underscores or dashes (‘_’ or ‘-‘)) in the UPS name.

You can find the appropriate driver for your UPS from NUT’s hardware compatibility list. You will probably need to specify a port (an stty from /dev) if the UPS is attached via a serial cable.

As this system is going to be a network server for others, I also needed to configure what addresses to listen on in /etc/nut/upsd.conf (192.168.0.1 is an example internal IP, you will need to choose the appropriate one for the network interface you want to listen on):

LISTEN 127.0.0.1 3493
LISTEN ::1 3493
LISTEN 192.168.0.1 3493

Do not forget to update the firewall to allow your clients to connect to port 3493 (NUT’s default port) on the server.

And configure some users to be able to interact with the NUT server. I created an ‘admin’ (full access), ‘monmaster’ (for the master, which should be this machine in this setup) and ‘monslave’ (for the other hosts). There should be one master. The users go in /etc/nut/upsd.users:

[admin]
password = adminpass
actions = SET
actions = FSD
instcmds = ALL
upsmon master

[monmaster]
password = masterpass
upsmon master

[monslave]
password = slavepass
upsmon slave

Then I configured the local system’s client, by adding a line to tell it which UPS on which NUT server to monitor, to /etc/nut/upsmon.conf (note, again adding this line not replacing the entire file):

MONITOR bx1400ui@localhost 1 monmaster masterpass master

Finally, I restarted the daemons to pick-up their new configuration:

systemctl restart nut-server
systemctl restart nut-monitor

NUT communicating with the UPS can be tested with the upsc command:

upsc bx1400ui@localhost

By default, NUT will signal to all clients that they should shut-down when the battery percentage remaining reaches 10% or the runtime remaining reaches 2 minutes before shutting itself down 15 seconds later. The default polling time is 5 seconds, so 15 seconds always gives the clients at least 2 check-ins in which to have seen the signal.

I have not tested this critical shut-down phase yet. I want to implement the early shut-down of the less critical systems which are UPS protected for data-protection, rather than remaining on-line during a power outage, reasons before testing the whole sequence.

The client

Like the server, this began with installing the package but only need nut-client on this one:

apt-get install nut-client

Next, I need to set the mode to network client (remember this is one line that needs changing, do not replace the entire file with this one line):

MODE=netclient

Then I add the MONITOR line to /etc/nut/upsmon.conf (adding this line, not replacing the entire file):

MONITOR bx1400ui@nut.your.domain.tld 1 monslave slavepass slave

You can use an IP address here, I have added ‘nut’ to my internal DNS in case I want to move where it lives in the future.

And then, I just need to restart the monitoring daemon:

systemctl restart nut-monitor

ReadyNAS 214 (as client)

To configure the UPS in the ReadyNAS, I logged in as an administrator user and selected the ‘Power’ tab:

Screenshot 2021-01-10 101147

At the bottom is the UPS section:

Screenshot 2021-01-10 101440

Clicking on the ‘+’ button allows me to add it. The UPS name needs to match that configured in NUT and the ReadyNAS will only accept IP addresses, not hostnames in the address field and ‘Remote UPS’ is the correct setting for a remote NUT server:

Screenshot 2021-01-10 101552

After filling it in clicking ‘Add’ a spinner appears for a few seconds then, all being well, it will change to show the configured UPS:

Screenshot 2021-01-10 101817

Monitoring

Munin

Munin comes with several plug-ins for monitoring NUT UPSs. One of them, nutups_, works with locally attached UPS. Two others, nut_misc and nut_volts will work with remote ones but require explicit configuration. In my network, I have set-up the nutups_ plug-in on the NUT server and the others on that system and my other Linux server that is UPS protected.

nutups_

This plug-in can be used to monitor voltage, charge, frequency and current however my model only supports voltage and charge. As with many Munin plug-ins, this is configured by creating symlinks to the plug-in in /etc/munin/plugins with the UPS name and monitored variable determined from the symlink name.

For my UPS, these are the symlinks I created:

ln -s /usr/share/munin/plugins/nutups_ /etc/munin/plugins/nutups_bx1400ui_charge
ln -s /usr/share/munin/plugins/nutups_ /etc/munin/plugins/nutups_bx1400ui_voltage

If you want to monitor frequency and current, you can also create those (but you will require a UPS that supports them):

ln -s /usr/share/munin/plugins/nutups_ /etc/munin/plugins/nutups_myupsname_current
ln -s /usr/share/munin/plugins/nutups_ /etc/munin/plugins/nutups_myupsname_freq

After creating these links, a restart of munin-node is called for to pick up the new plugins:

systemctl restart munin-node

nut_misc / nut_volts

These plug-ins require configuration. By default they monitor a ups called ‘bertha’ on host ‘127.0.0.1’ but as they do the monitoring over the network, without reference to a local NUT ups.conf file, I set these up to be monitored on all UPS protected systems.

First, as with all plug-ins, I created the symlinks:

ln -s /usr/share/munin/plugins/nut_volts /usr/share/munin/plugins/nut_misc /etc/munin/plugins/

Then I created a configuration file for each. The configuration file is identical for both except for the plug-in name. You can specify an alternative command for upsc as well, but that was unnecessary in my environment.

/etc/munin/plugin-conf.d/nut_volts:

[nut_volts]
env.upsname bx1400ui@nut.your.domain.tld

/etc/munin/plugin-conf.d/nut_misc:

[nut_misc]
env.upsname bx1400ui@nut.your.domain.tld

These changes can then be applied by restarting munin-node:

systemctl restart munin-node

Icinga

Like Munin, Icinga comes with a plug-in for monitoring NUT - check_ups.

I began by adding a variable to each host that is UPS protected specifying which UPS protects them. This allows for additional UPS potentially being available in the future to protect other systems. This could be expanded to multiple UPSs by turning the scalar string into a list, but as I only anticipate having one UPS to protect each system (I have none at the moment that have dual power supplies) I do not need to do this.

object Host "host.my.domain.tld" {
	/* ... */
	vars.protected_by_ups = "bx1400ui-home"
}

Then I add the services, as the plug-in is installed via a Linux distribution package, the command is already configured. Although there is one command, the documentation implies that each service will only alert on one variable, the default being to use the state (online/charging = OK, on battery = WARNING, low batter = CRITICAL). I added battery and load thresholds to the list:

apply Service "APC Back-UPS BX1400UI Status" {
  import "generic-service"

  check_command = "ups"

  vars.ups_address = "nut.my.domain.tld"
  vars.ups_name = "bx1400ui"

  assign where host.vars.protected_by_ups == "bx1400ui-home"
}

apply Service "APC Back-UPS BX1400UI Battery" {
  import "generic-service"

  check_command = "ups"

  vars.ups_address = "nut.my.domain.tld"
  vars.ups_name = "bx1400ui"
  vars.ups_variable = "BATTPCT"
  vars.ups_warning = 50
  vars.ups_critical = 20

  assign where host.vars.protected_by_ups == "bx1400ui-home"
}

apply Service "APC Back-UPS BX1400UI Load" {
  import "generic-service"

  check_command = "ups"

  vars.ups_address = "nut.my.domain.tld"
  vars.ups_name = "bx1400ui"
  vars.ups_variable = "LOADPCT"
  vars.ups_warning = 30
  vars.ups_critical = 60

  assign where host.vars.protected_by_ups == "bx1400ui-home"
}

And then trigger a reload of Icinga’s configuration:

systemctl reload icinga2