Back in August I setup fully automated shutdown of my server and NAS on power failure and the night before last we had a 2 hour power outage during which these systems failed to shutdown until the UPS reached its critical low battery state.

During post-mortem last night, I found this error in the logs:

Jul 18 01:04:32 ds9 upssched[28135]: Failed to connect to parent and failed to create parent: No such file or directory

The problem was that I had configured upssched.conf to use a subdirectory in /var/run/nut created and secured for the purpose:

# This sets the file name of the FIFO that will pass communications between
# processes to start and stop timers.  This should be set to some path where
# normal users can't create the file, due to the possibility of symlinking
# and other evil.
PIPEFN /var/run/nut/upssched/upssched.pipe
LOCKFN /var/run/nut/upssched/upssched.lock

As /var/run is a ram-disk, it is cleared whenever the system is rebooted so /var/run/nut/upssched was missing (having not been re-created by a configuration management run since the last reboot). Puppet, out of the box, would have recreated it when it next ran (as it applies the configuration every 5 minutes), if I were using it but there would still be a race condition that exists between the box starting and Puppet fixing the configuration.

I tested the configuration, by turning off the power to the UPS manually, when I set this up initially but I did not test it after rebooting and without subsequently running the SaltStack configuration management tool that would have recreated the /var/run/nut/upssched directory.

To correct this I moved the files to /var/run/nut directly. When I looked on the live system, this directory is owned by user root and group nut with permissions 770 - assuming this is always the case, it would be no less secure than the dedicated subdirectory:

PIPEFN /var/run/nut/upssched.pipe
LOCKFN /var/run/nut/upssched.lock