Back in August I setup fully automated shutdown of my server and NAS on power failure and the night before last we had a 2 hour power outage during which these systems failed to shutdown until the UPS reached its critical low battery state.
During post-mortem last night, I found this error in the logs:
Jul 18 01:04:32 ds9 upssched: Failed to connect to parent and failed to create parent: No such file or directory
The problem was that I had configured
upssched.conf to use a subdirectory in
/var/run/nut created and secured for the purpose:
# This sets the file name of the FIFO that will pass communications between # processes to start and stop timers. This should be set to some path where # normal users can't create the file, due to the possibility of symlinking # and other evil. PIPEFN /var/run/nut/upssched/upssched.pipe LOCKFN /var/run/nut/upssched/upssched.lock
/var/run is a ram-disk, it is cleared whenever the system is rebooted so
/var/run/nut/upssched was missing (having not been re-created by a configuration management run since the last reboot). Puppet, out of the box, would have recreated it when it next ran (as it applies the configuration every 5 minutes), if I were using it but there would still be a race condition that exists between the box starting and Puppet fixing the configuration.
I tested the configuration, by turning off the power to the UPS manually, when I set this up initially but I did not test it after rebooting and without subsequently running the SaltStack configuration management tool that would have recreated the
To correct this I moved the files to
/var/run/nut directly. When I looked on the live system, this directory is owned by user
root and group
nut with permissions
770 - assuming this is always the case, it would be no less secure than the dedicated subdirectory:
PIPEFN /var/run/nut/upssched.pipe LOCKFN /var/run/nut/upssched.lock