UPS automated shutdown failure
Back in August I setup fully automated shutdown of my server and NAS on power failure and the night before last we had a 2 hour power outage during which these systems failed to shutdown until the UPS reached its critical low battery state.
During post-mortem last night, I found this error in the logs:
Jul 18 01:04:32 ds9 upssched[28135]: Failed to connect to parent and failed to create parent: No such file or directory
The problem was that I had configured upssched.conf
to use a subdirectory in /var/run/nut
created and secured for the purpose:
# This sets the file name of the FIFO that will pass communications between
# processes to start and stop timers. This should be set to some path where
# normal users can't create the file, due to the possibility of symlinking
# and other evil.
PIPEFN /var/run/nut/upssched/upssched.pipe
LOCKFN /var/run/nut/upssched/upssched.lock
As /var/run
is a ram-disk, it is cleared whenever the system is rebooted so /var/run/nut/upssched
was missing (having not been re-created by a configuration management run since the last reboot). Puppet, out of the box, would have recreated it when it next ran (as it applies the configuration every 5 minutes), if I were using it but there would still be a race condition that exists between the box starting and Puppet fixing the configuration.
I tested the configuration, by turning off the power to the UPS manually, when I set this up initially but I did not test it after rebooting and without subsequently running the SaltStack configuration management tool that would have recreated the /var/run/nut/upssched
directory.
To correct this I moved the files to /var/run/nut
directly. When I looked on the live system, this directory is owned by user root
and group nut
with permissions 770
- assuming this is always the case, it would be no less secure than the dedicated subdirectory:
PIPEFN /var/run/nut/upssched.pipe
LOCKFN /var/run/nut/upssched.lock