Automatically shutting down server and NAS
At home we have a NAS (An APC Back-UPS BX1400UI) which protects our internet connectivity equipment, core network switch, one server and network-attached (NAS) storage device. The server runs the backup tool (BackupPC) that backs up all of my local and cloud systems, with the backups stored on an iSCSI volume on the NAS (with monthly “off-site” copies).
In the event of a power outage, the server and NAS are protected to stop them going down hard - the theory being that if one of the other systems does not survive a sudden power-failure, the latest backups should be safe. As the iSCSI connection goes over the network, I protected the switch (which also keeps our PoE wireless access points alive) so just adding the router to the UPS gifts us internet access during a power outage, at least for the 30 minutes or so before the UPS reaches 10% power and everything attached shuts-down.
The biggest power drains are the server and NAS, shutting these down early will both ensure they are shutdown cleanly and extend the battery runtime. Network UPS Tools(NUT) allows customising the threshold at which devices are signalled to shutdown, however this is a server settings so all clients are signalled at the same time. This is not ideal for me - I want to shutdown some clients early (the server and NAS) and the others late (router, network switch) and to shutdown some clients in sequence (server then NAS). Other than a custom script, I did not find a clear alternative way to set this up.
Shutting down the server
The first step is to script shutting down the server. There are several ways to do this, one is to login to the server and issue a poweroff
command (which typically requires root privileges). An alternative, that I decided to go for, is to use the out-of-band management system to send an ACPI shutdown signal to the operating system.
The server is one of my HP MicroServers. I have previously setup automation to update the iLO’s SSL certificate, so I have experience working with the tools to do this which helps using them for further automation.
Setting up tools to support powering off
- Like when I setup for the SSL certificates, I created a new user (I called it
UPS
) - this time with “Virtual Power and Reset” as its only privilege. - On my router, which is where the NUT server daemon runs, I installed the
python-hpilo
Debian package. - Create a configuration file for the HP iLO, with the username and password, following the same process as for updating the iLO’s SSL certificate (I put mine in
/etc/nut/hp-ilo/
, named after the iLO hostname and given permissions so only thenut
user can read them). - Test we can get the power state of the server with the configuration file:
ILO_HOST=my-ilo.home.domain.tld ; hpilo_cli -c /etc/nut/hp-ilo/${ILO_HOST}.ini $ILO_HOS T get_host_power_status
A script to shutdown the server
Once the above is setup and tested, I wrote a little script (/usr/local/sbin/nut-shutdown-hp-microserver
) which shuts down the server and then waits a period for it to go down (or errors if the shutdown fails):
#!/bin/bash
ILO_CONFIG_PATH=/etc/nut/hp-ilo
usage() {
cat - <<EOF
Usage: $0 ilo_hostname [...]
ilo_hostname: The DNS hostname or IP address of the iLO to shutdown. Must have a corresponding configuration file in $ILO_CONFIG_PATH
EOF
}
if [[ -z $1 ]]
then
usage
exit 1
fi
# No error, unless we find one...
exit_status=0
while [[ -n $1 ]]
do
ILO_HOST="$1"
# Make sure we remove the current one from the list - we do not
# want an infinite loop.
shift
# Check for a configuration file
if [[ ! -e "$ILO_CONFIG_PATH/$ILO_HOST.ini" ]]
then
echo "No configuration file for $ILO_HOST!" >&2
exit_status=1
continue # Skip to next host
fi
# Check it is not already off...
POWER_STATE=$( hpilo_cli -c "/etc/nut/hp-ilo/${ILO_HOST}.ini" "$ILO_HOST" get_host_power_status | tail -1 )
if [[ $POWER_STATE == "OFF" ]]
then
# Host being off already is not an error, so not changing exit status
echo "Host $ILO_HOST is not on, not shutting down." >&2
continue # Skip to next host
elif [[ $POWER_STATE != "ON" ]]
then
echo "Host $ILO_HOST is in unrecognised power state $POWER_STATE!" >&2
exit_status=1
continue # Skip to next host
fi
# Trigger the shutdown
echo "Shutting down $ILO_HOST..."
hpilo_cli -c "/etc/nut/hp-ilo/${ILO_HOST}.ini" "$ILO_HOST" press_pwr_btn
# Wait for shutdown to complete
echo "Waiting for $ILO_HOST to power off..."
counter=1
okay=0
# In my testing, it took around 24s to shutdown so 90s is a generous margin
while [[ $counter -lt 90 ]]
do
sleep 1
POWER_STATE=$( hpilo_cli -c /etc/nut/hp-ilo/${ILO_HOST}.ini $ILO_HOST get_host_power_status | tail -1 )
if [[ $POWER_STATE == "ON" ]]
then
echo -n .
elif [[ $POWER_STATE == "OFF" ]]
then
echo
echo "$ILO_HOST powered off after $counter seconds."
okay=1
break
else
echo
echo "$ILO_HOST entered unknown power state $POWER_STATE after $counter seconds." >&2
exit_status=1
break
fi
counter=$(( $counter + 1 ))
done
if [[ $okay -ne 1 ]]
then
echo "Shutdown failed!" >&2
exit_status=1
fi
done
exit $exit_status
Shutting down the NAS
This is more complicated, due to the introduction of CSRF tokens in the ReadyNAS’s web interface and the lack of a programmatic method of access.
An alternative method is to enable SSH access, and (according to the internet) you can then run rnutil rn_shutdown
to shut it down. However, if you try to enable ssh in the web configuration you get a stark warning that Netgear may choose to refuse to provide warranty support if you enable SSH.
A CSRF token can be obtained from /admin/csrf.html
, for example with curl:
curl -u "admin:admin_password" -k https://readynas_hostname/admin/csrf.html
Somewhere in the webpage that is returned will be a piece of javascript with a csrf token that can be used in subsequent requests:
<script type="text/javascript">
<!--
csrfInsert("csrfpId", "some_token_here");
//-->
</script>
My first instinct was to reach for Python at this point, and use the requests and Beautiful Soup libraries to parse the file but it is very straight-forward with a bit of sed to get the token:
curl -u "admin:admin_password" -k https://readynas_hostname/admin/csrf.html | sed -n 's/csrfInsert("csrfpId", "\([^"]\+\)");/\1/p'
It can then be shutdown with this call (modified from a post in the netgear community forum):
csrf_token=$( curl -Ss -u "admin:admin_password" -k https://readynas_hostname/admin/csrf.html | sed -n 's/csrfInsert("csrfpId", "\([^"]\+\)");/\1/p' )
curl -Ss -u "admin:admin_password" -k https://readynas_hostname/dbbroker -H "crsfpid: $csrf_token" -H "X-Requested-With: XMLHttpRequest" --data '<?xml version="1.0" encoding="UTF-8"?><xs:nml xmlns:xs="http://www.netgear.com/protocol/transaction/NMLSchema-0.9" xmlns="urn:netgear:nas:readynasd" src="dpv_1445852944000" dst="nas"><xs:transaction id="njl_id_2269"><xs:custom id="njl_id_2268" name="Halt" resource-id="Shutdown" resource-type="System"><Shutdown halt="true" fsck="false"/></xs:custom></xs:transaction></xs:nml>'
A script to shutdown the ReadyNas
As with the server shutdown script, I created a configuration file (this time a netrc file) containing the username and passwords called /etc/nut/readynas.netrc
. As this is a netrc file, it inherently supports storing multiple hosts credentials in a single file so there was no need to follow the per-host style used for the server. The basic format of each line of the netrc file is machine <hostname> login <username> password <password>
, e.g. for the examples above it would be machine readynas_hostname login admin password admin_password
.
I also created a dedicated user, again called ‘UPS’, for the purpose of shutting down the system but as the ReadyNas has no fine-grained access control my only option was to make this user an admin.
The script (/usr/local/sbin/nut-shutdown-readynas
) is then relatively simple:
#!/bin/bash
NETRC_FILE=/etc/nut/readynas.netrc
usage() {
cat - <<EOF
Usage: $0 readynas_hostname [...]
readynas_hostname: The DNS hostname or IP address of the ReadyNas to shutdown.
EOF
}
if [[ -z $1 ]]
then
usage
exit 1
fi
# No error, unless we find one...
exit_status=0
while [[ -n $1 ]]
do
READYNAS_HOST="$1"
# Make sure we remove the current one from the list - we do not
# want an infinite loop.
shift
# Check it is on
if ! nc -z -w 5 $READYNAS_HOST 443
then
echo "$READYNAS_HOST appears to already be off (or not exist?)." >&2
continue # Skip to next host
fi
# Obtain csrf token
csrf_token=$( curl -Ss --netrc-file "$NETRC_FILE" -k https://$READYNAS_HOST/admin/csrf.html | sed -n 's/csrfInsert("csrfpId", "\([^"]\+\)");/\1/p' )
# Issue shutdown
result=$( curl -Ss --netrc-file "$NETRC_FILE" -k https://$READYNAS_HOST/dbbroker -H "csrfpid: $csrf_token" -H "X-Requested-With: XMLHttpRequest" --data '<?xml version="1.0" encoding="UTF-8"?><xs:nml xmlns:xs="http://www.netgear.com/protocol/transaction/NMLSchema-0.9" xmlns="urn:netgear:nas:readynasd" src="dpv_1445852944000" dst="nas"><xs:transaction id="njl_id_2269"><xs:custom id="njl_id_2268" name="Halt" resource-id="Shutdown" resource-type="System"><Shutdown halt="true" fsck="false"/></xs:custom></xs:transaction></xs:nml>' )
echo $result | grep -q '<xs:response ref-id="njl_id_2268" status="success">'
if [[ $? -ne 0 ]]
then
echo "An error occurred shutting down the ReadyNas, the response was: $result" >&2
exit_status=1
continue # Skip to next host
fi
# Wait for shutdown to complete
echo "Waiting for ReadyNas to shutdown..."
counter=1
okay=0
# In my testing, it took around 51s to shutdown so 180s is a generous margin
while [[ $counter -lt 180 ]]
do
sleep 1
if nc -z -w 5 $READYNAS_HOST 443
then
echo -n .
else
echo
echo "$READYNAS_HOST appears to have shutdown after $counter seconds (is no longer contactable on web interface)."
okay=1
break
fi
counter=$(( $counter + 1 ))
done
if [[ $okay -ne 1 ]]
then
echo "Shutdown failed!" >&2
exit_status=1
fi
done
exit $exit_status
Automatic shutdown with NUT
Having scripted shutting down the server and NAS, the final piece in the puzzle is to automatically shut them down when the power fails. NUT has quite a few different configuration options, including running a script on certain events. This can, however, cause an issue in the event of a brown-out or very short lived interruption and NUT bundles a tool, upssched
, which will smooth these out and can be configured to trigger a handler only after a certain period without power. Doing this is described in section 7.2 “The advanced approach, using upssched” of the NUT user manual.
The first step I did was to configure NUT to run upssched
when an event occurs. This is done by setting NOTIFYCMD
to be the upssched
program and setting the EXEC
flag on each event in /etc/upsmon.conf
on the master (although I presume it would work anywhere, the master is guaranteed to remain up longest - or the clients would not be able to continue monitoring it):
NOTIFYCMD /sbin/upssched
NOTIFYFLAG ONLINE SYSLOG+EXEC
NOTIFYFLAG ONBATT SYSLOG+WALL+EXEC
NOTIFYFLAG LOWBATT SYSLOG+WALL+EXEC
NOTIFYFLAG COMMOK SYSLOG+EXEC
NOTIFYFLAG COMMBAD SYSLOG+WALL+EXEC
NOTIFYFLAG REPLBATT SYSLOG+WALL+EXEC
NOTIFYFLAG NOCOMM SYSLOG+WALL+EXEC
Next I configured upssched, firstly creating a handler script (/usr/local/sbin/nut-upssched-handler
) which will shutdown my servers:
#!/bin/bash
case $1 in
onbatt)
logger -t nut-upssched-handler "The UPS has been gone for awhile - shutting down servers and NAS devices"
# Do the shutdown...
# Shutdown servers
/usr/local/sbin/nut-shutdown-hp-microserver my-servers-ilo.home.domain.tld
# Check shutdown succeeded before shutting down NAS
if [[ $? -ne 0 ]]
then
logger -p user.err -t nut-upssched-handler "Shutdown of servers failed, not proceeding to shutdown NAS"
exit 1
fi
# Shutdown NAS
/usr/local/sbin/nut-shutdown-readynas ready_nas_hostname.home.domain.tld
;;
notify-*)
event=$( echo $1 | sed 's/^notify-//; s/-/ /g' )
logger -t nut-upssched-handler "Notification for event $event triggered."
echo "UPS notification - event $event has been triggered." | mail -s "UPS $event" root@localhost
;;
*)
logger -t nut-upssched-handler "Unrecognized command: $1"
;;
esac
Then I created a secure directory for the lock and pipe files - note the comments in the manual and upssched.conf
man pages regarding the need to secure these:
mkdir -p /var/run/nut/upssched
chown -R nut:nut /var/run/nut
chmod 750 /var/run/nut/upssched
And finally created the configuration file for upssched:
CMDSCRIPT /usr/local/sbin/nut-upssched-handler
# This sets the file name of the FIFO that will pass communications between
# processes to start and stop timers. This should be set to some path where
# normal users can't create the file, due to the possibility of symlinking
# and other evil.
PIPEFN /var/run/nut/upssched/upssched.pipe
LOCKFN /var/run/nut/upssched/upssched.lock
# Trigger early shutdowns after being on battery for 15s
AT ONBATT * START-TIMER onbatt 15
AT ONLINE * CANCEL-TIMER onbatt
# Notifications
AT ONLINE * EXECUTE notify-online
AT ONBATT * EXECUTE notify-on-battery
AT LOWBATT * EXECUTE notify-low-battery
AT COMMOK * EXECUTE notify-communication-restored
AT COMMBAD * EXECUTE notify-communication-lost
AT REPLBATT * EXECUTE notify-replace-battery
AT NOCOMM * EXECUTE notify-unavailable
Testing
The final step is to test this - I tested the server and NAS shutdown scripts as I wrote them, so the only test remaining was the power-failure test. This I did by simply turning off the UPS plug at the wall - monitoring the system and mail log to ensure the notifications were logged and sent and then, after 15s, watching the server and NAS shutdown in sequence before I turned the power back on.