Backups, “backups” and volatile power

This post is about backups, specifically my backup strategy and it’s latest evolution.  Yes, that thing I suspect most of us working in technology think about, do something half-hearted about and then forget about.

We recently had a few power outages in relatively quick succession (3 in the space of a few hours) which have managed to kill, in “interesting” (read: not immediately apparent) ways, a number of items of hardware in our house.  This included our router (which eventually turned out to be the SSD with the OS on had failed) and the USB hard disk that my backups resided on.  The upshot of this was no copy of the router’s disk or the backups that should have protected from that.

Now, I have not had off-site backups for a very long time, not since my backup was small enough to store on a couple of re-writeable CD.  This was a risk that, until this happened, I accepted.  My thoughts had been a dual hardware failure (taking out both the original and backup) seemed very unlikely and the most plausible situation I thought might result in a total loss (burglary where the original machine and backup device was taken or a fire in our “office” room) was somewhat mitigated by my documents being stored in Dropbox (“other cloud sync-and-share providers are available”).  But total loss of my key machines disk and failure of the drive with the backup has rattled me, so my opinion has swayed to needed a solution which incorporates both off-site and on-site backups.

I had been toying with creating my own backup solution (ugh!), as I didn’t think there was a solution that would fit what I wanted – however I think I’ve come up with something workable using my existing software that meets all my requirements:

  • On-site and off-site copies of the backups (for instant access for restoration)
  • No remote client required – I don’t have root on all the systems I backup
  • At least 1 full copy off-site at all times (i.e. no risk of total loss if a disastrous event occurred while the off-site copy was being updated)
  • Secure (I don’t trust Dropbox sufficiently to put anything I wouldn’t be okay with becoming public in it – my accounts, for example, are held purely within our home)
  • Cost-effective (both in terms of outlay, which largely boils down to space-efficient with the backup data, but also my time to setup and manage it has a value)

I’ve long been a user of BackupPC and I really, really like the product.  Due to it’s file-level de-duplication, it is extremely space-efficient if one is backing up more than a single machine with the same operating system and a lot of thought has gone into making it an efficient and streamlined system (see how it handles compression with a custom ZLib-based format, to minimise memory overhead on inflation, for example) with no remote client to install.  The main reason I had for considering writing my own was struggling to find a solution to the off-site backup solution with BackupPC’s hard-link based de-duplication within the pool (making file-level copying of the pool, with ‘cp’ or ‘rsync’ very expensive processes.

While replacing the failed hardware, I decided to take the plunge and replace my NAS device too, which is a little over 8 years old, that reached the limit of it’s capacity some time ago (at 2TB) and cannot be expanded further.  It’s a little NetGear ReadyNas that has proven itself to be very reliable, so I replaced it with a newer generation with some bigger disks.  The new one supports creating iSCSI targets, as well as the usual myriad of file-based access methods, so I have created a new BackupPC iSCSI LUN which I’ve partitioned and mounted an ext4 filesystem on my Debian box for BackupPC to use.  While musing about this, and the single point of failure I introduce by using the NAS shares for file-storage and the iSCSI Lun for backups on the same device, this solution popped into my head over lunch today:

What I plan to do is convert the plain partition on the iSCSI target to a LVM-logical volume.  I can then snapshot this (the ReadyNas supports manual snapshots of the iSCSI Lun but this appears to have to be initiated by hand through the web-interface, making it hard to script), use ‘dd’ to duplicate the point-in-time snapshot to a USB disk of the same capacity as the Lun for off-site backup without the overhead of cp/rsync having to track all the hard-links in memory.  By rotating 2 disks for the off-site backups I meet my “at least one full copy off-site at any one time” requirement.  Thanks to reducing disk costs and increased density, having 2 USB disks for off-site backup is now doable for under £100 and in a 2.5″ form factor, which wasn’t possible when I setup BackupPC last.

 

Accessing Debian Sid software from stable

So, I needed to access a file which had been created on my old Mac laptop using a newer version of an open-source application (installed via Homebrew on the Mac) than was currently packaged in Debian stable.

To my mind, I had 3 obvious choices:

  1. Download from source, build by hand and use it
  2. Download the Debian source package, rebuild it on stable, install and use it
  3. Create a Debian unstable (Sid) chroot, install it there and use it

I decided to go with option 3, which has a number of advantages over the other 2:

  • apt/the developers have already handled any dependencies (and version differences) needed by the new version
  • I don’t pollute my root filesystem with a version other than the one packaged and tested in stable (option 2)
  • I don’t have different versions of the same software installed on the path, in /usr and /usr/local (option 1)
  • If this were more than a one-off I could use apt in the chroot to track and keep the software updated with the current version in Debian unstable
  • I can install other software in the chroot, once it’s setup, direct from the repository

Alternatives I didn’t look would be installing unstable in a virtual machine or containers. I need this for a quick and dirty one-time task (install new version, convert file to old version, throw away chroot and use version installed with stable from now on) so either of these would be more effort for me than required (writing this blog post took longer than the actual task, below!).

To get this working, firstly we need debootstrap installed:
apt-get install debootstrap

Make a directory for the chroot:
mkdir /tmp/sid-chroot

Install Debian into the chroot:
debootstrap unstable /tmp/sid-chroot http://ftp.uk.debian.org/debian/

Change into the chroot:
chroot /tmp/sid-chroot

Update software:
apt-get update

Install and use new version of software within chroot.

This was a quick-and-dirty solution to a temporary problem (once opened and saved in the older format, I can use my file with the old version).

The Debian wiki recommends making these configuration changes to a chroot, which I’ve not bothered to do (as it was going to last all of 5 minutes):

  1. Create a /usr/sbin/policy-rc.d file IN THE CHROOT so that dpkg won’t start daemons unless desired. This example prevents all daemons from being started in the chroot.

    cat > ./usr/sbin/policy-rc.d <<EOF
    #!/bin/sh
    exit 101
    EOF
    chmod a+x ./usr/sbin/policy-rc.d
  2. The ischroot command is buggy and does not detect that it is running in a chroot (685034). Several packages depend upon ischroot for determining correct behavior in a chroot and will operate incorrectly during upgrades if it is not fixed. The easiest way to fix it is to replace ischroot with the /bin/true command.

    dpkg-divert --divert /usr/bin/ischroot.debianutils --rename /usr/bin/ischroot
    ln -s /bin/true /usr/bin/ischroot

A more complete chroot solution would involve the use of schroot to manage it, which I’ve done before to get an old ruby-on-rails application working on newer versions of Debian.

“smart”quotes in Outlook on Mac

Spent ages trying to figure out how to stop Outlook 2011 converting my straight “quotes’ into “smart”(curley, UTF-8, doesn’t render well on Windows machines)-quotes.

Turns out it’s not an Outlook setting at all but a system-wide OSX setting in the Keyboard preferences which controls how all text-input boxes work:

Screen Shot 2015-02-16 at 10.34.36

Quick and dirty delete all top-level files in a directory not owned by any user with a process currently running

Handy for shared systems where you want to reap files in, for example, /tmp without affect currently running processes. There are some unhandled exceptions, such as files disappearing after the file list is built but before the file is deleted (which will throw an uncaught file not found exception) but as a quick and dirty first attempt I think it’s not too shabby.

#!/usr/bin/env python

import collections
import logging
import getopt
import os
import shutil
import sys

def usage():
        sys.stderr.write("Usage: {scriptname} [[-h | -? | --help] | [-d | --debug] path [path1 [path2...]]]\n".format(scriptname=sys.argv[0]))

def remove(path):
        if os.path.isdir(path):
                shutil.rmtree(path)
        else:
                os.remove(path)

def reap(directory):

        # Get a list of all files in the directory to consider for reaping, and group them by owner's uid
        user_files = collections.defaultdict(list)
        map(lambda file: user_files[os.lstat(os.path.join(directory, file)).st_uid].append(file), os.listdir(directory))

        # Get a list of users who have processes running on the box
        users_with_processes = [ os.lstat('/proc/{proc}'.format(proc=proc)).st_uid for proc in os.listdir('/proc') if proc.isdigit() ]

        # Now find the users who do not have running processes, as these are the users' whose files we are going to reap (always skip root)
        users_to_reap = [ user for user in user_files.keys() if user != 0 and user not in users_with_processes ]

        # remove the files
        if DEBUG:
                action=logging.debug
        else:
                action=remove
        map(action, [ os.path.join(directory, file) for file in [ file for user in users_to_reap for file in user_files[user] ] ])

try:
        optlist, args = getopt.getopt(sys.argv[1:], 'dh?', ['debug', 'help'])
except getopt.GetoptError as err:
        sys.stderr.write(str(err) + "\n")
        usage()
        sys.exit(2)

DEBUG=False

for opt, value in optlist:
        if opt in ('-d', '--debug'):
                DEBUG=True
        elif opt in ('-h', '-?', '--help'):
                usage()
                sys.exit()
        else:
                sys.stderr.write("Unhandled option: {opt}\n".format(opt=opt))
                usage()
                sys.exit(2)

if len(args) == 0:
        usage()
        sys.exit(1)

if DEBUG:
        logging.basicConfig(level=logging.DEBUG)

for dir in args:
        reap(dir)

Preventing git commits as root

This is a quick pre-commit git hook to prevent committing annonymously as root – it refuses to allow root to commit directly and insists that –author is given if a user is commmitting via sudo.


#!/bin/bash

abort=0

# Check that if committing as root the author is set, as far as possible.
if [ "$UID" -eq 0 ]
then
echo "Warning: Committing as root." >&2
if [ -n "$SUDO_USER" ]
then
if ! echo $SUDO_COMMAND | grep -q -e '--author'
then
cat - >&2 < or (if you have committed before - see 'man git-commit'):
git commit --author=some_pattern_that_matches_your_name

Previous authors in repository:
$( git log --all --format='%an <%ae>' | sort -u )

EOF
abort=1
fi
else
echo "Committing as root, without using sudo. Please do not do this." >&2
abort=1
fi
fi

if [ "$abort" -ne 0 ]
then
echo -e "\n\ncommit aborted\n" >&2
exit 1
fi

Turning off GMails draconian spam filter

Okay, so there is no way to turn off GMails spam filter (or even turn it down to the point where it stops putting more legitimate emails than spam into the “Spam” folder).

To fix this behavior I have thrown together this short python script that simply moves any email found in the Spam folder into the Inbox. I used to achieve the same thing by creating a filter which told GMail not to spam any email that has an ‘@’ in the ‘From’ address but GMail has suddenly decided to start ignoring that filter so a more permanent solution is required.

#!/usr/bin/env python

import imaplib

IMAP_USER='<your_user_name>@gmail.com'
IMAP_PASSWORD='<your_password>'


if __name__ == '__main__':
  imap4 = imaplib.IMAP4_SSL('imap.gmail.com')
  imap4.login(IMAP_USER, IMAP_PASSWORD)
  imap4.select('[Gmail]/Spam')
  typ, data = imap4.search(None, 'ALL')
  for num in data[0].split():
    message_subj = imap4.fetch(num, '(BODY.PEEK[HEADER.FIELDS (SUBJECT FROM TO DATE)])')[1]
    print "Moving message '%s' from Spam to INBOX" % (', '.join(message_subj[0][1].rstrip().split("\r\n")))
    imap4.copy(num, 'INBOX')
    imap4.store(num, '+FLAGS', '\\Deleted')
  imap4.expunge()
  imap4.close()
  imap4.logout()

Automated notification of uncommitted git-controlled config changes

Yesterday I wrote about version controlling server configuration with GIT.

Inevitably I will change a file under version control but forget that it is under version control. To mitigate this I’ve thrown together this simple bash script:

#!/bin/bash

CONFIG_REPO="/root/vc"
MAILTO="spam@dev.null"

GIT_CMD="git status --porcelain"

tempfile=`mktemp`

cd "$CONFIG_REPO"
$GIT_CMD > $tempfile

if [ -s $tempfile ]
then
sendmail -oi -t <<EOF
To: $MAILTO
From: $USER@`hostname -f`
Subject: Uncommitted config files detected on `hostname`

I have detected that 'git' believes there are modified and/or added files which have not been committed on `hostname` in '$CONFIG_REPO'.

Hostname: `hostname -f`
Uname: `uname -a`
Repository location: $CONFIG_REPO

Output of '$GIT_CMD':
-----8<-------------------->8-----
`cat $tempfile`
-----8<-------------------->8-----

EOF
fi

rm $tempfile

I’ve put the script in /usr/local/sbin/check-config-git and scheduled it to run daily at midnight via cron. The idea is that if it annoys me every day, as opposed to once a week or even less frequently, I might actually be motivated into doing something about it.

Version controlling server configuration with GIT

Often I want to version control certain, usually critical, system configuration files. In the past I’ve either set this up on a directory-by-directory basis or not bothered (which results in me creating a lot of superfluous files by doing a ‘cp config_file config_file.bak_`date +%F`’ before editing).

I, with a colleague, have come up with this solution that, while not necessarily perfect, is a lot more manageable then previous alternatives I have used and has the benefits of creating a centralised repository on a given machine as well as not polluting system directories with ‘.svn’ or ‘.git’ directories. It also avoids having to play with nested repositories (i.e. directories with a ‘.git’ dir under another that also has a ‘.git’ dir), which are unlikely if you’re just version controlling /etc but more common if controlling /home/$USER. I think it’s quite neat and keeps the revision control itself away from the core system files.

  1. The first step is to create a suitable directory to host the version control. We won’t actually be storing files here but it does need to be a “normal” repository (i.e. not a bare repository) as it does represent a working copy of the repository.

    mkdir /root/vc # Calling it 'vc' for 'Version Control'
    cd /root/vc

  2. Step two is to initialise our repository:

    git init

  3. Now, and this is the clever bit, we need to configure the repository to use ‘/’ as the base of it’s working tree (so we end up with a repository, with all it’s revision control files, in /root/vc but really the files under ‘/’ is under revision control). I also excluded everything by default (so git does not list everything as not being controlled and we can cherry pick the files we actually care about). UPDATE: I’ve since found the config variable ‘staus.showUntrackedFiles’ (reading man-pages FTW!), which achieves the same end in a much saner manner.

    git config core.worktree /
    echo '*' >> .git/info/exclude
    git config status.showUntrackedFiles no

…and that’s it. Just use the ‘/root/vc’ directory as a normal git repository but with any files under ‘/’. Simple, eh?

There is a drawback, however. Since I have excluded everything, files have to be added to the repository with a ‘-f’ (force) flag:

git add -f /etc/ssh/sshd_config

This also applies when using ‘git add’ to stage modified files however ‘git commit -a’, which stages modified & deleted files and commits in one step, does not required ‘-f’.
UPDATE: This is no longer an issue using ‘status.showUntrackedFiles’ to disable showing untracked files by default. There maybe other issues with this approach but I’ve not spotted them in my (5 minutes!) of testing/experimentation.

There should be more than one way to do it in Python

Python has a philosophy of ‘There should be one– and preferably only one –obvious way to do it’ (http://www.python.org/dev/peps/pep-0020/) rather then Perl’s ‘There’s more than one way to do it’ (Programming Perl Third Edition, Larry Wall et al.). This is great, in theory – it leads to greater consistency between disparate programs and makes it easier for individual programmers to pick up someone else’s code.

The problem comes where the obvious way is not, for whatever reason, the practical way. For example, the obvious way to test if a string begins with another string is to use "string1".startswith("string2"). This is, however, significantly less performant than doing "string1"[7:] == "string2" which means when doing a large number of these tests you have to use this form, despite it not being the obvious method.

Unless you are familiar with Python’s sequence slicing syntax (http://docs.python.org/library/stdtypes.html#typesseq) I do not find "string1"[7:] to be obvious (even though I would expect most programmers to be able to hazard a, probably correct, guess to what it does), which means I would precede that line with a # Check if string1 starts with "string2". When I feel the need to comment on a specific line of code it is usually because I do not think what it is doing is sufficiently obvious, which means it violates Python’s ‘There should be one– and preferably only one –obvious way to do it’.

Just a quickie…

Spent about 20 minutes trying to figure out why this line in fstab allowed me to mount, but not unmount, a cifs share as a “normal” user:

//isolinear/documents /media/isolinear cifs user=laurence,file_mode=0644,dir_mode=0755,noauto,user 0 0

The error I was getting was:

umount: /media/isolinear mount disagrees with the fstab

It turns out that mount.cifs adds an extra ‘/’ to the end of the source mountpoint (i.e. ‘//isolinear/documents’ becomes ‘//isolinear/documents/’) causing the mismatch. Adding the extra ‘/’ in fstab meant I could mount and unmount it happily as a non-root user.

Posted in the hope that someone else might happen upon this entry when experiencing the same issue!