When restarted, Hasicorp Vault starts “sealed” and has to be unsealed to make the contents accessible. I have, on occasion, forgotten to do this (usually after a reboot for kernel update) and so I want to add checks for this situation to my monitoring. My current monitoring solution is Icinga, so I am creating a check for this.
This is another of those posts that I started and some time (measured in months) later I split up due to not having completed what I set out to do. My overall goal is to have a fully automated install that results in a system with disk encryption setup up but can be remotely unlocked and managed. The intention is that, in most circumstances, any of my systems could be reinstalled headlessly (that is, without plugging in a keyboard or monitor).
While browsing for some information on browsing down, I found some useful resources from the National Cyber Security Centre; a whitepaper on Security Architecture Anti-Patterns, guidance on secure system administration and a blog post on protecting management interfaces (which focuses on browsing down).
I just published 33 new blog posts. Nothing more to say really, other then normal service will hopefully now resume. I have decided that I will adopt a new policy of dating posts when they are published, rather than when I start writing them…
In a previous post I described copy changes from my air-gapped home lab back upstream by copying the entire tar of the repository back and then pushing the changes. While this works and is fine for small repositories, it is highly inefficient for small changes to large repositories. I wrote the last post in full knowledge there would be a more efficient way, which this post documents. This is useful for other situations where there is some level of network isolation (but not necessarily a full air-gap), such as packing changes to move them via a jump host to another network, for pushing to a remote source.
A quick and dirty way to generate a password on a Linux box:
tr -dc '[:print:]' < /dev/urandom | head -c32
This post begins with a rant about Virgin Media ignoring their own contract and cutting us off 17 days before they told us our services would end (just 13 days after we gave the contractual “30 days notice” to leave). It follows with setting up a DrayTek Vigor 130 VDSL2/ADSL modem with a Linux router for Sky’s fibre-to-the-cabinet (FTTC) broadband service.
This post is about deploying custom certificate authority (CA) certificates onto Linux hosts, from an anonymous Windows share, then deploying them to be used by web-browsers (which seem to use their own CA stores these days). There are two scripts, one for each of these tasks, as installing to the system store usually requires super-user (i.e.
root) access but installing to the browser stores is per-user (and should not be done using the super-user account).
On my systems at home I use Icinga2 to monitor health, adding new checks as and when I identify something I think needs checking or if a failure occurs that was not detected. Sometimes it is necessary to do some checks via other means, such as SLURM’s healthcheck program so it can be useful to have checks in script form. On previous systems, we have used the Nagios plugins that Icinga uses to minimise the maintenance overhead of have duplicated tests. The script will be written in bash and minimise dependencies on non-Coreutils files to try and keep it portable to different distributions.
In order to write some portable health-checking scripts, I needed to reliably detect both distribution (or at least distribution family) and version (e.g. to cope with
/usr/binin Red Hat 7). I based this on my previous Lua distribution detection script for Lmod. In contrast to the Lmod script, I am not interested in any CPU/architecture detection (at present). I only need this for Red Hat Enterprise Linux family (including CentOS, Scientific Linux and Rocky), Ubuntu and Debian distributions. Adding others would be trivial, it is just a case of finding their
lsb_releasereturn values and an appropriate fallback (file) method.
You may have noticed there appears to have been no posts since one on October 30th until one on 5th April. This is not because I have not written any, and a flurry (well, 8 including this one) of new posts have appeared in the last few days, beginning with that one on the 5th April. These are ones that I had “written” in my head but not yet committed to actual posts; the Bank Holiday in the UK, slight reprieve from day-job work in the evenings leading up to it, and my wife abandoning me to my own devices today has given me an opportunity to catch up. There also a further 28 posts in draft, which I have written over the last 6 months, that will hopefully land on the live blog soon. They have not yet been published because many of them link together, or cross-reference each other, in such a way that it will be difficult to publish individual ones until they are finished. Sorry about that.
This post is about automating, end-to-end, reinstalling a booted Rocky Linux system. The automation is done using Ansible but Puppet is used to configure the hosts, so Ansible orchestrates Puppet too. This is closely related to my recent posts on kickstart graphical feedback (tentatively related to the Puppet piece) and generating custom install isos (for complete hands-off reinstalls). All of the systems involved are bound to a Microsoft Active Directory [AD], so common credentials can be used and the computer that is being reinstalled will need its computer object resetting to rebind automatically. The local ssh host key cache is also updated after the reinstall is finished too, do avoid “the host key has changed” warnings post re-install.
This was something I needed in th
%preof my kickstart, but its likely to be generally handy so noting the recipe for future reference.
This post documents an Ansible process for building custom Rocky Linux (but should work for an Red Hat Enterprise Linux compatible distribution) installation isos, set to automatically kickstart with optional embedded kickstart within the ISO image and host-specific kickstart generation.
I discovered, when I attempted to create my first Linux VM, Hyper-V not only has secure boot enabled (which is a good thing), it is set to only allow Microsoft Windows to boot (which is nasty). To restore standard secure boot behaviour, which permits any binary signed by the Microsoft UEFI CA to boot, the “Template” must be changed to
Microsoft UEFI Certificate Authority(from the default
Microsoft Windows). Once this is done, suitably signed (by Microsoft’s chain of trust) bootloaders will start.
This post is about something I created and am genuine quite proud of. It creates a graphical progress bar to feedback on the progress of the
%postsection(s) in a kickstart installation script. I have used it with Red Hat and Rocky Linux distributions. It is an alternative to forcing a text install just to be able to programmatically
chvtto the log in order to display custom script progress. It might seem simple but it took a lot of work and testing to get to a reliable solution.
VirtualBox (my preferred desktop virtualisation tool) and Hyper-V conflict, but Hyper-V is required for some software and features to work correctly on Windows 11. In particular the wonderful Windows Subsystem for Linux (WSL) version 2 (and I specifically mean “WSL2” is wonderful, the original WSL wasn’t great) uses a Hyper-V VM. Following on from my post on managing Windows with Ansible and customising the taskbar and startup programs with Ansible, I will be Ansibleising installing Hyper-V (and as such, this will be a short post - Ansible makes things like this really easy).
Out of the box, Windows 11 shows a Chat icon in the taskbar for Microsoft Teams, a search box (duplicating the search function in the start menu) and launches the “home” version of Teams on startup. I do not use the home version of Teams, want to hide it and save teh taskbar space taken up by the search box.
Back in 2021, I added a check for the latest NextCloud version to Icinga based on a forum post from 2019 that suggested scraping the version from a file their GitHub website repository. Today I was looking at something on Icinga and it occurred to me I have not installed any NextCloud updates for a while, a quick check and I determined that the latest version in that file stalled at
23.0.3(released 21 March 2022, now unsupported and not the last of the
23.0.12is the last) and I am now 3 major versions behind - the current release is
26.0.0on 21 March 2023.
Just a note to say I pushed the red button and disabled my Twitter account today. In 30 days it will be deleted forever, according to the message. The final straw turned out to be the change to only allowing paying users will be able to partake in polls and shown to non-followers in feeds.
I tried to resize my display and discovered the mode I wanted was not available. Interestingly, the current mode was also not listed but that is a separate issue. Adding a new mode is not as straight-forward as I had hoped, so it is worth making a note (this post) of how to do it.
To replicate the Proxmox cluster I am (very slowly) building in my home lab, I have bought a set of 5 HP EliteDesk 800 G2 mini systems. They will be replacing my HP MicroServer and this post documents updating the UEFI firmware to version 2.60 on them.
In preparation for publishing my post about setting up a USB RAID box I wanted to strip the metadata (mainly GPS location) from the photos I took while setting it up, before publishing them publicly on this blog. Fortunately ImageMagick makes this easy.
This is a a review I left for The Spread in Hereford on Facebook. I’ve become a little nervous about the longevity of my social media posts, as I contemplate closing my Twitter account, so I decided to reproduce it here. Who knows, maybe it’s the start of me writing more reviews on my blog? I have expanded it slightly to be more of a “blog-first” critical review, than a quick “had a great time, would come back” Facebook one.
During the week, I have decided to replace the old HP Microservers I bought during 2020 with 5 HP EliteDesk 800 G2 minis mirroring the Lenovo M72e cluster I bought at in February of 2022 to build a home-lab, but admittedly still have not finished setting up. As the micro-server has 4 3.5” drive bays, in which my backup disks are currently housed, moving to a ultra small form factor with no drive bays presents a challenge. I found the Icy Bay IB-RD3620SU3 online which allows me to connect the drives via USB 3 or eSATA. It also does RAID internally, offloading it from the host system and reducing the amount of data that needs to be sent over the USB/eSATA connection (c.f. software-raid on the host).
About a month and half after setting up a proper secrets store with HashiCorp Vault I am able to pick up, crossing off a number of roles migrated from SaltStack in the process. This is with no secrets stored in the code, unlike my Salt configuration which had secrets in the pillar data. With the infrastructure setup and linked to Ansible, I am able to continue with finishing this migration task.
One of the things that has been bugging me, and itch to be scratched if you will, is that I have been changing the inventory to set which host uses the
localconnection type. On my live network I have been running
ansible-playbookon my laptop and in the lab environment on a “desktop” with different hardware and hostname. The lab host, specifically, has no
sshdaemon running so does not work at all if the connection type is not local.
Following on from my NAS disaster recovery post, I decided to fork out for some new hard disks and move the volume I backup to from the NAS to being internal to the server doing the backups. I had been considering this for performance reasons but as the NAS is also backed up, it would have aided recovery to not have to fetch the off-site backups. The flip-side of this option is that it means if the server fails I lose access to the backups - so either way I have a single point of failure.
Last night my ReadyNas suddenly went read-only with the ominous sounding log message
Jan 30, 2023 16:54:47 Volume: The volume data encountered an error and was made read-only. It is recommended to backup your data.- it looks like it ran out of RAM which ultimately caused BTRFS corruption. Despite 3 hours of attempting to recover, I was unable to repair the underlying BTRFS filesystem, so decided the only option was to reset the NAS (which would recreate the filesystem) and execute the disaster recovery (DR) plan.
I have played with using SaltStack to manage Windows but did not follow through with using it properly. I recently installed a NUC10i7FNH and decided to revisit managing Windows, this time with Ansible, rather than doing most things on Windows manually.
Welcome to the fifth post in the rabbit-hole I disappeared down, trying to add a bastion host to my network for Ansible after trying to get started with Ansible for managing iPXE configuration in October. This post is about deploying HashiCorp Vault for secret management, beginning with the PKI certificates for Icinga monitoring.
During the last few days I have been upgrading all of my Debian systems from 10 (Buster) to 11 (Bullseye). In contrast to my many previous upgrades, this went less than smoothly (some of which was down to me rushing) and I encountered a few issues - this post serves just as a note of all of the issues and their resolution. My earlier post on checking Debian update readiness with Ansible is also related to this set of upgrades.
Post about adding tasks to Ansible to print information about the current state of updates across a (Debian) landscape. This is Debian specific as I do not have anything else in my production environment. On some level, this belongs in the monitoring tools (currently Icinga2 in my infrastructure) however when orchestrating infrastructure it can be handy to refer to from, and as a check that everything can be orchestrated by, the orchestration tool (Ansible, in this case). It was done to help work out the landscape during upgrading to Debian 11 (Bullseye)
Quick post noting the bash command to mount a FAT formatted USB (or any other) drive, with the mount owned by the current user,
lsblkis your friend for finding the device for the drive.
This post is the fourth in the chain of posts starting with trying to get started with Ansible for managing my own infrastructure in October, heading down a rabbit-hole by trying to work around Ansible not playing nicely with 2 factor sudo authentication. It is the last in a series of three posts that I split out from the second in the series on the 2nd January 2023 and is the blog content I added yesterday explaining taking the monitoring role from SaltStack to Ansible.
This post continues the chain of posts from trying to get started with Ansible for managing my own infrastructure in October and working around Ansible not playing nicely with 2 factor sudo authentication. It is one of three posts that I split out from the second in the series on the 2nd January 2023 and is the blog content I added around the 13th December 2022 - describing bootstrapping the monitoring server and migrating the first role from SaltStack to Ansible.
As I put a new, very large, SSD in my laptop I decided it would be convenient (for playing with virtual machines etc.) to have a sub-set of my mirrors on there. This post explains how I took my latest mirroring scripts and made it flex to do this.
Out of the box AwesomeWM does not provide a mechanism for automatically starting applications when it starts. I have a small piece of code in my configuration file to do this however it is sub-optimal, not least because (despite using
awful.spawn.onceit relaunches all of the applications on restart. Instead I applied a snippet from Arch Linux’s Wiki’s page on AwesomeWM that autostarts applications using the XDG Autostart Specification.
Sometimes one stumbles across a command that they probably should have known about for a while. Today, for me, that command is
namei‘follow a pathname until a terminal point is found’. As it says, one of the really nice things it can do is resolve chains of symlinks showing each link in the chain.
In order to install Windows 11, my laptop’s Trusted Platform Module firmware needs upgrading to support TPM 2.0. Dell provide a firmware update for some models, including mine. However installing it is complicated by Windows 10, which it is currently running, re-initialising the TPM on shutdown; breaking the firmware update process, which detects the TPM has data and aborts the update when it goes to apply it on next reboot.
While trying to get started with Ansible I found Ansible really doesn’t like 2 factor sudo authentication. After trying, and failing, to write a new
becomeplugin that merges the behaviour of the
sudoplugin (use sudo) and
suplugin (recognise other password prompts) I decided to try a different approach. Instead, I decided to setup a bastion/jumphost that will require 2 factors to login to and then fewer (i.e. just a password, or two (different) passwords or a certificate and a password) to login and become root on my systems. The bastion can be used as an ssh proxy host and this method is documented by Ansible although, for 2 factor logins to the bastion, we will need to pre-login and configure connection multiplexing (referenced from another post on using a bastion with Ansible) to reuse the connection without re-authenticating.
This post ws going to be about migrating my iPXE configuration from being managed by SaltStack to Ansible. It was supposed to be the start of migrating all of my configuration management over to simplify bare-metal DR and managing test/development infrastructure without first deploying a master, or control, server. Starting with iPXE is motivated by needing to update the configuration following my lab experiment, moving it from duplicating the kernel and initial ramdisks on the tftp server to fetching them directly from a mirror. However, things turned out not to be so simple and I ended up restructuring the network and migrating some other configurations before getting back to iPXE.
It has been a while since I transferred CDs to an audio player. As a result I have a number of CDs bought in the last few years that I need to copy to my recently purchased Sony NW-A45 music player. It seems that the tool I used 16 years ago, A Better CD Encoder or
abcde, is still going strong so I stuck with it.
As I was trying to install a recent version of Ansible in my air-gapped home lab network I discovered that mirroring Python packages, e.g. from PyPi, is quite difficult despite pip’s download option. The
--abioptions are supposed to allow downloading for another platform but finding the right combination of options was tricky - in particular the
cryptographypackage (which often causes me issues) would not download for Debian Bullseye on either Buster or macOS with any combinations I tried. In the end, I resorted to the approach I adopted for Gentoo, using a Docker container to download for the platform inside the container.
About a month ago I bought a Sony NW-A45 music player from eBay to replace my failed 160GB iPod Classic (2007) whose hard disk failed on 21 June 2018. My main reason for picking this replacement was its battery life - I struggled to find any portable music players on the market that come close to the 40hrs playtime of my iPod, despite 12 years of progress. I discovered that the reason some of my album artwork was not appearing on it was due to it not supporting progressive JPEG images so I had to re-tag the embedded cover art in a few of my FLAC files.
I mentioned in my targeted PXE booting post that I might migrate from maintaining copies of the kernel and initial ramdisks on the tftp server to asking iPXE to fetch them directly. This has been brought to the fore after I updated the mirror in my home lab environment and the installer no longer starts due to not being able to load its kernel modules. When I fetched the image from the mirror and compared its checksum to the one in my configuration management system I saw it had changed.
Not so hot-on-the-heals of my 2021 post in enabling spell checking in VS Code, I was frustrated that I needed to know how many words were in a Markdown file today. Fortunately there’s a plugin for that.
This post could be subtitled “down the rabbit hole”. I needed to reboot my router (kernel update) and, when I set about doing so, found that a backup job was still running 4 hours after it had started. Looking into why this was the case, I found the backups were taking on average 5 hours. Looking at the size of the backup it seemed a little on the large size for what the box does so I set about seeing whether this could be reduced which led to find some missing configurations…
Quick post, using
findto list all files modified since a particular timestamp (e.g. since the last backup) in a folder:
Following on from my previous post on PXE booting Debian Installer with network-console (SSH access), I wanted to take this a stage further an fully automate deployments but in a very targeted way. My idea is that some hosts (e.g. Proxmox nodes) should reinstall and, ultimately, re-add themselves to the cluster automatically where as others should continue to just boot into the interactive Debian installer by default. Fortunately this is very easy to achieve with iPXE.
Back in 2020, I used the Debian installer’s pre-seed capability to automatically build KVM-based VMs. In my new lab network I have 10 systems that are headless, attached to managed PDUs (so can be remotely power cycled) but have no integrated console or KVM attached. Debian Installer has a network console feature, which allows remote installs over SSH. This post describes automating the setup of that so a network-booted host in the lab network will, by default, boot into the Debian Installer for the current stable distribution ready for me to remotely SSH in and complete the install.
This post describes setting up client-specific wireless network keys (private pre-shared keys or “PPSKs”) and per-client VLAN settings on a single wireless SSID, using FreeRADIUS to provide the key and vlan information.
Just over 5 months since I bought them, and 2 months since my last blog post it appears, I have finally got around to configuring the APC AP7920 PDUs I reset in April.
In addition to my computer backups I have a large cache of static files on my NAS. Some of these files are very large, the files never change, are relatively rarely added to and are all retrievable from elsewhere (either by re-downloading from the internet or re-copying from a physical disk). Backing them up is more a convenience to avoid recreating the cache from scratch, rather than it being a catastrophe if they were lost, so I chose to create a single off-site copy on some external disks (3 of them, to accommodate all of the files at a sensible size/price point for the external drives) with
rsync. This is rather than backing them up by adding more storage to the NAS to increase the size of the local backup volume to accommodate them, which would in turn necessitate buying larger off-site disks.
It has been a very long time since I install Gentoo Linux on anything, so just noting in a blog post what I did. As always, the basic process is RTFM in the form of the Gentoo Handbook. In this post, I got as far as completing what I will call the “modern stage 1” (i.e. the process for rebuilding everything with an optimised toolchain, which used to be a “stage 1” install but became a “stage 3 with bootstrap.sh script” around 2010) but not configuring/building/installing the kernel and bootloader. Unfortunately I ran out of time to play with this and went back to Debian on the system, in order to get some other work done.
Back in August I setup fully automated shutdown of my server and NAS on power failure and the night before last we had a 2 hour power outage during which these systems failed to shutdown until the UPS reached its critical low battery state.
If you are behind a proxy that breaks SSL (basically it does a man-in-the-middle attack, hopefully with your consent, typically for deep inspection) you may need to tell
pipto use the system certificate store (presuming that trusts the proxy’s certificate), rather than its embedded one, via the
The number of things being mirrored since I initially setup my mirrors in the home-lab network has grown and the current bash script has become a bit cumbersome to maintain. This post describes the process of replacing the current script and then extending it to mirror more things, including Gentoo and Git repositories - something that took nearly 4 weeks (bearing in mind this is a hobby-project, fitting it in around work and home life). Strap in, this is going to be a long post….
I noticed recently that my blog(not that you need a link, if you are reading it!) has stopped automatically updating when changes are merged into the main branch. On investigation I found that Gitea has stopped including the secret in the payload (which is very good, from a security point of view) and instead now hashes the payload along with the secret and puts that value in the
When I was taught typing at school, we were made to type sentences with two spaces after the full-stop. Opinion has historically been split on this practice but most sources these days are clear that one space is now the only correct option (see 1st source below for a study that asserts 2 spaces promotes faster reading). Recently I have annoyed people at work reviewing my documents, as they have to replace my habitual two spaces with one. The problem from my end is that for nearly 30 years, most of which I have been touch-typing, I have consistently used two spaces. That’s a lot of muscle-memory to overcome!
Until now I have been using SaltStack to apply configuration, although in some cases that means removing default settings. In my new home lab I have deployed systems by doing bare-metal restores from live-system backups. Predominantly due to hardware differences, there are some difficulties that require undoing configurations SaltStack applies to the live systems to correct. I think of this as “anti-configuration-management”.
Up until now I have been managing APT repositories individually as individual states in SaltStack (there is an example in one of my previous posts). As the number of 3rd party repositories and combinations of components (
non-freein the Debian default repositories) has grown this has become a little unwieldy and pushes some specific configuration into the state that could be moved to pillar.
When confronted with a QR code and no idea what it contains (or to verify it contains what is claimed), one might want to decode it. This can be accomplished easily using ZBar bar code reader.
In my original post on setting up my off-site backup, my notes on setting up the off-site disks are a bit sparse. With my home-lab now in the mix, I have decided to add another removable disk to the backup rotation so one can be attached to the lab in order to do restores from the live environment’s backups to refresh the lab’s state from live and test bare-metal disaster recovery in the lab. I originally had 2 disks, so one was always “off-site” even during updates to the other - now I will have at least 1 (usually 2) “off-site”, 1 in the lab and up-to 1 (usually 0) being updated. I will keep the newest and oldest off-site and the one in between will be in the lab, rotating them after each time the oldest is refreshed to become the new newest copy.
In my air-gapped Lab I have been making some changes to my configuration management configuration that I now need to extract and push back upstream. Since my air-gapping is a defensive measure against accidentally interfering with my live network, rather than any concern about data security/privacy this is just a case of using a USB drive to export and then push the commits.
Bash script which can be sourced to configure
no_proxyenvironment variables (used by most internet-capable Linux applications) for a specific proxy. Prompts for domain (defaults to the local domain if machine is domain joined through realmd), username (defaults to local username) and password, although it then puts it into the environment variable so retrievable by anyone with access to the subsequent environment. As a superficial level of security, it exports an
envbash function that wraps the usual command to redact the password.
After restoring the configuration management server I needed to start making some changes specific to the lab environment (initially the MAC addresses of the hosts for the DHCP server). I also want to give the lab a separate domain, like I did when testing Proxmox.
After restoring the router, the next step I needed to complete was to restore my configuration management server in order to start orchestrating the rest. On the live network this resides in a virtual machine on my HP Microserver. This means I need to restore the host and the VM to get this up and running.
Following on from getting my DR “off-site” backup available to restore from, from scratch I restored the first machine from that backup, the router (to get DNS and DHCP up and running). After this, I can start deploying other bits.
For my new home lab, I have bought a couple of used APC AP7920 switched PDUs. These are ideal for me as they take a 13-Amp C14 input (often known as a “kettle lead”, colloquially) and provide a maximum of 10A across 8 C13 output sockets.
As I was preparing my new lab environment I found that my recently bought Lenovo M72e machines had a variety of different settings and BIOS levels, including alarm-based scheduled wake-ups so for consistency I wanted to update the BIOS on them and reset to known settings. This turned out to be very difficult, and I am far from alone in this opinion with forum quotes like “Lenovo has turned into a total mess. I miss IBM.” and “Probably my last Lenovo… never experienced such absurd bios issue with my other tiny pc, which is HP EliteDesk 800 G2 Mini”.
As part of working on my new home lab I moved my old core switch, a TP-Link T1600G-28PS to replace the aged Cisco switches that I was using temporarily. Now I have the same make and generation of switch in the lab as my main network, I can start replicating my core network in the lab for testing and development. I am not sure I have ever bootstrapped a network (bearing in mind this lab is air-gapped) before - I have always started with some sort of router or other existing infrastructure (i.e. some form of DHCP server and DNS) I have migrated away from.
In January I had a problem due to having filled my core network switch. Since then I have deployed a very old Cisco switch into my air-gapped home lab which is also less than ideal as I have very different networking hardware on my main network and the lab. To make it easier to do meaningful testing and solve the capacity problem I have bought a used TP-Link T1600G-52PS switch to replace my existing T1600G-28PS. This post is concerned with configuring the new switch to be the same as the old, then swapping it and the old one over.
To show quick-access window for matching lines in the current file:
I was genuinely surprised how easy this was, provided your source and target systems are the same processor architecture.
10 days after being installed, my “180 day evaluation” version of Windows Server 2019 has decided it is expired and will not activate. After trying unsuccessfully with both the telephone and their new web-based offline activation system (which you get the option to be text a time-limited link for through the phone system), I rang Microsoft’s support which resulted in being accused of pirating the software twice (according to them the entire “Microsoft Evaluation Center” section of Microsoft’s website does not exist) before being told that in order to use the 180 day trial I had to first buy a full licence (no, I cannot figure out how that is supposed to work either!).
Following from setting up my first vm and turning it into a Proxmox cluster I wanted to setup Ceph as a decentralised and shared storage infrastructure which should allow more seamless migration of VMs between hosts.
I originally deployed an old Cisco Catalyst 2970 switch in my new home lab environment however it was insufficient for my needs in a number of ways, such as no being able to act as NTP or DNS server and not supporting SSH for remote management. I also had a 3560, although the model I have has 48 ports (as opposed to the 24 on the 2970 I used) and is physically much longer (both being standard 1U 19” rack-mount height and width) which I why I initially tried the smaller, lower power, 2970.
I installed 2 more Proxmox servers using the process I used to setup the first one and this post is my notes about adding them to the cluster. Note that I already created the cluster during the first node’s setup, although at the time is was a single-node “cluster”.
This post documents my first Proxmox VE host installation, configuration and the building of a Windows Domain Controller VM within my new home lab environment.
I have an old ReadyNAS Duo 2120 (confusingly it says 2120v2 on the bottom, although it is RND2000v1 generation hardware) which I have turned into a webserver to provide a mirror service for my new air gapped home lab network. This is a precursor to setting up Proxmox VE, with both read-only package mirrors and ISO repository on the NAS. It is updated via a USB hard disk which I sync on my home network then physically move to the lab environment, mount read-only to update the mirrors from.
Continuing my experimentation with Puppet I created my first class which takes parameters and sets those parameters using data in the master repository (for now).
One of the features of Sublime Text I used a lot was the snippets, where I could save a portion of text that would be substituted when I typed its trigger (in my case, I used a word followed by tab). The venerable VIM has a number of plugins that support snippets however it’s inbuilt abbreviation support can provide what I need without external dependencies.
This is a quick post - starting with Python 3.6, a
secretsmodule is included “for generating cryptographically strong random numbers suitable for managing data such as passwords, account authentication, security tokens, and related secrets”. This means we can generate good values for Django’s
SECRET_KEYsetting on any system with Python installed:
I have been using SaltStack for many years(although my current SaltStack configuration Git history goes back to 2013, so I had been using for at least a few years before the linked post). Prior to that I have had some experience using Puppet and cfengine but this was before I started using SaltStack so my Puppet knowledge is at least 10 years old.
For some external email addresses, I need my mail server to relay mail through their SMTP host in order to pass DKIM/SPF checks (otherwise my mail server is just forging the from address from the point of view of the destination system). These are the changes needed to make this work in my virtual mail setup.
This all started because GMail started blocking my mail server’s IP address. A close inspection of my mail logs showed no unusual activity, and certainly nothing spammy being sent from my system, but there is a strong recommendation to setup DKIM for email domains which I have not done yet (SPF has been in place for many years). This post documents setting up OpenDKIM in my existing virtual mail infrastructure.
Due to port exhaustion I am moving my “desktop” (quoted as it includes some laptops) systems to a secondary switch. This post describes the switch and new VLAN arrangement for these systems.
On Thursday we received our new Virgin Media equipment, including a TV360 box. Despite being physically the same box as their old V6 version (existing customers are getting TV360 pushed out as a software update) the TV360 software no longer works with existing Virgin TV remotes.
Continuing my journey of TerraForm with Microsoft’s Azure cloud, I needed to create some blob storage with NFS enabled (which currently has to be done at the storage account level and can only be turned on or off at account creation time).
Continuing from my last TerraForm post, I have split my TerraForm configuration into a number of files. I am now taking this one step further and creating a module that I can use to deploy a number of identical (or near-identical) resources following a pattern.
This post is about launching a Docker Compose orchestrated collection of containers as a system service using systemd. I provisioned a VM running Docker (it’s a loooong story but short version: Azure Kubernetes and Azure container services cannot be deployed without public IP addresses, which goes against some of our secure-by-design decisions) using Terraform and Ansible to deploy and configure it. The service it is running is a web application made up of two Docker containers and I have written a Docker Compose file that builds and runs the infrastructure.
As I mentioned at the end of yesterday’s TerraForm post, one of the two outstanding tasks with my test setup is to kick-off an initial configuration of the VM.
Following on from my last TerraForm post, my next task is to deploy a VM with the tool. This is more complex than it sounds, as a ‘VM’ requires a number of components (network interface, disk) in addition to the VM as well as me wanting to configure it to do some initial boot-strapping on first start.
Since my post on creating separate EasyBuild development environments I have encountered a piece of software I need to develop some custom EasyBlocks for. My scripts allow this, however my magic module does not initialise the platform-specific installs using a custom EasyBlock source. In order to retain the convenience of architecture-local initialisation on each architecture I modified part of my module.
Following on from my first steps with TerraForm post, this post covers the next steps. This includes external state storage and splitting up the terraform configuration.
Since setting up my Raspberry Pi to boot from a USB SSD, I need to repurpose the SSD I used to set this up. I bought a smaller 240GB SSD to replace it and wanted to copy the existing filesystem over to it rather than reinstall.
One of my current issues is that I do not notice my NextCloud install requires updating until it is no longer supported and the clients stops working with it. I already have icinga setup as my monitoring solution, so it has been on my to-do list for a while to create a plugin to check the version of my server against the latest release.
At home we have a NAS (An APC Back-UPS BX1400UI) which protects our internet connectivity equipment, core network switch, one server and network-attached (NAS) storage device. The server runs the backup tool (BackupPC) that backs up all of my local and cloud systems, with the backups stored on an iSCSI volume on the NAS (with monthly “off-site” copies).
I switched from Sublime Text, which I have used for many years, to Visual Studio Code (or ‘VS Code’ for short) mainly due to being unhappy with Sublime Text’s new licencing model (which has gone from licences for a particular major version to a subscription model, with a 3 year subscription period). One thing I have missed is Sublime Text’s built-in spell checker but there is a VS Code marketplace extension for that.
This follows on from my post on platform detection with Lmod, using that as a tool from which to create platform-specific builds of software with EasyBuild. This is a long one, as it contains the full setup for getting this running.
Following on from my getting started with Docker on CentOS post, I found that the version of Docker is very old (1.13) and, amongst other things, does not support multi-stage builds which will allow use to cut down our image size (useful, as the Azure registry charges for data transfer).
Managing software start getting interesting in High-Performance Computing (HPC) when clusters become heterogeneous. One way to manage this, using a common shared filesystem, is to allow the software management tool to detect the current platform and make the appropriate software available. This post shows how to do this with the Lmod tool.
In order for NFS identity mapping to work accross both platforms (Linux and Windows), pending either AD integration or fixing Windows NFS client integration with OpenLDAP server (neither of which is currently in place), we are currently maintaining a passwd file for Windows that maps the Windows identity to the corresponding UID/GID.
This post stated, like most of my posts, with me making notes as I went along trying to accomplish a technical task. I quickly became a bit of a rant as I encountered numerous issues with Ansible’s Azure integration.
Further to my post on querying AD with LDAP, to search for users by Windows username or email address one can use these commands:
Every month I produce usage spreadsheets for the HPC service I run. This involved much copying and pasting of data, which I already have a script to export in CSV format, into a bunch of spreadsheets for different people. In total, it takes about 7 hours (essentially a full working day of doing nothing else) to do - this post is taking some of that 7 hours to get the existing script to generate the required spreadsheets directly. This is definitely going to be a case of getting back more time than I invest.
We had a problem with our network a few months ago, in which the router seemed to be running out of capacity to track connections. In order to try and identify the cause, I came up with this snippet that will display the number of currently tracked connections per-host on the Linux router. The first grep limits it to hosts that start ‘192.168.’, which is specific to the part of my network I was interested in:
I keep getting this wrong (forgetting the
--ptyoption), so all (Slurm users) recite together now:
This post describes the process of enabling boot-from-USB and then using a USB disk for booting my Raspberry PI 4. It was made from notes I jotted down when I did this, over a week ago, so I apologise if it seems a bit disjointed or bulleted.
This post documents my attempt to send traffic from specific local hosts (Sky TV boxes) via my existing VPN connection, whilst retaining direct access to my broadband connection for all other hosts as a workaround to the annoying problems Sky services have when the client is behind a carrier-level network address translation (carrier-grade NAT or CGNAT).
We have a recurring problem with disk space being exhausted on the root filesystem of a system, the root cause of which is gnome-terminal holding open file-handles to very large deleted temporary files in /tmp. I suspect there is a bug in gnome-terminal not closing the handles to its scrollback buffer (possibly only when set to unlimited scrollback, as some users have).
For many years I have run an OpenVPN server on my home router, which provides access to my home network remotely when I am away from home and a private tunnel to my cloud server through which most monitoring and all command-and-control (via SaltStack at the time of writing) take place.
I have been planning to migrate my monitoring to my Raspberry PI 4, from its current location on a VM which is fine unless there is a fault with the VM or the host but this has happened (today in fact). It monitors these hosts and it is helpful to know what is going on if there is a problem. My intention is to also shutdown the VM host in the event of a power outage (to extend UPS battery runtime) and it would be nice for monitoring to continue during the outage.
For the first time in quite a few years I have been really suffering with hay-fever. The glorious sunshine during peak grass season (which is what causes mine) season has sent grass-pollen counts through the roof. This is coupled with me walking our two dogs during peak pollen times, in the morning and evenings (when the weather is cooler for the dogs) but I have been thinking about whether other factors are at play in my unusually high level of suffering this year. These are my entirely unscientific, layman, musings.
Because I keep ending up looking at my old scripts to refresh myself on how to do this by hand, here’s the recipe for querying AD with ldapsearch.
Decrypt a VNC password file in one-line with nothing but openssl:
Quick and dirty password generator
Quick and dirty, how to change the windows password from the command line in an insecure fashion (exposes password on the command-line).
Linux has a number of SCSI drivers, many devices are managed by their own driver as well as the
sggeneric driver. Some tools need access to both devices provided by the specific driver (e.g.
srfor CD/DVD drives) and the generic one - requiring knowledge of which device files correspond to the same physical device to pass through to, e.g., Docker containers.
Since at least July 2013 (the initial commit in my current Salt configuration repository at home) I have been using SaltStack to orchestrate and configure my systems, including VMs, at home. In the last few years Ansible has grown in popularity and I have recently been looking closely at it due to its integration with Azure and Microsoft include Ansible in their Azure documentation and Cloud Shell Platform.
One of the things we need more of in my new day-job is automation, something we were good at in my previous role. To get started with GitHub Actions I have decided to start with linting in one of my repositories, Slurm Helpers, and this post documents the journey to getting that working.
For nearly 2 years I have been using Let’s Encrypt (like half the tech world) for SSL certificates on my public-facing projects and services. I have decided to try an extend their use to my internal sites too, and do-away with running my own certificate authority except for a few niche cases (OpenVPN, for example).
Last night we had a power-cut that lasted approximately 20 minutes, about 5 minutes into which I started my usual routine of manually shutting down systems to shed load (and hence prolong runtime for our broadband infrastructure) from my UPS. At around 15 minutes into the power-cut I started to get a bit twitchy about how much runtime was left on the UPS and finally go around to doing something about it (not the ideal conditions to be setting it up, but a good motivator).
I have been using a role-base approach to managing my SaltStack states, using lists of roles in the pillar to apply those roles to systems. This has started to breakdown when I want to dynamically add roles rather than have a set list per-host (e.g. to add an operating system role to all hosts without listing them on each) due to how SaltStack merges pillar data.
As a stepping stone in migration a CentOS Linux system from local user management to Active Directory(AD) integration, I setup Kerberos authentication to the AD but retaining local identity information.
I had a need to split DNS resolution on a series of hosts between two DNS servers, one local to the environment and one for global address resolution. In the past I have always used BIND 9 for my go-to DNS server, however for this trivial forwarding tasks it seemed overkill and a good opportunity to give something lighter-weight a go. So I opted to try Dnsmasq instead.
Related to my previous post on monitoring output, I also needed to provide a script to transfer the output files from the scripts to a remote server. For the same reasons, so the researcher can maintain his own scripts, this was written in Python rather than Bash.
Related to my previous post on launching 16 scripts in tmux, I also needed to provide a script to monitor the output files from some scripts and alert if they are not updated. For the same reasons, so the researcher can maintain his own scripts, this was written in Python rather than Bash.
I was asked to help write a script to automate launching 16 scripts in a tmux session.
Since migrating BackupPC to a VM I have not been doing off-site backups (since I have been working from home full-time). Today I had to visit the office to retrieve some essential adaptors for my work laptop, and while I was there I grabbed my off-site backups disks. Now I have done this, I need to figure out how to pass the device through to my VM in order to update the oldest backup (which has not been updated since December 2019!).
As part of my migration of core services off the router to VMs I am moving the Omada controller. I have already moved the SaltStack controller and Debian pre-seed web site, but as these were straight-forward migration of daemons and data I did not write any notes about the process.
Windows 10 does not come with a telnet client (one of the commands I still have to drop into a WSL session for). Today I rectified that by enabling the optional feature.
From my May post, in which I started using PowerShell, I have been using it to do a search I would previously have used a Linux environment to use
Hot on the heals of migrating BackupPC from a bare-metal (router) system to a VM, the next set of services to migrate are my monitoring services. For this I run two seperate systems, Icinga2 and Munin. Icinga excels at monitoring and alerting to problems and faults as they happen, Munin provides resource monitoring and graphing that gives a better view of “what just happened to kill our performance?”(quoted from their website) and historic views of the same.
Now that I have deploying VMs sorted I am migrating the first service from my router onto a new VM, my BackupPC server. This should be relatively straight forward as the server set-up is managed in SaltStack and the data resides on an iSCSI volume from my NAS, do configuration and “transfer” should be simple. Should be.
Following the awful time I had setting up a working preseed configuration for Debian installer I am exploring the alternative method of pre-building the disk image using debootstrap.
Over the weekend I bought and collected 2 8th generation HPE Microservers. The two I have bought have been upgraded to 16GB of ECC memory and have had their stock processors replaced with E3-1240 v2 processors. They also came with 10GbE cards fitted, although currently I have no infrastructure to make use of these.
To use your own custom backgrounds, drop the background file(s) in
%APPDATA%\Microsoft\Teams\Backgrounds\Uploadsin cmd). They must be PNG files and will just show up in the list of backgrounds to choose from in the user interface.
I have automated monitoring (via Icinga2) of the update status of my various servers, all of which currently run Debian. To date I’ve been using cron to run a daily
apt-get update, to my mind the obvious solution to “I need it to update daily”, which updates the local package cache and then the monitoring picks up whether there’s anything to update. While investigating a problem with another cron-job yesterday, I noticed messages from apt-daily and apt-daily-upgrade services and a quick Google revealed that there is a Debian way to do this.
Yesterday I wrote about getting started with PowerShell but in order to make certain things persist, such as aliases, they need to be created each time the shell starts in the “profile” start-up script.
Over the past 6-12 months I have started using a more Microsoft-y environment on Windows, both on my own laptop and work’s. It started with using the bundled Windows OpenSSH client, removing the need for me to launch a Windows Subsystem for Linux (WSL) environment just to ssh to another system. I also been trying out Microsoft Terminal(yes, it is open source), which was released on Tuesday, and it is a really slick tool - a vast improvement on the old command-line windows and/or PuTTY. Yesterday I switched my default session from (WSL) Debian to PowerShell and have started using it as my main shell environment on Windows.
My Netgear ReadyNAS 214 has two Ethernet ports. It is used as a safe place to store my data as well has hosting an iSCSI target that all my backups are done to (with regular off-site copies made to mitigate against failure or physical damage/loss of the NAS) so some extra bandwidth would not hurt and the web user-interface provides the option to bond them. Since my managed switch also supports bonding, I have decided to do this.
For a very long time I have been running my own Linux-based routers as gateways to the internet. The configuration was setup a very long time ago and it has not been revisited, other than to update logins etc., since. I discovered today that there is now a kernel-mode PPPoE module (since 2.4, from what I can gather) that I have missed the arrival of and a more performant driver for it. As I was monitoring the route due to line-speed problems, I noted that I could see the process
pppoeappear in top hovering around the 12% CPU mark and wondered if that was expected or not. Googling this lead to these changes.
For some systems that I have user accounts on but do not administer, I use Salt SSH (agent-less salt minion) to manage the common user-local files that are managed on the systems I do administer on these systems. I have previously made notes about it.
This post is just pulling together notes I have made whilst researching eGPUs with this laptop. I currently use a Razer Core X Chroma™ with it, which works very well except for issues with USB-disconnects but this seems to be a general problem with USB devices and Windows 7 onwards.
In order to backup my PS3™, in preparation for replacing the hard disk drive with a larger SSD I have laying around my desk, I needed a FAT32 formatted USB disk. Windows 10 will only allow disks larger than 32GB to be formatted as exFAT (or NTFS), which the PS3™ does not support.
On 1st & 2nd February I was fortunate to attend FOSDEM 2020 in at Université Libre de Bruxelles, Brussels. While I was there I took part in the event’s keysigning party and now need to sign the keys I verified (only 8 days before the deadline of 30th April!).
Rawlplug’s UNO range are currently my favourite wall plug, reliable and I’m yet to have any problems with the installation of one. I struggle to find this information, particularly supported screw sizes, when I want to refer to it to here it is:
Setting up the final piece of the new network puzzle, the wireless access points…
Continuing from yesterday’s work on my new network kit, today I’m starting with configuring the new switch.
This post documents the re-introduction of VLANs to our home network, removed in 2018 to solve problems with 4-year old WAP and power-line adapters their replacement means we can move back to more secure and flexible networking set-up.
Following my last post I’ve been going through my DVD collection and fixing the forced subtitles for the films that have them.
As part of an on-going project to convert my DVD and BluRay collection to a hard-disk based media collection (mainly to save shelf-space and having to keep swapping disks) I’ve encountered a disk that has so-called “forced subtitles” to subtitle some foreign language speech into the viewers native language.
In July I replaced my Lenovo ThinkPad x240 with a Dell XPS 13 9370 I impulse bought from the reduced section of John Lewis. Last week I finally got around to installing the new SSD I bought in a Black Friday deal on-line.
Shortly before this time last year I revolutionised my backup infrastructure. This year I finally got around to scripting updating the off-site version.
I had been using my own external hard disk for backing up my work computer, however (despite being 256GB) Time Machine kept complaining it didn't have enough space. Fortunately I also have access to a 3TB network share at work, so here's how I changed my Mac to backup to there instead.
I've started a Mobile App at work using React Native. Obviously this needs testing. This post is about getting started on that process.
After much suffering from the incompitence of Virgin Media we have switched to PlusNet for our broadband provider. Thankfully FTTC has arrived in our neighbourhood so we can, at long last, get more than 300kbps estimated speed down the phone line (although we had to have a new phone line installed, as we did not already have one).
Trying to debug a missing close paragraph tag somewhere in a 300 line web page, I came across Tidy which helped with it's
-eflag to report errors and warnings.
To change an Microsoft Active Directory password on your (non-bound) Mac:
I've been playing around with different ways to get graphical applications working in Docker and in the process creates some large images.
After my previous post, on installing Docker, I needed to setup my first container. I choose to set-up a container for Firefox, first, as that required me to get a graphical application that also needed sound working.
For a while (
possiblyprobably since I installed it) I've had weird graphical glitches on resuming my laptop from deep sleep. Until now I'd settled for rebooting it to fix it but I went on the hunt for a better solution.
I'm working on digitising some of my DVD collection at the moment, which means creating some pretty large files. Trying to view these over the network to discover what they are is pretty tedious (lots of buffering!), so I've been working on dumping an "index" image file that will helpfully give me enough information to determine what it is without actually loading/playing the file.
I've been using virtual machines (via VirtualBox) for Linux-based testing and development for sometime but I've been persuaded that it's high-time I joined everyone else and started using containerisation.
Further to my earlier post on automating deployment of my blog I found a small flaw with my method and have decided to fix it.
I sometimes manually install a Debian package, whilst trying to figure out dependencies for something else, and then discover I should have installed a different "parent" package that will pull it in.
So, I've started a new blog. It think this is my 3rd or 4th Blog, although it's the first new one in over 13 years, since I switched to Wordpress in April 2006.
These notes are a bit rough because they have been copied more-or-less directly from my old wiki that was just for my own consumption. They are from when I finally set-up icinga at home. Note that the configuration has moved on substantially since these notes were written, however as it is in a git repository I have not made any more notes or blog posts about it, to date.
For many, many years (at least since 2008) I’ve been using BackupPC to provide backups of my machines, at home and in the cloud. I recently replaces my NAS with one that has a larger capacity (as part of a project to turn my DVD collection into something more convenient to browse and watch) and that NAS has iSCSI support so I moved my backup solution from a USB attached disk to an iSCSI target and introduced off-site replication of the backup pool for DR purposes.
salt-ssh provides a way to run salt remotely without it being installed on the destination system. This means, for example, that I can use it to manage my user’s dotfiles on the BlueBEAR HPC cluster.
A master-less Salt minion can be used to manage a standalone machine (e.g. my work Linux desktop) or bootstrap any master-controlled minion (or even the master itself) as the salt states will take over management of the minion’s configuration and reconfigure it appropriately on first run.
After just under 10 years flawless service my 160GB iPod Classic has died, and I’m surprised it’s the spinning rust that went (click-click-click of death) before the in-unit battery stopped getting me through multiple days continual use, at work, between charges. :(
This content is dumped, almost directly, from my old wiki and includes notes about 3 different iterations of orchestrating my own CA. From newest to oldest. It is a sister page to my notes on using an OpenSSL Certificate Authority.
Using OpenSSL as a Certificate Authority - ported from my old wiki. See also my OpenSSL certficiates in a nutshell post for client-level certficiate handling.
These notes are a little rough-and-ready, copied more or less directly from my old wiki which was only intended for my consumption.
OpenVPN setup notes - ported from old wiki (unedited).
Salt is a remote execution and configuration management tool that I have been using to manage the many Linux servers and desktops I have. Its state system also replaces some of my notes as a self-documenting machine-readable description of how each item is configured.
I have one TP-LINK TL-SG3210 JetStream 8-Port Gigabit L2 Managed Switch with 2 SFP Slots. Initially I had to connect via serial cable as it defaults to a static ip address of 192.168.0.1 which conflicts.
subscribe via RSS