Adding mirrors to laptop (a.k.a. improving mirror sync 2.0)
As I put a new, very large, SSD in my laptop I decided it would be convenient (for playing with virtual machines etc.) to have a sub-set of my mirrors on there. This post explains how I took my latest mirroring scripts and made it flex to do this.
Pre-seeding the mirror
I started by copying the current mirror to my laptop, to avoid re-downloading everything. Firstly, as the root user, I created the store and group and added my user to it:
mkdir /srv/mirrors
groupadd store
gpasswd -a laurence store
chown laurence:store /srv/mirrors
chmod 2774 /srv/mirrors # sgid
apt-get install rsync
Then, as my user, I copied the existing mirror over (having mounted it at /mnt
):
# As laurence
rsync -avP /mnt/mirrors/ /srv/mirrors/
First mirror run
I started by simply changing the mirror_base_path
for my laptop. I did this initially with a lookup based on the hostname fact because using the local connection type on both systems makes it difficult to set variables via the inventory:
- name: Set mirror_base_path based on host
ansible.builtin.set_fact:
mirror_base_path: "{{ mirror_path_lookup[ansible_facts['hostname']] }}"
vars:
mirror_path_lookup:
defiant: /srv/mirrors
galaxy: /mnt/mirrors
In order to get the mirror working, I had to comment out the Gentoo and RPM based mirrors because I did not have Docker or yum-utils installed.
Fixing RPM mirrors
The yum-utils
package is not available to install on Debian Bullseye, so I used Docker to create a Rocky Linux container and use that to download RPM mirrors.
Adding the Docker repository key for Secure Apt I did following advice I found in one or two places online and not added the keys to /etc/apt/trusted.gpg.d/
, where they would be trusted for all repositories. Instead I placed the key in /usr/local/share/keyrings
, downloading and copying it like this:
curl https://download.docker.com/linux/debian/gpg | /usr/bin/gpg --dearmor --output /usr/local/share/keyrings/docker-archive-keyring.gpg
This is then used in the Docker sources.list file, which I put in /etc/apt/sources.list.d/docker.list
:
deb [signed-by=/usr/local/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/debian bullseye stable
The Dockerfile for this is very simple:
FROM rockylinux:8
# Install reposync and create repo
RUN dnf install -y yum-utils createrepo
In the roles/reposync/tasks/main.yaml
file, I first added the tasks to create the Docker container:
- name: Make temporary directory for Dockerfile
ansible.builtin.tempfile:
state: directory
register: dockerfile_tmp
- name: Copy Dockerfile to temporary path
ansible.builtin.copy:
src: Dockerfile
dest: "/Dockerfile"
- name : Build the image
community.docker.docker_image:
build:
path: ""
name: reposync
source: build
force_source: yes # Always rebuild
- name: Remove temporary path
ansible.builtin.file:
path: ""
state: absent
Then I updated the reposync
command task to use the new Docker container instead:
- name: Do reposync
community.docker.docker_container:
name: do_reposync
container_default_behavior: no_defaults # Stop warning
cleanup: yes
detach: no
# My old sync script had --downloadcomps and --download-metadata
# hard-coded - not making them variables until I need to turn them
# off.
# DNF reposync plugin seems to be downloading the system's core
# repos, even with `--config` so limit it to just those we want
# with repoid too.
command: >
dnf reposync
-y
--arch={{ architecture }}
--config=/tmp/yum.conf
--destdir=/mnt
{{ "--delete" if delete else "" }}
{{ "--newest-only" if newest_only else "" }}
--downloadcomps
--download-metadata
--repoid={{ ' --repoid='.join(repos | map(attribute='name')) }}
image: reposync
timeout: 900 # Timesout with default 60 setting
mounts:
- source: "{{ yum_conf }}"
target: /tmp/yum.conf
type: bind
read_only: yes
- source: "{{ target }}"
target: /mnt
type: bind
Finally, the createrepo
call is changed to also use the Docker container:
- name: Do createrepo
community.docker.docker_container:
name: do_createrepo
container_default_behavior: no_defaults # Stop warning
cleanup: yes
detach: no
command: >
createrepo
--update
{{ "-g" if comps_xml_stat.stat.exists else "" }}
{{ "/mnt/comps.xml" if comps_xml_stat.stat.exists else ""}}
/mnt
image: reposync
mounts:
- source: "{{ repo }}"
target: /mnt
type: bind
I also added a dependency on the docker
role in the reposync role’s meta/main.yaml
:
---
dependencies:
- role: docker
...
Making the mirror specification data orientated
In making my mirroring Ansible playbook, I violated one of my typical principals and made the specifics of what to download embedded into the roles coded into “meta roles” that depend on the others rather than data-driven via the inventory.
As a result I have these roles which are pure “meta roles”:
- centos-mirrors
- debian-mirrors
- gitlab-mirrors
- git-mirrors
- proxmox-mirrors
- rocky-mirrors
and these roles that have some logic in them (i.e. are more than purely “meta roles” in some way):
- dban
- debmirror
- docker
- gentoo-mirror
- git-bare
- pip-mirrors
- reposync
So, to turn this into a data-driven system I turn the meta roles into data in the inventory (via variables).
I started by creating an inventory and, following good practice, using role-orientated groups to apply the variables to the hosts (galaxy
being the host used to create the lab mirror, defiant
is my laptop):
---
all:
hosts:
defiant:
galaxy:
laptop_local:
hosts:
defiant:
lab_source:
hosts:
galaxy:
...
Hitting bugs
My plan was to then replace the existing list of roles with include_role
with a loop over the mirrors from the inventory data, something like this (where mirrors
is a dictionary with two elements - the mirror type (i.e. role name, e.g. reposync
or git-bare
) and arguments to pass to the role as a dict):
- name: Do mirror
ansible.builtin.include_role:
name: "{{ mirror.type }}"
vars: "{{ mirror.args }}"
loop: "{{ mirrors }}"
loop_control:
loop_var: mirror
However this does not work due to a wontfix bug in Ansible, that because vars
is evaluated early it won’t expand a dictionary in that way. There a various workarounds, such as using set_fact
to set the variables as facts or rewriting the roles to be “loop aware” and take item
as an argument. I am not keen on the latter as an exposing the loop semantics to the role (as opposed to my current approach of a role being a thing that only understands a single item, and the loop semantics are entirely contained in the fact it is called with loop
). That said, standardising the interface for a “mirror role” is probably sensible and would achieve the same end without exposing the fact it is being looped over to the role.
In “pseudo Ansible”, this is the interface I adopted for my mirroring roles:
- role:
name: some-mirror-role
source: some-role-applicable-source (url/directory/etc.)
target: target directory for the mirror
selectors: some-role-applicable-package-selector(s) (e.g. suite/distribution/architectures/etc.)
This made the argument_specs for everything but the selectors (which is highly dependent on the type of mirror) is reasonably common, although there might be some variation in the format (e.g. Gentoo has two difference sources for portage and distfiles):
---
argument_specs:
main:
short_description: Mirror some thing
options:
target:
type: str
required: true
description: Directory to mirror to
source:
type: str
required: true
description: Where to mirror from
selectors:
type: dict # In all of my use cases
required: true
description: Package selectors
options:
# This is where is specialises for the mirror method role
Working around the bugs
The next piece was to create the data to use as our source - note I also pulled the target path out, , removing that lookup from the playbook. e.g. for the laptop I created group_vars/laptop_local.yaml
(the lab_source
is simply the same but with different mirrors and mirror_base_path):
---
mirror_base_path: /srv/mirrors/
mirrors:
- type: gentoo-mirror
target: "{{ mirror_base_path }}/gentoo-mirror"
source: {} # Currently unused by the role but required for interface
selectors:
profile: default/linux/amd64/17.1/no-multilib/systemd
make.conf-vars:
USE:
- X
- alsa
# - doc
VIDEO_CARDS:
- fbdev
- vesa
- intel
package:
use:
00cpu-flags: '*/* CPU_FLAGS_X86: avx f10c mmx mmxext pclmul popcnt sse sse2 sse3 sse4_1 sse4_2 ssse3'
sys-boot: 'sys-boot/grub:2 device_mapper'
packages:
- '@world' # All the stage3 pre-built base packages
- app-editors/vim
- app-portage/cpuid2cpuflags
- dev-vcs/git
- net-misc/rsync
- sys-boot/grub
- sys-kernel/dracut
- sys-kernel/gentoo-sources
- sys-fs/cryptsetup
- sys-fs/e2fsprogs
- sys-fs/lvm2
...
I modified my existing gentoo-mirror
role to use this variable arrangement, instead of the old one.
As these are run against localhost (currently), I needed to add a task that sets the fact for localhost
, using the hostname
gathered fact to identify which host it is actually running on:
- name: Set mirror facts based on hostname fact, as ansible only has 'localhost'
ansible.builtin.set_fact:
mirrors: "{{ hostvars[ansible_facts['hostname']].mirrors }}"
mirror_base_path: "{{ hostvars[ansible_facts['hostname']].mirror_base_path }}"
The git_deploy_credentials
variable that is used to provide credentials to clone my self-hosted git repositories I moved to group_vars/all.yaml
as they are common to everything (these credentials only have read access, but even so they should really be looked up rather than hard-coded).
My, finally working, generic “do mirror” task now looks like this:
- name: Do mirror
ansible.builtin.include_role:
name: "{{ mirror.type }}"
vars:
source: "{{ mirror.source }}"
target: "{{ mirror.target }}"
selectors: "{{ mirror.selectors }}"
loop: "{{ mirrors }}"
loop_control:
loop_var: mirror
Adding Debian
The argument spec for the debmirror
role, in the new arrangement, turned out to be more complicated than I anticipated:
---
argument_specs:
main:
short_description: Main entry point for mirroring a repository with debmirror
options:
target:
description: Locations to download to
type: dict
required: true
options:
directory:
type: str
required: true
description: Directory to mirror to
keyring_directory:
type: str
required: true
description: Directory to download keys and store keyring in (must be outside target or debmirror will delete it)
source:
description: Where to mirror from
type: dict
options:
host:
type: str
default: ftp.debian.org
description: Hostname to mirror from
root:
type: str
default: debian
description: Location on the remote host that is the root of the Debian archive
keys:
type: list
elements: dict
options:
name:
type: str
required: true
description: Name of key (will be used for download filename)
url:
type: str
requires: true
description: Where to fetch the key from
checksum:
type: str
description: Optional checksum for the downloaded key
fingerprint:
type: str
description: Optional fingerprint of the downloaded key
method:
type: str
default: rsync
choices:
- ftp
- http
- https
- rsync
description: Method to use to download (rsync/http)
selectors:
type: dict
required: yes
options:
suites:
type: list
elements: str
required: true
description: The list of suites to mirror
components:
type: list
elements: str
default: ['main']
description: The list of components to mirror
architectures:
type: list
elements: str
default: ['amd64']
description: The list of architectures to mirror
rsync_extras:
type: list
elements: str
default: ['trace']
choices:
- doc
- indices
- tools
- trace
- none
description: Additional files to mirror (with rsync) that are not part of the archive itself
installers:
type: bool
default: false
description: Fetch the Debian installers (that match the architectures and suites requested)
...
Again, I then modified the role to use this instead of the previous structure for the same argument - the added challenge here was that because some of the options were no longer top-level, I had to use | default('foo')
to default values to foo
(for example) instead of specifying them in defaults/main.yaml
in the role.
Once migrated, I added debian
and debian-security
mirrors to the group_vars
, which entirely replaced the debian-mirrors
role:
- type: debmirror
target:
directory: "{{ mirror_base_path }}/debian"
keyring_directory: "{{ mirror_base_path }}/keyrings/debian"
source:
host: rsync.mirrorservice.org
root: ftp.debian.org/debian
keys:
- name: debian-10-buster
url: https://ftp-master.debian.org/keys/archive-key-10.asc
fingerprint: "80D1 5823 B7FD 1561 F9F7 BCDD DC30 D7C2 3CBB ABEE"
- name: debian-11-bullseye
url: https://ftp-master.debian.org/keys/archive-key-11.asc
fingerprint: "1F89 983E 0081 FDE0 18F3 CC96 73A4 F27B 8DD4 7936"
selectors:
installers: true
suites:
- buster
- buster-updates
- bullseye
- bullseye-updates
- type: debmirror
target:
directory: "{{ mirror_base_path }}/debian-security"
keyring_directory: "{{ mirror_base_path }}/keyrings/debian-security"
source:
method: http
host: security.debian.org
root: debian-security
keys:
- name: debian-10-buster-security
url: https://ftp-master.debian.org/keys/archive-key-10-security.asc
fingerprint: "5E61 B217 265D A980 7A23 C5FF 4DFA B270 CAA9 6DFA"
- name: debian-11-bullseye-security
url: https://ftp-master.debian.org/keys/archive-key-11-security.asc
fingerprint: "AC53 0D52 0F2F 3269 F5E9 8313 A484 4904 4AAD 5C5D"
selectors:
suites:
- buster/updates
- bullseye-security
RPM based repositories
I think the argument list arrangements I came up with is sub-optimal, but I was trying to group them into the new source/target/selectors interface with minimal changes:
---
argument_specs:
main:
short_description: Main entry point for mirroring a repository with reposync
options:
target:
type: dict
required: true
options:
directory:
type: str
required: true
description: Directory to mirror to (note each repository will be created in a subdirectory matching the repository name by reposync)
delete:
type: bool
default: true
description: Delete local files no longer present in remote repository
yum_conf:
type: str
required: true
description: Path of yum configuration file (does not need to be called yum.conf) for this repository (will be crated/updated before sync)
source:
type: dict
required: true
options:
repos:
type: list
elements: dict
description: List of repositories to include in this mirror
required_one_of:
- ['baseurl', 'mirrorlist']
mutually_exclusive:
- ['baseurl', 'mirrorlist']
options:
name:
type: str
required: true
description: Name of repository (will be used for target directory by reposync)
baseurl:
type: str
description: Base url for the repository
mirrorlist:
type: str
description: Mirror list url for the repository
description:
type: str
description: Description of the repository
repo_gpgcheck:
type: bool
default: false
description: Whether a GPG signature check should be done on the repodata
gpgcheck:
type: bool
default: true
description: Whether a GPG signature check should be done on the packages
gpgkey:
type: str
description: URL for the ASCII-armoured GPG key for the repository
sslverify:
type: bool
default: true
description: Should SSL certificates be verified
metadata_expire:
type: int
description: Frequency (in seconds) with which yum metadata will expire
yum_conf:
type: str
required: true
description: Path of yum configuration file (does not need to be called yum.conf) for this repository (will be crated/updated before sync)
selectors:
type: dict
options:
newest_only:
type: bool
default: true
description: Only fetch the latest version if mulitple versions of a package are available
architecture:
type: str
default: x86_64
description: Act as if running the specified arch
...
Migrating the rocky-mirrors
to the data is a case of doing this:
- type: reposync
target:
directory: "{{ mirror_base_path }}"
yum_conf: "{{ mirror_base_path }}/yum-configs/rocky.yum.conf"
source:
repos:
- name: rocky-8-baseos
description: Rocky Linux 8 BaseOS
baseurl: http://rockylinux.mirrorservice.org/8/BaseOS/x86_64/os/
gpgkey: https://dl.rockylinux.org/pub/rocky/RPM-GPG-KEY-rockyofficial
- name: rocky-8-appstream
description: Rocky Linux 8 AppStream
baseurl: http://rockylinux.mirrorservice.org/8/AppStream/x86_64/os/
gpgkey: https://dl.rockylinux.org/pub/rocky/RPM-GPG-KEY-rockyofficial
- name: rocky-8-extras
description: Rocky Linux 8 extras
baseurl: http://rockylinux.mirrorservice.org/8/extras/x86_64/os/
gpgkey: https://dl.rockylinux.org/pub/rocky/RPM-GPG-KEY-rockyofficial
and centos-mirrors
:
- type: reposync
target:
directory: "{{ mirror_base_path }}"
yum_conf: "{{ mirror_base_path }}/yum-configs/centos.yum.conf"
source:
repos:
- name: centos-6.10-os
description: CentOS 6.10 OS
baseurl: https://vault.centos.org/6.10/os/x86_64/
gpgkey: https://www.centos.org/keys/RPM-GPG-KEY-CentOS-6
- name: centos-6.10-updates
description: CentOS 6.10 updates
baseurl: https://vault.centos.org/6.10/updates/x86_64/
gpgkey: https://www.centos.org/keys/RPM-GPG-KEY-CentOS-6
- name: centos-6.10-extras
description: CentOS 6.10 extras
baseurl: https://vault.centos.org/6.10/extras/x86_64/
gpgkey: https://www.centos.org/keys/RPM-GPG-KEY-CentOS-6
- name: centos-7-os
description: CentOS 7 OS
baseurl: http://mirror.centos.org/centos/7/os/x86_64/
gpgkey: https://www.centos.org/keys/RPM-GPG-KEY-CentOS-7
- name: centos-7-updates
description: CentOS 7 updates
baseurl: http://mirror.centos.org/centos/7/updates/x86_64/
gpgkey: https://www.centos.org/keys/RPM-GPG-KEY-CentOS-7
- name: centos-7-extras
description: CentOS 7 extras
baseurl: http://mirror.centos.org/centos/7/extras/x86_64/
gpgkey: https://www.centos.org/keys/RPM-GPG-KEY-CentOS-7
Proxmox mirrors
The Proxmox mirrors are just Debian sources, so they were easy to add as migrating the actual mirroring role was already done so just needed putting in the appropriate group_vars
files:
- type: debmirror
target:
directory: "{{ mirror_base_path }}/pve-no-subscription"
keyring_directory: "{{ mirror_base_path }}/keyrings/pve-no-subscription"
source:
method: http
host: download.proxmox.com
root: debian/pve
keys:
- name: proxmox-keyring
url: https://enterprise.proxmox.com/debian/proxmox-release-bullseye.gpg
checksum: 'sha512:7fb03ec8a1675723d2853b84aa4fdb49a46a3bb72b9951361488bfd19b29aab0a789a4f8c7406e71a69aabbc727c936d3549731c4659ffa1a08f44db8fdcebfa'
selectors:
rsync_extras: ['none']
suites:
- bullseye
components:
- pve-no-subscription
- type: debmirror
target:
directory: "{{ mirror_base_path }}/ceph-pacific"
keyring_directory: "{{ mirror_base_path }}/keyrings/ceph-pacific"
source:
method: http
host: download.proxmox.com
root: debian/ceph-pacific
keys:
- name: proxmox-keyring
url: https://enterprise.proxmox.com/debian/proxmox-release-bullseye.gpg
checksum: 'sha512:7fb03ec8a1675723d2853b84aa4fdb49a46a3bb72b9951361488bfd19b29aab0a789a4f8c7406e71a69aabbc727c936d3549731c4659ffa1a08f44db8fdcebfa'
selectors:
rsync_extras: ['none']
suites:
- bullseye
components:
- main
DBAN
This one is quite simple, just moving the version
setting under selectors
:
---
argument_specs:
main:
short_description: Mirror DBAN ISO and extract dban image (e.g. for PXE booting)
options:
target:
type: str
required: true
description: Directory to mirror to
selectors:
type: dict
required: true
options:
version:
type: str
required: true
description: Version of DBAN to fetch
...
And add to the list of mirrors in the data:
- type: dban
target: "{{ mirror_base_path }}/dban-2.3.0"
selectors:
version: 2.3.0
GitLab
As for Proxmox, GitLab is just a case of more mirrors using existing mirror types - the key difference being Proxmox is Debian only where as GitLab has both APT and RPM repositories:
# GitLab
- type: debmirror
target:
directory: "{{ mirror_base_path }}/gitlab-ee-debian"
keyring_directory: "{{ mirror_base_path }}/keyrings/gitlab-ee-debian"
source:
method: http
host: packages.gitlab.com
root: gitlab/gitlab-ee/debian
keys:
# GitLab's own script (https://packages.gitlab.com/install/repositories/gitlab/gitlab-ee/script.deb.sh)
# doesn't verify the key, so I haven't either but not sure that is a good
# idea!
- name: gitlab-keyring
url: https://packages.gitlab.com/gitlab/gitlab-ee/gpgkey
selectors:
rsync_extras: ['none']
suites:
- bullseye
components:
- main
- type: reposync
target:
directory: "{{ mirror_base_path }}"
yum_conf: "{{ mirror_base_path }}/yum-configs/gitlab-ee.yum.conf"
source:
repos:
- name: gitlab_gitlab-ee-el_7
description: gitlab_gitlab-ee-el_7
baseurl: https://packages.gitlab.com/gitlab/gitlab-ee/el/7/x86_64
gpgkey: |-
https://packages.gitlab.com/gitlab/gitlab-ee/gpgkey
https://packages.gitlab.com/gitlab/gitlab-ee/gpgkey/gitlab-gitlab-ee-3D645A26AB9FBD22.pub.gpg
metadata_expire: 300 # Not sure if this is needed - was in old sync script
# Disable these per https://gitlab.com/gitlab-org/omnibus-gitlab/-/issues/5226
repo_gpgcheck: no
gpgcheck: no
sslverify: no
Git Mirrors
This was ridiculously easy, I just changed the argument repository
to be called source
and it directly conformed to the new structure:
---
argument_specs:
main:
short_description: Mirror Git repository (as bare repo)
options:
target:
type: str
required: true
description: Directory to mirror to
source:
type: str
required: true
description: Repository to clone
...
There’s a lot of these, so I’ll just show the first one as an example:
- type: git-bare
source: https://github.com/mythic-beasts/dehydrated-mythic-dns01.git
target: "{{ mirror_base_path }}/git/github.com/mythic-beasts/dehydrated-mythic-dns01.git/"
PIP mirrors
Mirroring python packages with PIP continues to be complicated. I moved the list of packages under the source
key in the new structure and the ‘platforms’ (used to select the Docker image to use to do the fetch) into selectors
:
---
argument_specs:
main:
short_description: Mirror pip packages, using docker to fetch platform-specific version
options:
target:
type: str
required: true
description: Base directory to mirror to
source:
type: dict
required: true
options:
packages:
type: list
required: true
elements: str
description: List of packages to fetch with pip (passed directly to pip, so anything pip accepts (e.g. version constraints) can be included)
selectors:
type: dict
required: true
options:
platforms:
type: list
required: true
elements: dict
options:
image:
type: str
required: true
description: Docker image name to use to fetch pip
pre-command:
type: str
description: Pre-pip command to run (e.g. to install pip in the container). No command will be run if not provided.
name:
type: str
description: sub-directory name to use (defaults to value of `image` value with colons replace by hyphens)
...
Using it is very similar to the rest:
- type: pip-mirror
target: "{{ mirror_base_path }}/pip"
source:
packages:
- ansible
selectors:
platforms:
- image: debian:bullseye
pre-command: 'apt-get update && apt-get -y install python3-pip'