As I put a new, very large, SSD in my laptop I decided it would be convenient (for playing with virtual machines etc.) to have a sub-set of my mirrors on there. This post explains how I took my latest mirroring scripts and made it flex to do this.

Pre-seeding the mirror

I started by copying the current mirror to my laptop, to avoid re-downloading everything. Firstly, as the root user, I created the store and group and added my user to it:

mkdir /srv/mirrors
groupadd store
gpasswd -a laurence store
chown laurence:store /srv/mirrors
chmod 2774 /srv/mirrors # sgid
apt-get install rsync

Then, as my user, I copied the existing mirror over (having mounted it at /mnt):

# As laurence
rsync -avP /mnt/mirrors/ /srv/mirrors/

First mirror run

I started by simply changing the mirror_base_path for my laptop. I did this initially with a lookup based on the hostname fact because using the local connection type on both systems makes it difficult to set variables via the inventory:

- name: Set mirror_base_path based on host
  ansible.builtin.set_fact:
    mirror_base_path: "{{ mirror_path_lookup[ansible_facts['hostname']] }}"
  vars:
    mirror_path_lookup:
      defiant: /srv/mirrors
      galaxy: /mnt/mirrors

In order to get the mirror working, I had to comment out the Gentoo and RPM based mirrors because I did not have Docker or yum-utils installed.

Fixing RPM mirrors

The yum-utils package is not available to install on Debian Bullseye, so I used Docker to create a Rocky Linux container and use that to download RPM mirrors.

Adding the Docker repository key for Secure Apt I did following advice I found in one or two places online and not added the keys to /etc/apt/trusted.gpg.d/, where they would be trusted for all repositories. Instead I placed the key in /usr/local/share/keyrings, downloading and copying it like this:

curl https://download.docker.com/linux/debian/gpg | /usr/bin/gpg --dearmor --output /usr/local/share/keyrings/docker-archive-keyring.gpg

This is then used in the Docker sources.list file, which I put in /etc/apt/sources.list.d/docker.list:

deb [signed-by=/usr/local/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/debian bullseye stable

The Dockerfile for this is very simple:

FROM rockylinux:8
# Install reposync and create repo
RUN dnf install -y yum-utils createrepo

In the roles/reposync/tasks/main.yaml file, I first added the tasks to create the Docker container:

- name: Make temporary directory for Dockerfile
  ansible.builtin.tempfile:
    state: directory
  register: dockerfile_tmp
- name: Copy Dockerfile to temporary path
  ansible.builtin.copy:
    src: Dockerfile
    dest: "/Dockerfile"
- name : Build the image
  community.docker.docker_image:
    build:
      path: ""
    name: reposync
    source: build
    force_source: yes # Always rebuild
- name: Remove temporary path
  ansible.builtin.file:
    path: ""
    state: absent

Then I updated the reposync command task to use the new Docker container instead:

- name: Do reposync
  community.docker.docker_container:
    name: do_reposync
    container_default_behavior: no_defaults  # Stop warning
    cleanup: yes
    detach: no
    # My old sync script had --downloadcomps and --download-metadata
    # hard-coded - not making them variables until I need to turn them
    # off.
    # DNF reposync plugin seems to be downloading the system's core
    # repos, even with `--config` so limit it to just those we want
    # with repoid too.
    command: >
      dnf reposync
      -y
      --arch={{ architecture }}
      --config=/tmp/yum.conf
      --destdir=/mnt
      {{ "--delete" if delete else "" }}
      {{ "--newest-only" if newest_only else "" }}
      --downloadcomps
      --download-metadata
      --repoid={{ ' --repoid='.join(repos | map(attribute='name')) }}
    image: reposync
    timeout: 900 # Timesout with default 60 setting
    mounts:
      - source: "{{ yum_conf }}"
        target: /tmp/yum.conf
        type: bind
        read_only: yes
      - source: "{{ target }}"
        target: /mnt
        type: bind

Finally, the createrepo call is changed to also use the Docker container:

- name: Do createrepo
  community.docker.docker_container:
    name: do_createrepo
    container_default_behavior: no_defaults  # Stop warning
    cleanup: yes
    detach: no
    command: >
      createrepo
      --update
      {{ "-g" if comps_xml_stat.stat.exists else "" }}
      {{ "/mnt/comps.xml" if comps_xml_stat.stat.exists else ""}}
      /mnt
    image: reposync
    mounts:
      - source: "{{ repo }}"
        target: /mnt
        type: bind

I also added a dependency on the docker role in the reposync role’s meta/main.yaml:

---
dependencies:
  - role: docker
...

Making the mirror specification data orientated

In making my mirroring Ansible playbook, I violated one of my typical principals and made the specifics of what to download embedded into the roles coded into “meta roles” that depend on the others rather than data-driven via the inventory.

As a result I have these roles which are pure “meta roles”:

  • centos-mirrors
  • debian-mirrors
  • gitlab-mirrors
  • git-mirrors
  • proxmox-mirrors
  • rocky-mirrors

and these roles that have some logic in them (i.e. are more than purely “meta roles” in some way):

  • dban
  • debmirror
  • docker
  • gentoo-mirror
  • git-bare
  • pip-mirrors
  • reposync

So, to turn this into a data-driven system I turn the meta roles into data in the inventory (via variables).

I started by creating an inventory and, following good practice, using role-orientated groups to apply the variables to the hosts (galaxy being the host used to create the lab mirror, defiant is my laptop):

---
all:
  hosts:
    defiant:
    galaxy:
laptop_local:
  hosts:
    defiant:
lab_source:
  hosts:
    galaxy:
...

Hitting bugs

My plan was to then replace the existing list of roles with include_role with a loop over the mirrors from the inventory data, something like this (where mirrors is a dictionary with two elements - the mirror type (i.e. role name, e.g. reposync or git-bare) and arguments to pass to the role as a dict):

- name: Do mirror
  ansible.builtin.include_role:
    name: "{{ mirror.type }}"
  vars: "{{ mirror.args }}"
  loop: "{{ mirrors }}"
  loop_control:
    loop_var: mirror

However this does not work due to a wontfix bug in Ansible, that because vars is evaluated early it won’t expand a dictionary in that way. There a various workarounds, such as using set_fact to set the variables as facts or rewriting the roles to be “loop aware” and take item as an argument. I am not keen on the latter as an exposing the loop semantics to the role (as opposed to my current approach of a role being a thing that only understands a single item, and the loop semantics are entirely contained in the fact it is called with loop). That said, standardising the interface for a “mirror role” is probably sensible and would achieve the same end without exposing the fact it is being looped over to the role.

In “pseudo Ansible”, this is the interface I adopted for my mirroring roles:

- role:
    name: some-mirror-role
    source: some-role-applicable-source (url/directory/etc.)
    target: target directory for the mirror
    selectors: some-role-applicable-package-selector(s) (e.g. suite/distribution/architectures/etc.)

This made the argument_specs for everything but the selectors (which is highly dependent on the type of mirror) is reasonably common, although there might be some variation in the format (e.g. Gentoo has two difference sources for portage and distfiles):

---
argument_specs:
  main:
    short_description: Mirror some thing
    options:
      target:
        type: str
        required: true
        description: Directory to mirror to
      source:
        type: str
        required: true
        description: Where to mirror from
      selectors:
        type: dict # In all of my use cases
        required: true
        description: Package selectors
        options:
          # This is where is specialises for the mirror method role

Working around the bugs

The next piece was to create the data to use as our source - note I also pulled the target path out, , removing that lookup from the playbook. e.g. for the laptop I created group_vars/laptop_local.yaml (the lab_source is simply the same but with different mirrors and mirror_base_path):

---
mirror_base_path: /srv/mirrors/
mirrors:
  - type: gentoo-mirror
    target: "{{ mirror_base_path }}/gentoo-mirror"
    source: {} # Currently unused by the role but required for interface
    selectors:
      profile: default/linux/amd64/17.1/no-multilib/systemd
      make.conf-vars:
        USE:
          - X
          - alsa
#            - doc
        VIDEO_CARDS:
          - fbdev
          - vesa
          - intel
      package:
        use:
          00cpu-flags: '*/* CPU_FLAGS_X86: avx f10c mmx mmxext pclmul popcnt sse sse2 sse3 sse4_1 sse4_2 ssse3'
          sys-boot: 'sys-boot/grub:2 device_mapper'
      packages:
        - '@world' # All the stage3 pre-built base packages
        - app-editors/vim
        - app-portage/cpuid2cpuflags
        - dev-vcs/git
        - net-misc/rsync
        - sys-boot/grub
        - sys-kernel/dracut
        - sys-kernel/gentoo-sources
        - sys-fs/cryptsetup
        - sys-fs/e2fsprogs
        - sys-fs/lvm2
...

I modified my existing gentoo-mirror role to use this variable arrangement, instead of the old one.

As these are run against localhost (currently), I needed to add a task that sets the fact for localhost, using the hostname gathered fact to identify which host it is actually running on:

- name: Set mirror facts based on hostname fact, as ansible only has 'localhost'                           
  ansible.builtin.set_fact:                                                                           
    mirrors: "{{ hostvars[ansible_facts['hostname']].mirrors }}"
    mirror_base_path: "{{ hostvars[ansible_facts['hostname']].mirror_base_path }}"

The git_deploy_credentials variable that is used to provide credentials to clone my self-hosted git repositories I moved to group_vars/all.yaml as they are common to everything (these credentials only have read access, but even so they should really be looked up rather than hard-coded).

My, finally working, generic “do mirror” task now looks like this:

- name: Do mirror
  ansible.builtin.include_role:
    name: "{{ mirror.type }}"
  vars:
    source: "{{ mirror.source }}"
    target: "{{ mirror.target }}"
    selectors: "{{ mirror.selectors }}"
  loop: "{{ mirrors }}"
  loop_control:
    loop_var: mirror

Adding Debian

The argument spec for the debmirror role, in the new arrangement, turned out to be more complicated than I anticipated:

---
argument_specs:
  main:
    short_description: Main entry point for mirroring a repository with debmirror
    options:
      target:
        description: Locations to download to
        type: dict
        required: true
        options:
          directory:
            type: str
            required: true
            description: Directory to mirror to
          keyring_directory:
            type: str
            required: true
            description: Directory to download keys and store keyring in (must be outside target or debmirror will delete it)
      source:
        description: Where to mirror from
        type: dict
        options:
          host:
            type: str
            default: ftp.debian.org
            description: Hostname to mirror from
          root:
            type: str
            default: debian
            description: Location on the remote host that is the root of the Debian archive
          keys:
            type: list
            elements: dict
            options:
              name:
                type: str
                required: true
                description: Name of key (will be used for download filename)
              url:
                type: str
                requires: true
                description: Where to fetch the key from
              checksum:
                type: str
                description: Optional checksum for the downloaded key
              fingerprint:
                type: str
                description: Optional fingerprint of the downloaded key
          method:
            type: str
            default: rsync
            choices:
              - ftp
              - http
              - https
              - rsync
            description: Method to use to download (rsync/http)
      selectors:
        type: dict
        required: yes
        options:
          suites:
            type: list
            elements: str
            required: true
            description: The list of suites to mirror
          components:
            type: list
            elements: str
            default: ['main']
            description: The list of components to mirror
          architectures:
            type: list
            elements: str
            default: ['amd64']
            description: The list of architectures to mirror
          rsync_extras:
            type: list
            elements: str
            default: ['trace']
            choices:
              - doc
              - indices
              - tools
              - trace
              - none
            description: Additional files to mirror (with rsync) that are not part of the archive itself
          installers:
            type: bool
            default: false
            description: Fetch the Debian installers (that match the architectures and suites requested)
...

Again, I then modified the role to use this instead of the previous structure for the same argument - the added challenge here was that because some of the options were no longer top-level, I had to use | default('foo') to default values to foo (for example) instead of specifying them in defaults/main.yaml in the role.

Once migrated, I added debian and debian-security mirrors to the group_vars, which entirely replaced the debian-mirrors role:

- type: debmirror
  target:
    directory: "{{ mirror_base_path }}/debian"
    keyring_directory: "{{ mirror_base_path }}/keyrings/debian"
  source:
    host: rsync.mirrorservice.org
    root: ftp.debian.org/debian
    keys:
      - name: debian-10-buster
        url: https://ftp-master.debian.org/keys/archive-key-10.asc
        fingerprint: "80D1 5823 B7FD 1561 F9F7 BCDD DC30 D7C2 3CBB ABEE"
      - name: debian-11-bullseye
        url: https://ftp-master.debian.org/keys/archive-key-11.asc
        fingerprint: "1F89 983E 0081 FDE0 18F3 CC96 73A4 F27B 8DD4 7936"
  selectors:
    installers: true
    suites:
      - buster
      - buster-updates
      - bullseye
      - bullseye-updates
- type: debmirror
  target:
    directory: "{{ mirror_base_path }}/debian-security"
    keyring_directory: "{{ mirror_base_path }}/keyrings/debian-security"
  source:
    method: http
    host: security.debian.org
    root: debian-security
    keys:
      - name: debian-10-buster-security
        url: https://ftp-master.debian.org/keys/archive-key-10-security.asc
        fingerprint: "5E61 B217 265D A980 7A23 C5FF 4DFA B270 CAA9 6DFA"
      - name: debian-11-bullseye-security
        url: https://ftp-master.debian.org/keys/archive-key-11-security.asc
        fingerprint: "AC53 0D52 0F2F 3269 F5E9 8313 A484 4904 4AAD 5C5D"
  selectors:
    suites:
      - buster/updates
      - bullseye-security

RPM based repositories

I think the argument list arrangements I came up with is sub-optimal, but I was trying to group them into the new source/target/selectors interface with minimal changes:

---
argument_specs:
  main:
    short_description: Main entry point for mirroring a repository with reposync
    options:
      target:
        type: dict
        required: true
        options:
          directory:
            type: str
            required: true
            description: Directory to mirror to (note each repository will be created in a subdirectory matching the repository name by reposync)
          delete:
            type: bool
            default: true
            description: Delete local files no longer present in remote repository
          yum_conf:
            type: str
            required: true
            description: Path of yum configuration file (does not need to be called yum.conf) for this repository (will be crated/updated before sync)
      source:
        type: dict
        required: true
        options:
          repos:
            type: list
            elements: dict
            description: List of repositories to include in this mirror
            required_one_of:
              - ['baseurl', 'mirrorlist']
            mutually_exclusive:
              - ['baseurl', 'mirrorlist']
            options:
              name:
                type: str
                required: true
                description: Name of repository (will be used for target directory by reposync)
              baseurl:
                type: str
                description: Base url for the repository
              mirrorlist:
                type: str
                description: Mirror list url for the repository
              description:
                type: str
                description: Description of the repository
              repo_gpgcheck:
                type: bool
                default: false
                description: Whether a GPG signature check should be done on the repodata
              gpgcheck:
                type: bool
                default: true
                description: Whether a GPG signature check should be done on the packages
              gpgkey:
                type: str
                description: URL for the ASCII-armoured GPG key for the repository
              sslverify:
                type: bool
                default: true
                description: Should SSL certificates be verified
              metadata_expire:
                type: int
                description: Frequency (in seconds) with which yum metadata will expire
          yum_conf:
            type: str
            required: true
            description: Path of yum configuration file (does not need to be called yum.conf) for this repository (will be crated/updated before sync)
      selectors: 
        type: dict
        options:
          newest_only:
            type: bool
            default: true
            description: Only fetch the latest version if mulitple versions of a package are available
          architecture:
            type: str
            default: x86_64
            description: Act as if running the specified arch
...

Migrating the rocky-mirrors to the data is a case of doing this:

- type: reposync
  target:
    directory: "{{ mirror_base_path }}"
    yum_conf: "{{ mirror_base_path }}/yum-configs/rocky.yum.conf"
  source:
    repos:
      - name: rocky-8-baseos
        description: Rocky Linux 8 BaseOS
        baseurl: http://rockylinux.mirrorservice.org/8/BaseOS/x86_64/os/
        gpgkey: https://dl.rockylinux.org/pub/rocky/RPM-GPG-KEY-rockyofficial
      - name: rocky-8-appstream
        description: Rocky Linux 8 AppStream
        baseurl: http://rockylinux.mirrorservice.org/8/AppStream/x86_64/os/
        gpgkey: https://dl.rockylinux.org/pub/rocky/RPM-GPG-KEY-rockyofficial
      - name: rocky-8-extras
        description: Rocky Linux 8 extras
        baseurl: http://rockylinux.mirrorservice.org/8/extras/x86_64/os/
        gpgkey: https://dl.rockylinux.org/pub/rocky/RPM-GPG-KEY-rockyofficial

and centos-mirrors:

- type: reposync
  target:
    directory: "{{ mirror_base_path }}"
    yum_conf: "{{ mirror_base_path }}/yum-configs/centos.yum.conf"
  source:
    repos:
      - name: centos-6.10-os
        description: CentOS 6.10 OS
        baseurl: https://vault.centos.org/6.10/os/x86_64/
        gpgkey: https://www.centos.org/keys/RPM-GPG-KEY-CentOS-6
      - name: centos-6.10-updates
        description: CentOS 6.10 updates
        baseurl: https://vault.centos.org/6.10/updates/x86_64/
        gpgkey: https://www.centos.org/keys/RPM-GPG-KEY-CentOS-6
      - name: centos-6.10-extras
        description: CentOS 6.10 extras
        baseurl: https://vault.centos.org/6.10/extras/x86_64/
        gpgkey: https://www.centos.org/keys/RPM-GPG-KEY-CentOS-6
      - name: centos-7-os
        description: CentOS 7 OS
        baseurl: http://mirror.centos.org/centos/7/os/x86_64/
        gpgkey: https://www.centos.org/keys/RPM-GPG-KEY-CentOS-7
      - name: centos-7-updates
        description: CentOS 7 updates
        baseurl: http://mirror.centos.org/centos/7/updates/x86_64/
        gpgkey: https://www.centos.org/keys/RPM-GPG-KEY-CentOS-7
      - name: centos-7-extras
        description: CentOS 7 extras
        baseurl: http://mirror.centos.org/centos/7/extras/x86_64/
        gpgkey: https://www.centos.org/keys/RPM-GPG-KEY-CentOS-7

Proxmox mirrors

The Proxmox mirrors are just Debian sources, so they were easy to add as migrating the actual mirroring role was already done so just needed putting in the appropriate group_vars files:

- type: debmirror
  target:
    directory: "{{ mirror_base_path }}/pve-no-subscription"
    keyring_directory: "{{ mirror_base_path }}/keyrings/pve-no-subscription"
  source:
    method: http
    host: download.proxmox.com
    root: debian/pve
    keys:
      - name: proxmox-keyring
        url: https://enterprise.proxmox.com/debian/proxmox-release-bullseye.gpg
        checksum: 'sha512:7fb03ec8a1675723d2853b84aa4fdb49a46a3bb72b9951361488bfd19b29aab0a789a4f8c7406e71a69aabbc727c936d3549731c4659ffa1a08f44db8fdcebfa'
  selectors:
    rsync_extras: ['none']
    suites:
      - bullseye
    components:
      - pve-no-subscription
- type: debmirror
  target:
    directory: "{{ mirror_base_path }}/ceph-pacific"
    keyring_directory: "{{ mirror_base_path }}/keyrings/ceph-pacific"
  source:
    method: http
    host: download.proxmox.com
    root: debian/ceph-pacific
    keys:
      - name: proxmox-keyring
        url: https://enterprise.proxmox.com/debian/proxmox-release-bullseye.gpg
        checksum: 'sha512:7fb03ec8a1675723d2853b84aa4fdb49a46a3bb72b9951361488bfd19b29aab0a789a4f8c7406e71a69aabbc727c936d3549731c4659ffa1a08f44db8fdcebfa'
  selectors:
    rsync_extras: ['none']
    suites:
      - bullseye
    components:
      - main

DBAN

This one is quite simple, just moving the version setting under selectors:

---
argument_specs:
  main:
    short_description: Mirror DBAN ISO and extract dban image (e.g. for PXE booting)
    options:
      target:
        type: str
        required: true
        description: Directory to mirror to
      selectors:
        type: dict
        required: true
        options:
          version:
            type: str
            required: true
            description: Version of DBAN to fetch
...

And add to the list of mirrors in the data:

- type: dban
  target: "{{ mirror_base_path }}/dban-2.3.0"
  selectors:
    version: 2.3.0

GitLab

As for Proxmox, GitLab is just a case of more mirrors using existing mirror types - the key difference being Proxmox is Debian only where as GitLab has both APT and RPM repositories:

  # GitLab
  - type: debmirror
    target:
      directory: "{{ mirror_base_path }}/gitlab-ee-debian"
      keyring_directory: "{{ mirror_base_path }}/keyrings/gitlab-ee-debian"
    source:
      method: http
      host: packages.gitlab.com
      root: gitlab/gitlab-ee/debian
      keys:
        # GitLab's own script (https://packages.gitlab.com/install/repositories/gitlab/gitlab-ee/script.deb.sh)
        # doesn't verify the key, so I haven't either but not sure that is a good
        # idea!
        - name: gitlab-keyring
          url: https://packages.gitlab.com/gitlab/gitlab-ee/gpgkey
    selectors:
      rsync_extras: ['none']
      suites:
        - bullseye
      components:
        - main
  - type: reposync
    target:
      directory: "{{ mirror_base_path }}"
      yum_conf: "{{ mirror_base_path }}/yum-configs/gitlab-ee.yum.conf"
    source:
      repos:
        - name: gitlab_gitlab-ee-el_7
          description: gitlab_gitlab-ee-el_7
          baseurl: https://packages.gitlab.com/gitlab/gitlab-ee/el/7/x86_64
          gpgkey: |-
            https://packages.gitlab.com/gitlab/gitlab-ee/gpgkey
                  https://packages.gitlab.com/gitlab/gitlab-ee/gpgkey/gitlab-gitlab-ee-3D645A26AB9FBD22.pub.gpg
          metadata_expire: 300  # Not sure if this is needed - was in old sync script
          # Disable these per https://gitlab.com/gitlab-org/omnibus-gitlab/-/issues/5226
          repo_gpgcheck: no
          gpgcheck: no
          sslverify: no

Git Mirrors

This was ridiculously easy, I just changed the argument repository to be called source and it directly conformed to the new structure:

---
argument_specs:
  main:
    short_description: Mirror Git repository (as bare repo)
    options:
      target:
        type: str
        required: true
        description: Directory to mirror to
      source:
        type: str
        required: true
        description: Repository to clone
...

There’s a lot of these, so I’ll just show the first one as an example:

  - type: git-bare
    source: https://github.com/mythic-beasts/dehydrated-mythic-dns01.git
    target: "{{ mirror_base_path }}/git/github.com/mythic-beasts/dehydrated-mythic-dns01.git/"

PIP mirrors

Mirroring python packages with PIP continues to be complicated. I moved the list of packages under the source key in the new structure and the ‘platforms’ (used to select the Docker image to use to do the fetch) into selectors:

---
argument_specs:
  main:
    short_description: Mirror pip packages, using docker to fetch platform-specific version
    options:
      target:
        type: str
        required: true
        description: Base directory to mirror to
      source:
        type: dict
        required: true
        options:
          packages:
            type: list
            required: true
            elements: str
            description: List of packages to fetch with pip (passed directly to pip, so anything pip accepts (e.g. version constraints) can be included)
      selectors:
        type: dict
        required: true
        options:
          platforms:
            type: list
            required: true
            elements: dict
            options:
              image:
                type: str
                required: true
                description: Docker image name to use to fetch pip
              pre-command:
                type: str
                description: Pre-pip command to run (e.g. to install pip in the container). No command will be run if not provided.
              name:
                type: str
                description: sub-directory name to use (defaults to value of `image` value with colons replace by hyphens)
...

Using it is very similar to the rest:

  - type: pip-mirror
    target: "{{ mirror_base_path }}/pip"
    source:
      packages:
        - ansible
    selectors:
      platforms:
        - image: debian:bullseye
          pre-command: 'apt-get update && apt-get -y install python3-pip'