The number of things being mirrored since I initially setup my mirrors in the home-lab network has grown and the current bash script has become a bit cumbersome to maintain. This post describes the process of replacing the current script and then extending it to mirror more things, including Gentoo and Git repositories - something that took nearly 4 weeks (bearing in mind this is a hobby-project, fitting it in around work and home life). Strap in, this is going to be a long post….

I had been planning to rewrite it to use either bash functions or a series of script to mirror specific repositories using generalised patterns for (e.g.) apt and yum repositories. When I came to write it, though, I thought it might be easier to use Ansible - which is rapidly becoming my “go to” tool for this sort of thing when it outgrows a simple bash script.

Replicating the current script

I have 2 types of repository, currently (apt and yum). Once I have improved the process, I will also be adding Gentoo and various git repository clones to the mirror.

The preamble

I began by replicating the start of my script, which ensures the mirror directory exists, aborting if not:

---
- hosts: localhost
  vars:
    mirror_base_path: /mnt/mirrors
  pre_tasks:
    - name: Stat mirrors directory
      ansible.builtin.stat:
        path: '{{ mirror_base_path }}'
      register: mirror_path_stat
    - name: Fail if mirrors directory does not exist
      ansible.builtin.fail:
        msg: 'Please pre-create the mirror target directory (currently set as {{ mirror_base_path }}'
      when: not mirror_path_stat.stat.exists
...

Apt mirrors

I split this part in two - a role called debmirror that actually does the mirroring then separate roles for each collection of software that I wanted to mirror which use the debmirror role with different arguments to perform the mirroring.

The debmirror role

The first step is to define all of the arguments required for debmirror in the role’s meta/argument_specs.yml file:

---
argument_specs:
  main:
    short_description: Main entry point for mirroring a repository with debmirror
    options:
      target:
        type: str
        required: true
        description: Directory to mirror to
      keyring_directory:
        type: str
        required: true
        description: Directory to download keys and store keyring in (must be outside target or debmirror will delete it)
      source:
        description: Where to mirror from
        options:
          host:
            type: str
            default: ftp.debian.org
            description: Hostname to mirror from
          root:
            type: str
            default: debian
            description: Location on the remote host that is the root of the Debian archive
      keys:
        type: list
        elements: dict
        options:
          name:
            type: str
            required: true
            description: Name of key (will be used for filename)
          url:
            type: str
            requires: true
            description: Where to fetch the key from
          checksum:
            type: str
            description: Optional checksum for the downloaded key
          fingerprint:
            type: str
            description: Optional fingerprint of the downloaded key
      method:
        type: str
        default: rsync
        choices:
          - ftp
          - http
          - https
          - rsync
        description: Method to use to download (rsync/http)
      suites:
        type: list
        elements: str
        required: true
        description: The list of suites to mirror
      components:
        type: list
        elements: str
        default: ['main']
        description: The list of components to mirror
      architectures:
        type: list
        elements: str
        default: ['amd64']
        description: The list of architectures to mirror
      rsync_extras:
        type: list
        elements: str
        default: ['trace']
        choices:
          - doc
          - indices
          - tools
          - trace
          - none
        description: Additional files to mirror (with rsync) that are not part of the archive itself
...

I set (what I consider) sensible default values, mostly matching the debmirror command’s defaults but a few set to my preference in the role’s defaults/main.yml file:

---
method: rsync
# Match debmirror's defaults for source, components and rsync_extras
source:
  host: ftp.debian.org
  root: debian
components:
  - main
  - contrib
  - non-free
  - main/debian-installer
rsync_extras:
  - trace
# Almost all my systems are x86_64
architectures:
  - amd64
...

As a repository might have multiple keys (e.g. containing packages for different distribution versions), I separated the key fetching tasks into their own file (tasks/fetch-key.yml) to make looping a bit neater:

---
- name: Download key {{ key.name }}
  ansible.builtin.get_url:
    dest: "{{ keyring_directory }}/keys.d/{{ key.name }}.key"
    url: "{{ key.url }}"
    checksum: "{{ key.checksum| default('') }}"
- name: Validate {{ key.name }} fingerprint
  block:
    - name: Get fingerprint from file
      ansible.builtin.shell:
        cmd: "gpg --show-keys --with-fingerprint --with-colons {{ keyring_directory }}/keys.d/{{ key.name }}.key | awk -F: '$1 == \"fpr\" { print $10 }'"
      register: fingerprint
    - name: Check fingerprint matches
      ansible.builtin.fail:
        msg: "File fingerprint ({{ fingerprint.stdout }}) does not match specified value ({{ key.fingerprint }})"
      when: key.fingerprint.replace(' ', '') not in fingerprint.stdout.split('\n')
  when: key.fingerprint is defined
...

And finally, the main part of the role - it’s tasks/main.yml file:

---
- name: Make target directory
  ansible.builtin.file:
    path: "{{ target }}"
    state: directory
- name: Make keys directory
  ansible.builtin.file:
    path: "{{ keyring_directory }}/keys.d"
    state: directory
  when: keys is defined
- name: Download keys for keyring
  include_tasks: fetch-key.yml
  loop: "{{ keys | default([]) }}"
  loop_control:
    loop_var: key
- name: Delete any extra keys
  ansible.builtin.file:
    path: "{{ item }}"
    state: absent
  with_fileglob: "{{ keyring_directory }}/keys.d/*.key"
  when: item.split('/')[-1].split('.')[0] not in keys | map(attribute='name') | list
- name: Delete old keyring
  ansible.builtin.file:
    path: "{{ keyring_directory }}/keyring.gpg"
    state: absent
- name: Make keyring
  ansible.builtin.shell: gpg --no-default-keyring --keyring {{ keyring_directory }}/keyring.gpg --import {{ item }}
  with_fileglob: "{{ keyring_directory }}/keys.d/*.key"
- name: Run debmirror
  ansible.builtin.command:
    argv:
      - debmirror
      - --verbose
      - --precleanup # Remove old files before download - makes mirror inconsistent until debmirror finishes but copes better with limited space
      - --dist={{ ','.join(suites) }}
      - --section={{ ','.join(components) }}
      - --arch={{ ','.join(architectures) }}
      - --method={{ method }}
      - --rsync-extra={{ ','.join(rsync_extras) }}
      - --host={{ source.host }}
      - --root={{ source.root }}
      - --keyring={{ keyring_directory }}/keyring.gpg
      - "{{ target }}"
...

Using the debmirror role

To use the debmirror role, I added new roles which list it as a dependency with the appropriate arguments for the mirror(s) they are creating. The core Debian mirrors are in the role debian-mirrors whose only file is meta/main.yml (the variable mirror_base_path is set at the play level, but could be easily specified in inventory or command line):

---
dependencies:
  - role: debmirror
    source:
      host: rsync.mirrorservice.org
      root: ftp.debian.org/debian
    target: '{{ mirror_base_path }}/debian'
    keyring_directory: '{{ mirror_base_path }}/keyrings/debian'
    suites:
      - buster
      - buster-updates
      - bullseye
      - bullseye-updates
    keys:
      - name: debian-10-buster
        url: https://ftp-master.debian.org/keys/archive-key-10.asc
        fingerprint: "80D1 5823 B7FD 1561 F9F7 BCDD DC30 D7C2 3CBB ABEE"
      - name: debian-11-bullseye
        url: https://ftp-master.debian.org/keys/archive-key-11.asc
        fingerprint: "1F89 983E 0081 FDE0 18F3 CC96 73A4 F27B 8DD4 7936"
  - role: debmirror
    method: http
    source:
      host: security.debian.org
      root: debian-security
    target: '{{ mirror_base_path }}/debian-security'
    keyring_directory: '{{ mirror_base_path }}/keyrings/debian-security'
    suites:
      - buster/updates
      - bullseye-security
    keys:
      - name: debian-10-buster-security
        url: https://ftp-master.debian.org/keys/archive-key-10-security.asc
        fingerprint: "5E61 B217 265D A980 7A23 C5FF 4DFA B270 CAA9 6DFA"
      - name: debian-11-bullseye-security
        url: https://ftp-master.debian.org/keys/archive-key-11-security.asc
        fingerprint: "AC53 0D52 0F2F 3269 F5E9 8313 A484 4904 4AAD 5C5D"
...

This can be duplicated for other mirrors, for example for GitLab:

---
dependencies:
  - role: debmirror
    method: http
    rsync_extras: ['none']
    source:
      host: packages.gitlab.com
      root: gitlab/gitlab-ee/debian
    target: '{{ mirror_base_path }}/gitlab-ee-debian'
    keyring_directory: '{{ mirror_base_path }}/keyrings/gitlab-ee-debian'
    suites:
      - bullseye
    components:
      - main
    keys:
      # GitLab's own script (https://packages.gitlab.com/install/repositories/gitlab/gitlab-ee/script.deb.sh)
      # doesn't verify the key, so I haven't either but not sure that is a good
      # idea!
      - name: gitlab-keyring
        url: https://packages.gitlab.com/gitlab/gitlab-ee/gpgkey
...

and Proxmox:

---
dependencies:
  - role: debmirror
    method: http
    rsync_extras: ['none']
    source:
      host: download.proxmox.com
      root: debian/pve
    target: '{{ mirror_base_path }}/pve-no-subscription'
    keyring_directory: '{{ mirror_base_path }}/keyrings/pve-no-subscription'
    suites:
      - bullseye
    components:
      - pve-no-subscription
    keys:
      - name: proxmox-keyring
        url: https://enterprise.proxmox.com/debian/proxmox-release-bullseye.gpg
        checksum: 'sha512:7fb03ec8a1675723d2853b84aa4fdb49a46a3bb72b9951361488bfd19b29aab0a789a4f8c7406e71a69aabbc727c936d3549731c4659ffa1a08f44db8fdcebfa'
  - role: debmirror
    method: http
    rsync_extras: ['none']
    source:
      host: download.proxmox.com
      root: debian/ceph-pacific
    target: '{{ mirror_base_path }}/ceph-pacific'
    keyring_directory: '{{ mirror_base_path }}/keyrings/ceph-pacific'
    suites:
      - bullseye
    components:
      - main
    keys:
      - name: proxmox-keyring
        url: https://enterprise.proxmox.com/debian/proxmox-release-bullseye.gpg
        checksum: 'sha512:7fb03ec8a1675723d2853b84aa4fdb49a46a3bb72b9951361488bfd19b29aab0a789a4f8c7406e71a69aabbc727c936d3549731c4659ffa1a08f44db8fdcebfa'
...

RPM mirrors

This follows a similar pattern to the APT mirrors, creating a reposync role for repositories synced using this method.

The reposync role

Beginning again with the arguments and defaults files. Some of the defaults, as each reposync command can be used with multiple repositories so these are provided as a list of dictionaries, are set in the template yum configuration (see further below) rather than the role’s defaults file.

meta/argument_specs.yml:

---
argument_specs:
  main:
    short_description: Main entry point for mirroring a repository with reposync
    options:
      target:
        type: str
        required: true
        description: Directory to mirror to (note each repository will be created in a subdirectory matching the repository name by reposync)
      yum_conf:
        type: str
        required: true
        description: Path of yum configuration file (does not need to be called yum.conf) for this repository (will be crated/updated before sync)
      delete:
        type: bool
        default: true
        description: Delete local files no longer present in remote repository
      newest_only:
        type: bool
        default: true
        description: Only fetch the latest version if mulitple versions of a package are available
      architecture:
        type: str
        default: x86_64
        description: Act as if running the specified arch
      repos:
        type: list
        elements: dict
        description: List of repositories to include in this mirror
        required_one_of:
          - ['baseurl', 'mirrorlist']
        mutually_exclusive:
          - ['baseurl', 'mirrorlist']
        options:
          name:
            type: str
            required: true
            description: Name of repository (will be used for target directory by reposync)
          baseurl:
             type: str
             description: Base url for the repository
          mirrorlist:
             type: str
             description: Mirror list url for the repository
          description:
            type: str
            description: Description of the repository
          repo_gpgcheck:
             type: bool
             default: true
             description: Whether a GPG signature check should be done on the repodata
          gpgcheck:
             type: bool
             default: true
             description: Whether a GPG signature check should be done on the packages
          gpgkey:
             type: str
             description: URL for the ASCII-armoured GPG key for the repository
          sslverify:
             type: bool
             default: true
             description: Should SSL certificates be verified
          metadata_expire:
             type: int
             description: Frequency (in seconds) with which yum metadata will expire
...

defaults/main.yml:

---
delete: true
newest_only: true
architecture: x86_64
...

Like with the key management for APT, I separated out the createrepo command into its own task file (tasks/createrepo.yml) to make looping easier:

---
- name: Check if comps.xml exists
  ansible.builtin.stat:
    path: '{{ repo }}/comps.xml'
  register: comps_xml_stat
- name: Run createrepo
  ansible.builtin.command:
    cmd: >
      createrepo
      --update
      {{ "-g" if comps_xml_stat.stat.exists else "" }}
      {{ repo + "/comps.xml" if comps_xml_stat.stat.exists else ""}}
      {{ repo }}
...

As reposync uses a yum configuration file, I created a template for that from the repos passed to the role in templates/yum.conf.j2:

{% for repo in repos %}
[{{ repo.name }}]
name={{ repo.description }}
{# All repos for mirroring need to be enabled #}
enabled=1
{%   if repo.baseurl | default(false) %}
baseurl={{ repo.baseurl }}
{%   else %}
mirrorlist={{ repo.mirrorlist }}
{%   endif %}
gpgcheck={{ '1' if repo.gpgcheck | default(true) else '0' }}
gpgkey={{ repo.gpgkey }}
repo_gpgcheck={{ '1' if repo.repo_gpgcheck | default(true) else '0' }}
sslverify={{ '1' if repo.sslverify | default(true) else '0' }}
{%   if repo.metadata_expire is defined %}
metadata_expire={{ repo.metadata_expire }}
{%   endif %}

{% endfor %}

And finally the main task file in tasks/main.yml, which does the mirroring:

---
- name: Make sure yum config directory exists
  ansible.builtin.file:
    path: '{{ yum_conf | dirname }}'
    state: directory
- name: Create yum configuration file
  ansible.builtin.template:
    src: yum.conf.j2
    dest: '{{ yum_conf }}'
- name: Run reposync
  ansible.builtin.command:
    # My old sync script had --downloadcomps and --download-metadata
    # hard-coded - not making them variables until I need to turn them
    # off.
    cmd: >
      reposync
      --arch={{ architecture }}
      --config={{ yum_conf }}
      --download_path={{ target }}
      {{ "--delete" if delete else "" }}
      {{ "--newest-only" if newest_only else "" }}
      --downloadcomps
      --download-metadata
- name: Run create repo on each new repository
  include_tasks: createrepo.yml
  vars:
    repo: '{{ target }}/{{ item.name }}'
  loop: '{{ repos }}'
...

Using the reposync role

Role for CentOS 6 and 7 mirrors (called centos-mirrors):

---
dependencies:
  - role: reposync
    target: '{{ mirror_base_path }}'
    yum_conf: '{{ mirror_base_path }}/yum-configs/centos.yum.conf'
    repos:
      - name: centos-6.10-os
        description: CentOS 6.10 OS
        baseurl: https://vault.centos.org/6.10/os/x86_64/
        gpgkey: https://www.centos.org/keys/RPM-GPG-KEY-CentOS-6
      - name: centos-6.10-updates
        description: CentOS 6.10 updates
        baseurl: https://vault.centos.org/6.10/updates/x86_64/
        gpgkey: https://www.centos.org/keys/RPM-GPG-KEY-CentOS-6
      - name: centos-6.10-extras
        description: CentOS 6.10 extras
        baseurl: https://vault.centos.org/6.10/extras/x86_64/
        gpgkey: https://www.centos.org/keys/RPM-GPG-KEY-CentOS-6
      - name: centos-7-os
        description: CentOS 7 OS
        baseurl: http://mirror.centos.org/centos/7/os/x86_64/
        gpgkey: https://www.centos.org/keys/RPM-GPG-KEY-CentOS-7
      - name: centos-7-updates
        description: CentOS 7 updates
        baseurl: http://mirror.centos.org/centos/7/updates/x86_64/
        gpgkey: https://www.centos.org/keys/RPM-GPG-KEY-CentOS-7
      - name: centos-7-extras
        description: CentOS 7 extras
        baseurl: http://mirror.centos.org/centos/7/extras/x86_64/
        gpgkey: https://www.centos.org/keys/RPM-GPG-KEY-CentOS-7
...

And for Rocky 8 mirrors (called rocky-mirrors):

---
dependencies:
  - role: reposync
    target: '{{ mirror_base_path }}'
    yum_conf: '{{ mirror_base_path }}/yum-configs/rocky.yum.conf'
    repos:
      - name: rocky-8-baseos
        description: Rocky Linux 8 BaseOS
        baseurl: http://rockylinux.mirrorservice.org/8/BaseOS/x86_64/os/
        gpgkey: https://dl.rockylinux.org/pub/rocky/RPM-GPG-KEY-rockyofficial
      - name: rocky-8-appstream
        description: Rocky Linux 8 AppStream
        baseurl: http://rockylinux.mirrorservice.org/8/AppStream/x86_64/os/
        gpgkey: https://dl.rockylinux.org/pub/rocky/RPM-GPG-KEY-rockyofficial
      - name: rocky-8-extras
        description: Rocky Linux 8 extras
        baseurl: http://rockylinux.mirrorservice.org/8/extras/x86_64/os/
        gpgkey: https://dl.rockylinux.org/pub/rocky/RPM-GPG-KEY-rockyofficial
...

For GitLab, I just added a reposync entry to the existing dependencies for the gitlab-mirrors role:

---
dependencies:
  - role: debmirror
    method: http
    rsync_extras: ['none']
    source:
      host: packages.gitlab.com
      root: gitlab/gitlab-ee/debian
    target: '{{ mirror_base_path }}/gitlab-ee-debian'
    keyring_directory: '{{ mirror_base_path }}/keyrings/gitlab-ee-debian'
    suites:
      - bullseye
    components:
      - main
    keys:
      # GitLab's own script (https://packages.gitlab.com/install/repositories/gitlab/gitlab-ee/script.deb.sh)
      # doesn't verify the key, so I haven't either but not sure that is a good
      # idea!
      - name: gitlab-keyring
        url: https://packages.gitlab.com/gitlab/gitlab-ee/gpgkey
  - role: reposync
    target: '{{ mirror_base_path }}'
    yum_conf: '{{ mirror_base_path }}/yum-configs/gitlab-ee.yum.conf'
    repos:
      - name: gitlab_gitlab-ee-el_7
        description: gitlab_gitlab-ee-el_7
        baseurl: https://packages.gitlab.com/gitlab/gitlab-ee/el/7/x86_64
        gpgkey: |-
          https://packages.gitlab.com/gitlab/gitlab-ee/gpgkey
                 https://packages.gitlab.com/gitlab/gitlab-ee/gpgkey/gitlab-gitlab-ee-3D645A26AB9FBD22.pub.gpg          
        metadata_expire: 300  # Not sure if this is needed - was in old sync script
        # Disable these per https://gitlab.com/gitlab-org/omnibus-gitlab/-/issues/5226
        repo_gpgcheck: no
        gpgcheck: no
        sslverify: no
...

New functionality

My old script was able to mirror some apt and yum repositories, but now I come to extending the new version to create some mirrors that were not in the original script.

Gentoo mirrors

My original thought was to mirror Gentoo, like I have done for Debian and CentOS/Rock above. However, the full size of the Gentoo Source mirror is over 550GiB, which pushes the balance away from convenience of having all packages immediately available to install and towards selectively mirror at a wanted-package level (being a source distribution, the notion of mirroring at an architecture or version level is largely meaningless).

In order to do this, I need to adopt an approach similar to one I previously used with EasyBuild - fetching in an internet connected environment then teleporting them (in this case on my USB mirror update drive) into the isolated environment to do the building. The next complication is the need to use the emerge command to do the download - without an existing Gentoo install to hand this required a little thinking to get around. I decided to use the official Gentoo docker image to do the downloading, as the system I am using to do the mirroring happened to already have Docker installed.

In order for Ansible to talk to Docker, I have to install the full ansible package (I had been using ansible-core up until this point) and docker python package.

The Gentoo mirroring is very specific to the distribution, so a I created a single gentoo-mirror role for it. Its arguments are simply a target directory to place the mirrored files and a list of packages to download sources for. I made it download all of the sources for the bootstrap script as well.

meta/argument_specs.yml:

---
argument_specs:
  main:
    short_description: Mirror Gentoo portage database and wanted distfile sources
    options:
      target:
        type: str
        required: true
        description: Directory to mirror to
      config:
        type: dict
        required: true
        description: Configuration information for portage environment
        options:
          profile:
            type: str
            required: true
            description: Name of the portage profile (from /var/db/repos/gentoo/profiles)
          make.conf-vars:
            type: dict
            required: true
            description: Dictionary of keys -> list of values (values will be concatenated with spaces separating them)
          package:
            type: dict
            required: false
            description: Dictionary of file contents for each of package.use, package.mask and package.accept_keywords directories
            options:
              use:
                type: dict
                description: Dictionary of filenames to (string) content for package.use directory
              mask:
                type: dict
                description: Dictionary of filenames to (string) content for package.mask directory
              accept_keywords:
                type: dict
                description: Dictionary of filenames to (string) content for package.accept_keywords directory
      packages:
        type: list
        required: true
        description: Packages to download sources for (in addition to bootstrap)
        elements: str
...

I setup a task file that would install Docker, tasks/setup-docker.yml, making sure it matched my current configuration management (SaltStack)’s configuration exactly to stop them trying to stamp of each other’s changes:

---
- name: Ensure Docker repository is available
  become: yes
  ansible.builtin.apt_repository:
    # Match what is currently in SaltStack (repo and filename) so they
    # do not end up fighting.
    repo: deb https://download.docker.com/linux/debian {{ ansible_distribution_release }} stable
    filename: docker
    state: present
- name: Ensure Docker is installed
  become: yes
  ansible.builtin.package:
    name:
      - docker-ce
      - docker-ce-cli
      - containerd.io
      - python3-docker # So Ansible can manage docker
    state: present
...

In order to install Gentoo, I need a “stage” file as well as the packages. These are built nightly, so I created a task file, tasks/download-stage-file.yml, that will download one and create/update a symlink to point to it deleting the file it used to point to if the target changes:

---
- name: Fetch file {{ stage_file }}
  ansible.builtin.get_url:
    url: "{{ base_url }}{{ stage_file }}"
    dest: "{{ target }}/{{ stage_file | basename }}"
- name: Stat symlink
  ansible.builtin.stat:
    path: "{{ target }}/{{ symlink }}"
  register: symlink_stat
- name: Remove old file
  ansible.builtin.file:
    path: "{{ symlink_stat.stat.lnk_target }}"
    state: absent
  when: symlink_stat.stat.exists and symlink_stat.stat.islnk and symlink_stat.stat.lnk_target | basename != stage_file | basename
- name: Create new symlink
  ansible.builtin.file:
    path: "{{ target }}/{{ symlink }}"
    src: "{{ stage_file | basename }}"
    state: link
...

The latest nightly is listed in a text file, which can contain several stage files to download, so I wrapped this task file with another task file, tasks/download-stage.yml, that loops it to fetch each one in the file on Gentoo’s mirror:

---
- name: Fetch text file with location for {{ target_stage }}
  ansible.builtin.uri:
    url: "{{ base_url }}latest-{{ target_stage }}.txt"
    return_content: yes
  register: target_info
- name: Download the files in the index text file
  ansible.builtin.include_tasks: download-stage-file.yml
  vars:
    stage_file: "{{ target_line | split(' ') | first }}"
    symlink: latest-{{ target_stage }}.{{ target_line | split(' ') | first | split('.') | last }}
  when: target_line
  loop: "{{ target_info.content | split('\n') | map('regex_search', '^[^#].*$') }}"
  loop_control:
    loop_var: target_line
...

I needed to create a Dockerfile, files/Dockerfile, for my docker image to download the files - using the official Gentoo stage3 docker image, I just need to copy in some settings to make sure emerge correctly resolves all of the dependencies when fetching the sources:

FROM gentoo/stage3
ARG profile

# Copy our make.conf in, so emerge is using the correct useflags and architecture settings
COPY make.conf /etc/portage/make.conf
COPY package.accept_keywords/ /etc/portage/package.accept_keywords/
COPY package.mask/ /etc/portage/package.mask/
COPY package.use/ /etc/portage/package.use/
RUN ln -sf /var/db/repos/gentoo/profiles/$profile /etc/portage/make.profile

Formake.conf, a template in templates/make.conf generates if from the role arguments:

{% for (key, values) in config['make.conf-vars'].items() %}
{{ key }}="{{ ' '.join(values) }}"
{% endfor %}

Finally, the role’s main task file, tasks/main.yml, pulls together installing docker, fetching the stage files then creates the docker image and uses it to download the bootstrap and each package file.

I tagged emerge --sync separately because Gentoo sync netiquette dictates not syncing more than once per day, so this makes it easy to exclude with anbile-playbook’s --skip-tags emerge-sync - as a future improvement I should skip this based on if a file timestamp has been updated in the last 24 hours.

All of my Docker commands run with become: yes to run as the root user - because giving a user access to a Docker daemon running as root is the same as giving root access directly (as the Docker documentation puts it: “The docker group grants privileges equivalent to the root user.”), so I only allow root to interact with docker.

---
- name: Fetch stage tarballs
  ansible.builtin.include_tasks: download-stage.yml
  vars:
    base_url: http://www.mirrorservice.org/sites/distfiles.gentoo.org/releases/amd64/autobuilds/
  loop:
    - install-amd64-minimal
    - stage3-amd64-openrc
    - stage3-amd64-systemd
    - stage3-amd64-nomultilib-openrc
    - stage3-amd64-nomultilib-systemd
  loop_control:
    loop_var: target_stage
- name: Setup Docker
  ansible.builtin.include_tasks: setup-docker.yml
- name: Make temporary directory for Dockerfile (and associated gubbins)
  ansible.builtin.tempfile:
    state: directory
  register: dockerfile_tmp
- name: Copy Dockerfile to temporary path
  ansible.posix.copy:
    src: Dockerfile
    dest: "{{ dockerfile_tmp.path }}/Dockerfile"
- name: Ensure necessary directories exist (even if empty) for Docker image creation
  ansible.builtin.file:
    path: "{{ dockerfile_tmp.path }}/{{ item }}"
    state: directory
  loop:
    - package.accept_keywords
    - package.mask
    - package.use
- name: Create make.conf
  ansible.builtin.template:
    dest: "{{ dockerfile_tmp.path }}/make.conf"
    src: make.conf.j2
- name: Create package.use files
  ansible.builtin.copy:
    dest: "{{ dockerfile_tmp.path }}/package.use/{{ item.key }}"
    content: "{{ item.value }}"
  loop: "{{ config.package.use | default({}) | dict2items }}"
- name: Create package.mask files
  ansible.builtin.copy:
    dest: "{{ dockerfile_tmp.path }}/package.mask/{{ item.key }}"
    content: "{{ item.value }}"
  loop: "{{ config.package.mask | default({}) | dict2items }}"
- name: Create package.accept_keywords files
  ansible.builtin.copy:
    dest: "{{ dockerfile_tmp.path }}/package.accept_keywords/{{ item.key }}"
    content: "{{ item.value }}"
  loop: "{{ config.package.accept_keywords | default({}) | dict2items }}"
- name : Build the image
  become: yes
  community.docker.docker_image:
    build:
      args:
        profile: "{{ config.profile }}"
      path: "{{ dockerfile_tmp.path }}"
    name: gentoo-emerge
    source: build
    force_source: yes # Always rebuild
- name: Remove temporary path
  ansible.builtin.file:
    path: "{{ dockerfile_tmp.path }}"
    state: absent
- name: Make target portage database directory
  ansible.builtin.file:
    path: "{{ target }}/portage-db"
    state: directory
- name: Make target distfiles directory
  ansible.builtin.file:
    path: "{{ target }}/distfiles"
    state: directory
- name: Do emerge sync
  become: yes
  community.docker.docker_container:
    name: do_emerge_sync
    container_default_behavior: no_defaults  # Stop warning
    cleanup: yes
    detach: no
    command: emerge --sync
    image: gentoo-emerge
    mounts:
      - source: "{{ target }}/portage-db"
        target: /var/db/repos
        type: bind
      - source: "{{ target }}/distfiles"
        target: /var/cache/distfiles
        type: bind
  tags: emerge-sync
- name: Download bootstrap packages
  become: yes
  community.docker.docker_container:
    name: do_emerge_sync
    container_default_behavior: no_defaults  # Stop warning
    cleanup: yes
    detach: no
    command: /var/db/repos/gentoo/scripts/bootstrap.sh --fetchonly
    image: gentoo-emerge
    mounts:
      - source: "{{ target }}/portage-db"
        target: /var/db/repos
        type: bind
      - source: "{{ target }}/distfiles"
        target: /var/cache/distfiles
        type: bind
- name: Fetch sources
  become: yes
  community.docker.docker_container:
    name: do_emerge_sync
    container_default_behavior: no_defaults  # Stop warning
    cleanup: yes
    detach: no
    # Might be able to optimise (remove some of) these options to emerge.
    command: emerge --oneshot --fetchonly --newuse --emptytree {{ ' '.join(packages | default([])) }}
    image: gentoo-emerge
    mounts:
      - source: "{{ target }}/portage-db"
        target: /var/db/repos
        type: bind
      - source: "{{ target }}/distfiles"
        target: /var/cache/distfiles
        type: bind
...

It can then be used by adding the role to a list of roles:

    - role: gentoo-mirror
      tags: ['gentoo']
      target: "{{ mirror_base_path }}/gentoo"
      config:
          profile: default/linux/amd64/17.1/no-multilib/systemd
          make.conf-vars:
            USE:
              - X
              - alsa
#              - doc
            VIDEO_CARDS:
              - fbdev
              - vesa
              - intel
          package:
            use:
              00cpu-flags: '*/* CPU_FLAGS_X86: avx f10c mmx mmxext pclmul popcnt sse sse2 sse3 sse4_1 sse4_2 ssse3'
              sys-boot: 'sys-boot/grub:2 device_mapper'
      packages:
        - '@world' # All the stage3 pre-built base packages
        - app-editors/vim
        - app-portage/cpuid2cpuflags
        - dev-vcs/git
        - net-misc/rsync
        - sys-boot/grub
        - sys-kernel/dracut
        - sys-kernel/gentoo-sources
        - sys-fs/cryptsetup
        - sys-fs/e2fsprogs
        - sys-fs/lvm2

In the lab network, after enabling read-only NFS access to the share on the NAS hosting the mirrors and installing an nfs client, the distfiles can be mounted read-only (e.g. with mount -t nfs -o ro mirror:/mirrors/gentoo/distfiles /var/cache/shared_distfiles) and the path added to PORTAGE_RO_DISTDIRS in /etc/portage/make.conf. Gentoo’s portage will use the repository as a read-only source; if the required files are in the read-only directory, and not in the read-write one, Portage will symlink to them from the read-write directory.

Git repository mirrors

Creating read-only mirrors of Git repositories is really easy - as Git can work over http(s), all that is required is to expose a “bare” clone via a webserver.

To create the mirrors, the first step is a role to create bare clones. It’s meta/argument_specs.yml looks like this:

---
argument_specs:
  main:
    short_description: Mirror Git repository (as bare repo)
    options:
      target:
        type: str
        required: true
        description: Directory to mirror to
      repository:
        type: str
        required: true
        description: Repository to clone
...

Only two tasks are needed, firstly to clone the repository and a second one to run git update-server-info so that it can be served directly over “dumb” protocols (e.g. http(s)):

---
# Use 'target' in the name output as 'repository' may contain credentials.
- name: Clone respository {{ target }}
  ansible.builtin.git:
    bare: yes
    repo: '{{ repository }}'
    dest: '{{ target }}'
  register: git_clone_result
- name: Update server info (to allow serving using "dumb" protocols like http(s))
  ansible.builtin.command:
    cmd: git update-server-info
    chdir: '{{ target }}'
  # Only when revision changed
  when: git_clone_result.after != git_clone_result.before
...

I then created a meta-role, like for the various yum/debian repositories which use the reposync/debmirror roles underneath, to mirror all the repositories I need in my lab:

---
dependencies:
  - role: git-bare
    repository: https://github.com/mythic-beasts/dehydrated-mythic-dns01.git
    target: '{{ mirror_base_path }}/git/github.com/mythic-beasts/dehydrated-mythic-dns01.git/'
  - role: git-bare
    repository: https://github.com/loz-hurst/dehydrated-code-rack.git
    target: '{{ mirror_base_path }}/git/github.com/loz-hurst/dehydrated-code-rack.git/'
  - role: git-bare
    repository: https://github.com/loz-hurst/cron-wrapper.git
    target: '{{ mirror_base_path }}/git/github.com/loz-hurst/cron-wrapper.git/'
  - role: git-bare
    repository: https://{{ my_git_deploy_credentials }}@git.my.domain.tld/path/to/salt-home.git
    target: '{{ mirror_base_path }}/git/my.domain.tld/path/to/salt-home.git/'
  - role: git-bare
    repository: https://{{ my_git_deploy_credentials }}@my.domain.tld/path/to/debian-installer-preseed.git
    target: '{{ mirror_base_path }}/git/my.domain.tld/path/to/debian-installer-preseed.git/'
  - role: git-bare
    repository: https://{{ my_git_deploy_credentials }}@my.domain.tld/path/to/icinga2-zones-config.git
    target: '{{ mirror_base_path }}/git/my.domain.tld/path/to/icinga2-zones-config.git/'
  - role: git-bare
    repository: https://{{ my_git_deploy_credentials }}@my.domain.tld/path/to/blog.git
    target: '{{ mirror_base_path }}/git/my.domain.tld/path/to/blog.git/'
  - role: git-bare
    repository: https://{{ my_git_deploy_credentials }}@my.domain.tld/path/to/mailadm.git
    target: '{{ mirror_base_path }}/git/my.domain.tld/path/to/mailadm.git/'
  - role: git-bare
    repository: https://{{ my_git_deploy_credentials }}@my.domain.tld/path/to/py-budget.git
    target: '{{ mirror_base_path }}/git/my.domain.tld/path/to/py-budget.git/'
  - name: git-bare
    repository: https://{{ my_git_deploy_credentials }}@my.domain.tld/path/to/py-entek.git
    target: '{{ mirror_base_path }}/git/my.domain.tld/path/to/py-entek.git/'
  - name: git-bare
    repository: https://{{ my_git_deploy_credentials }}@my.domain.tld/path/to/hack-dilbert.git
    target: '{{ mirror_base_path }}/git/my.domain.tld/path/to/hack-dilbert.git/'
  - role: git-bare
    repository: https://{{ my_git_deploy_credentials }}@my.domain.tld/path/to/puppet.git
    target: '{{ mirror_base_path }}/git/my.domain.tld/path/to/puppet.git/'
...

In my site.yml file, I added the deployment credentials and the meta-role:

---
- hosts: localhost
  vars:
    #...
    my_git_deploy_credentials: 'user:password'
  #...
  roles:
    #...
    - { role: git-mirrors, tags: ['git'] }
...

Ansible site.yml

The final site.yml file looks like this:

---
- hosts: localhost
  vars:
    mirror_base_path: /mnt/mirrors
    my_git_deploy_credentials: 'user:password'
  pre_tasks:
    - name: Stat mirrors directory
      ansible.builtin.stat:
        path: '{{ mirror_base_path }}'
      register: mirror_path_stat
    - name: Fail if mirrors directory does not exist
      ansible.builtin.fail:
        msg: 'Please pre-create the mirror target directory (currently set as {{ mirror_base_path }}'
      when: not mirror_path_stat.stat.exists
    - name: Make sure keyrings directory exists
      ansible.builtin.file:
        path: '{{ mirror_base_path }}/keyrings'
        state: directory
  roles:
    - { role: debian-mirrors, tags: ['debian'] }
    - { role: proxmox-mirrors, tags: ['proxmox'] }
    - { role: gitlab-mirrors, tags: ['gitlab'] }
    - { role: centos-mirrors, tags: ['centos'] }
    - { role: rocky-mirrors, tags: ['rocky'] }
    - role: gentoo-mirror
      tags: ['gentoo']
      target: "{{ mirror_base_path }}/gentoo"
      config:
        profile: default/linux/amd64/17.1/no-multilib/systemd
        make.conf-vars:
          USE:
            - X
            - alsa
#            - doc
          VIDEO_CARDS:
            - fbdev
            - vesa
            - intel
        package:
          use:
            00cpu-flags: '*/* CPU_FLAGS_X86: avx f10c mmx mmxext pclmul popcnt sse sse2 sse3 sse4_1 sse4_2 ssse3'
            sys-boot: 'sys-boot/grub:2 device_mapper'
      packages:
        - '@world' # All the stage3 pre-built base packages
        - app-editors/vim
        - app-portage/cpuid2cpuflags
        - dev-vcs/git
        - net-misc/rsync
        - sys-boot/grub
        - sys-kernel/dracut
        - sys-kernel/gentoo-sources
        - sys-fs/cryptsetup
        - sys-fs/e2fsprogs
        - sys-fs/lvm2
    - { role: git-mirrors, tags: ['git'] }
...

Pre-requisites

These are listed in requirements.txt:

ansible
docker # For Gentoo mirror

Usage

I put some basic instructions into README.md:

# Ansible playbook to create mirrors

A collection of roles and a playbook that will create mirrors for various Linux distributions.

## Usage

Do all mirrors except Gentoo:

```bash
ansible-playbook site.yml --skip-tags gentoo
```

**N.B.** Gentoo mirrors use Docker to run emerge.  `ansible-playbook` can be run with `-K` to prompt for the sudo password but will hang if the sudo TOTP secret expires before completing the run.  To avoid this, authenticate with sudo (e.g. run `sudo -l`) before running `ansible-playbook` without `-K` and, due to the local connection, it will be able to use your session's active sudo authorisation (which is longer than the TOTP's validity) until it expires.

Do just Gentoo mirrors:

```bash
ansible-playbook site.yml -t gentoo
```

Do just Gentoo but skip `emerge --sync` (which Gentoo nettiquette suggests doing no more than once per day):

```bash
ansible-playbook site.yml -t gentoo --skip-tags gentoo
```

## Initial setup

Create a virtualenv:

```bash
virtualenv -p python3 /tmp/venv
```

Install ansible and requirements:

```bash
# Either
/tmp/venv/bin/activate
pip install -r requirements.txt
# or
/tmp/venv/bin/pip install -r requirements.txt
```

Either activate the virtualenv or run `/tmp/venv/bin/ansible-playbook` directory to use.