Improving mirror sync
The number of things being mirrored since I initially setup my mirrors in the home-lab network has grown and the current bash script has become a bit cumbersome to maintain. This post describes the process of replacing the current script and then extending it to mirror more things, including Gentoo and Git repositories - something that took nearly 4 weeks (bearing in mind this is a hobby-project, fitting it in around work and home life). Strap in, this is going to be a long post….
I had been planning to rewrite it to use either bash functions or a series of script to mirror specific repositories using generalised patterns for (e.g.) apt and yum repositories. When I came to write it, though, I thought it might be easier to use Ansible - which is rapidly becoming my “go to” tool for this sort of thing when it outgrows a simple bash script.
Replicating the current script
I have 2 types of repository, currently (apt and yum). Once I have improved the process, I will also be adding Gentoo and various git repository clones to the mirror.
The preamble
I began by replicating the start of my script, which ensures the mirror directory exists, aborting if not:
---
- hosts: localhost
vars:
mirror_base_path: /mnt/mirrors
pre_tasks:
- name: Stat mirrors directory
ansible.builtin.stat:
path: '{{ mirror_base_path }}'
register: mirror_path_stat
- name: Fail if mirrors directory does not exist
ansible.builtin.fail:
msg: 'Please pre-create the mirror target directory (currently set as {{ mirror_base_path }}'
when: not mirror_path_stat.stat.exists
...
Apt mirrors
I split this part in two - a role called debmirror
that actually does the mirroring then separate roles for each collection of software that I wanted to mirror which use the debmirror
role with different arguments to perform the mirroring.
The debmirror role
The first step is to define all of the arguments required for debmirror
in the role’s meta/argument_specs.yml
file:
---
argument_specs:
main:
short_description: Main entry point for mirroring a repository with debmirror
options:
target:
type: str
required: true
description: Directory to mirror to
keyring_directory:
type: str
required: true
description: Directory to download keys and store keyring in (must be outside target or debmirror will delete it)
source:
description: Where to mirror from
options:
host:
type: str
default: ftp.debian.org
description: Hostname to mirror from
root:
type: str
default: debian
description: Location on the remote host that is the root of the Debian archive
keys:
type: list
elements: dict
options:
name:
type: str
required: true
description: Name of key (will be used for filename)
url:
type: str
requires: true
description: Where to fetch the key from
checksum:
type: str
description: Optional checksum for the downloaded key
fingerprint:
type: str
description: Optional fingerprint of the downloaded key
method:
type: str
default: rsync
choices:
- ftp
- http
- https
- rsync
description: Method to use to download (rsync/http)
suites:
type: list
elements: str
required: true
description: The list of suites to mirror
components:
type: list
elements: str
default: ['main']
description: The list of components to mirror
architectures:
type: list
elements: str
default: ['amd64']
description: The list of architectures to mirror
rsync_extras:
type: list
elements: str
default: ['trace']
choices:
- doc
- indices
- tools
- trace
- none
description: Additional files to mirror (with rsync) that are not part of the archive itself
...
I set (what I consider) sensible default values, mostly matching the debmirror
command’s defaults but a few set to my preference in the role’s defaults/main.yml
file:
---
method: rsync
# Match debmirror's defaults for source, components and rsync_extras
source:
host: ftp.debian.org
root: debian
components:
- main
- contrib
- non-free
- main/debian-installer
rsync_extras:
- trace
# Almost all my systems are x86_64
architectures:
- amd64
...
As a repository might have multiple keys (e.g. containing packages for different distribution versions), I separated the key fetching tasks into their own file (tasks/fetch-key.yml
) to make looping a bit neater:
---
- name: Download key {{ key.name }}
ansible.builtin.get_url:
dest: "{{ keyring_directory }}/keys.d/{{ key.name }}.key"
url: "{{ key.url }}"
checksum: "{{ key.checksum| default('') }}"
- name: Validate {{ key.name }} fingerprint
block:
- name: Get fingerprint from file
ansible.builtin.shell:
cmd: "gpg --show-keys --with-fingerprint --with-colons {{ keyring_directory }}/keys.d/{{ key.name }}.key | awk -F: '$1 == \"fpr\" { print $10 }'"
register: fingerprint
- name: Check fingerprint matches
ansible.builtin.fail:
msg: "File fingerprint ({{ fingerprint.stdout }}) does not match specified value ({{ key.fingerprint }})"
when: key.fingerprint.replace(' ', '') not in fingerprint.stdout.split('\n')
when: key.fingerprint is defined
...
And finally, the main part of the role - it’s tasks/main.yml
file:
---
- name: Make target directory
ansible.builtin.file:
path: "{{ target }}"
state: directory
- name: Make keys directory
ansible.builtin.file:
path: "{{ keyring_directory }}/keys.d"
state: directory
when: keys is defined
- name: Download keys for keyring
include_tasks: fetch-key.yml
loop: "{{ keys | default([]) }}"
loop_control:
loop_var: key
- name: Delete any extra keys
ansible.builtin.file:
path: "{{ item }}"
state: absent
with_fileglob: "{{ keyring_directory }}/keys.d/*.key"
when: item.split('/')[-1].split('.')[0] not in keys | map(attribute='name') | list
- name: Delete old keyring
ansible.builtin.file:
path: "{{ keyring_directory }}/keyring.gpg"
state: absent
- name: Make keyring
ansible.builtin.shell: gpg --no-default-keyring --keyring {{ keyring_directory }}/keyring.gpg --import {{ item }}
with_fileglob: "{{ keyring_directory }}/keys.d/*.key"
- name: Run debmirror
ansible.builtin.command:
argv:
- debmirror
- --verbose
- --precleanup # Remove old files before download - makes mirror inconsistent until debmirror finishes but copes better with limited space
- --dist={{ ','.join(suites) }}
- --section={{ ','.join(components) }}
- --arch={{ ','.join(architectures) }}
- --method={{ method }}
- --rsync-extra={{ ','.join(rsync_extras) }}
- --host={{ source.host }}
- --root={{ source.root }}
- --keyring={{ keyring_directory }}/keyring.gpg
- "{{ target }}"
...
Using the debmirror role
To use the debmirror
role, I added new roles which list it as a dependency with the appropriate arguments for the mirror(s) they are creating. The core Debian mirrors are in the role debian-mirrors
whose only file is meta/main.yml
(the variable mirror_base_path
is set at the play level, but could be easily specified in inventory or command line):
---
dependencies:
- role: debmirror
source:
host: rsync.mirrorservice.org
root: ftp.debian.org/debian
target: '{{ mirror_base_path }}/debian'
keyring_directory: '{{ mirror_base_path }}/keyrings/debian'
suites:
- buster
- buster-updates
- bullseye
- bullseye-updates
keys:
- name: debian-10-buster
url: https://ftp-master.debian.org/keys/archive-key-10.asc
fingerprint: "80D1 5823 B7FD 1561 F9F7 BCDD DC30 D7C2 3CBB ABEE"
- name: debian-11-bullseye
url: https://ftp-master.debian.org/keys/archive-key-11.asc
fingerprint: "1F89 983E 0081 FDE0 18F3 CC96 73A4 F27B 8DD4 7936"
- role: debmirror
method: http
source:
host: security.debian.org
root: debian-security
target: '{{ mirror_base_path }}/debian-security'
keyring_directory: '{{ mirror_base_path }}/keyrings/debian-security'
suites:
- buster/updates
- bullseye-security
keys:
- name: debian-10-buster-security
url: https://ftp-master.debian.org/keys/archive-key-10-security.asc
fingerprint: "5E61 B217 265D A980 7A23 C5FF 4DFA B270 CAA9 6DFA"
- name: debian-11-bullseye-security
url: https://ftp-master.debian.org/keys/archive-key-11-security.asc
fingerprint: "AC53 0D52 0F2F 3269 F5E9 8313 A484 4904 4AAD 5C5D"
...
This can be duplicated for other mirrors, for example for GitLab:
---
dependencies:
- role: debmirror
method: http
rsync_extras: ['none']
source:
host: packages.gitlab.com
root: gitlab/gitlab-ee/debian
target: '{{ mirror_base_path }}/gitlab-ee-debian'
keyring_directory: '{{ mirror_base_path }}/keyrings/gitlab-ee-debian'
suites:
- bullseye
components:
- main
keys:
# GitLab's own script (https://packages.gitlab.com/install/repositories/gitlab/gitlab-ee/script.deb.sh)
# doesn't verify the key, so I haven't either but not sure that is a good
# idea!
- name: gitlab-keyring
url: https://packages.gitlab.com/gitlab/gitlab-ee/gpgkey
...
and Proxmox:
---
dependencies:
- role: debmirror
method: http
rsync_extras: ['none']
source:
host: download.proxmox.com
root: debian/pve
target: '{{ mirror_base_path }}/pve-no-subscription'
keyring_directory: '{{ mirror_base_path }}/keyrings/pve-no-subscription'
suites:
- bullseye
components:
- pve-no-subscription
keys:
- name: proxmox-keyring
url: https://enterprise.proxmox.com/debian/proxmox-release-bullseye.gpg
checksum: 'sha512:7fb03ec8a1675723d2853b84aa4fdb49a46a3bb72b9951361488bfd19b29aab0a789a4f8c7406e71a69aabbc727c936d3549731c4659ffa1a08f44db8fdcebfa'
- role: debmirror
method: http
rsync_extras: ['none']
source:
host: download.proxmox.com
root: debian/ceph-pacific
target: '{{ mirror_base_path }}/ceph-pacific'
keyring_directory: '{{ mirror_base_path }}/keyrings/ceph-pacific'
suites:
- bullseye
components:
- main
keys:
- name: proxmox-keyring
url: https://enterprise.proxmox.com/debian/proxmox-release-bullseye.gpg
checksum: 'sha512:7fb03ec8a1675723d2853b84aa4fdb49a46a3bb72b9951361488bfd19b29aab0a789a4f8c7406e71a69aabbc727c936d3549731c4659ffa1a08f44db8fdcebfa'
...
RPM mirrors
This follows a similar pattern to the APT mirrors, creating a reposync
role for repositories synced using this method.
The reposync role
Beginning again with the arguments and defaults files. Some of the defaults, as each reposync
command can be used with multiple repositories so these are provided as a list of dictionaries, are set in the template yum configuration (see further below) rather than the role’s defaults file.
meta/argument_specs.yml
:
---
argument_specs:
main:
short_description: Main entry point for mirroring a repository with reposync
options:
target:
type: str
required: true
description: Directory to mirror to (note each repository will be created in a subdirectory matching the repository name by reposync)
yum_conf:
type: str
required: true
description: Path of yum configuration file (does not need to be called yum.conf) for this repository (will be crated/updated before sync)
delete:
type: bool
default: true
description: Delete local files no longer present in remote repository
newest_only:
type: bool
default: true
description: Only fetch the latest version if mulitple versions of a package are available
architecture:
type: str
default: x86_64
description: Act as if running the specified arch
repos:
type: list
elements: dict
description: List of repositories to include in this mirror
required_one_of:
- ['baseurl', 'mirrorlist']
mutually_exclusive:
- ['baseurl', 'mirrorlist']
options:
name:
type: str
required: true
description: Name of repository (will be used for target directory by reposync)
baseurl:
type: str
description: Base url for the repository
mirrorlist:
type: str
description: Mirror list url for the repository
description:
type: str
description: Description of the repository
repo_gpgcheck:
type: bool
default: true
description: Whether a GPG signature check should be done on the repodata
gpgcheck:
type: bool
default: true
description: Whether a GPG signature check should be done on the packages
gpgkey:
type: str
description: URL for the ASCII-armoured GPG key for the repository
sslverify:
type: bool
default: true
description: Should SSL certificates be verified
metadata_expire:
type: int
description: Frequency (in seconds) with which yum metadata will expire
...
defaults/main.yml
:
---
delete: true
newest_only: true
architecture: x86_64
...
Like with the key management for APT, I separated out the createrepo
command into its own task file (tasks/createrepo.yml
) to make looping easier:
---
- name: Check if comps.xml exists
ansible.builtin.stat:
path: '{{ repo }}/comps.xml'
register: comps_xml_stat
- name: Run createrepo
ansible.builtin.command:
cmd: >
createrepo
--update
{{ "-g" if comps_xml_stat.stat.exists else "" }}
{{ repo + "/comps.xml" if comps_xml_stat.stat.exists else ""}}
{{ repo }}
...
As reposync
uses a yum configuration file, I created a template for that from the repos passed to the role in templates/yum.conf.j2
:
{% for repo in repos %}
[{{ repo.name }}]
name={{ repo.description }}
{# All repos for mirroring need to be enabled #}
enabled=1
{% if repo.baseurl | default(false) %}
baseurl={{ repo.baseurl }}
{% else %}
mirrorlist={{ repo.mirrorlist }}
{% endif %}
gpgcheck={{ '1' if repo.gpgcheck | default(true) else '0' }}
gpgkey={{ repo.gpgkey }}
repo_gpgcheck={{ '1' if repo.repo_gpgcheck | default(true) else '0' }}
sslverify={{ '1' if repo.sslverify | default(true) else '0' }}
{% if repo.metadata_expire is defined %}
metadata_expire={{ repo.metadata_expire }}
{% endif %}
{% endfor %}
And finally the main task file in tasks/main.yml
, which does the mirroring:
---
- name: Make sure yum config directory exists
ansible.builtin.file:
path: '{{ yum_conf | dirname }}'
state: directory
- name: Create yum configuration file
ansible.builtin.template:
src: yum.conf.j2
dest: '{{ yum_conf }}'
- name: Run reposync
ansible.builtin.command:
# My old sync script had --downloadcomps and --download-metadata
# hard-coded - not making them variables until I need to turn them
# off.
cmd: >
reposync
--arch={{ architecture }}
--config={{ yum_conf }}
--download_path={{ target }}
{{ "--delete" if delete else "" }}
{{ "--newest-only" if newest_only else "" }}
--downloadcomps
--download-metadata
- name: Run create repo on each new repository
include_tasks: createrepo.yml
vars:
repo: '{{ target }}/{{ item.name }}'
loop: '{{ repos }}'
...
Using the reposync role
Role for CentOS 6 and 7 mirrors (called centos-mirrors
):
---
dependencies:
- role: reposync
target: '{{ mirror_base_path }}'
yum_conf: '{{ mirror_base_path }}/yum-configs/centos.yum.conf'
repos:
- name: centos-6.10-os
description: CentOS 6.10 OS
baseurl: https://vault.centos.org/6.10/os/x86_64/
gpgkey: https://www.centos.org/keys/RPM-GPG-KEY-CentOS-6
- name: centos-6.10-updates
description: CentOS 6.10 updates
baseurl: https://vault.centos.org/6.10/updates/x86_64/
gpgkey: https://www.centos.org/keys/RPM-GPG-KEY-CentOS-6
- name: centos-6.10-extras
description: CentOS 6.10 extras
baseurl: https://vault.centos.org/6.10/extras/x86_64/
gpgkey: https://www.centos.org/keys/RPM-GPG-KEY-CentOS-6
- name: centos-7-os
description: CentOS 7 OS
baseurl: http://mirror.centos.org/centos/7/os/x86_64/
gpgkey: https://www.centos.org/keys/RPM-GPG-KEY-CentOS-7
- name: centos-7-updates
description: CentOS 7 updates
baseurl: http://mirror.centos.org/centos/7/updates/x86_64/
gpgkey: https://www.centos.org/keys/RPM-GPG-KEY-CentOS-7
- name: centos-7-extras
description: CentOS 7 extras
baseurl: http://mirror.centos.org/centos/7/extras/x86_64/
gpgkey: https://www.centos.org/keys/RPM-GPG-KEY-CentOS-7
...
And for Rocky 8 mirrors (called rocky-mirrors
):
---
dependencies:
- role: reposync
target: '{{ mirror_base_path }}'
yum_conf: '{{ mirror_base_path }}/yum-configs/rocky.yum.conf'
repos:
- name: rocky-8-baseos
description: Rocky Linux 8 BaseOS
baseurl: http://rockylinux.mirrorservice.org/8/BaseOS/x86_64/os/
gpgkey: https://dl.rockylinux.org/pub/rocky/RPM-GPG-KEY-rockyofficial
- name: rocky-8-appstream
description: Rocky Linux 8 AppStream
baseurl: http://rockylinux.mirrorservice.org/8/AppStream/x86_64/os/
gpgkey: https://dl.rockylinux.org/pub/rocky/RPM-GPG-KEY-rockyofficial
- name: rocky-8-extras
description: Rocky Linux 8 extras
baseurl: http://rockylinux.mirrorservice.org/8/extras/x86_64/os/
gpgkey: https://dl.rockylinux.org/pub/rocky/RPM-GPG-KEY-rockyofficial
...
For GitLab, I just added a reposync entry to the existing dependencies for the gitlab-mirrors
role:
---
dependencies:
- role: debmirror
method: http
rsync_extras: ['none']
source:
host: packages.gitlab.com
root: gitlab/gitlab-ee/debian
target: '{{ mirror_base_path }}/gitlab-ee-debian'
keyring_directory: '{{ mirror_base_path }}/keyrings/gitlab-ee-debian'
suites:
- bullseye
components:
- main
keys:
# GitLab's own script (https://packages.gitlab.com/install/repositories/gitlab/gitlab-ee/script.deb.sh)
# doesn't verify the key, so I haven't either but not sure that is a good
# idea!
- name: gitlab-keyring
url: https://packages.gitlab.com/gitlab/gitlab-ee/gpgkey
- role: reposync
target: '{{ mirror_base_path }}'
yum_conf: '{{ mirror_base_path }}/yum-configs/gitlab-ee.yum.conf'
repos:
- name: gitlab_gitlab-ee-el_7
description: gitlab_gitlab-ee-el_7
baseurl: https://packages.gitlab.com/gitlab/gitlab-ee/el/7/x86_64
gpgkey: |-
https://packages.gitlab.com/gitlab/gitlab-ee/gpgkey
https://packages.gitlab.com/gitlab/gitlab-ee/gpgkey/gitlab-gitlab-ee-3D645A26AB9FBD22.pub.gpg
metadata_expire: 300 # Not sure if this is needed - was in old sync script
# Disable these per https://gitlab.com/gitlab-org/omnibus-gitlab/-/issues/5226
repo_gpgcheck: no
gpgcheck: no
sslverify: no
...
New functionality
My old script was able to mirror some apt and yum repositories, but now I come to extending the new version to create some mirrors that were not in the original script.
Gentoo mirrors
My original thought was to mirror Gentoo, like I have done for Debian and CentOS/Rock above. However, the full size of the Gentoo Source mirror is over 550GiB, which pushes the balance away from convenience of having all packages immediately available to install and towards selectively mirror at a wanted-package level (being a source distribution, the notion of mirroring at an architecture or version level is largely meaningless).
In order to do this, I need to adopt an approach similar to one I previously used with EasyBuild - fetching in an internet connected environment then teleporting them (in this case on my USB mirror update drive) into the isolated environment to do the building. The next complication is the need to use the emerge
command to do the download - without an existing Gentoo install to hand this required a little thinking to get around. I decided to use the official Gentoo docker image to do the downloading, as the system I am using to do the mirroring happened to already have Docker installed.
In order for Ansible to talk to Docker, I have to install the full ansible
package (I had been using ansible-core
up until this point) and docker
python package.
The Gentoo mirroring is very specific to the distribution, so a I created a single gentoo-mirror
role for it. Its arguments are simply a target directory to place the mirrored files and a list of packages to download sources for. I made it download all of the sources for the bootstrap script as well.
meta/argument_specs.yml
:
---
argument_specs:
main:
short_description: Mirror Gentoo portage database and wanted distfile sources
options:
target:
type: str
required: true
description: Directory to mirror to
config:
type: dict
required: true
description: Configuration information for portage environment
options:
profile:
type: str
required: true
description: Name of the portage profile (from /var/db/repos/gentoo/profiles)
make.conf-vars:
type: dict
required: true
description: Dictionary of keys -> list of values (values will be concatenated with spaces separating them)
package:
type: dict
required: false
description: Dictionary of file contents for each of package.use, package.mask and package.accept_keywords directories
options:
use:
type: dict
description: Dictionary of filenames to (string) content for package.use directory
mask:
type: dict
description: Dictionary of filenames to (string) content for package.mask directory
accept_keywords:
type: dict
description: Dictionary of filenames to (string) content for package.accept_keywords directory
packages:
type: list
required: true
description: Packages to download sources for (in addition to bootstrap)
elements: str
...
I setup a task file that would install Docker, tasks/setup-docker.yml
, making sure it matched my current configuration management (SaltStack)’s configuration exactly to stop them trying to stamp of each other’s changes:
---
- name: Ensure Docker repository is available
become: yes
ansible.builtin.apt_repository:
# Match what is currently in SaltStack (repo and filename) so they
# do not end up fighting.
repo: deb https://download.docker.com/linux/debian {{ ansible_distribution_release }} stable
filename: docker
state: present
- name: Ensure Docker is installed
become: yes
ansible.builtin.package:
name:
- docker-ce
- docker-ce-cli
- containerd.io
- python3-docker # So Ansible can manage docker
state: present
...
In order to install Gentoo, I need a “stage” file as well as the packages. These are built nightly, so I created a task file, tasks/download-stage-file.yml
, that will download one and create/update a symlink to point to it deleting the file it used to point to if the target changes:
---
- name: Fetch file {{ stage_file }}
ansible.builtin.get_url:
url: "{{ base_url }}{{ stage_file }}"
dest: "{{ target }}/{{ stage_file | basename }}"
- name: Stat symlink
ansible.builtin.stat:
path: "{{ target }}/{{ symlink }}"
register: symlink_stat
- name: Remove old file
ansible.builtin.file:
path: "{{ symlink_stat.stat.lnk_target }}"
state: absent
when: symlink_stat.stat.exists and symlink_stat.stat.islnk and symlink_stat.stat.lnk_target | basename != stage_file | basename
- name: Create new symlink
ansible.builtin.file:
path: "{{ target }}/{{ symlink }}"
src: "{{ stage_file | basename }}"
state: link
...
The latest nightly is listed in a text file, which can contain several stage files to download, so I wrapped this task file with another task file, tasks/download-stage.yml
, that loops it to fetch each one in the file on Gentoo’s mirror:
---
- name: Fetch text file with location for {{ target_stage }}
ansible.builtin.uri:
url: "{{ base_url }}latest-{{ target_stage }}.txt"
return_content: yes
register: target_info
- name: Download the files in the index text file
ansible.builtin.include_tasks: download-stage-file.yml
vars:
stage_file: "{{ target_line | split(' ') | first }}"
symlink: latest-{{ target_stage }}.{{ target_line | split(' ') | first | split('.') | last }}
when: target_line
loop: "{{ target_info.content | split('\n') | map('regex_search', '^[^#].*$') }}"
loop_control:
loop_var: target_line
...
I needed to create a Dockerfile, files/Dockerfile
, for my docker image to download the files - using the official Gentoo stage3 docker image, I just need to copy in some settings to make sure emerge correctly resolves all of the dependencies when fetching the sources:
FROM gentoo/stage3
ARG profile
# Copy our make.conf in, so emerge is using the correct useflags and architecture settings
COPY make.conf /etc/portage/make.conf
COPY package.accept_keywords/ /etc/portage/package.accept_keywords/
COPY package.mask/ /etc/portage/package.mask/
COPY package.use/ /etc/portage/package.use/
RUN ln -sf /var/db/repos/gentoo/profiles/$profile /etc/portage/make.profile
Formake.conf
, a template in templates/make.conf
generates if from the role arguments:
{% for (key, values) in config['make.conf-vars'].items() %}
{{ key }}="{{ ' '.join(values) }}"
{% endfor %}
Finally, the role’s main task file, tasks/main.yml
, pulls together installing docker, fetching the stage files then creates the docker image and uses it to download the bootstrap and each package file.
I tagged emerge --sync
separately because Gentoo sync netiquette dictates not syncing more than once per day, so this makes it easy to exclude with anbile-playbook
’s --skip-tags emerge-sync
- as a future improvement I should skip this based on if a file timestamp has been updated in the last 24 hours.
All of my Docker commands run with become: yes
to run as the root
user - because giving a user access to a Docker daemon running as root
is the same as giving root access directly (as the Docker documentation puts it: “The docker
group grants privileges equivalent to the root
user.”), so I only allow root
to interact with docker.
---
- name: Fetch stage tarballs
ansible.builtin.include_tasks: download-stage.yml
vars:
base_url: http://www.mirrorservice.org/sites/distfiles.gentoo.org/releases/amd64/autobuilds/
loop:
- install-amd64-minimal
- stage3-amd64-openrc
- stage3-amd64-systemd
- stage3-amd64-nomultilib-openrc
- stage3-amd64-nomultilib-systemd
loop_control:
loop_var: target_stage
- name: Setup Docker
ansible.builtin.include_tasks: setup-docker.yml
- name: Make temporary directory for Dockerfile (and associated gubbins)
ansible.builtin.tempfile:
state: directory
register: dockerfile_tmp
- name: Copy Dockerfile to temporary path
ansible.posix.copy:
src: Dockerfile
dest: "{{ dockerfile_tmp.path }}/Dockerfile"
- name: Ensure necessary directories exist (even if empty) for Docker image creation
ansible.builtin.file:
path: "{{ dockerfile_tmp.path }}/{{ item }}"
state: directory
loop:
- package.accept_keywords
- package.mask
- package.use
- name: Create make.conf
ansible.builtin.template:
dest: "{{ dockerfile_tmp.path }}/make.conf"
src: make.conf.j2
- name: Create package.use files
ansible.builtin.copy:
dest: "{{ dockerfile_tmp.path }}/package.use/{{ item.key }}"
content: "{{ item.value }}"
loop: "{{ config.package.use | default({}) | dict2items }}"
- name: Create package.mask files
ansible.builtin.copy:
dest: "{{ dockerfile_tmp.path }}/package.mask/{{ item.key }}"
content: "{{ item.value }}"
loop: "{{ config.package.mask | default({}) | dict2items }}"
- name: Create package.accept_keywords files
ansible.builtin.copy:
dest: "{{ dockerfile_tmp.path }}/package.accept_keywords/{{ item.key }}"
content: "{{ item.value }}"
loop: "{{ config.package.accept_keywords | default({}) | dict2items }}"
- name : Build the image
become: yes
community.docker.docker_image:
build:
args:
profile: "{{ config.profile }}"
path: "{{ dockerfile_tmp.path }}"
name: gentoo-emerge
source: build
force_source: yes # Always rebuild
- name: Remove temporary path
ansible.builtin.file:
path: "{{ dockerfile_tmp.path }}"
state: absent
- name: Make target portage database directory
ansible.builtin.file:
path: "{{ target }}/portage-db"
state: directory
- name: Make target distfiles directory
ansible.builtin.file:
path: "{{ target }}/distfiles"
state: directory
- name: Do emerge sync
become: yes
community.docker.docker_container:
name: do_emerge_sync
container_default_behavior: no_defaults # Stop warning
cleanup: yes
detach: no
command: emerge --sync
image: gentoo-emerge
mounts:
- source: "{{ target }}/portage-db"
target: /var/db/repos
type: bind
- source: "{{ target }}/distfiles"
target: /var/cache/distfiles
type: bind
tags: emerge-sync
- name: Download bootstrap packages
become: yes
community.docker.docker_container:
name: do_emerge_sync
container_default_behavior: no_defaults # Stop warning
cleanup: yes
detach: no
command: /var/db/repos/gentoo/scripts/bootstrap.sh --fetchonly
image: gentoo-emerge
mounts:
- source: "{{ target }}/portage-db"
target: /var/db/repos
type: bind
- source: "{{ target }}/distfiles"
target: /var/cache/distfiles
type: bind
- name: Fetch sources
become: yes
community.docker.docker_container:
name: do_emerge_sync
container_default_behavior: no_defaults # Stop warning
cleanup: yes
detach: no
# Might be able to optimise (remove some of) these options to emerge.
command: emerge --oneshot --fetchonly --newuse --emptytree {{ ' '.join(packages | default([])) }}
image: gentoo-emerge
mounts:
- source: "{{ target }}/portage-db"
target: /var/db/repos
type: bind
- source: "{{ target }}/distfiles"
target: /var/cache/distfiles
type: bind
...
It can then be used by adding the role to a list of roles:
- role: gentoo-mirror
tags: ['gentoo']
target: "{{ mirror_base_path }}/gentoo"
config:
profile: default/linux/amd64/17.1/no-multilib/systemd
make.conf-vars:
USE:
- X
- alsa
# - doc
VIDEO_CARDS:
- fbdev
- vesa
- intel
package:
use:
00cpu-flags: '*/* CPU_FLAGS_X86: avx f10c mmx mmxext pclmul popcnt sse sse2 sse3 sse4_1 sse4_2 ssse3'
sys-boot: 'sys-boot/grub:2 device_mapper'
packages:
- '@world' # All the stage3 pre-built base packages
- app-editors/vim
- app-portage/cpuid2cpuflags
- dev-vcs/git
- net-misc/rsync
- sys-boot/grub
- sys-kernel/dracut
- sys-kernel/gentoo-sources
- sys-fs/cryptsetup
- sys-fs/e2fsprogs
- sys-fs/lvm2
In the lab network, after enabling read-only NFS access to the share on the NAS hosting the mirrors and installing an nfs client, the distfiles can be mounted read-only (e.g. with mount -t nfs -o ro mirror:/mirrors/gentoo/distfiles /var/cache/shared_distfiles
) and the path added to PORTAGE_RO_DISTDIRS
in /etc/portage/make.conf
. Gentoo’s portage will use the repository as a read-only source; if the required files are in the read-only directory, and not in the read-write one, Portage will symlink to them from the read-write directory.
Git repository mirrors
Creating read-only mirrors of Git repositories is really easy - as Git can work over http(s), all that is required is to expose a “bare” clone via a webserver.
To create the mirrors, the first step is a role to create bare clones. It’s meta/argument_specs.yml
looks like this:
---
argument_specs:
main:
short_description: Mirror Git repository (as bare repo)
options:
target:
type: str
required: true
description: Directory to mirror to
repository:
type: str
required: true
description: Repository to clone
...
Only two tasks are needed, firstly to clone the repository and a second one to run git update-server-info
so that it can be served directly over “dumb” protocols (e.g. http(s)):
---
# Use 'target' in the name output as 'repository' may contain credentials.
- name: Clone respository {{ target }}
ansible.builtin.git:
bare: yes
repo: '{{ repository }}'
dest: '{{ target }}'
register: git_clone_result
- name: Update server info (to allow serving using "dumb" protocols like http(s))
ansible.builtin.command:
cmd: git update-server-info
chdir: '{{ target }}'
# Only when revision changed
when: git_clone_result.after != git_clone_result.before
...
I then created a meta-role, like for the various yum/debian repositories which use the reposync/debmirror roles underneath, to mirror all the repositories I need in my lab:
---
dependencies:
- role: git-bare
repository: https://github.com/mythic-beasts/dehydrated-mythic-dns01.git
target: '{{ mirror_base_path }}/git/github.com/mythic-beasts/dehydrated-mythic-dns01.git/'
- role: git-bare
repository: https://github.com/loz-hurst/dehydrated-code-rack.git
target: '{{ mirror_base_path }}/git/github.com/loz-hurst/dehydrated-code-rack.git/'
- role: git-bare
repository: https://github.com/loz-hurst/cron-wrapper.git
target: '{{ mirror_base_path }}/git/github.com/loz-hurst/cron-wrapper.git/'
- role: git-bare
repository: https://{{ my_git_deploy_credentials }}@git.my.domain.tld/path/to/salt-home.git
target: '{{ mirror_base_path }}/git/my.domain.tld/path/to/salt-home.git/'
- role: git-bare
repository: https://{{ my_git_deploy_credentials }}@my.domain.tld/path/to/debian-installer-preseed.git
target: '{{ mirror_base_path }}/git/my.domain.tld/path/to/debian-installer-preseed.git/'
- role: git-bare
repository: https://{{ my_git_deploy_credentials }}@my.domain.tld/path/to/icinga2-zones-config.git
target: '{{ mirror_base_path }}/git/my.domain.tld/path/to/icinga2-zones-config.git/'
- role: git-bare
repository: https://{{ my_git_deploy_credentials }}@my.domain.tld/path/to/blog.git
target: '{{ mirror_base_path }}/git/my.domain.tld/path/to/blog.git/'
- role: git-bare
repository: https://{{ my_git_deploy_credentials }}@my.domain.tld/path/to/mailadm.git
target: '{{ mirror_base_path }}/git/my.domain.tld/path/to/mailadm.git/'
- role: git-bare
repository: https://{{ my_git_deploy_credentials }}@my.domain.tld/path/to/py-budget.git
target: '{{ mirror_base_path }}/git/my.domain.tld/path/to/py-budget.git/'
- name: git-bare
repository: https://{{ my_git_deploy_credentials }}@my.domain.tld/path/to/py-entek.git
target: '{{ mirror_base_path }}/git/my.domain.tld/path/to/py-entek.git/'
- name: git-bare
repository: https://{{ my_git_deploy_credentials }}@my.domain.tld/path/to/hack-dilbert.git
target: '{{ mirror_base_path }}/git/my.domain.tld/path/to/hack-dilbert.git/'
- role: git-bare
repository: https://{{ my_git_deploy_credentials }}@my.domain.tld/path/to/puppet.git
target: '{{ mirror_base_path }}/git/my.domain.tld/path/to/puppet.git/'
...
In my site.yml
file, I added the deployment credentials and the meta-role:
---
- hosts: localhost
vars:
#...
my_git_deploy_credentials: 'user:password'
#...
roles:
#...
- { role: git-mirrors, tags: ['git'] }
...
Ansible site.yml
The final site.yml
file looks like this:
---
- hosts: localhost
vars:
mirror_base_path: /mnt/mirrors
my_git_deploy_credentials: 'user:password'
pre_tasks:
- name: Stat mirrors directory
ansible.builtin.stat:
path: '{{ mirror_base_path }}'
register: mirror_path_stat
- name: Fail if mirrors directory does not exist
ansible.builtin.fail:
msg: 'Please pre-create the mirror target directory (currently set as {{ mirror_base_path }}'
when: not mirror_path_stat.stat.exists
- name: Make sure keyrings directory exists
ansible.builtin.file:
path: '{{ mirror_base_path }}/keyrings'
state: directory
roles:
- { role: debian-mirrors, tags: ['debian'] }
- { role: proxmox-mirrors, tags: ['proxmox'] }
- { role: gitlab-mirrors, tags: ['gitlab'] }
- { role: centos-mirrors, tags: ['centos'] }
- { role: rocky-mirrors, tags: ['rocky'] }
- role: gentoo-mirror
tags: ['gentoo']
target: "{{ mirror_base_path }}/gentoo"
config:
profile: default/linux/amd64/17.1/no-multilib/systemd
make.conf-vars:
USE:
- X
- alsa
# - doc
VIDEO_CARDS:
- fbdev
- vesa
- intel
package:
use:
00cpu-flags: '*/* CPU_FLAGS_X86: avx f10c mmx mmxext pclmul popcnt sse sse2 sse3 sse4_1 sse4_2 ssse3'
sys-boot: 'sys-boot/grub:2 device_mapper'
packages:
- '@world' # All the stage3 pre-built base packages
- app-editors/vim
- app-portage/cpuid2cpuflags
- dev-vcs/git
- net-misc/rsync
- sys-boot/grub
- sys-kernel/dracut
- sys-kernel/gentoo-sources
- sys-fs/cryptsetup
- sys-fs/e2fsprogs
- sys-fs/lvm2
- { role: git-mirrors, tags: ['git'] }
...
Pre-requisites
These are listed in requirements.txt
:
ansible
docker # For Gentoo mirror
Usage
I put some basic instructions into README.md
:
# Ansible playbook to create mirrors
A collection of roles and a playbook that will create mirrors for various Linux distributions.
## Usage
Do all mirrors except Gentoo:
```bash
ansible-playbook site.yml --skip-tags gentoo
```
**N.B.** Gentoo mirrors use Docker to run emerge. `ansible-playbook` can be run with `-K` to prompt for the sudo password but will hang if the sudo TOTP secret expires before completing the run. To avoid this, authenticate with sudo (e.g. run `sudo -l`) before running `ansible-playbook` without `-K` and, due to the local connection, it will be able to use your session's active sudo authorisation (which is longer than the TOTP's validity) until it expires.
Do just Gentoo mirrors:
```bash
ansible-playbook site.yml -t gentoo
```
Do just Gentoo but skip `emerge --sync` (which Gentoo nettiquette suggests doing no more than once per day):
```bash
ansible-playbook site.yml -t gentoo --skip-tags gentoo
```
## Initial setup
Create a virtualenv:
```bash
virtualenv -p python3 /tmp/venv
```
Install ansible and requirements:
```bash
# Either
/tmp/venv/bin/activate
pip install -r requirements.txt
# or
/tmp/venv/bin/pip install -r requirements.txt
```
Either activate the virtualenv or run `/tmp/venv/bin/ansible-playbook` directory to use.