As I was trying to install a recent version of Ansible in my air-gapped home lab network I discovered that mirroring Python packages, e.g. from PyPi, is quite difficult despite pip’s download option. The --platform, --python-version, --implementation and --abi options are supposed to allow downloading for another platform but finding the right combination of options was tricky - in particular the cryptography package (which often causes me issues) would not download for Debian Bullseye on either Buster or macOS with any combinations I tried. In the end, I resorted to the approach I adopted for Gentoo, using a Docker container to download for the platform inside the container.

docker role

I started by separating the setup-docker sub task file (tasks/setup-docker.yaml) from gentoo-mirror role into a role in its own right, called docker. This is because I now needed Docker in more than just that one role so, applying the DRY (don’t repeat yourself) principal, now is the time to pull it out.

The role takes not arguments, the only variable is a built-in fact for the distribution’s release, so all that is needed is the task. This remains Debian-specific for the host, I should look at creating a version that works on macOS (and maybe Windows) too. The tasks go in the role’s tasks/main.yaml file:

---
- name: Ensure Docker repository is available
  become: yes
  ansible.builtin.apt_repository:
    # Match what is currently in SaltStack (repo and filename) so they
    # do not end up fighting.
    repo: deb https://download.docker.com/linux/debian {{ ansible_facts.distribution_release }} stable
    filename: docker
    state: present
- name: Ensure Docker is installed
  become: yes
  ansible.builtin.package:
    name:
      - docker-ce
      - docker-ce-cli
      - containerd.io
      - python3-docker # So Ansible can manage docker
    state: present
...

Updating gentoo-mirror role

In gentoo-mirror’s tasks/main.yaml I removed (and the tasks/setup-docker.yaml file):

  ansible.builtin.include_tasks: setup-docker.yaml

and created meta/main.yaml with the contents:

---
dependencies:
  - role: docker
...

Creating the pip-mirrors role

This has the same dependency, so needs an identical meta/main.yaml file:

---
dependencies:
  - role: docker
...

This roles does take arguments; where to mirror to, a list of platforms to download for and a list of packages to download. The specification for these arguments goes in meta/argument_specs.yaml:

---
argument_specs:
  main:
    short_description: Mirror pip packages, using docker to fetch platform-specific version
    options:
      target:
        type: str
        required: true
        description: Base directory to mirror to
      platforms:
        type: list
        required: true
        elements: dict
        options:
          image:
            type: str
            required: true
            description: Docker image name to use to fetch pip
          pre-command:
            type: str
            description: Pre-pip command to run (e.g. to install pip in the container). No command will be run if not provided.
          name:
            type: str
            description: sub-directory name to use (defaults to value of `image` value with colons replace by hyphens)
      packages:
        type: list
        required: true
        elements: str
        description: List of packages to fetch with pip (passed directly to pip, so anything pip accepts (e.g. version constraints) can be included)
...

The tasks to fetch the packages just use the docker images from the platforms list to download all of the packages. This does in tasks/main.yaml:

---
- name: Make sure the target directories exists
  ansible.builtin.file:
    path: "{{ target }}/{{ item.name | default(item.image | replace(':', '-')) }}"
    state: directory
  loop: '{{ platforms }}'
- name: Fetch packages
  become: yes
  community.docker.docker_container:
    name: do_pip_fetch
    container_default_behavior: no_defaults  # Stop warning
    cleanup: yes
    detach: no
    # This originally used `pip download -d` instead of `pip wheel -w`
    # but that caused problems with missing build dependencies in
    # isolated (air-gapped) environments. `pip wheel` will produce
    # built binaries for the platform, even if that means building
    # locally.
    command: "bash -c '{{ item['pre-command'] | default('/bin/true') }} && pip wheel -w /mnt/ {{ packages | map('quote') | join(' ') }}'"
    image: '{{ item.image }}'
    mounts:
      - source: "{{ target }}/{{ item.name | default(item.image | replace(':', '-')) }}"
        target: /mnt
        type: bind
  loop: '{{ platforms }}'
...

Using pip-mirrors role

To use this role to download ansible for Debian Bullseye (current stable, at time of writing):

    - role: pip-mirrors
      target: '{{ mirror_base_path }}/pip'
      platforms:
        - image: debian:bullseye
          pre-command: 'apt-get update && apt-get -y install python3-pip'
      packages:
        - ansible
      tags: ['pip']

N.B. This will never remove packages from the mirror, which might become a problem over time.

Installing from mirror

Unfortunately simply uploading the files to a web-server doesn’t confirm to the layout required by pip, which requires normalising the names of the packages and placing them in the appropriate locations. The easiest way I found to use it (until it becomes large) was to download the packages with wget’s recursion:

# Create a temporary folder
outdir=$(mktemp -d)
# Fetch with wget in temporary directory:
# -r = recursive
# -A whl = download only files with the extension 'whl' (was
#          `-A whl,gz,tar` when using `pip download` instead of
#          `pip wheel` to create mirror but now will always be whl)
# -np = do not recurse to parent directories
# -nd = do not create the server's directory hierarchy locally
pushd $outdir
wget -r -A whl -np -nd http://mirror/mirrors/pip/debian-bullseye/
popd
# Install Ansible
# -f = where to look for packages
pip install -f $outdir --no-index ansible
# Tidy up
rm -rf $outdir

Although a simpler solution would probably be to NFS mount the mirror - I’m sure it could be used by the OS that way too.