In my new network design, I crate a new VLAN that, for security, I did not want to provide a default route to, or routed access to anything outside my network. However the new VM hosts within still need to be able to get their packages and updates. So, I want to provide limited internet access via a very restrictive “whitelist” proxy with strict rules on what can be accessed (initially just deb.debian.org). I may even make this an authenticating proxy, in the future.

As a key security control, I also do not want to allow arbitrary DNS from within the secure VLAN. I have seen first hand the ease with which unrestricted DNS lookups can be used to exfiltrate data and anecdotally heard of people who regularly use it to bypass captive portals without paying or giving out their personal data. I found an article which gives a cursory introduction for anyone not familiar with how DNS can be abused in this way.

Attempt 1 - squid proxy

My first thought was to use Squid as a transparent proxy so that existing preseeded install, scripts, etc. “just work” in this locked-down environment.

Setting up Squid

For an quick initial test in a couple of VMs, I setup one with two interfaces (one to the real network, one on an internal VM-only network) and the other just connected to the internal VM-only network. The dual-homed VM did not have IP forwarding enable and the one just connected to the internal network had no default route specified.

To install squid and configure squid for a test on the dual-homed VM, I ran these commands:

apt install squid
cat - >/etc/squid/conf.d/package-management.conf <<EOF
acl package_sources dstdomain deb.debian.org
# Only required for https, which apt does not seem to need
#acl package_sources_ssl ssl::server_name deb.debian.org
http_access allow localnet package_sources
EOF

I then tested a Debian network install using the squid proxy, which worked fine so I decided this configuration was good enough to use as a starting point in the live network.

Make the proxy transparent

Configure Bind9 DNS server to respond with proxy IP for permitted domains

My live network uses the bind9 DNS server. There’s a good basic tutorial on response policy zones online but I later found a really good tutorial for this coupled with using views to deal with different client groups, which is what I needed to do. The official documentation also has a useful guide on how acls and views work together as well as a clear, concise, explanation of acls.

Create an acl that matches the networks to be restricted:

acl management {
  192.168.1.0/24;
};

Adding the view so that matched clients get the false IP for proxied URLs:

view "restricted-internet" {
  match-clients { management; };
  response-policy {
    zone "rpz.local";
  };

  zone "rpz.local" {
    type master;
    file "/etc/bind/rpz.local";
    allow-query { localhost; };
    allow-transfer { localhost; };
  };
};

Once one view is used, all zones must be in views. This means that include "/etc/bind/named.conf.default-zones"; must be commented (or removed) from /etc/bind/named.conf.

In my test VM I also had to turn off DNSSEC validation in /etc/bind/named.conf.options to work with my router as a forwarder:

options{
  //...
  forwarders {
    // Upstream router (via VM NAT interface)
    192.168.10.250;
  };
  dnssec-validation no;
  //...
}

Finally the zone itself:

$TTL 86400
@ IN SOA localhost. root.localhost. (
        1   ; Serial
   604800   ; Refresh
    86400   ; Retry
  2419200   ; Expire
    86400 ) ; Negative cache TTL
@ IN  NS localhost.

deb.debian.org A     192.168.254.254

Configure isc-dhcp-server to tell clients how to route proxy traffic to proxy server

My live network uses the ISC DHCP Server. Using a specific IP just for proxy traffic makes it easy to craft a route just for that, which the DHCP server can hand out. This configuration is based on a ServerFault answer but I did not setup ms-classless-static-routes (option 249) as this is only required by Windows XP and Server 2003 and I do not have any Microsoft systems older than Windows 10 on the network.

In /etc/dhcp/dhcpd.conf:

option rfc3442-classless-static-routes code 121 = array of integer 8;
subnet 192.168.1.0 netmask 255.255.255.0 {
  range 192.168.1.100 192.168.1.200;
  option rfc3442-classless-static-routes 32, 192, 168, 254, 254, 192, 168, 1, 1;
}

Transparently intercepting http

Squid needs to be told that it is intercepting requests and proxying them, rather than being addressed as a proxy by a client. This is done by adding a new http_port with intercept:

http_port 3129 intercept

Some documents suggest adding intercept to the default http_port but this will not work(the systemd unit will fail to start).

I also turned off client_dst_passthru to try and make the proxy preform DNS lookups and ignore the (dummy) IP address and port provided by the client - this was not effective, see below:

client_dst_passthru off

Redirect traffic to the proxy

The traffic is bound for a dummy IP, which is not configured on the box, but because a route via the server’s real IP is handed out clients they are sending the traffic to it. This can be intercepted and redirected to the intercepting proxy port.

Most instructions online are for the now defunct iptables command, I worked out the equivalent nft command from a variety of sources including the nftables wiki.

nft add table nat
nft 'add chain nat prerouting { type nat hook prerouting priority -100; }'
nft add rule nat prerouting iif eth1 ip daddr 192.168.254.254/32 tcp dport 80 redirect to 3129

Where it all fell down

This setup worked perfectly apart from one detail - traffic for http://deb.debian.org was intercepted and transparently redirected to the proxy…which tried to use the dummy 192.168.254.254 destination. On further reading of the squid documentation for client_dst_passthru I found that:

Regardless of this option setting, when dealing with intercepted traffic Squid will verify the Host: header and any traffic which fails Host verification will be treated as if this option were ON.

see host_verify_scrict for details on the verification process.

The documentation for host_verify_scrict makes clearer that if the destination IP does not match the Host header domain or IP it will use the client provided address (in this case the dummy “proxy traffic” IP):

Regardless of this option setting, when dealing with intercepted traffic, Squid always verifies that the destination IP address matches the Host header domain or IP (called ‘authority form URL’).

[…] When set to OFF (the default):

[…]

  • Intercepted requests which fail verification are sent to the client original destination instead of DIRECT. This overrides ‘client_dst_passthru off’.

After some searching, and finding quite a few other people struggling with trying to build similar setups, I concluded using Squid to do what I wanted was not possible. I did find reports of success with Privoxy instead but I decided to go a different route…

HTTPS transparent proxy

Because of the problem I encountered, I did not try setting this up. According to the internet (so it must be true) - SeverFault specifically - it is possible to use Squid’s SSL bump and peek/splice features to filter https targets via SNI. The suggested configuration from the SeverFault link above is:

acl denylist_ssl ssl::server_name google.com # NOT dstdomain
acl step1 at_step SslBump1
ssl_bump peek step1
ssl_bump splice !denylist_ssl # allow everything not in the denylist
ssl_bump terminate all # block everything else
https_port 3129 intercept ssl-bump cert=/etc/squid/dummy.pem

Attempt 2 - nginx

After the problem with Squid, I re-evaluated my approach and decided I was overcomplicating the solution. A simple reverse-proxy to the real repositories, presented to the secure network via a local dual-homed NGINX server, would be a simpler way to negate the need for public DNS resolution and deny general internet access from within.

I already have provision for using a local mirror instead of public ones in my Ansible playbooks (via a local_mirror variable) so utilising this required no change to my existing Ansible tasks, beyond setting it up.

With Ansible, I defined a new role called mirror and added it to my site.yaml for systems in a new mirror_servers group:

- hosts: mirror_servers
  tags: mirror
  roles:
    - mirror

In my inventory, I put the router into this group - actually any host would have done and I may move this in the future but I don’t want to create a situation where, e.g., building the hosts for the virtual machines is dependent on something running on those virtual machines (creating a catch-22 during disaster recovery):

mirror_servers:
  hosts:
    xxx:

In the domain group variables file for the live network, I set the local mirror to what will become the internal caching server.

local_mirror:
  debian:
    uri: http://mirror/debian
  debian-security:
    uri: http://mirror/debian-security

Adding proxy caching to nginx configuration

In my existing webserver role, I added a new entry point and tasks file called proxy_cache to add a cache. The cache configuration in NGINX is at the protocol (http) level, and cannot be defined within the virtual server or location sections, although multiple caches can be defined.

The proxy_cache.yaml tasks create a named cache, in a configuration file that is named for the cache being defined, to support the multiple caches. For the time being, I hardcoded the key zone size to 10 megabytes - according to the documentation this should be sufficient for ~80k keys. Even though I plan to create a large cache for the package proxy, this should be a case of a smallish number of large files being cached so a relatively low number of cache keys. In short, I only added variables for the things I wanted to customise for this (mirror proxy) use-case for now - I may turn more settings into variables in the future.

---
- name: Add proxy_cache requested to nginx
  become: true
  ansible.builtin.copy:
    dest: /etc/nginx/conf.d/proxy-cache-{{ proxy_cache_name }}.conf
    owner: root
    group: root
    mode: 00440
    # See docs for info on keys_zone size:
    # "One megabyte zone can store about 8 thousand keys."
    # (https://nginx.org/en/docs/http/ngx_http_proxy_module.html#proxy_cache_path)
    # Therefore 10m = 80k keys - which seems more than sufficient.
    content: >-
      proxy_cache_path {{ proxy_cache_path }}
      keys_zone={{ proxy_cache_name }}:10m
      levels=1:2
      {% if proxy_cache_inactive | default(false) %}inactive={{ proxy_cache_inactive }}{% endif %}
      {% if proxy_cache_max_size | default(false) %}max_size={{ proxy_cache_max_size }}{% endif %}
      ;
  notify: Restart nginx
...

The argument specification, added to the role’s meta/argument_specs.yaml, is:

proxy_cache:
  short_description: Add a proxy cache to the webserver
  author: Laurence Alexander Hurst
  options:
    proxy_cache_name:
      required: true
      type: str
      description: >-
        Name for this cache - will be used as part of the
        configuration filename.
    proxy_cache_path:
      required: true
      type: str
      description: >-
        Path on the filesystem where this cache will be stored
        (should exist and be writable by the webserver)
    proxy_cache_inactive:
      required: false
      type: str
      description: >-
        Files that are not accessed within this time (NGINX defaults
        to 10 minutes) are removed regardless of their freshness.
    proxy_cache_max_size:
      required: false
      type: str
      description: >-
        Maximum size of the cache - once reached, the least recently
        accessed files will start to be removed.

The mirror role

The mirror requires (because it uses) the webserver role, so I started by creating that dependency in the role’s meta/main.yaml:

---
dependencies:
  - role: webserver
...

The role takes a number of arguments, defined in meta/argument_specs.yaml, and I provided some default values including mirrors for Debian (the main repository and security updates):

---
argument_specs:
  main:
    short_description: Sets up a mirror webserver
    author: Laurence Alexander Hurst
    options:
      mirror_root:
        description: Root path for the web root
        type: str
        default: /srv/mirror
      mirror_server_name:
        description: Server name for the webserver configuration
        type: str
        default: 'mirror.{{ ansible_facts.domain }} mirror'
      mirror_proxies:
        description: Map of folder names to upstream proxies (only used with method=proxy)
        type: list
        elements: dict
        options:
          name:
            type: str
            required: true
            description: Location (folder) for this proxy mirror.
          upstream:
            type: str
            required: true
            description: Upstream server to proxy requests to.
          description:
            type: str
            description: Optional description of the mirror for the mirror index page.
        default:
          - name: debian
            upstream: http://deb.debian.org/debian/
            description: Debian main repositories
          - name: debian-security
            upstream:  http://security.debian.org/debian-security/
            description: Debian security repository
          - name: icons
            upstream: http://deb.debian.org
            description: Icons for Debian repositories (which use '/icons' absolute URL)
...

The corresponding defaults/main.yaml file:

---
mirror_root: /srv/mirror
mirror_server_name: 'mirror.{{ ansible_facts.domain }} mirror'
mirror_proxies:
  - name: debian
    upstream: http://deb.debian.org/debian/
    description: Debian main repositories
  - name: debian-security
    upstream:  http://security.debian.org/debian-security/
    description: Debian security repository
  # This one is just to get the icons in the Apache folder view from
  # (e.g.) debian and debian-security.
  - name: icons
    # Without the trailing '/' nginx will forward the original path
    # (/icons/...).
    upstream: http://deb.debian.org
    description: Icons for Debian repositories (which use '/icons' absolute URL)
...

The tasks file, tasks/main.yaml, sets up NGINX to proxy to the real mirrors by:

  • Creating the proxy directory
  • Sets up a 20G cache with 1 Month validity on cache files
  • Ensures the mirror web root directory exists
  • Creates an index file listing all local mirrors (i.e. directories within the root) and proxied mirrors
  • Webserver is configured to proxy mirrors to their respective upstreams
---
- name: Mirror proxy cache path exists
  become: true
  ansible.builtin.file:
    path: /srv/mirror-cache
    owner: www-data
    group: www-data
    mode: '750'
    state: directory
  when: mirror_proxies | length > 0
- name: Mirror proxy cache path is absent
  become: true
  ansible.builtin.file:
    path: /srv/mirror-cache
    state: absent
  when: mirror_proxies | length == 0
- name: Mirror proxy cache is setup
  ansible.builtin.include_role:
    name: webserver
    tasks_from: proxy_cache
  vars:
    proxy_cache_path: /srv/mirror-cache
    proxy_cache_name: mirror
    # Keep cached files for upto 1 month
    proxy_cache_inactive: 1M
    # Maximum 20G cache size
    proxy_cache_max_size: 20G
    proxy_badgers: banana
  when: mirror_proxies | length > 0
- name: Mirror proxy cache is removed
  become: true
  ansible.builtin.file:
    path: /etc/nginx/conf.d/proxy-cache-mirror.conf
    state: absent
  when: mirror_proxies | length == 0
- name: Mirror root exists and is accessible by web server
  become: true
  ansible.builtin.file:
    path: '{{ mirror_root }}'
    group: www-data
    # Will be publicly available over http, no need to restrict
    # world read access to files.
    mode: 0755
    state: directory
- name: List of directories (mirrors) in {{ mirror_root }} is known
  become: true
  # Webserver user should be able to see this
  become_user: www-data
  ansible.builtin.find:
    recurse: false
    file_type: directory
    follow: true
    path: '{{ mirror_root }}'
  register: mirror_root_directories
- name: Mirror index file exists
  become: true
  ansible.builtin.template:
    dest: '{{ mirror_root }}/index.html'
    src: index.html
    group: www-data
    mode: 0660
  vars:
    local_mirror_locations: "{{ mirror_root_directories.files | map(attribute='path') | map('ansible.builtin.relpath', mirror_root) }}"
- name: Mirror configuration is set to baseline
  ansible.builtin.set_fact:
    nginx_mirror_config:
      - location: /
        configuration: |
          autoindex on;
          try_files $uri $uri/ =404;
- name: Mirror proxies are in nginx configuration
  ansible.builtin.set_fact:
    nginx_mirror_config: >-
      {{
        nginx_mirror_config +
        [{
          'location': '/' + item.name + '/',
          'configuration': 'proxy_pass ' + item.upstream + ';',
        }]
      }}
  loop: '{{ mirror_proxies }}'
- name: Mirror server is configured
  ansible.builtin.include_role:
    name: webserver
    tasks_from: add_site
  vars:
    site_name: mirror
    root: '{{ mirror_root }}'
    server_name: '{{ mirror_server_name }}'
    nginx:
      extra_configuration: |
        {% if mirror_proxies | length > 0 %}
        # Default to considering 200 responses valid for 2 weeks
        proxy_cache_valid      200  2w;
        # If we get certain errors then allow the use of stale cache
        # entries instead of failing.
        proxy_cache_use_stale  error timeout invalid_header updating
                               http_500 http_502 http_503 http_504;
        {% endif %}
      locations: '{{ nginx_mirror_config }}'
...

The template for the index file is:

<!DOCTYPE html>
<html>
<head>
  <title>Mirrors</title>
  <meta charset="utf-8"  />
</head>

<body>
  <p>
    These are the available local mirrors:
    <ul>
      {% for location in local_mirror_locations %}
      <li><a href="{{ location }}/">{{ location }}/</a></li>
      {% endfor %}
    </ul>
  </p>
  <p>These are the available proxied mirrors:</p>
  <ul>
    {% for proxy_mirror in mirror_proxies %}
    <li><a href="{{ proxy_mirror.name }}/">{{ proxy_mirror.name }}/{% if proxy_mirror.description | default(false) %} - {{ proxy_mirror.description }}{% endif %}</a></li>
    {% endfor %}
  </ul>
</body>
</html>

Usage

As this is just a reverse proxy to the real mirror, using it (e.g. in the preseed configuration) is just a case of pointing to the url of the nginx webserver:

d-i mirror/http/hostname string mirror
d-i mirror/http/directory string /debian

(or, for the Ansible template:)

d-i mirror/http/hostname string {{ local_mirror.debian default(‘http://deb.debain.org/debian’) urlsplit(‘hostname’) }}
d-i mirror/http/directory string {{ local_mirror.debian default(‘http://deb.debain.org/debian’) urlsplit(‘path’) }}