Mirror proxy for limited access networks
In my new network design, I crate a new VLAN that, for security, I did not want to provide a default route to, or routed access to anything outside my network. However the new VM hosts within still need to be able to get their packages and updates. So, I want to provide limited internet access via a very restrictive “whitelist” proxy with strict rules on what can be accessed (initially just deb.debian.org
). I may even make this an authenticating proxy, in the future.
As a key security control, I also do not want to allow arbitrary DNS from within the secure VLAN. I have seen first hand the ease with which unrestricted DNS lookups can be used to exfiltrate data and anecdotally heard of people who regularly use it to bypass captive portals without paying or giving out their personal data. I found an article which gives a cursory introduction for anyone not familiar with how DNS can be abused in this way.
Attempt 1 - squid proxy
My first thought was to use Squid as a transparent proxy so that existing preseeded install, scripts, etc. “just work” in this locked-down environment.
Setting up Squid
For an quick initial test in a couple of VMs, I setup one with two interfaces (one to the real network, one on an internal VM-only network) and the other just connected to the internal VM-only network. The dual-homed VM did not have IP forwarding enable and the one just connected to the internal network had no default route specified.
To install squid and configure squid for a test on the dual-homed VM, I ran these commands:
apt install squid
cat - >/etc/squid/conf.d/package-management.conf <<EOF
acl package_sources dstdomain deb.debian.org
# Only required for https, which apt does not seem to need
#acl package_sources_ssl ssl::server_name deb.debian.org
http_access allow localnet package_sources
EOF
I then tested a Debian network install using the squid proxy, which worked fine so I decided this configuration was good enough to use as a starting point in the live network.
Make the proxy transparent
Configure Bind9 DNS server to respond with proxy IP for permitted domains
My live network uses the bind9 DNS server. There’s a good basic tutorial on response policy zones online but I later found a really good tutorial for this coupled with using views to deal with different client groups, which is what I needed to do. The official documentation also has a useful guide on how acls and views work together as well as a clear, concise, explanation of acls.
Create an acl
that matches the networks to be restricted:
acl management {
192.168.1.0/24;
};
Adding the view
so that matched clients get the false IP for proxied URLs:
view "restricted-internet" {
match-clients { management; };
response-policy {
zone "rpz.local";
};
zone "rpz.local" {
type master;
file "/etc/bind/rpz.local";
allow-query { localhost; };
allow-transfer { localhost; };
};
};
Once one view
is used, all zones must be in views. This means that include "/etc/bind/named.conf.default-zones";
must be commented (or removed) from /etc/bind/named.conf
.
In my test VM I also had to turn off DNSSEC validation in /etc/bind/named.conf.options
to work with my router as a forwarder:
options{
//...
forwarders {
// Upstream router (via VM NAT interface)
192.168.10.250;
};
dnssec-validation no;
//...
}
Finally the zone itself:
$TTL 86400
@ IN SOA localhost. root.localhost. (
1 ; Serial
604800 ; Refresh
86400 ; Retry
2419200 ; Expire
86400 ) ; Negative cache TTL
@ IN NS localhost.
deb.debian.org A 192.168.254.254
Configure isc-dhcp-server to tell clients how to route proxy traffic to proxy server
My live network uses the ISC DHCP Server. Using a specific IP just for proxy traffic makes it easy to craft a route just for that, which the DHCP server can hand out. This configuration is based on a ServerFault answer but I did not setup ms-classless-static-routes
(option 249
) as this is only required by Windows XP and Server 2003 and I do not have any Microsoft systems older than Windows 10 on the network.
In /etc/dhcp/dhcpd.conf
:
option rfc3442-classless-static-routes code 121 = array of integer 8;
subnet 192.168.1.0 netmask 255.255.255.0 {
range 192.168.1.100 192.168.1.200;
option rfc3442-classless-static-routes 32, 192, 168, 254, 254, 192, 168, 1, 1;
}
Transparently intercepting http
Squid needs to be told that it is intercepting requests and proxying them, rather than being addressed as a proxy by a client. This is done by adding a new http_port
with intercept
:
http_port 3129 intercept
Some documents suggest adding intercept
to the default http_port
but this will not work(the systemd unit will fail to start).
I also turned off client_dst_passthru
to try and make the proxy preform DNS lookups and ignore the (dummy) IP address and port provided by the client - this was not effective, see below:
client_dst_passthru off
Redirect traffic to the proxy
The traffic is bound for a dummy IP, which is not configured on the box, but because a route via the server’s real IP is handed out clients they are sending the traffic to it. This can be intercepted and redirected to the intercepting proxy port.
Most instructions online are for the now defunct iptables
command, I worked out the equivalent nft
command from a variety of sources including the nftables wiki.
nft add table nat
nft 'add chain nat prerouting { type nat hook prerouting priority -100; }'
nft add rule nat prerouting iif eth1 ip daddr 192.168.254.254/32 tcp dport 80 redirect to 3129
Where it all fell down
This setup worked perfectly apart from one detail - traffic for http://deb.debian.org was intercepted and transparently redirected to the proxy…which tried to use the dummy 192.168.254.254
destination. On further reading of the squid documentation for client_dst_passthru
I found that:
Regardless of this option setting, when dealing with intercepted traffic Squid will verify the Host: header and any traffic which fails Host verification will be treated as if this option were ON.
see host_verify_scrict for details on the verification process.
The documentation for host_verify_scrict
makes clearer that if the destination IP does not match the Host
header domain or IP it will use the client provided address (in this case the dummy “proxy traffic” IP):
Regardless of this option setting, when dealing with intercepted traffic, Squid always verifies that the destination IP address matches the Host header domain or IP (called ‘authority form URL’).
[…] When set to OFF (the default):
[…]
- Intercepted requests which fail verification are sent to the client original destination instead of DIRECT. This overrides ‘client_dst_passthru off’.
After some searching, and finding quite a few other people struggling with trying to build similar setups, I concluded using Squid to do what I wanted was not possible. I did find reports of success with Privoxy instead but I decided to go a different route…
HTTPS transparent proxy
Because of the problem I encountered, I did not try setting this up. According to the internet (so it must be true) - SeverFault specifically - it is possible to use Squid’s SSL bump and peek/splice features to filter https targets via SNI. The suggested configuration from the SeverFault link above is:
acl denylist_ssl ssl::server_name google.com # NOT dstdomain
acl step1 at_step SslBump1
ssl_bump peek step1
ssl_bump splice !denylist_ssl # allow everything not in the denylist
ssl_bump terminate all # block everything else
https_port 3129 intercept ssl-bump cert=/etc/squid/dummy.pem
Attempt 2 - nginx
After the problem with Squid, I re-evaluated my approach and decided I was overcomplicating the solution. A simple reverse-proxy to the real repositories, presented to the secure network via a local dual-homed NGINX server, would be a simpler way to negate the need for public DNS resolution and deny general internet access from within.
I already have provision for using a local mirror instead of public ones in my Ansible playbooks (via a local_mirror
variable) so utilising this required no change to my existing Ansible tasks, beyond setting it up.
With Ansible, I defined a new role called mirror
and added it to my site.yaml
for systems in a new mirror_servers
group:
- hosts: mirror_servers
tags: mirror
roles:
- mirror
In my inventory, I put the router into this group - actually any host would have done and I may move this in the future but I don’t want to create a situation where, e.g., building the hosts for the virtual machines is dependent on something running on those virtual machines (creating a catch-22 during disaster recovery):
mirror_servers:
hosts:
xxx:
In the domain group variables file for the live network, I set the local mirror to what will become the internal caching server.
local_mirror:
debian:
uri: http://mirror/debian
debian-security:
uri: http://mirror/debian-security
Adding proxy caching to nginx configuration
In my existing webserver
role, I added a new entry point and tasks file called proxy_cache
to add a cache. The cache configuration in NGINX is at the protocol (http
) level, and cannot be defined within the virtual server or location sections, although multiple caches can be defined.
The proxy_cache.yaml
tasks create a named cache, in a configuration file that is named for the cache being defined, to support the multiple caches. For the time being, I hardcoded the key zone size to 10 megabytes - according to the documentation this should be sufficient for ~80k keys. Even though I plan to create a large cache for the package proxy, this should be a case of a smallish number of large files being cached so a relatively low number of cache keys. In short, I only added variables for the things I wanted to customise for this (mirror proxy) use-case for now - I may turn more settings into variables in the future.
---
- name: Add proxy_cache requested to nginx
become: true
ansible.builtin.copy:
dest: /etc/nginx/conf.d/proxy-cache-{{ proxy_cache_name }}.conf
owner: root
group: root
mode: 00440
# See docs for info on keys_zone size:
# "One megabyte zone can store about 8 thousand keys."
# (https://nginx.org/en/docs/http/ngx_http_proxy_module.html#proxy_cache_path)
# Therefore 10m = 80k keys - which seems more than sufficient.
content: >-
proxy_cache_path {{ proxy_cache_path }}
keys_zone={{ proxy_cache_name }}:10m
levels=1:2
{% if proxy_cache_inactive | default(false) %}inactive={{ proxy_cache_inactive }}{% endif %}
{% if proxy_cache_max_size | default(false) %}max_size={{ proxy_cache_max_size }}{% endif %}
;
notify: Restart nginx
...
The argument specification, added to the role’s meta/argument_specs.yaml
, is:
proxy_cache:
short_description: Add a proxy cache to the webserver
author: Laurence Alexander Hurst
options:
proxy_cache_name:
required: true
type: str
description: >-
Name for this cache - will be used as part of the
configuration filename.
proxy_cache_path:
required: true
type: str
description: >-
Path on the filesystem where this cache will be stored
(should exist and be writable by the webserver)
proxy_cache_inactive:
required: false
type: str
description: >-
Files that are not accessed within this time (NGINX defaults
to 10 minutes) are removed regardless of their freshness.
proxy_cache_max_size:
required: false
type: str
description: >-
Maximum size of the cache - once reached, the least recently
accessed files will start to be removed.
The mirror role
The mirror requires (because it uses) the webserver role, so I started by creating that dependency in the role’s meta/main.yaml
:
---
dependencies:
- role: webserver
...
The role takes a number of arguments, defined in meta/argument_specs.yaml
, and I provided some default values including mirrors for Debian (the main repository and security updates):
---
argument_specs:
main:
short_description: Sets up a mirror webserver
author: Laurence Alexander Hurst
options:
mirror_root:
description: Root path for the web root
type: str
default: /srv/mirror
mirror_server_name:
description: Server name for the webserver configuration
type: str
default: 'mirror.{{ ansible_facts.domain }} mirror'
mirror_proxies:
description: Map of folder names to upstream proxies (only used with method=proxy)
type: list
elements: dict
options:
name:
type: str
required: true
description: Location (folder) for this proxy mirror.
upstream:
type: str
required: true
description: Upstream server to proxy requests to.
description:
type: str
description: Optional description of the mirror for the mirror index page.
default:
- name: debian
upstream: http://deb.debian.org/debian/
description: Debian main repositories
- name: debian-security
upstream: http://security.debian.org/debian-security/
description: Debian security repository
- name: icons
upstream: http://deb.debian.org
description: Icons for Debian repositories (which use '/icons' absolute URL)
...
The corresponding defaults/main.yaml
file:
---
mirror_root: /srv/mirror
mirror_server_name: 'mirror.{{ ansible_facts.domain }} mirror'
mirror_proxies:
- name: debian
upstream: http://deb.debian.org/debian/
description: Debian main repositories
- name: debian-security
upstream: http://security.debian.org/debian-security/
description: Debian security repository
# This one is just to get the icons in the Apache folder view from
# (e.g.) debian and debian-security.
- name: icons
# Without the trailing '/' nginx will forward the original path
# (/icons/...).
upstream: http://deb.debian.org
description: Icons for Debian repositories (which use '/icons' absolute URL)
...
The tasks file, tasks/main.yaml
, sets up NGINX to proxy to the real mirrors by:
- Creating the proxy directory
- Sets up a 20G cache with 1 Month validity on cache files
- Ensures the mirror web root directory exists
- Creates an index file listing all local mirrors (i.e. directories within the root) and proxied mirrors
- Webserver is configured to proxy mirrors to their respective upstreams
---
- name: Mirror proxy cache path exists
become: true
ansible.builtin.file:
path: /srv/mirror-cache
owner: www-data
group: www-data
mode: '750'
state: directory
when: mirror_proxies | length > 0
- name: Mirror proxy cache path is absent
become: true
ansible.builtin.file:
path: /srv/mirror-cache
state: absent
when: mirror_proxies | length == 0
- name: Mirror proxy cache is setup
ansible.builtin.include_role:
name: webserver
tasks_from: proxy_cache
vars:
proxy_cache_path: /srv/mirror-cache
proxy_cache_name: mirror
# Keep cached files for upto 1 month
proxy_cache_inactive: 1M
# Maximum 20G cache size
proxy_cache_max_size: 20G
proxy_badgers: banana
when: mirror_proxies | length > 0
- name: Mirror proxy cache is removed
become: true
ansible.builtin.file:
path: /etc/nginx/conf.d/proxy-cache-mirror.conf
state: absent
when: mirror_proxies | length == 0
- name: Mirror root exists and is accessible by web server
become: true
ansible.builtin.file:
path: '{{ mirror_root }}'
group: www-data
# Will be publicly available over http, no need to restrict
# world read access to files.
mode: 0755
state: directory
- name: List of directories (mirrors) in {{ mirror_root }} is known
become: true
# Webserver user should be able to see this
become_user: www-data
ansible.builtin.find:
recurse: false
file_type: directory
follow: true
path: '{{ mirror_root }}'
register: mirror_root_directories
- name: Mirror index file exists
become: true
ansible.builtin.template:
dest: '{{ mirror_root }}/index.html'
src: index.html
group: www-data
mode: 0660
vars:
local_mirror_locations: "{{ mirror_root_directories.files | map(attribute='path') | map('ansible.builtin.relpath', mirror_root) }}"
- name: Mirror configuration is set to baseline
ansible.builtin.set_fact:
nginx_mirror_config:
- location: /
configuration: |
autoindex on;
try_files $uri $uri/ =404;
- name: Mirror proxies are in nginx configuration
ansible.builtin.set_fact:
nginx_mirror_config: >-
{{
nginx_mirror_config +
[{
'location': '/' + item.name + '/',
'configuration': 'proxy_pass ' + item.upstream + ';',
}]
}}
loop: '{{ mirror_proxies }}'
- name: Mirror server is configured
ansible.builtin.include_role:
name: webserver
tasks_from: add_site
vars:
site_name: mirror
root: '{{ mirror_root }}'
server_name: '{{ mirror_server_name }}'
nginx:
extra_configuration: |
{% if mirror_proxies | length > 0 %}
# Default to considering 200 responses valid for 2 weeks
proxy_cache_valid 200 2w;
# If we get certain errors then allow the use of stale cache
# entries instead of failing.
proxy_cache_use_stale error timeout invalid_header updating
http_500 http_502 http_503 http_504;
{% endif %}
locations: '{{ nginx_mirror_config }}'
...
The template for the index file is:
<!DOCTYPE html>
<html>
<head>
<title>Mirrors</title>
<meta charset="utf-8" />
</head>
<body>
<p>
These are the available local mirrors:
<ul>
{% for location in local_mirror_locations %}
<li><a href="{{ location }}/">{{ location }}/</a></li>
{% endfor %}
</ul>
</p>
<p>These are the available proxied mirrors:</p>
<ul>
{% for proxy_mirror in mirror_proxies %}
<li><a href="{{ proxy_mirror.name }}/">{{ proxy_mirror.name }}/{% if proxy_mirror.description | default(false) %} - {{ proxy_mirror.description }}{% endif %}</a></li>
{% endfor %}
</ul>
</body>
</html>
Usage
As this is just a reverse proxy to the real mirror, using it (e.g. in the preseed configuration) is just a case of pointing to the url of the nginx webserver:
d-i mirror/http/hostname string mirror
d-i mirror/http/directory string /debian
(or, for the Ansible template:)
d-i mirror/http/hostname string {{ local_mirror.debian | default(‘http://deb.debain.org/debian’) | urlsplit(‘hostname’) }} |
d-i mirror/http/directory string {{ local_mirror.debian | default(‘http://deb.debain.org/debian’) | urlsplit(‘path’) }} |