Efficiently copying git changes across isolating network boundaries

In a previous post I described copy changes from my air-gapped home lab back upstream by copying the entire tar of the repository back and then pushing the changes. While this works and is fine for small repositories, it is highly inefficient for small changes to large repositories. I wrote the last post in full knowledge there would be a more efficient way, which this post documents. This is useful for other situations where there is some level of network isolation (but not necessarily a full air-gap), such as packing changes to move them via a jump host to another network, for pushing to a remote source.

Generating the pack

To create the pack file, we first need to know the last common commit between the two repositories - if there is none, this is a non-starter. Once we have that commit reference, we can use git rev-list to get the list of revisions between that commit and the current HEAD (this process will also work with another target, but for my purposes I want to use the HEAD of the current branch) and feed that to git pack-objects to create the pack of all objects in those revisions (in this example 1a2b3c4 is a placeholder for the common starting commit, export is simply a base-name for the pack files):

git rev-list --objects 1a2b3c4..HEAD | git pack-objects export

This will create two new files, export-sha1.pack and export-sha1.idx (where sha1 is a hash based on the pack content).

On my repository, this generated two files total size of 128K, representing several months of changes that I have not yet copied over, on a repository with a total 768K of code and a total size of 3.5M (when including version history). While this is a small scale example (to test the principle), the advantages are clear when the changes deltas are small and the repository large - I work on one repository that is a number of gigabytes in size.

Now (and this is important), make a note of the current HEAD (or whatever your target was) commit it - for this example, I shall imagine it was 5d6e7f8:

git rev-parse --short HEAD

Applying the pack

To apply the pack to the existing codebase, first I unpacked the objects exported from the isolated network (the glob, *, is just for the example - I explicitly listed the actual filename from my export, just in case I left an old one behind9):

git unpack-objects <export-*.pack

Now I created a new branch at the HEAD of the exported repository (which is why it was important to note its commit id):

git checkout -b import 5d6e7f8

Alternatively git fsck --lost-found might show you the dangling commit id, however if there is other garbage in your current git clone it might be lost in the noise. git gc (to run the garbage collector) might help tidy things up and just leave the imported HEAD as the only dangling commit id or not, it really depends on how you have used the repository and when and you might permanently lose some other “lost” commits you really wanted.

The next thing to do is to merge with the current version of your main branch. In my case, this resulted in a number of conflicts to resolve as I had been doing parallel development on my desktop systems configuration management in the live network:

git pull origin main

Once this is all merged, the import branch can be pushed and (if applicable) a pull-request started to merge into the main branch:

git push -u origin import

You might need to repeat this process in reverse to copy the merged result back - in the case of my air-gapped network, it will get a read-only mirror of the merge repository to pull down from via my existing mirror synchronisation process.