Efficiently copying git changes across isolating network boundaries
In a previous post I described copy changes from my air-gapped home lab back upstream by copying the entire tar of the repository back and then pushing the changes. While this works and is fine for small repositories, it is highly inefficient for small changes to large repositories. I wrote the last post in full knowledge there would be a more efficient way, which this post documents. This is useful for other situations where there is some level of network isolation (but not necessarily a full air-gap), such as packing changes to move them via a jump host to another network, for pushing to a remote source.
Generating the pack
To create the pack file, we first need to know the last common commit between the two repositories - if there is none, this is a non-starter. Once we have that commit reference, we can use git rev-list
to get the list of revisions between that commit and the current HEAD
(this process will also work with another target, but for my purposes I want to use the HEAD of the current branch) and feed that to git pack-objects
to create the pack of all objects in those revisions (in this example 1a2b3c4
is a placeholder for the common starting commit, export
is simply a base-name for the pack files):
git rev-list --objects 1a2b3c4..HEAD | git pack-objects export
This will create two new files, export-sha1.pack
and export-sha1.idx
(where sha1
is a hash based on the pack content).
On my repository, this generated two files total size of 128K, representing several months of changes that I have not yet copied over, on a repository with a total 768K of code and a total size of 3.5M (when including version history). While this is a small scale example (to test the principle), the advantages are clear when the changes deltas are small and the repository large - I work on one repository that is a number of gigabytes in size.
Now (and this is important), make a note of the current HEAD
(or whatever your target was) commit it - for this example, I shall imagine it was 5d6e7f8
:
git rev-parse --short HEAD
Applying the pack
To apply the pack to the existing codebase, first I unpacked the objects exported from the isolated network (the glob, *
, is just for the example - I explicitly listed the actual filename from my export, just in case I left an old one behind9):
git unpack-objects <export-*.pack
Now I created a new branch at the HEAD
of the exported repository (which is why it was important to note its commit id):
git checkout -b import 5d6e7f8
Alternatively git fsck --lost-found
might show you the dangling commit id, however if there is other garbage in your current git clone it might be lost in the noise. git gc
(to run the garbage collector) might help tidy things up and just leave the imported HEAD
as the only dangling commit id or not, it really depends on how you have used the repository and when and you might permanently lose some other “lost” commits you really wanted.
The next thing to do is to merge with the current version of your main branch. In my case, this resulted in a number of conflicts to resolve as I had been doing parallel development on my desktop systems configuration management in the live network:
git pull origin main
Once this is all merged, the import branch can be pushed and (if applicable) a pull-request started to merge into the main branch:
git push -u origin import
You might need to repeat this process in reverse to copy the merged result back - in the case of my air-gapped network, it will get a read-only mirror of the merge repository to pull down from via my existing mirror synchronisation process.