Storing Blobs on the GitHub Container Registry
Recently, I had the need to access some virtual machine images in one of my GitHub Actions workflows. As they came in at more than 1 GB, I was hard-pressed for a place to store them that was also easy to access from a GitHub Actions workflow. If only I could upload a ZIP archive to GitHub Packages!
It turns out that it is not only possible but also much easier than I imagined it to be. Let’s learn how.
Container Images Are Nothing but Fancy Tarballs
The key is to realise that container images (those things that you feed to docker run
) are nothing but tarballs and some metadata bundled together. There is no rule1 that says “Thou shalt only store operating systems in container images.” So, there is nothing that stops us from basically storing anything in a container image. And those can be uploaded to the GitHub Container Registry (GHCR), which is part of the GitHub Packages offering, or any other container registry.
Now, we only need to find a tool that makes putting anything into a container image easy. Enter oras
.
Upload Anything with ORAS
oras
is a nifty command-line tool made by the ORAS project. ORAS stands for “OCI Registry As Storage”, and OCI is the Open Container Initiative. OCI is the standards body regulating the format of container images, registries, and so on. They ensure that any container image can be run by any runtime (Docker, Podman, containerd, …) and can be uploaded to any registry (GHCR, Docker Hub, …). Thanks to this standardisation, oras
is compatible with a dozen different registries.
oras
itself is available for all major operating systems and most common platforms. You can either download and run the binary from GitHub Releases or use one of the other installation methods.
I have prepared a directory called data
that I want to store on GHCR. It looks like this:
$ tree data
data
├── 50m.bin
└── a-directory
└── hello.txt
2 directories, 2 files
Uploading it to GHCR is as simple as running:
$ oras push ghcr.io/example/data:1 data
To download the files, create an empty directory and change into it. Then run:
$ oras pull ghcr.io/example/data:1
That’s it! There is now a directory called data
in the current working directory that looks exactly like the one I uploaded above:
$ tree data
data
├── 50m.bin
└── a-directory
└── hello.txt
2 directories, 2 files
While this is not exactly “uploading a ZIP archive”, it is actually much better: You do not have to create the ZIP archive yourself, checksum verification is built-in, and in some circumstances, it is even possible to change the container image without having to re-upload everything.
Saving Space and Bandwith with Layers
In the background, oras
takes the directory and turns it into an OCI container image before uploading it. That is the same kind of image that Docker uses. If you have used Docker before, you probably heard about “layers”. In a nutshell, a container image consists of one or more layers. Layer is just a fancy term for a tarball. Having multiple tarballs in an image instead of a single one helps with caching and reducing the amount of storage consumed by all those images in a registry. oras
creates layers, too, as we can see when we look at the manifest2 of the container image we uploaded:
$ oras manifest fetch ghcr.io/example/data:1 | jq
{
"schemaVersion": 2,
"mediaType": "application/vnd.oci.image.manifest.v1+json",
"artifactType": "application/vnd.unknown.artifact.v1",
"config": {
"mediaType": "application/vnd.oci.empty.v1+json",
"digest": "sha256:44136fa355b3678a1146ad16f7e8649e94fb4fc21fe77e8310c060f61caaff8a",
"size": 2,
"data": "e30="
},
"layers": [
{
"mediaType": "application/vnd.oci.image.layer.v1.tar+gzip",
"digest": "sha256:8c9d7307a4263817ad8dd2b845c4bac3a4a59621d98a063d3482df77763e7cee",
"size": 52445175,
"annotations": {
"io.deis.oras.content.digest": "sha256:7520f1358115aa8ffd0ca65b22ba5bf9ef4555e9f9212032f65f8cf91e7ec93a",
"io.deis.oras.content.unpack": "true",
"org.opencontainers.image.title": "data"
}
}
],
"annotations": {
"org.opencontainers.image.created": "2024-07-04T15:18:29Z"
}
}
There it is, a single layer with the SHA-256 checksum 8c9d7307a4263817ad8dd2b845c4bac3a4a59621d98a063d3482df77763e7cee
. There is even an annotation called org.opencontainers.image.title
with the folder’s name: data
. The manifest of a “normal” Docker image does not look much different:
$ oras manifest fetch --platform linux/amd64 docker.io/library/postgres:16.3 | jq
{
"schemaVersion": 2,
"mediaType": "application/vnd.oci.image.manifest.v1+json",
"config": {
"mediaType": "application/vnd.oci.image.config.v1+json",
"digest": "sha256:f23dc7cd74bd7693fc164fd829b9a7fa1edf8eaaed488c117312aef2a48cafaa",
"size": 10091
},
"layers": [
{
"mediaType": "application/vnd.oci.image.layer.v1.tar+gzip",
"digest": "sha256:f11c1adaa26e078479ccdd45312ea3b88476441b91be0ec898a7e07bfd05badc",
"size": 29126278
},
// Many more layers omitted.
{
"mediaType": "application/vnd.oci.image.layer.v1.tar+gzip",
"digest": "sha256:95c2c2ef9f02d7666e80992c98c53c9ec7b5e8ccf244d00a5c85e46bbc2820ae",
"size": 184
}
],
"annotations": {
"com.docker.official-images.bashbrew.arch": "amd64",
"org.opencontainers.image.base.digest": "sha256:39868a6f452462b70cf720a8daff250c63e7342970e749059c105bf7c1e8eeaf",
"org.opencontainers.image.base.name": "debian:bookworm-slim",
"org.opencontainers.image.created": "2024-05-09T18:58:11Z",
"org.opencontainers.image.revision": "d08757ccb56ee047efd76c41dbc148e2e2c4f68f",
"org.opencontainers.image.source": "https://github.com/docker-library/postgres.git#d08757ccb56ee047efd76c41dbc148e2e2c4f68f:16/bookworm",
"org.opencontainers.image.url": "https://hub.docker.com/_/postgres",
"org.opencontainers.image.version": "16.3"
}
}
Back to those layers. As I mentioned before, it is possible to change the container image created by oras
in some circumstances without re-uploading everything. Those circumstances have a lot to do with those layers. When you run oras push <name> <file> [...]
, oras
creates a separate layer per argument. Let’s upload the same directory as before, but this time, specify every file as a separate argument:
$ oras push ghcr.io/example/data:1 data/50m.bin data/a-directory/hello.txt
While the result on disk is the same when we download the image again, the manifest looks different:
$ oras manifest fetch ghcr.io/example/data:1 | jq
{
"schemaVersion": 2,
"mediaType": "application/vnd.oci.image.manifest.v1+json",
"artifactType": "application/vnd.unknown.artifact.v1",
"config": {
"mediaType": "application/vnd.oci.empty.v1+json",
"digest": "sha256:44136fa355b3678a1146ad16f7e8649e94fb4fc21fe77e8310c060f61caaff8a",
"size": 2,
"data": "e30="
},
"layers": [
{
"mediaType": "application/vnd.oci.image.layer.v1.tar",
"digest": "sha256:1e4c2dd682422beba2fa33db0f926935afe1414f722ee54be7788c6a6c40ebca",
"size": 52428800,
"annotations": {
"org.opencontainers.image.title": "data/50m.bin"
}
},
{
"mediaType": "application/vnd.oci.image.layer.v1.tar",
"digest": "sha256:03ba204e50d126e4674c005e04d82e84c21366780af1f43bd54a37816b6ab340",
"size": 13,
"annotations": {
"org.opencontainers.image.title": "data/a-directory/hello.txt"
}
}
],
"annotations": {
"org.opencontainers.image.created": "2024-07-05T13:51:19Z"
}
}
There is now one layer per file, two in total. That means you can now change the contents of the image in place by adding or removing layers. Let’s add another file, data/10m.bin
:
$ oras push ghcr.io/example/data:1 data/50m.bin data/10m.bin data/a-directory/hello.txt
You will see that oras
only uploads data/10m.bin
because the other two files (layers) are already part of the image. Omit data/50m.bin
and oras
will only delete its layer, leaving everything else in place:
$ oras push ghcr.io/example/data:1 data/10m.bin data/a-directory/hello.txt
So if you expect that you need to update parts of the container image frequently or you want to save space3 when storing multiple images that share some contents, it might be beneficial to put every file, or at least every folder, in a separate layer as shown in the preceding examples. If you want to save on typing, find
and xargs
can help:
$ find data -type f -print0 | xargs -r -0 oras push ghcr.io/example/data:1
oras
has more to offer, but what we have seen so far should suffice for everyday use.
ORAS in a GitHub Actions Workflow
This is a minimal GitHub Actions workflow to download our data
folder onto the runner:
name: Build
on:
push:
jobs:
build:
name: Build
runs-on: ubuntu-latest
permissions:
contents: read
packages: read # Required to access GHCR
steps:
- name: Install oras
run: |
sudo snap install oras --classic
- name: Download data
run: |
oras login --username "${{ github.actor }}" --password "${{ secrets.GITHUB_TOKEN }}" ghcr.io
oras pull ghcr.io/example/data:1
The highlights:
- You need to declare the permission
packages: read
to access GHCR. If you use a different container registry, you can omit it. - I install
oras
usingsnap
. Any other method is fine, too. - Log into GHCR with
${{ github.actor }}
as username and${{ secrets.GITHUB_TOKEN }}
as password. This also saves you some money4.
Then, you can use oras
as usual.
If the container images you are accessing are private, and they are private by default, you also have to link the image with the repository that the GitHub Actions workflow is part of. Otherwise, you get permission errors. There are two ways to do this:
-
Connect a repository to a package using the GitHub UI.
-
You can add the annotation
org.opencontainers.image.source
to the container image. Assuming you want to access the image inhttps://github.com/example/my-repository
, then the command would look as follows:$ oras push ghcr.io/example/data:1 \ -a "org.opencontainers.image.source=https://github.com/example/my-repository" \ data/50m.bin data/10m.bin data/a-directory/hello.txt
Docker Can Do This, Too
Before I stumbled upon ORAS, I tried my luck with a normal image builder. My preferred tool is Buildah, and it can actually do it. The key is to use the empty base image scratch
. The equivalent to oras push ghcr.io/example/data:1 data
looks as follows:
$ export newcontainer=$(buildah from scratch)
$ buildah unshare
$ buildah copy $newcontainer data /data
$ buildah unmount $newcontainer
$ buildah commit $newcontainer data
$ buildah rm $newcontainer
$ buildah push data:latest ghcr.io/example/data:1
When you think that this is kinda gross, it absolutely is.
Docker does not fare better. First, we need a Dockerfile
next to the data
folder:
FROM scratch
COPY data /data
Then, we can build the image and push it to GHCR:
$ docker build -t ghcr.io/example/data:1 .
$ docker image push ghcr.io/example/data:1
Skopeo is probably the best tool (relatively speaking) to get the data
folder back:
$ skopeo copy docker://ghcr.io/example/data:1 dir:output
This command will extract the image into the pre-existing folder output
. Unfortunately, we are far from done. We still have to look into the manifest to figure out which file contains the filesystem layer and what compression algorithm was used. Then, we can extract it with tar
to get our folder data
back.
This is absolutely no fun, and nobody should do it. I only wanted to mention it. After all, you never know when this otherwise useless knowledge might come in handy.
-
GitHub itself showcases that Homebrew stores at least half a petabyte of binaries on GHCR. If you are curious how Homebrew does it: Homebrew writes the OCI image itself and then uploads it using skopeo. ↩︎
-
A manifest is a piece of metadata that describes the contents of a container image. ↩︎
-
Container registries usually store each layer only once, even if it is part of hundreds or thousands of images. ↩︎
-
Data transfer is free of charge when GHCR is accessed with
GITHUB_TOKEN
. ↩︎