Is Your Container Image Really Distroless?
Containerization helped drastically improve the security of applications by providing engineers with greater control over the runtime environment of their applications. However, a significant time investment is required to maintain the security posture of those applications, given the daily discovery of new vulnerabilities as well as regular releases of languages and frameworks.
The concept of “distroless” images offers the promise of greatly reducing the time needed to keep applications secure by eliminating most of the software contained in typical container images. This approach also reduces the amount of time teams spend remediating vulnerabilities, allowing them to focus only on the software they are using.
In this article, we explain what makes an image distroless, describe tools that make the creation of distroless images practical, and discuss whether distroless images live up to their potential.
What’s a distro?
A Linux distribution is a complete operating system built around the Linux kernel, comprising a package management system, GNU tools and libraries, additional software, and often a graphical user interface.
Common Linux distributions include Debian, Ubuntu, Arch Linux, Fedora, Red Hat Enterprise Linux, CentOS, and Alpine Linux (which is more common in the world of containers). These Linux distributions, like most Linux distros, treat security seriously, with teams working diligently to release frequent patches and updates to known vulnerabilities. A key challenge that all Linux distributions must face involves the usability/security dilemma.
On its own, the Linux kernel is not very usable, so many utility commands are included in distributions to cover a large array of use cases. Having the right utilities included in the distribution without having to install additional packages greatly improves a distro’s usability. The downside of this increase in usability, however, is an increased attack surface area to keep up to date.
A Linux distro must strike a balance between these two elements, and different distros have different approaches to doing so. A key aspect to keep in mind is that a distro that emphasizes usability is not “less secure” than one that does not emphasize usability. What it means is that the distro with more utility packages requires more effort from its users to keep it secure.
Multi-stage builds
Multi-stage builds allow developers to separate build-time dependencies from runtime ones. Developers can now start from a full-featured build image with all the necessary components installed, perform the necessary build step, and then copy only the result of those steps to a more minimal or even an empty image, called “scratch”. With this approach, there’s no need to clean up dependencies and, as an added bonus, the build stages are also cacheable, which can considerably reduce build time.
The following example shows a Go program taking advantage of multi-stage builds. Because the Golang runtime is compiled into the binary, only the binary and root certificates need to be copied to the blank slate image.
FROM golang:1.21.5-alpine as build
WORKDIR /
COPY go.* .
RUN go mod download
COPY . .
RUN go build -o my-app
FROM scratch
COPY --from=build
/etc/ssl/certs/ca-certificates.crt
/etc/ssl/certs/ca-certificates.crt
COPY --from=build /my-app /usr/local/bin/my-app
ENTRYPOINT ["/usr/local/bin/my-app"]
BuildKit
BuildKit, the current engine used by docker build, helps developers create minimal images thanks to its extensible, pluggable architecture. It provides the ability to specify alternative frontends (with the default being the familiar Dockerfile) to abstract and hide the complexity of creating distroless images. These frontends can accept more streamlined and declarative inputs for builds and can produce images that contain only the software needed for the application to run.
The following example shows the input for a frontend for creating Python applications:
#syntax=cmdjulian/mopy
apiVersion: v1
python: 3.9.2
build-deps:
- libopenblas-dev
- gfortran
- build-essential
envs:
MYENV: envVar1
pip:
- numpy==1.22
- slycot
- ./my_local_pip/
- ./requirements.txt
labels:
foo: bar
fizz: ${mopy.sbom}
project: my-python-app/
So, is your image really distroless?
Thanks to new tools for creating container images like multi-stage builds and BuildKit, it is now a lot more practical to create images that only contain the required software and its runtime dependencies.
However, many images claiming to be distroless still include a shell (usually Bash) and/or BusyBox, which provides many of the commands a Linux distribution does — including wget — that can leave containers vulnerable to Living off the land (LOTL) attacks. This raises the question, “Why would an image trying to be distroless still include key parts of a Linux distribution?” The answer typically involves container initialization.
Developers often have to make their applications configurable to meet the needs of their users. Most of the time, those configurations are not known at build time so they need to be configured at run time. Often, these configurations are applied using shell initialization scripts, which in turn depend on common Linux utilities such as sed, grep, cp, etc. When this is the case, the shell and utilities are only needed for the first few seconds of the container’s lifetime. Luckily, there is a way to create true distroless images while still allowing initialization using tools available from most container orchestrators: init containers.
Init containers
In Kubernetes, an init container is a container that starts and must complete successfully before the primary container can start. By using a non-distroless container as an init container that shares a volume with the primary container, the runtime environment and application can be configured before the application starts.
The lifetime of that init container is short (often just a couple seconds), and it typically doesn’t need to be exposed to the internet. Much like multi-stage builds allow developers to separate the build-time dependencies from the runtime dependencies, init containers allow developers to separate initialization dependencies from the execution dependencies.
The concept of init container may be familiar if you are using relational databases, where an init container is often used to perform schema migration before a new version of an application is started.
Kubernetes example
Here are two examples of using init containers. First, using Kubernetes:
apiVersion: v1
kind: Pod
metadata:
name: kubecon-postgress-pod
labels:
app.kubernetes.io/name: KubeConPostgress
spec:
containers:
- name: postgress
image: laurentgoderre689/postgres-distroless
securityContext:
runAsUser: 70
runAsGroup: 70
volumeMounts:
- name: db
mountPath: /var/lib/postgresql/data/
initContainers:
- name: init-postgress
image: postgres:alpine3.18
env:
- name: POSTGRES_PASSWORD
valueFrom:
secretKeyRef:
name: kubecon-postgress-admin-pwd
key: password
command: ['docker-ensure-initdb.sh']
volumeMounts:
- name: db
mountPath: /var/lib/postgresql/data/
volumes:
- name: db
emptyDir: {}
- - -
> kubectl apply -f pod.yml && kubectl get pods
pod/kubecon-postgress-pod created
NAME READY STATUS RESTARTS AGE
kubecon-postgress-pod 0/1 Init:0/1 0 0s
> kubectl get pods
NAME READY STATUS RESTARTS AGE
kubecon-postgress-pod 1/1 Running 0 10s
Docker Compose example
The init container concept can also be emulated in Docker Compose for local development using service dependencies and conditions.
services:
db:
image: laurentgoderre689/postgres-distroless
user: postgres
volumes:
- pgdata:/var/lib/postgresql/data/
depends_on:
db-init:
condition: service_completed_successfully
db-init:
image: postgres:alpine3.18
environment:
POSTGRES_PASSWORD: example
volumes:
- pgdata:/var/lib/postgresql/data/
user: postgres
command: docker-ensure-initdb.sh
volumes:
pgdata:
- - -
> docker-compose up
[+] Running 4/0
✔ Network compose_default Created
✔ Volume "compose_pgdata" Created
✔ Container compose-db-init-1 Created
✔ Container compose-db-1 Created
Attaching to db-1, db-init-1
db-init-1 | The files belonging to this database system will be owned by user "postgres".
db-init-1 | This user must also own the server process.
db-init-1 |
db-init-1 | The database cluster will be initialized with locale "en_US.utf8".
db-init-1 | The default database encoding has accordingly been set to "UTF8".
db-init-1 | The default text search configuration will be set to "english".
db-init-1 | [...]
db-init-1 exited with code 0
db-1 | 2024-02-23 14:59:33.191 UTC [1] LOG: starting PostgreSQL 16.1 on aarch64-unknown-linux-musl, compiled by gcc (Alpine 12.2.1_git20220924-r10) 12.2.1 20220924, 64-bit
db-1 | 2024-02-23 14:59:33.191 UTC [1] LOG: listening on IPv4 address "0.0.0.0", port 5432
db-1 | 2024-02-23 14:59:33.191 UTC [1] LOG: listening on IPv6 address "::", port 5432
db-1 | 2024-02-23 14:59:33.194 UTC [1] LOG: listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
db-1 | 2024-02-23 14:59:33.196 UTC [9] LOG: database system was shut down at 2024-02-23 14:59:32 UTC
db-1 | 2024-02-23 14:59:33.198 UTC [1] LOG: database system is ready to accept connections
As demonstrated by the previous example, an init container can be used alongside a container to remove the need for general-purpose software and allow the creation of true distroless images.
Conclusion
This article explained how Docker build tools allow for the separation of build-time dependencies from run-time dependencies to create “distroless” images. For example, using init containers allows developers to separate the logic needed to configure a runtime environment from the environment itself and provide a more secure container. This approach also helps teams focus their efforts on the software they use and find a better balance between security and usability.
Source: Laurent Goderre