Managing Docker Images
In the previous posts of the series, we discussed in depth about Docker images. As we've seen, we've been able to take existing images, provided to the general public in Docker Hub, and run them or reuse them for our purposes. The image itself helps us streamline our processes and reduce the work we need to do.
In the next few posts, we are going to take a more in-depth look at images and how to work with them on our system. We're going to discuss how images can be better organized and tagged, understand how different layers of images work, and set up registries that are both public and private to further reuse the images we have created.
Docker images are also ideal for application development because each image contains a complete, self-contained version of the application along with all its dependencies. This enables developers to create an image locally and deploy it in development or testing environments to verify compatibility with other parts of the application. If testing is successful, the same image can be pushed to the production environment for users to use. It's crucial to maintain consistency when using these images, especially when collaborating within larger developer teams.
Docker Layers and Caching
A registry is a way to store and distribute Docker images. When you pull a Docker image from a registry, you might notice that the image is pulled in pieces and not as a single image. The same thing happens when you build an image on your system.
This is because Docker images are structured in layers, each representing a stage of the image's construction. Each layer of an image represents a specific action or change made when building the image with a Dockerfile. These layers are organized on top of a base image, capturing every change to the filesystem that occurs with each instruction in the Dockerfile. This setup is structured in a way that Docker can use caching efficiently.
When you instantiate an image as a container, Docker adds a writable layer on top of the existing read-only layers. This writable layer, often referred to as the container layer, allows the container to modify and persist changes during runtime without affecting the underlying image.
As we will see in the following examples, when you build a Docker container from a Dockerfile, Docker shows the execution of each command specified in the Dockerfile. These commands contribute to creating layers in the Docker image, each represented by a unique ID generated during the build process. After successfully building the image, we can inspect the layers using the docker history
command, which provides a detailed view including the image name or ID alongside the commands that formed each layer.
It's important to note, that as you setup your build environment and progress in development, the number of layers in the Docker image grows. More layers mean larger image sizes, which can lead to longer build times.
When you build an image from a Dockerfile, each instruction contributes to the creation of layers in the image. Layers are created explicitly when commands like RUN, ADD and COPY are executed. These commands make changes to the filesystem within the image, resulting in new layers being added.
On the other hand, commands like FROM, ENV, WORKDIR and CMD do not directly create filesystem changes. Instead, they modify the environment or configure settings within the image without altering the filesystem itself. As a result, these commands generate intermediate layers. These layers have a size of 0 bytes because they don't introduce any new filesystem change. They serve as metadata or configuration layers that help define how an image behaves or is structured, but they don't increase the size of the final Docker image.
When building our Docker images, we can use the docker history
command and the image name or ID to see the layers used to create the image. The output will provide details on commands being used to generate the layer as well as the size of the layer.
docker history <image_name|image_id>
The docker image inspect
command is useful in providing further details on where the layers of our images are located:
docker image inspect <image_id>
Working with Docker Image Layers
In this example, we are going to work with some basic Dockerfiles to see how Docker uses layers to build images. We will start by creating a Dockerfile and building a new image. We will then rebuild the image to see the advantages of caching and how the build time is reduced due to its use.
Create a file named Dockerfile and add the following directives:
FROM alpine
RUN apk update
RUN apk add wget
Save the Dockerfile and then, from the command line, make sure you are in the same directory as the Dockerfile you are created. Use the docker build
command to create the new image using the -t
option to name it basic-example
docker build -t basic-example .
If the image is built successfully, you should see an output similar to the following. Rach step is built as an intermediate layer and if it completes successfully, it is then transferred to a read-only layer
[+] Building 9.0s (8/8) FINISHED docker:default
=> [internal] load .dockerignore 0.1s
=> => transferring context: 2B 0.0s
=> [internal] load build definition from Dockerfile 0.1s
=> => transferring dockerfile: 84B 0.0s
=> [internal] load metadata for docker.io/library/alpine:latest 3.6s
=> [auth] library/alpine:pull token for registry-1.docker.io 0.0s
=> [1/3] FROM docker.io/library/alpine@sha256:b89d9c93e9ed3597455c90a0b88a8bbb5cb7188438f70953fede212a0c4394e0 1.1s
=> => resolve docker.io/library/alpine@sha256:b89d9c93e9ed3597455c90a0b88a8bbb5cb7188438f70953fede212a0c4394e0 0.1s
=> => sha256:a606584aa9aa875552092ec9e1d62cb98d486f51f389609914039aabd9414687 1.47kB / 1.47kB 0.0s
=> => sha256:ec99f8b99825a742d50fb3ce173d291378a46ab54b8ef7dd75e5654e2a296e99 3.62MB / 3.62MB 0.6s
=> => sha256:b89d9c93e9ed3597455c90a0b88a8bbb5cb7188438f70953fede212a0c4394e0 1.85kB / 1.85kB 0.0s
=> => sha256:dabf91b69c191a1a0a1628fd6bdd029c0c4018041c7f052870bb13c5a222ae76 528B / 528B 0.0s
=> => extracting sha256:ec99f8b99825a742d50fb3ce173d291378a46ab54b8ef7dd75e5654e2a296e99 0.2s
=> [2/3] RUN apk update 2.1s
=> [3/3] RUN apk add wget 1.7s
=> exporting to image 0.1s
=> => exporting layers 0.1s
=> => writing image sha256:2fc965ee555abcf548a268e3622ee031366479ddefb080e6920847c46a8848b9 0.0s
=> => naming to docker.io/library/basic-example
Use the docker history
command along with the image name of basic-example
to see the different layers of the image
docker history basic-example
The history gives you creation details, including the size of each layer
IMAGE CREATED CREATED BY SIZE COMMENT
2fc965ee555a 7 minutes ago RUN /bin/sh -c apk add wget # buildkit 3.07MB buildkit.dockerfile.v0
<missing> 7 minutes ago RUN /bin/sh -c apk update # buildkit 2.32MB buildkit.dockerfile.v0
<missing> 5 days ago /bin/sh -c #(nop) CMD ["/bin/sh"] 0B
<missing> 5 days ago /bin/sh -c #(nop) ADD file:33ebe56b967747a97… 7.8MB
The docker history
command shows the layer of the original image used as part of the Dockerfile FROM command as <missing>
. It is showing as missing
as it was created by a different system and then pulled into ours.
Run the build again without making any changes
docker build -t basic-example .
This will show you the build is done using the layers stored in the Docker image cache, thereby speeding up our build. Although this is a small image, a much larger image would show a significant increase
[+] Building 4.3s (8/8) FINISHED docker:default
=> [internal] load .dockerignore 0.0s
=> => transferring context: 2B 0.0s
=> [internal] load build definition from Dockerfile 0.0s
=> => transferring dockerfile: 84B 0.0s
=> [internal] load metadata for docker.io/library/alpine:latest 4.1s
=> [auth] library/alpine:pull token for registry-1.docker.io 0.0s
=> [1/3] FROM docker.io/library/alpine@sha256:b89d9c93e9ed3597455c90a0b88a8bbb5cb7188438f70953fede212a0c4394e0 0.0s
=> CACHED [2/3] RUN apk update 0.0s
=> CACHED [3/3] RUN apk add wget 0.0s
=> exporting to image 0.0s
=> => exporting layers 0.0s
=> => writing image sha256:2fc965ee555abcf548a268e3622ee031366479ddefb080e6920847c46a8848b9 0.0s
=> => naming to docker.io/library/basic-example 0.0s
Lets add the curl
package as part of our image creation, and modify the Dockerfile as follows
FROM alpine
RUN apk update
RUN apk add wget
RUN apk add curl
Build the image again, and now you'll see the image was created with a mix of cached and new layers
docker build -t basic-example .
The above command should create the following output
[+] Building 7.1s (9/9) FINISHED docker:default
=> [internal] load .dockerignore 0.0s
=> => transferring context: 2B 0.0s
=> [internal] load build definition from Dockerfile 0.0s
=> => transferring dockerfile: 102B 0.0s
=> [internal] load metadata for docker.io/library/alpine:latest 4.0s
=> [auth] library/alpine:pull token for registry-1.docker.io 0.0s
=> [1/4] FROM docker.io/library/alpine@sha256:b89d9c93e9ed3597455c90a0b88a8bbb5cb7188438f70953fede212a0c4394e0 0.0s
=> CACHED [2/4] RUN apk update 0.0s
=> CACHED [3/4] RUN apk add wget 0.0s
=> [4/4] RUN apk add curl 2.8s
=> exporting to image 0.1s
=> => exporting layers 0.1s
=> => writing image sha256:f2bb77eb9898954a27c2bd12838c3ed0fdfe19ed78a7440189c28bdb0cbfbf8d 0.0s
=> => naming to docker.io/library/basic-example
Run the docker image
command again
docker images
You will now notice an image named and tagged as <none>
to show we have now created a dangling image
REPOSITORY TAG IMAGE ID CREATED SIZE
basic-example latest f2bb77eb9898 52 seconds ago 18.9MB
<none> <none> 2fc965ee555a 16 hours ago 13.2MB
onbuild-child-example latest 9fb3629a292e 5 days ago 222MB
onbuild-parent-example latest 4a6360882fb6 6 days ago 222MB
In Docker, dangling images are those that are represented by <none>
in the image list. These images occur when a layer in the image hierarchy no longer corresponds to any tagged or referenced image in the system. Essentially, they are orphaned or unused layers that have lost their connection to any active image.
Dangling images can accumulate over time as you build and prune Docker images, and they occupy disk space without serving any meaningful purpose. Even though each individual dangling image might be relatively small, such as the example of 7.48 MB, these sizes can accumulate significantly over time, especially in development and production environments where frequent image builds and updates occur.
Run the docker image inspect
command using the image ID to see the location of where the dangling images are located in the system
docker image inspect 2fc965ee555a
And you should get an output similar to the following
...
"Data":{
"LowerDir":"/var/lib/docker/overlay2/p9vnb2sakx8pcxhnabhpbbghs/diff:/var/lib/docker/overlay2/f41cbe299d47005328bcfbf0aa9c958ead5148eca0e0f65679aaabb38f9db96a/diff",
"MergedDir":"/var/lib/docker/overlay2/sbm1ykpcezy3eydzy6eyxhz08/merged",
"UpperDir":"/var/lib/docker/overlay2/sbm1ykpcezy3eydzy6eyxhz08/diff",
"WorkDir":"/var/lib/docker/overlay2/sbm1ykpcezy3eydzy6eyxhz08/work"
}
...
All of our images are located in the same location. So any dangling images would waste space on our system.
Run the docker images
command again using the -a
option
docker images -a
It will also show the intermediate layers used when our image is build
basic-example latest f2bb77eb9898 23 minutes ago 18.9MB
<none> <none> 2fc965ee555a 17 hours ago 13.2MB
onbuild-child-example latest 9fb3629a292e 5 days ago 222MB
Run the docker image prune
command to remove all the dangling images. We could use docker rmi
but the docker image prune
command is the easier way to do it
docker image prune
You should get an output looking like the following
WARNING! This will remove all dangling images.
Are you sure you want to continue? [y/N] y
Deleted Images:
deleted: sha256:2fc965ee555abcf548a268e3622ee031366479ddefb080e6920847c46a8848b9
Run the docker images
command again
docker images
You will see we no longer have the dangling image in our list of images
REPOSITORY TAG IMAGE ID CREATED SIZE
basic-example latest f2bb77eb9898 28 minutes ago 18.9MB
onbuild-child-example latest 9fb3629a292e 5 days ago 222MB
onbuild-parent-example latest 4a6360882fb6 6 days ago 222MB
In this example was on smaller image sizes, but this is definitely something to keep in mind when running production and development environments. In the next example, we will look further at our layers and caching to see how they can be used to speed up the image build process.
Increasing Build Speed and Reducing Layers
We've been working with small projects up until now. As our apps get bigger and more complex, though, we'll want to start thinking about the size and number of layers in our Docker images, along with how quickly we're building them. In this example, we'll focus on speeding up build times, shrinking those image sizes, and using the --cache-from
option to make things even faster.
First, let's clean up any existing images on your system. We'll use the docker rmi -f $(docker images -a -q)
command, which will force-remove all images currently on your system. This will give you a clean slate to work with.
Create a new Dockerfile with the following content. It will simulate a simple web server, as well as print the output of our Dockerfile during the build process
FROM alpine
RUN apk update
RUN apk add wget curl
RUN wget -O randomdata.txt https://github.com/Kalkwst/Docker-Workshop-Repository/blob/master/Dockerfiles/create-base-image2/random_data.txt
CMD mkdir /var/www/
CMD mkdir /var/www/html/
WORKDIR /var/www/html/
COPY Dockerfile.tar.gz /tmp/
RUN tar -zxvf /tmp/Dockerfile.tar.gz -C /var/www/html
RUN rm /tmp/Dockerfile.tar.gz
RUN cat Dockerfile
Download the Alpine
base image using docker pull
so that we can start with the same image for each test we do
docker pull alpine
Create a TAR file to be added to our image
tar zcvf Dockerfile.tar.gz Dockerfile
Build a new image using the name of basic-server
. We are going to use the time
command at the start of the command to allow us to gauge the time it takes to build the image
time docker build -t basic-server .
The output will return something similar to the following
#0 building with "default" instance using docker driver
#1 [internal] load build definition from Dockerfile
#1 transferring dockerfile: 451B done
#1 DONE 0.0s
#2 [internal] load .dockerignore
#2 transferring context: 2B done
#2 DONE 0.1s
#3 [internal] load metadata for docker.io/library/alpine:latest
#3 DONE 0.0s
#4 [1/9] FROM docker.io/library/alpine
#4 DONE 0.0s
#5 [2/9] RUN apk update
#5 CACHED
#6 [internal] load build context
#6 transferring context: 396B done
#6 DONE 0.0s
#7 [3/9] RUN apk add wget curl
#7 0.323 fetch https://dl-cdn.alpinelinux.org/alpine/v3.20/main/x86_64/APKINDEX.tar.gz
#7 1.054 fetch https://dl-cdn.alpinelinux.org/alpine/v3.20/community/x86_64/APKINDEX.tar.gz
#7 2.267 (1/12) Installing ca-certificates (20240226-r0)
#7 2.393 (2/12) Installing brotli-libs (1.1.0-r2)
#7 2.638 (3/12) Installing c-ares (1.28.1-r0)
#7 2.723 (4/12) Installing libunistring (1.2-r0)
#7 3.061 (5/12) Installing libidn2 (2.3.7-r0)
#7 3.155 (6/12) Installing nghttp2-libs (1.62.1-r0)
#7 3.240 (7/12) Installing libpsl (0.21.5-r1)
#7 3.316 (8/12) Installing zstd-libs (1.5.6-r0)
#7 3.526 (9/12) Installing libcurl (8.8.0-r0)
#7 3.709 (10/12) Installing curl (8.8.0-r0)
#7 3.831 (11/12) Installing pcre2 (10.43-r0)
#7 4.008 (12/12) Installing wget (1.24.5-r0)
#7 4.157 Executing busybox-1.36.1-r29.trigger
#7 4.160 Executing ca-certificates-20240226-r0.trigger
#7 4.183 OK: 14 MiB in 26 packages
#7 DONE 4.3s
#8 [4/9] RUN wget -O randomdata.txt https://github.com/Kalkwst/Docker-Workshop-Repository/blob/master/Dockerfiles/create-base-image2/random_data.txt
#8 0.404 --2024-07-01 07:34:51-- https://github.com/Kalkwst/Docker-Workshop-Repository/blob/master/Dockerfiles/create-base-image2/random_data.txt
#8 0.414 Resolving github.com (github.com)... 140.82.121.3
#8 0.561 Connecting to github.com (github.com)|140.82.121.3|:443... connected.
#8 0.685 HTTP request sent, awaiting response... 200 OK
#8 1.109 Length: unspecified [text/html]
#8 1.109 Saving to: 'randomdata.txt'
#8 1.109
#8 1.109 0K .......... .......... .......... .......... .......... 437K
#8 1.223 50K .......... .......... .......... .......... .......... 881K
#8 1.280 100K .......... .......... .......... .......... .......... 2.02M
#8 1.304 150K .......... .......... .......... .......... .......... 720K
#8 1.373 200K .......... .......... .......... .......... .......... 35.1M
#8 1.375 250K .......... .......... ....... 481K=0.3s
#8 1.432
#8 1.432 2024-07-01 07:34:52 (858 KB/s) - 'randomdata.txt' saved [284319]
#8 1.432
#8 DONE 1.5s
#9 [5/9] WORKDIR /var/www/html/
#9 DONE 0.0s
#10 [6/9] COPY Dockerfile.tar.gz /tmp/
#10 DONE 0.0s
#11 [7/9] RUN tar -zxvf /tmp/Dockerfile.tar.gz -C /var/www/html
#11 0.365 Dockerfile
#11 DONE 0.4s
#12 [8/9] RUN rm /tmp/Dockerfile.tar.gz
#12 DONE 0.5s
#13 [9/9] RUN cat Dockerfile
#13 0.450 FROM alpine
#13 0.450
#13 0.450 RUN apk update
#13 0.450 RUN apk add wget curl
#13 0.450
#13 0.450 RUN wget -O randomdata.txt https://github.com/Kalkwst/Docker-Workshop-Repository/blob/master/Dockerfiles/create-base-image2/random_data.txt
#13 0.450
#13 0.450 CMD mkdir /var/www/
#13 0.450 CMD mkdir /var/www/html/
#13 0.450
#13 0.450 WORKDIR /var/www/html/
#13 0.450
#13 0.450 COPY Dockerfile.tar.gz /tmp/
#13 0.450
#13 0.450 RUN tar -zxvf /tmp/Dockerfile.tar.gz -C /var/www/html
#13 0.450 RUN rm /tmp/Dockerfile.tar.gz
#13 0.450
#13 0.450 RUN cat Dockerfile
#13 DONE 0.5s
#14 exporting to image
#14 exporting layers
#14 exporting layers 0.2s done
#14 writing image sha256:971477cab5a251188e82e14baaf9268de83ecdd04429e5818c617ffe0803921c done
#14 naming to docker.io/library/basic-server done
#14 DONE 0.2s
And the time will be:
...
real 0m10.468s
user 0m0.060s
sys 0m0.276s
Run the docker history
command over the new basic-app
image
docker history basic-server
The output should be something like the following
IMAGE CREATED CREATED BY SIZE COMMENT
971477cab5a2 8 minutes ago RUN /bin/sh -c cat Dockerfile # buildkit 0B buildkit.dockerfile.v0
<missing> 8 minutes ago RUN /bin/sh -c rm /tmp/Dockerfile.tar.gz # b… 0B buildkit.dockerfile.v0
<missing> 8 minutes ago RUN /bin/sh -c tar -zxvf /tmp/Dockerfile.tar… 412B buildkit.dockerfile.v0
<missing> 8 minutes ago COPY Dockerfile.tar.gz /tmp/ # buildkit 350B buildkit.dockerfile.v0
<missing> 8 minutes ago WORKDIR /var/www/html/ 0B buildkit.dockerfile.v0
<missing> 8 minutes ago CMD ["/bin/sh" "-c" "mkdir /var/www/html/"] 0B buildkit.dockerfile.v0
<missing> 8 minutes ago CMD ["/bin/sh" "-c" "mkdir /var/www/"] 0B buildkit.dockerfile.v0
<missing> 8 minutes ago RUN /bin/sh -c wget -O randomdata.txt https:… 284kB buildkit.dockerfile.v0
<missing> 8 minutes ago RUN /bin/sh -c apk add wget curl # buildkit 8.75MB buildkit.dockerfile.v0
<missing> 4 days ago RUN /bin/sh -c apk update # buildkit 2.32MB buildkit.dockerfile.v0
<missing> 10 days ago /bin/sh -c #(nop) CMD ["/bin/sh"] 0B
<missing> 10 days ago /bin/sh -c #(nop) ADD file:33ebe56b967747a97… 7.8MB
As you can see there are 12 layers in our new image. As you can see, the RUN, COPY and ADD commands in our Dockerfile are creating layers of a particular size relevant to the command being run or files being added, and all of the other commands are of size 0 B
.
We can slim down our image by merging some of the commands in the Dockerfile we created earlier. Combine the RUN
commands from lines 3 and 4, and then merge the CMD
commands from lines 8 and 9. This will reduce the number of layers in our image, making it more efficient.
After making these changes, your Dockerfile should look like this:
FROM alpine
RUN apk update && apk add wget curl
RUN wget -O randomdata.txt https://github.com/Kalkwst/Docker-Workshop-Repository/blob/master/Dockerfiles/create-base-image2/random_data.txt
CMD mkdir -p /var/www/html/
WORKDIR /var/www/html/
COPY Dockerfile.tar.gz /tmp/
RUN tar -zxvf /tmp/Dockerfile.tar.gz -C /var/www/hmtl/
RUN rm /tmp/Dockerfile.tar.gz
RUN cat Dockerfile
If we rebuild our Docker image now, we'll see that the number of layers has decreased from 12 to 9. This is because we combined several commands into single lines, even though the same actions are still being performed.
We can further optimize our Dockerfile by replacing lines 11, 12, and 13 with a single ADD
command. This eliminates the need for separate COPY, RUN, and RUN commands to unzip and remove the archived file. Here's the updated Dockerfile snippet:
FROM alpine
RUN apk update && apk add wget curl
RUN wget -O randomdata.txt https://github.com/Kalkwst/Docker-Workshop-Repository/blob/master/Dockerfiles/create-base-image2/random_data.txt
CMD mkdir -p /var/www/html/
WORKDIR /var/www/html/
ADD Dockerfile.tar.gz -C /var/www/hmtl/
RUN rm /tmp/Dockerfile.tar.gz
RUN cat Dockerfile
Rebuilding our Docker image with the ADD command in place reduces the number of layers from 9 to 8, making it even more streamlined.
You might have noticed that a significant portion of the build time is spent running apk update
, installing wget
and curl
, and fetching content from websites (as seen in lines 3 and 5 of our Dockerfile). While this isn't a major issue for a few builds, it can become a bottleneck when creating multiple images.
To address this, we can create a base image that already includes these tools and dependencies. By using this base image as our starting point, we can eliminate these lines from our Dockerfile altogether, further improving build times and image efficiency.
All right, let's create a dedicated base image to streamline our Docker builds. First, navigate to a new directory:
mkdir base-image
cd base-image
Now, create a new Dockerfile in this directory with the following contents:
FROM alpine
RUN apk update && apk add --no-cache wget curl
RUN wget -O randomdata.txt https://github.com/Kalkwst/Docker-Workshop-Repository/blob/master/Dockerfiles/create-base-image2/random_data.txt
This Dockerfile does three things:
- Pulls the base image: It starts with the
alpine:latest
image as its foundation. - Runs the apk commands: It updates the package index and installs
wget
andcurl
usingapk add --no-cache
. The--no-cache
option prevents apk from storing the downloaded packages in the local cache, keeping the image smaller. - Runs the wget command: It finally downloads the
randomdata.txt
file from the external directory.
Build the new image from the previous Dockerfile and name it basic-base
docker build basic-base .
Perfect! Now that we have a base image ready, let's update our original Dockerfile. First, head back to the directory where your original Dockerfile resides:
cd ../<original_project_directory>
Now, open your Dockerfile and make the following changes:
- Remove line 3: Delete the line that starts with
RUN apk update...
since these commands are now handled in our base image. - Update the
FROM
command: Change the base image fromalpine:latest
to the name you'll give your custom base image (i.e.,basic-base
). - Remove the
apk
commands from line 3 (now line 2): Delete the remaining part of the line that installedwget
andcurl
. Your updated Dockerfile should now look similar to this
FROM basic-base
CMD mkdir -p /var/www/html/
WORKDIR /var/www/html/
ADD Dockerfile.tar.gz /var/www/html/
RUN cat Dockerfile
Run the build again for your new Dockerfile. Using the time
command again, you should see the build complete in just under 3 seconds
time docker build -t basic-server .
You'll likely notice a significant speed improvement compared to our previous builds. This is because we've offloaded the time-consuming package installations to our base image, streamlining the build process for our main image.
real 0m2.924s
user 0m0.000s
sys 0m0.262s
Throughout this example, we've seen firsthand how the build cache and image layers work together to significantly speed up the Docker build process. We've been starting our builds by pulling images from Docker Hub, but you have the flexibility to start with your own custom images. This gives you even more control over the build process and allows for further optimization.
By creating and maintaining your own base images, you can tailor them to your specific needs, pre-installing common dependencies and configurations. This not only reduces build times but also ensures consistency across your projects.
Summary
This post demonstrated how Docker allows users to work with images to package their applications together with a working environment to be moved across different environments. You've also seen how Docker uses layers and caching to improve build speed and ensure you can also work with these layers to reserve resources or disk space.
Source: Kostas Kalafatis