Code Review Videos > Linux > Multi-Stage Docker Build Arg Example

Multi-Stage Docker Build Arg Example

Today I needed access to an environment variable whilst building a Docker image from a Dockerfile. To make the example concrete, I wanted to get access to the Git commit hash exposed in a CircleCI build. The idea was to then use this commit hash inside my build as a cache buster on client side JavaScript assets.

A cache buster is something like src=/my/asset.js?version=some_random_string. Only you don’t want the string to be random in so much as it changes every single page load. But rather it changes when the file contents change (e.g. on a new build being deployed), which forces all clients to re-download the file only when it changes… or busts the cache.

However it doesn’t really matter what the use case is, but rather how can we get access to environment variables during a Docker build? And how do multi-stage Docker builds handle environment variables?

Well, I found the way this worked to be a little confusing. Hence the blog post.

A broken container, a fine example of how Docker can make you feel.

A Simple Docker Image Example

Let’s start with a basic example.

Here’s our Dockerfile:

FROM alpine

ENV MY_ENV=default_value

CMD ["sh", "-c", "echo \"my env var value: $MY_ENV\""]Code language: Dockerfile (dockerfile)

We’re using the Alpine image as it’s tiny and so makes a good base for this example.

ENV MY_ENV=default_value sets an environment variable named MY_ENV with the value default_value.

CMD ["sh", "-c", "echo \"my env var value: $MY_ENV\""] looks messy, but the escaped ouput is essential to ensure our environment variable actually gets printed.

Build And Run

We need to build and run this:

➜  docker build -t my-docker-image .
[+] Building 0.4s (5/5) FINISHED                                                                    docker:default
 => [internal] load build definition from Dockerfile                                                          0.0s
 => => transferring dockerfile: 525B                                                                          0.0s
 => [internal] load .dockerignore                                                                             0.0s
 => => transferring context: 2B                                                                               0.0s
 => [internal] load metadata for docker.io/library/alpine:latest                                              0.3s
 => CACHED [1/1] FROM docker.io/library/alpine@sha256:51b67269f354137895d43f3b3d810bfacd3945438e94dc5ac55fda  0.0s
 => exporting to image                                                                                        0.0s
 => => exporting layers                                                                                       0.0s
 => => writing image sha256:d7d8aed46eb9c83ecda6b72062f7e67a7b2ebad531a8f091a7fe364ae964a263                  0.0s
 => => naming to docker.io/library/my-docker-image                                                            0.0s

➜  docker run --rm my-docker-image       
my env var value: default_valueCode language: PHP (php)

Not the most thrilling example, but it gets the job done.

Override The Environment Variable At Runtime

At runtime we aren’t restricted to the default value set in the Dockerfile during build.

➜ docker run --rm -e MY_ENV=new_value my-docker-image
my env var value: new_valueCode language: JavaScript (javascript)

By passing in a new environment variable when actually creating a container from the image, we can override the default value quite easily.

That’s fine, but it doesn’t directly answer my original problem. I won’t have access to the Git commit hash at runtime. That is something that is only available to me at build time, very specifically inside the CircleCI environment.

Ideally what we need to do is pass in the Git commit hash that CircleCI provides during the build, and then expose that string as an environment variable inside the running container.

So let’s keep going.

Arg and Env, Why and When?

You may be aware you can pass Docker arguments via the command line when both building an image, and running a container.

An example during the build phase could be:

docker build -t my-docker-image --build-arg BASE_IMAGE_TAG=latest .

And an example when running a container:

docker run -e MY_ENV=some_value my-docker-image

In the examples above, BASE_IMAGE_TAG and MY_ENV would be found inside the Dockerfile that you use to create the image.

FROM alpine:latest

ARG MY_ARG="a build argument value"
ENV MY_ENV="an environment variable"

# Write ARG and ENV values to a text file inside the image
RUN echo "Build Argument: $MY_ARG" > /output.txt \
    && echo "Environment Variable: $MY_ENV" >> /output.txt

# Display the contents of the text file when running a container from this image
CMD ["cat", "/output.txt"]Code language: Dockerfile (dockerfile)

ENV vars are intended to be like system environment variables. They are for setting environment variables that are available during the runtime of the container.

However, ARG is for build time variables.

ARG defines variables that we can pass at build-time to the builder with the docker build command using the --build-arg flag. These variables are only available during the build stage.

Placement Matters

Perhaps more importantly, the placement of the ARG definition inside your Dockerfile matters.

In the example below we will allow the user who builds an image from our Dockerfile to specify a particular version of the alpine image to use. If they don’t provide a specific version, we will specific 3.14 as a fallback / default version.

# Set up our argument with a default value
ARG BASE_IMAGE_TAG=3.14

# Start with the specified base image - seemingly all good
FROM alpine:$BASE_IMAGE

WORKDIR /app

# Try to write the build argument value to a file in the resulting image
RUN echo "Build arg value: '$BASE_IMAGE'" > build_arg_file

# Display the content of the file during the run command
# gotcha: it will be blank
CMD ["sh", "-c", "cat /app/build_arg_file"]Code language: Dockerfile (dockerfile)

If we run this, the build ARG BASE_IMAGE won’t be written out to /app/build_arg_file inside the resulting image.

Yet the image will build successfully, and use the version of the Alpine image that we ask for.

In the example below, we do not provide a build-arg to override the default:

➜  docker build -t my-docker-image .
[+] Building 0.5s (7/7) FINISHED                                                                                                                                                                                                                                                                                                        docker:default
 => [internal] load build definition from Dockerfile                                                                                                                                                                                                                                                                                              0.0s
 => => transferring dockerfile: 889B                                                                                                                                                                                                                                                                                                              0.0s
 => [internal] load .dockerignore                                                                                                                                                                                                                                                                                                                 0.0s
 => => transferring context: 2B                                                                                                                                                                                                                                                                                                                   0.0s
 => [internal] load metadata for docker.io/library/alpine:3.14                                                                                                                                                                                                                                                                                    0.4s
 => [1/3] FROM docker.io/library/alpine:3.14@sha256:0f2d5c38dd7a4f4f733e688e3a6733cb5ab1ac6e3cb4603a5dd564e5bfb80eed                                                                                                                                                                                                                              0.0s
 => CACHED [2/3] WORKDIR /app                                                                                                                                                                                                                                                                                                                     0.0s
 => [3/3] RUN echo "Build arg value: '$BASE_IMAGE'" > build_arg_file                                                                                                                                                                                                                                                                              0.2s
 => exporting to image                                                                                                                                                                                                                                                                                                                            0.0s
 => => exporting layers                                                                                                                                                                                                                                                                                                                           0.0s
 => => writing image sha256:0028f791e59b42ce06bbd9c2b9700281c072345edc9c1b17ce09d8e73ae729d6                                                                                                                                                                                                                                                      0.0s
 => => naming to docker.io/library/my-docker-image                                                                                                                                                                                                                                                                                    0.0s


➜  docker run --rm my-docker-image
Build arg value: ''
Code language: Shell Session (shell)

As shown above on the highlighted line 7, the default build argument value is used as expected.

However the build argument value written to /app/build_arg_file is empty.

That is confusing.

It certainly confused me.

We can see that the build ARG is taken when used as part of the Dockerfile configuration by looking at the build log printed to the terminal below.

In this example, note that the default build argument has been explicitly overwritten with --build-arg BASE_IMAGE=alpine:latest in the initial build command:

➜  docker-args docker build -t my-docker-image --build-arg BASE_IMAGE=alpine:latest .
[+] Building 1.1s (7/7) FINISHED                                                                                                                                                                                                                                                                                                        docker:default
 => [internal] load .dockerignore                                                                                                                                                                                                                                                                                                                 0.0s
 => => transferring context: 2B                                                                                                                                                                                                                                                                                                                   0.0s
 => [internal] load build definition from Dockerfile                                                                                                                                                                                                                                                                                              0.0s
 => => transferring dockerfile: 889B                                                                                                                                                                                                                                                                                                              0.0s
 => [internal] load metadata for docker.io/library/alpine:latest                                                                                                                                                                                                                                                                                  0.9s
 => [1/3] FROM docker.io/library/alpine:latest@sha256:51b67269f354137895d43f3b3d810bfacd3945438e94dc5ac55fdac340352f48                                                                                                                                                                                                                            0.0s
 => CACHED [2/3] WORKDIR /app                                                                                                                                                                                                                                                                                                                     0.0s
 => [3/3] RUN echo "Build arg value: '$BASE_IMAGE'" > build_arg_file                                                                                                                                                                                                                                                                              0.2s
 => exporting to image                                                                                                                                                                                                                                                                                                                            0.0s
 => => exporting layers                                                                                                                                                                                                                                                                                                                           0.0s
 => => writing image sha256:2f86808f32450644e936f4b6237a7055a488a302ee314a7d2c4f21b2a0756dc5                                                                                                                                                                                                                                                      0.0s
 => => naming to docker.io/library/my-docker-image          

➜  docker run --rm my-docker-image                                                                                                                                                                                                                                                                                      0.0s
Build arg value: ''

Code language: Shell Session (shell)

It should be possible to use build arguments in this way. But for confusing reasons, that build argument is somehow lost.

This is adapted somewhat from an example on the official Docker docs – only they don’t cover the second part of this process in their example, writing out or later referencing that ARG as we do here.

And after a lot of trial and error and head scratching, it is due to the placement of the ARG definition in the Dockerfile.

As best I understand this, when Docker encounters a FROM instruction, it finalises any prior instructions in this part of the build stage, and no further instructions are processed for that stage.

It certainly wasn’t immediately obvious to me that even with this seemingly very simple Dockerfile, I had effectively created a multi-stage build:

# effective stage 1
ARG BASE_IMAGE_TAG=3.14

FROM alpine:$BASE_IMAGE_TAG

# effective stage 2
WORKDIR /app

RUN echo "Build arg value: '$BASE_IMAGE_TAG'" > build_arg_file

CMD ["sh", "-c", "cat /app/build_arg_file"]Code language: PHP (php)

I’m still not entirely if this is technically a multi-stage build, but the placement of the ARG certainly seems to behave like one.

Also this is something of an edge case. The only reason I need to include some configurable build argument before the first FROM is to specify the image tag I want to base my container off of. I can’t think of any other time in my previous Docker experience where I haven’t been able to move the ARG line after the first FROM.

In the example above the ARG is provided with a default value.

You do not need to provide a default value, in which case the ARG would become mandatory in this example:

ARG BASE_IMAGE_TAG

FROM alpine:$BASE_IMAGE_TAG

WORKDIR /app

RUN echo "Build arg value: '$BASE_IMAGE_TAG'" > build_arg_file

CMD ["sh", "-c", "cat /app/build_arg_file"]Code language: Dockerfile (dockerfile)

Try to build this now without providing a BASE_IMAGE_TAG and we get an error:

➜  docker build -t my-docker-image .

Dockerfile:77
--------------------
  75 |     ARG BASE_IMAGE
  76 |     
  77 | >>> FROM alpine:$BASE_IMAGE
  78 |     
  79 |     # effective stage 2
--------------------
ERROR: failed to solve: failed to parse stage name "alpine:": invalid reference formatCode language: PHP (php)

We’re on line 77 from the Docker build point of view as it also took into account everything in the base image.

If we explicitly provide the build-arg then it works as expected:

➜  docker build --build-arg BASE_IMAGE=latest -t my-docker-image .

[+] Building 0.7s (7/7) FINISHED                                                                                                                        docker:default
 => [internal] load .dockerignore                                                                                                                                 0.0s
 => => transferring context: 2B                                                                                                                                   0.0s
 => [internal] load build definition from Dockerfile                                                                                                              0.0s
 => => transferring dockerfile: 2.21kB                                                                                                                            0.0s
 => [internal] load metadata for docker.io/library/alpine:latest                                                                                                  0.7s
 => [1/3] FROM docker.io/library/alpine:latest@sha256:51b67269f354137895d43f3b3d810bfacd3945438e94dc5ac55fdac340352f48                                            0.0s
 => CACHED [2/3] WORKDIR /app                                                                                                                                     0.0s
 => CACHED [3/3] RUN echo "Build arg value: '$BASE_IMAGE'" > build_arg_file                                                                                       0.0s
 => exporting to image                                                                                                                                            0.0s
 => => exporting layers                                                                                                                                           0.0s
 => => writing image sha256:5d270043702b8c8e2615c7bc917eff249f646657ddb68f0990c6b8feee3997ac                                                                      0.0s
 => => naming to docker.io/library/my-docker-image Code language: PHP (php)

A Multi Stage Docker Image Example

Let’s take the original example and expand upon it.

During this example the placement issue with ARG still occurs, but in a different – and in my opinion – more intuitive way.

Here’s the Dockerfile:

# Stage 1: Build
FROM alpine AS build
ARG MY_ARG=default_value
WORKDIR /app
RUN echo "stage 1 arg value: $MY_ARG" > /app/stage_1_output

# Stage 2: Production
FROM alpine AS production
COPY --from=build /app /app
ARG MY_ARG=another_value
RUN echo "stage 2 arg value: $MY_ARG" > /app/stage_2_output
CMD ["sh", "-c", "cat /app/stage_1_output && cat /app/stage_2_output"]Code language: Dockerfile (dockerfile)

Based on what we covered above, the following build and run output should be what seems the most obvious:

➜  docker build -t my-docker-image .
# output omitted for brevity

➜  docker run --rm my-docker-image
stage 1 arg value: default_value
stage 2 arg value: another_valueCode language: Shell Session (shell)

As we covered, so long as the ARG statement occurs after the FROM, that value will be available during that phase of the build.

We can, indeed we must, define the ARG line again for each build phase, if we continue to need access to it.

Here’s another example to illustrate this:

# Stage 1: Build
FROM alpine AS build
ARG MY_ARG=default_value
WORKDIR /app
RUN echo "stage 1 arg value: $MY_ARG" > /app/stage_1_output

# Stage 2: Production
FROM alpine AS production
COPY --from=build /app /app

# comment out the MY_ARG definition here
#ARG MY_ARG=another_value

RUN echo "stage 2 arg value: $MY_ARG" > /app/stage_2_output
CMD ["sh", "-c", "cat /app/stage_1_output && cat /app/stage_2_output"]
Code language: Dockerfile (dockerfile)

And now if we run the same commands:

➜  docker build -t my-docker-image .
# output omitted for brevity

➜  docker run --rm my-docker-image
stage 1 arg value: default_value
stage 2 arg value: Code language: PHP (php)

Remember, ARG works up to the next FROM, at which point it effectively goes out of scope and will no longer be available.

Even if we explicitly provide a --build-arg here, it won’t be available:

➜  docker build -t my-docker-image --build-arg MY_ARG=override .

➜  docker run --rm my-docker-image
stage 1 arg value: override
stage 2 arg value: 

And for completeness, if we uncomment the ARG lines once more:

# Stage 1: Build
FROM alpine AS build
ARG MY_ARG=default_value
WORKDIR /app
RUN echo "stage 1 arg value: $MY_ARG" > /app/stage_1_output

# Stage 2: Production
FROM alpine AS production
COPY --from=build /app /app
ARG MY_ARG=another_value
RUN echo "stage 2 arg value: $MY_ARG" > /app/stage_2_output
CMD ["sh", "-c", "cat /app/stage_1_output && cat /app/stage_2_output"]
Code language: Dockerfile (dockerfile)

And we run the build with an overridden build-arg:

➜  docker build -t my-docker-image --build-arg MY_ARG=override .

➜  docker run --rm my-docker-image
stage 1 arg value: override
stage 2 arg value: override

Persist Build Arguments Beyond The Build Stage

Going back to the original problem, which is that I needed to access the Git commit hash in my resulting container, but this was only being passed from CircleCI as a --build-arg.

We’ve seen that ARG behaves a bit strangely, and are only available during the build phase.

Fortunately there is a little workaround – is it a hack? – that means we can persist Docker build arguments to be then made available in the resulting container.

Here’s the starting example:

# Stage 1: Build
FROM alpine AS build
ARG MY_ARG=default_arg_value
ENV MY_ENV=default_env_value
WORKDIR /app
RUN echo "stage 1 arg value: $MY_ARG" > output.txt \
    && echo "stage 1 env var: $MY_ENV" >> output.txt

# Stage 2: Production
FROM alpine AS production
COPY --from=build /app /app
WORKDIR /app
ARG MY_ARG=another_arg_value
ENV MY_ENV=another_env_value
RUN echo "stage 2 arg value: $MY_ARG" >> output.txt \
    && echo "stage 2 env var: $MY_ENV" >> output.txt
CMD ["sh", "-c", "cat /app/output.txt"]Code language: Dockerfile (dockerfile)

The idea here is that we have both ARG and ENV in both stages, and they are all different. This is the easiest example to start from, and gives the most obvious output when run:

➜  docker build -t my-docker-image .

➜  docker run --rm my-docker-image
stage 1 arg value: default_arg_value
stage 1 env var: default_env_value
stage 2 arg value: another_arg_value
stage 2 env var: another_env_valueCode language: JavaScript (javascript)

The more real world scenario, I have found, is that we want the ARG and ENV value to be the same:

FROM alpine AS build
ARG MY_ARG=some_default_value
ENV MY_ENV=some_default_valueCode language: Dockerfile (dockerfile)

However, we don’t want to repeat the definition twice.

So we can reference one from the other:

FROM alpine AS build
ARG MY_ARG=some_default_value
ENV MY_ENV=${MY_ARG}
Code language: PHP (php)

And for completeness, sometimes (perhaps even more often?) the ARG will not have a default. So how do we deal with that?

FROM alpine AS build

ARG MY_ARG
ENV MY_ENV=${MY_ARG:-"stage 1 default value"}

WORKDIR /app

RUN echo "stage 1 arg value: $MY_ARG" > output.txt \
    && echo "stage 1 env var: $MY_ENV" >> output.txt

CMD ["sh", "-c", "cat /app/output.txt"]
Code language: Dockerfile (dockerfile)

In this example, ARG MY_ARG is not provided with a default.

ENV MY_ENV will first try to use any value set by MY_ARG, or fall back to another default value. Note the colon dash syntax there for setting the default.

➜  docker build -t my-docker-image .

➜  docker run --rm my-docker-image
stage 1 arg value: 
stage 1 env var: stage 1 default valueCode language: JavaScript (javascript)

And again, for completeness:

➜  docker build -t my-docker-image  --build-arg MY_ARG=something .

➜  docker run --rm my-docker-image
stage 1 arg value: something
stage 1 env var: stage 1 somethingCode language: JavaScript (javascript)

An Unfortunate Circumstance

Whilst we are able to provide defaults and base values off previous values, as a result of all of this I am not aware of a way to pass ARG or ENV values between stages in such a way that defaults are preserved.

What do I mean by this?

Time for yet another Dockerfile example:

# Stage 1: Build
FROM alpine AS build
ARG MY_ARG
ENV MY_ENV=${MY_ARG:-"stage 1 default value"}
WORKDIR /app
RUN echo "stage 1 arg value: $MY_ARG" > output.txt \
    && echo "stage 1 env var: $MY_ENV" >> output.txt
CMD ["sh", "-c", "cat /app/output.txt"]

# Stage 2: Production
FROM alpine AS production
COPY --from=build /app /app
WORKDIR /app
RUN echo "stage 2 arg value: $MY_ARG" >> output.txt \
    && echo "stage 2 env var: $MY_ENV" >> output.txt
CMD ["sh", "-c", "cat /app/output.txt"]Code language: Dockerfile (dockerfile)

In this example we set both ARG and ENV in the first stage.

Then we hit another FROM statement, and try to access both MY_ARG and MY_ENV that we defined in the previous stage.

Neither are available:

➜  docker build -t my-docker-image .

➜  docker run --rm my-docker-image
stage 1 arg value: 
stage 1 env var: stage 1 default value
stage 2 arg value: 
stage 2 env var: Code language: JavaScript (javascript)

Hopefully by now you trust me that explicitly passing a --build-arg would not alter this outcome.

We can reference the build arguments between stages.

And we can redefine the same environment variables between stages.

But we have to be explicit:

# Stage 1: Build
FROM alpine AS build
ARG MY_ARG
ENV MY_ENV=${MY_ARG:-"stage 1 default value"}
WORKDIR /app
RUN echo "stage 1 arg value: $MY_ARG" > output.txt \
    && echo "stage 1 env var: $MY_ENV" >> output.txt
CMD ["sh", "-c", "cat /app/output.txt"]

# Stage 2: Production
FROM alpine AS production
ARG MY_ARG
ENV MY_ENV=${MY_ARG:-"stage 2 default value"}
COPY --from=build /app /app
WORKDIR /app
RUN echo "stage 2 arg value: $MY_ARG" >> output.txt \
    && echo "stage 2 env var: $MY_ENV" >> output.txt
CMD ["sh", "-c", "cat /app/output.txt"]
Code language: Dockerfile (dockerfile)

Which gives:

➜  docker build -t my-docker-image .

➜  docker run --rm my-docker-image
stage 1 arg value: 
stage 1 env var: stage 1 default value
stage 2 arg value: 
stage 2 env var: stage 2 default valueCode language: JavaScript (javascript)

The downside being that we have to repeat any default value, but the upside being we can make the default value stage specific.

No Need To Declare Your Arg In Every Prior Stage

Bringing this home, from CircleCI we have a config file something like this:

jobs:
    build_docker:
        steps:
            - run:
                command: |
                    docker build --pull \
                      --build-arg GIT_REF=$CIRCLE_SHA1
                     .
                name: Build container imageCode language: PHP (php)

I have removed much for brevity and to focus only on the specifics to this post.

The gist is that CircleCI (or any other CI pipeline) will build our Docker image, from our Dockerfile, and can pass in interesting bits of data that we might want to keep around.

As above, we pass in the SHA1 that Circle exposes, which is the last commit hash from the current build.

This is given to the build process as a build-arg.

Our actual Dockerfile may / likely will contain multiple stages as we covered above. The reasoning for this is to keep the final Docker image as streamlined as possible.

We may have an initial Stage to set up user accounts, set the time zone, ensure any required dependencies are updated, and so on. This stage doesn’t care about the commit hash.

Then we might have a stage to run npm commands such as npm ci. This stage also doesn’t need to know about the commit hash.

After that we might do the final build, copying over only the production assets from the previous stage, and starting the webserver. This stage does care about the commit hash, because we ultimately want to reference that as our cache buster string.

So here’s a very simplified example, covering everything we learned along the way:

# Stage 1: Build
FROM alpine AS build
WORKDIR /app
# do some interesting things

# Stage 2: Test
FROM build as test
COPY --from=build /app /app
WORKDIR /app
# do more interesting things

# Stage 3: Production
# we go back to build as we don't need any of the test stage gubbins
FROM build as production
COPY --from=build /app /app
WORKDIR /app
ARG GIT_REF
ENV GIT_REF=${GIT_REF:-'fallback_value'}
# do more interesting things
CMD ["printenv"]Code language: Dockerfile (dockerfile)

We only need the commit hash in the third stage. Therefore we do not need to specify ARG or ENV prior to this stage. Even if we do, it would only impact that stage, up until the next FROM statement.

And as our final proof:

➜  docker build -t my-docker-image .

➜  docker run --rm my-docker-image

PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
HOSTNAME=8aaeb6db90d6
GIT_REF=fallback_value
HOME=/rootCode language: JavaScript (javascript)

Oh, and for completeness:

➜  docker build -t my-docker-image --build-arg GIT_REF=8a33493a83e5fddf90fb65ad76a1780ff0c51598 .

➜  docker run --rm my-docker-image

PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
HOSTNAME=8aaeb6db90d6
GIT_REF=8a33493a83e5fddf90fb65ad76a1780ff0c51598
HOME=/root

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.