Today I needed access to an environment variable whilst building a Docker image from a Dockerfile
. To make the example concrete, I wanted to get access to the Git commit hash exposed in a CircleCI build. The idea was to then use this commit hash inside my build as a cache buster on client side JavaScript assets.
A cache buster is something like src=/my/asset.js?version=some_random_string
. Only you don’t want the string to be random in so much as it changes every single page load. But rather it changes when the file contents change (e.g. on a new build being deployed), which forces all clients to re-download the file only when it changes… or busts the cache.
However it doesn’t really matter what the use case is, but rather how can we get access to environment variables during a Docker build? And how do multi-stage Docker builds handle environment variables?
Well, I found the way this worked to be a little confusing. Hence the blog post.
A Simple Docker Image Example
Let’s start with a basic example.
Here’s our Dockerfile
:
FROM alpine
ENV MY_ENV=default_value
CMD ["sh", "-c", "echo \"my env var value: $MY_ENV\""]
Code language: Dockerfile (dockerfile)
We’re using the Alpine image as it’s tiny and so makes a good base for this example.
ENV MY_ENV=default_value
sets an environment variable named MY_ENV
with the value default_value
.
CMD ["sh", "-c", "echo \"my env var value: $MY_ENV\""]
looks messy, but the escaped ouput is essential to ensure our environment variable actually gets printed.
Build And Run
We need to build and run this:
➜ docker build -t my-docker-image .
[+] Building 0.4s (5/5) FINISHED docker:default
=> [internal] load build definition from Dockerfile 0.0s
=> => transferring dockerfile: 525B 0.0s
=> [internal] load .dockerignore 0.0s
=> => transferring context: 2B 0.0s
=> [internal] load metadata for docker.io/library/alpine:latest 0.3s
=> CACHED [1/1] FROM docker.io/library/alpine@sha256:51b67269f354137895d43f3b3d810bfacd3945438e94dc5ac55fda 0.0s
=> exporting to image 0.0s
=> => exporting layers 0.0s
=> => writing image sha256:d7d8aed46eb9c83ecda6b72062f7e67a7b2ebad531a8f091a7fe364ae964a263 0.0s
=> => naming to docker.io/library/my-docker-image 0.0s
➜ docker run --rm my-docker-image
my env var value: default_value
Code language: PHP (php)
Not the most thrilling example, but it gets the job done.
Override The Environment Variable At Runtime
At runtime we aren’t restricted to the default value set in the Dockerfile
during build.
➜ docker run --rm -e MY_ENV=new_value my-docker-image
my env var value: new_value
Code language: JavaScript (javascript)
By passing in a new environment variable when actually creating a container from the image, we can override the default value quite easily.
That’s fine, but it doesn’t directly answer my original problem. I won’t have access to the Git commit hash at runtime. That is something that is only available to me at build time, very specifically inside the CircleCI environment.
Ideally what we need to do is pass in the Git commit hash that CircleCI provides during the build, and then expose that string as an environment variable inside the running container.
So let’s keep going.
Arg and Env, Why and When?
You may be aware you can pass Docker arguments via the command line when both building an image, and running a container.
An example during the build
phase could be:
docker build -t my-docker-image --build-arg BASE_IMAGE_TAG=latest .
And an example when running a container:
docker run -e MY_ENV=some_value my-docker-image
In the examples above, BASE_IMAGE_TAG
and MY_ENV
would be found inside the Dockerfile
that you use to create the image.
FROM alpine:latest
ARG MY_ARG="a build argument value"
ENV MY_ENV="an environment variable"
# Write ARG and ENV values to a text file inside the image
RUN echo "Build Argument: $MY_ARG" > /output.txt \
&& echo "Environment Variable: $MY_ENV" >> /output.txt
# Display the contents of the text file when running a container from this image
CMD ["cat", "/output.txt"]
Code language: Dockerfile (dockerfile)
ENV
vars are intended to be like system environment variables. They are for setting environment variables that are available during the runtime of the container.
However, ARG
is for build time variables.
ARG
defines variables that we can pass at build-time to the builder with the docker build
command using the --build-arg
flag. These variables are only available during the build stage.
Placement Matters
Perhaps more importantly, the placement of the ARG
definition inside your Dockerfile
matters.
In the example below we will allow the user who builds an image from our Dockerfile
to specify a particular version of the alpine
image to use. If they don’t provide a specific version, we will specific 3.14
as a fallback / default version.
# Set up our argument with a default value
ARG BASE_IMAGE_TAG=3.14
# Start with the specified base image - seemingly all good
FROM alpine:$BASE_IMAGE
WORKDIR /app
# Try to write the build argument value to a file in the resulting image
RUN echo "Build arg value: '$BASE_IMAGE'" > build_arg_file
# Display the content of the file during the run command
# gotcha: it will be blank
CMD ["sh", "-c", "cat /app/build_arg_file"]
Code language: Dockerfile (dockerfile)
If we run this, the build ARG BASE_IMAGE
won’t be written out to /app/build_arg_file
inside the resulting image.
Yet the image will build successfully, and use the version of the Alpine image that we ask for.
In the example below, we do not provide a build-arg
to override the default:
➜ docker build -t my-docker-image .
[+] Building 0.5s (7/7) FINISHED docker:default
=> [internal] load build definition from Dockerfile 0.0s
=> => transferring dockerfile: 889B 0.0s
=> [internal] load .dockerignore 0.0s
=> => transferring context: 2B 0.0s
=> [internal] load metadata for docker.io/library/alpine:3.14 0.4s
=> [1/3] FROM docker.io/library/alpine:3.14@sha256:0f2d5c38dd7a4f4f733e688e3a6733cb5ab1ac6e3cb4603a5dd564e5bfb80eed 0.0s
=> CACHED [2/3] WORKDIR /app 0.0s
=> [3/3] RUN echo "Build arg value: '$BASE_IMAGE'" > build_arg_file 0.2s
=> exporting to image 0.0s
=> => exporting layers 0.0s
=> => writing image sha256:0028f791e59b42ce06bbd9c2b9700281c072345edc9c1b17ce09d8e73ae729d6 0.0s
=> => naming to docker.io/library/my-docker-image 0.0s
➜ docker run --rm my-docker-image
Build arg value: ''
Code language: Shell Session (shell)
As shown above on the highlighted line 7, the default build argument value is used as expected.
However the build argument value written to /app/build_arg_file
is empty.
That is confusing.
It certainly confused me.
We can see that the build ARG
is taken when used as part of the Dockerfile
configuration by looking at the build log printed to the terminal below.
In this example, note that the default build argument has been explicitly overwritten with --build-arg BASE_IMAGE=alpine:latest
in the initial build
command:
➜ docker-args docker build -t my-docker-image --build-arg BASE_IMAGE=alpine:latest .
[+] Building 1.1s (7/7) FINISHED docker:default
=> [internal] load .dockerignore 0.0s
=> => transferring context: 2B 0.0s
=> [internal] load build definition from Dockerfile 0.0s
=> => transferring dockerfile: 889B 0.0s
=> [internal] load metadata for docker.io/library/alpine:latest 0.9s
=> [1/3] FROM docker.io/library/alpine:latest@sha256:51b67269f354137895d43f3b3d810bfacd3945438e94dc5ac55fdac340352f48 0.0s
=> CACHED [2/3] WORKDIR /app 0.0s
=> [3/3] RUN echo "Build arg value: '$BASE_IMAGE'" > build_arg_file 0.2s
=> exporting to image 0.0s
=> => exporting layers 0.0s
=> => writing image sha256:2f86808f32450644e936f4b6237a7055a488a302ee314a7d2c4f21b2a0756dc5 0.0s
=> => naming to docker.io/library/my-docker-image
➜ docker run --rm my-docker-image 0.0s
Build arg value: ''
Code language: Shell Session (shell)
It should be possible to use build arguments in this way. But for confusing reasons, that build argument is somehow lost.
This is adapted somewhat from an example on the official Docker docs – only they don’t cover the second part of this process in their example, writing out or later referencing that ARG
as we do here.
And after a lot of trial and error and head scratching, it is due to the placement of the ARG
definition in the Dockerfile
.
As best I understand this, when Docker encounters a FROM
instruction, it finalises any prior instructions in this part of the build stage, and no further instructions are processed for that stage.
It certainly wasn’t immediately obvious to me that even with this seemingly very simple Dockerfile
, I had effectively created a multi-stage build:
# effective stage 1
ARG BASE_IMAGE_TAG=3.14
FROM alpine:$BASE_IMAGE_TAG
# effective stage 2
WORKDIR /app
RUN echo "Build arg value: '$BASE_IMAGE_TAG'" > build_arg_file
CMD ["sh", "-c", "cat /app/build_arg_file"]
Code language: PHP (php)
I’m still not entirely if this is technically a multi-stage build, but the placement of the ARG
certainly seems to behave like one.
Also this is something of an edge case. The only reason I need to include some configurable build argument before the first FROM
is to specify the image tag I want to base my container off of. I can’t think of any other time in my previous Docker experience where I haven’t been able to move the ARG
line after the first FROM
.
In the example above the ARG
is provided with a default value.
You do not need to provide a default value, in which case the ARG
would become mandatory in this example:
ARG BASE_IMAGE_TAG
FROM alpine:$BASE_IMAGE_TAG
WORKDIR /app
RUN echo "Build arg value: '$BASE_IMAGE_TAG'" > build_arg_file
CMD ["sh", "-c", "cat /app/build_arg_file"]
Code language: Dockerfile (dockerfile)
Try to build this now without providing a BASE_IMAGE_TAG
and we get an error:
➜ docker build -t my-docker-image .
Dockerfile:77
--------------------
75 | ARG BASE_IMAGE
76 |
77 | >>> FROM alpine:$BASE_IMAGE
78 |
79 | # effective stage 2
--------------------
ERROR: failed to solve: failed to parse stage name "alpine:": invalid reference format
Code language: PHP (php)
We’re on line 77 from the Docker build point of view as it also took into account everything in the base image.
If we explicitly provide the build-arg
then it works as expected:
➜ docker build --build-arg BASE_IMAGE=latest -t my-docker-image .
[+] Building 0.7s (7/7) FINISHED docker:default
=> [internal] load .dockerignore 0.0s
=> => transferring context: 2B 0.0s
=> [internal] load build definition from Dockerfile 0.0s
=> => transferring dockerfile: 2.21kB 0.0s
=> [internal] load metadata for docker.io/library/alpine:latest 0.7s
=> [1/3] FROM docker.io/library/alpine:latest@sha256:51b67269f354137895d43f3b3d810bfacd3945438e94dc5ac55fdac340352f48 0.0s
=> CACHED [2/3] WORKDIR /app 0.0s
=> CACHED [3/3] RUN echo "Build arg value: '$BASE_IMAGE'" > build_arg_file 0.0s
=> exporting to image 0.0s
=> => exporting layers 0.0s
=> => writing image sha256:5d270043702b8c8e2615c7bc917eff249f646657ddb68f0990c6b8feee3997ac 0.0s
=> => naming to docker.io/library/my-docker-image
Code language: PHP (php)
A Multi Stage Docker Image Example
Let’s take the original example and expand upon it.
During this example the placement issue with ARG
still occurs, but in a different – and in my opinion – more intuitive way.
Here’s the Dockerfile
:
# Stage 1: Build
FROM alpine AS build
ARG MY_ARG=default_value
WORKDIR /app
RUN echo "stage 1 arg value: $MY_ARG" > /app/stage_1_output
# Stage 2: Production
FROM alpine AS production
COPY --from=build /app /app
ARG MY_ARG=another_value
RUN echo "stage 2 arg value: $MY_ARG" > /app/stage_2_output
CMD ["sh", "-c", "cat /app/stage_1_output && cat /app/stage_2_output"]
Code language: Dockerfile (dockerfile)
Based on what we covered above, the following build
and run
output should be what seems the most obvious:
➜ docker build -t my-docker-image .
# output omitted for brevity
➜ docker run --rm my-docker-image
stage 1 arg value: default_value
stage 2 arg value: another_value
Code language: Shell Session (shell)
As we covered, so long as the ARG
statement occurs after the FROM
, that value will be available during that phase of the build.
We can, indeed we must, define the ARG
line again for each build phase, if we continue to need access to it.
Here’s another example to illustrate this:
# Stage 1: Build
FROM alpine AS build
ARG MY_ARG=default_value
WORKDIR /app
RUN echo "stage 1 arg value: $MY_ARG" > /app/stage_1_output
# Stage 2: Production
FROM alpine AS production
COPY --from=build /app /app
# comment out the MY_ARG definition here
#ARG MY_ARG=another_value
RUN echo "stage 2 arg value: $MY_ARG" > /app/stage_2_output
CMD ["sh", "-c", "cat /app/stage_1_output && cat /app/stage_2_output"]
Code language: Dockerfile (dockerfile)
And now if we run the same commands:
➜ docker build -t my-docker-image .
# output omitted for brevity
➜ docker run --rm my-docker-image
stage 1 arg value: default_value
stage 2 arg value:
Code language: PHP (php)
Remember, ARG
works up to the next FROM
, at which point it effectively goes out of scope and will no longer be available.
Even if we explicitly provide a --build-arg
here, it won’t be available:
➜ docker build -t my-docker-image --build-arg MY_ARG=override .
➜ docker run --rm my-docker-image
stage 1 arg value: override
stage 2 arg value:
And for completeness, if we uncomment the ARG
lines once more:
# Stage 1: Build
FROM alpine AS build
ARG MY_ARG=default_value
WORKDIR /app
RUN echo "stage 1 arg value: $MY_ARG" > /app/stage_1_output
# Stage 2: Production
FROM alpine AS production
COPY --from=build /app /app
ARG MY_ARG=another_value
RUN echo "stage 2 arg value: $MY_ARG" > /app/stage_2_output
CMD ["sh", "-c", "cat /app/stage_1_output && cat /app/stage_2_output"]
Code language: Dockerfile (dockerfile)
And we run the build with an overridden build-arg
:
➜ docker build -t my-docker-image --build-arg MY_ARG=override .
➜ docker run --rm my-docker-image
stage 1 arg value: override
stage 2 arg value: override
Persist Build Arguments Beyond The Build Stage
Going back to the original problem, which is that I needed to access the Git commit hash in my resulting container, but this was only being passed from CircleCI as a --build-arg
.
We’ve seen that ARG
behaves a bit strangely, and are only available during the build phase.
Fortunately there is a little workaround – is it a hack? – that means we can persist Docker build arguments to be then made available in the resulting container.
Here’s the starting example:
# Stage 1: Build
FROM alpine AS build
ARG MY_ARG=default_arg_value
ENV MY_ENV=default_env_value
WORKDIR /app
RUN echo "stage 1 arg value: $MY_ARG" > output.txt \
&& echo "stage 1 env var: $MY_ENV" >> output.txt
# Stage 2: Production
FROM alpine AS production
COPY --from=build /app /app
WORKDIR /app
ARG MY_ARG=another_arg_value
ENV MY_ENV=another_env_value
RUN echo "stage 2 arg value: $MY_ARG" >> output.txt \
&& echo "stage 2 env var: $MY_ENV" >> output.txt
CMD ["sh", "-c", "cat /app/output.txt"]
Code language: Dockerfile (dockerfile)
The idea here is that we have both ARG
and ENV
in both stages, and they are all different. This is the easiest example to start from, and gives the most obvious output when run:
➜ docker build -t my-docker-image .
➜ docker run --rm my-docker-image
stage 1 arg value: default_arg_value
stage 1 env var: default_env_value
stage 2 arg value: another_arg_value
stage 2 env var: another_env_value
Code language: JavaScript (javascript)
The more real world scenario, I have found, is that we want the ARG
and ENV
value to be the same:
FROM alpine AS build
ARG MY_ARG=some_default_value
ENV MY_ENV=some_default_value
Code language: Dockerfile (dockerfile)
However, we don’t want to repeat the definition twice.
So we can reference one from the other:
FROM alpine AS build
ARG MY_ARG=some_default_value
ENV MY_ENV=${MY_ARG}
Code language: PHP (php)
And for completeness, sometimes (perhaps even more often?) the ARG
will not have a default. So how do we deal with that?
FROM alpine AS build
ARG MY_ARG
ENV MY_ENV=${MY_ARG:-"stage 1 default value"}
WORKDIR /app
RUN echo "stage 1 arg value: $MY_ARG" > output.txt \
&& echo "stage 1 env var: $MY_ENV" >> output.txt
CMD ["sh", "-c", "cat /app/output.txt"]
Code language: Dockerfile (dockerfile)
In this example, ARG MY_ARG
is not provided with a default.
ENV MY_ENV
will first try to use any value set by MY_ARG
, or fall back to another default value. Note the colon dash syntax there for setting the default.
➜ docker build -t my-docker-image .
➜ docker run --rm my-docker-image
stage 1 arg value:
stage 1 env var: stage 1 default value
Code language: JavaScript (javascript)
And again, for completeness:
➜ docker build -t my-docker-image --build-arg MY_ARG=something .
➜ docker run --rm my-docker-image
stage 1 arg value: something
stage 1 env var: stage 1 something
Code language: JavaScript (javascript)
An Unfortunate Circumstance
Whilst we are able to provide defaults and base values off previous values, as a result of all of this I am not aware of a way to pass ARG
or ENV
values between stages in such a way that defaults are preserved.
What do I mean by this?
Time for yet another Dockerfile
example:
# Stage 1: Build
FROM alpine AS build
ARG MY_ARG
ENV MY_ENV=${MY_ARG:-"stage 1 default value"}
WORKDIR /app
RUN echo "stage 1 arg value: $MY_ARG" > output.txt \
&& echo "stage 1 env var: $MY_ENV" >> output.txt
CMD ["sh", "-c", "cat /app/output.txt"]
# Stage 2: Production
FROM alpine AS production
COPY --from=build /app /app
WORKDIR /app
RUN echo "stage 2 arg value: $MY_ARG" >> output.txt \
&& echo "stage 2 env var: $MY_ENV" >> output.txt
CMD ["sh", "-c", "cat /app/output.txt"]
Code language: Dockerfile (dockerfile)
In this example we set both ARG
and ENV
in the first stage.
Then we hit another FROM
statement, and try to access both MY_ARG
and MY_ENV
that we defined in the previous stage.
Neither are available:
➜ docker build -t my-docker-image .
➜ docker run --rm my-docker-image
stage 1 arg value:
stage 1 env var: stage 1 default value
stage 2 arg value:
stage 2 env var:
Code language: JavaScript (javascript)
Hopefully by now you trust me that explicitly passing a --build-arg
would not alter this outcome.
We can reference the build arguments between stages.
And we can redefine the same environment variables between stages.
But we have to be explicit:
# Stage 1: Build
FROM alpine AS build
ARG MY_ARG
ENV MY_ENV=${MY_ARG:-"stage 1 default value"}
WORKDIR /app
RUN echo "stage 1 arg value: $MY_ARG" > output.txt \
&& echo "stage 1 env var: $MY_ENV" >> output.txt
CMD ["sh", "-c", "cat /app/output.txt"]
# Stage 2: Production
FROM alpine AS production
ARG MY_ARG
ENV MY_ENV=${MY_ARG:-"stage 2 default value"}
COPY --from=build /app /app
WORKDIR /app
RUN echo "stage 2 arg value: $MY_ARG" >> output.txt \
&& echo "stage 2 env var: $MY_ENV" >> output.txt
CMD ["sh", "-c", "cat /app/output.txt"]
Code language: Dockerfile (dockerfile)
Which gives:
➜ docker build -t my-docker-image .
➜ docker run --rm my-docker-image
stage 1 arg value:
stage 1 env var: stage 1 default value
stage 2 arg value:
stage 2 env var: stage 2 default value
Code language: JavaScript (javascript)
The downside being that we have to repeat any default value, but the upside being we can make the default value stage specific.
No Need To Declare Your Arg In Every Prior Stage
Bringing this home, from CircleCI we have a config file something like this:
jobs:
build_docker:
steps:
- run:
command: |
docker build --pull \
--build-arg GIT_REF=$CIRCLE_SHA1
.
name: Build container image
Code language: PHP (php)
I have removed much for brevity and to focus only on the specifics to this post.
The gist is that CircleCI (or any other CI pipeline) will build our Docker image, from our Dockerfile
, and can pass in interesting bits of data that we might want to keep around.
As above, we pass in the SHA1 that Circle exposes, which is the last commit hash from the current build.
This is given to the build process as a build-arg
.
Our actual Dockerfile
may / likely will contain multiple stages as we covered above. The reasoning for this is to keep the final Docker image as streamlined as possible.
We may have an initial Stage to set up user accounts, set the time zone, ensure any required dependencies are updated, and so on. This stage doesn’t care about the commit hash.
Then we might have a stage to run npm
commands such as npm ci
. This stage also doesn’t need to know about the commit hash.
After that we might do the final build, copying over only the production assets from the previous stage, and starting the webserver. This stage does care about the commit hash, because we ultimately want to reference that as our cache buster string.
So here’s a very simplified example, covering everything we learned along the way:
# Stage 1: Build
FROM alpine AS build
WORKDIR /app
# do some interesting things
# Stage 2: Test
FROM build as test
COPY --from=build /app /app
WORKDIR /app
# do more interesting things
# Stage 3: Production
# we go back to build as we don't need any of the test stage gubbins
FROM build as production
COPY --from=build /app /app
WORKDIR /app
ARG GIT_REF
ENV GIT_REF=${GIT_REF:-'fallback_value'}
# do more interesting things
CMD ["printenv"]
Code language: Dockerfile (dockerfile)
We only need the commit hash in the third stage. Therefore we do not need to specify ARG
or ENV
prior to this stage. Even if we do, it would only impact that stage, up until the next FROM
statement.
And as our final proof:
➜ docker build -t my-docker-image .
➜ docker run --rm my-docker-image
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
HOSTNAME=8aaeb6db90d6
GIT_REF=fallback_value
HOME=/root
Code language: JavaScript (javascript)
Oh, and for completeness:
➜ docker build -t my-docker-image --build-arg GIT_REF=8a33493a83e5fddf90fb65ad76a1780ff0c51598 .
➜ docker run --rm my-docker-image
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
HOSTNAME=8aaeb6db90d6
GIT_REF=8a33493a83e5fddf90fb65ad76a1780ff0c51598
HOME=/root