[Part 2] - Docker Volumes - Volumes


In the previous video we looked at Docker bind mounts.

Bind mounts allow us to let our running Docker containers access folders on our local computer / dev box / host (whatever you want to call it), as though they are "real" absolute paths.

Volumes allow us to do this, too.

Rather than relying on the absolute path, Docker volumes make use of a special storage directory on the Docker host machine, and Docker manages all of this for you.

There are a bunch of reasons why this is good. And there are some big reasons why this is more complicated, especially for beginners.

Here's what Docker's docs have to say on the subject:

Volumes have several advantages over bind mounts:

  • Volumes are easier to back up or migrate than bind mounts.
  • You can manage volumes using Docker CLI commands or the Docker API.
  • Volumes work on both Linux and Windows containers.
  • Volumes can be more safely shared among multiple containers.
  • Volume drivers allow you to store volumes on remote hosts or cloud providers, to encrypt the contents of volumes, or to add other functionality.
  • A new volume’s contents can be pre-populated by a container.

What isn't mentioned is that for example on a Mac, figuring out where your underlying data really lives is a complete pain. This is because your data won't live on the file system, but instead inside Docker's virtual filesystem. It's this kind of confusion that makes Docker dealings difficult.

The real power of using Docker Volumes becomes apparent as our systems grow in complexity. Remember that Docker is a tool aimed at Enterprise. To make certain aspects of a projects life easier (scaling, testing, deployment, etc) mean we do need to change some of the fundamentals of how we work in order to get the most from Docker.

The way we organise and think about data is a large part of this.

As we will see later on in our journey with Docker, it's best to use the Volumes approach as our lives will be made easier when it gets to deployment / going to production.

Docker Volume Examples

Let's start by recreating the bind mount approach, but this time use a Docker Volume instead:

First, let's create ourselves a Docker Volume:

$ docker volume create crv

crv

We can see our new volume:

$ docker volume ls

DRIVER              VOLUME NAME
local               crv

And remove it:

$ docker volume rm crv

crv

# docker volume ls

DRIVER              VOLUME NAME

We kinda need that volume though, so let's create it again:

$ docker volume create crv

crv

# accidentally up arrow, and return
$ docker volume create crv

crv

That's ok, the command is idempotent. If the volume already exists it just returns us the name of the existing volume. No harm, no foul.

And we can also inspect the volume by name, to learn a little more about it:

docker inspect crv
[
    {
        "Driver": "local",
        "Labels": null,
        "Mountpoint": "/var/lib/docker/volumes/crv/_data",
        "Name": "crv",
        "Options": {},
        "Scope": "local"
    }
]

This command tells us a bunch of useful info.

Most useful to us immediately is the Mountpoint.

However, by default we don't own, nor have group execute access to the created directory. As such we can't even ls the contents :/

We must use elevated privileges on Docker host / own machine here:

$ sudo ls -la /var/lib/docker/volumes

total 744
drwx------ 57 root root 131072 Sep  2 20:52 .
drwx--x--x 12 root root   4096 Sep  2 09:06 ..
drwxr-xr-x  3 root root   4096 Sep  2 20:52 crv

# you're probably curious
$ sudo ls -la /var/lib/docker/volumes/crv

total 140
drwxr-xr-x  3 root root   4096 Sep  2 20:52 .
drwx------ 57 root root 131072 Sep  2 20:52 ..
drwxr-xr-x  2 root root   4096 Sep  2 20:52 _data

$ sudo ls -la /var/lib/docker/volumes/crv/_data
total 8
drwxr-xr-x 2 root root 4096 Sep  2 20:52 .
drwxr-xr-x 3 root root 4096 Sep  2 20:52 ..

Basically a pre-configured, but empty directory structure.

Trying to do this on OSX / a Mac is, frustratingly, more involved.

You may have to use screen.

sudo screen ~/Library/Containers/com.docker.docker/Data/com.docker.driver.amd64-linux/tty

I cannot claim to have figured this out, of course. I found the solution on Stack Overflow, like any other good Googler :)

Soapbox, Feel Free To Skip :)

My main gripe with Volumes over bind mounts comes down to this point:

By default, your Mountpoint is going to be where-ever Docker decided to install itself too. Which differs from OS to OS.

I use Linux.

The way I use Linux is to put all the Linux-y stuff on a small, but fast SSD. Linux itself can fit on a small drive with next to no issues. No point wasting money on a bigger drive when I want it 100% devoted to OS tasks.

My various other data go over other drives. Some slow, some fast. But my choice.

My dev work, for example, goes onto a dedicated SSD.

I can bind mount to my dev SSD and know it's fast, and know that the disk itself is vast in size, and that even if it does grow to gargantuan proportions, I'm not risking mucking up my OS disk, or any other disk, by my own silly actions.

Not so with the default Docker mountpoint.

In my case this ends up on my OS drive by default.

There is a fix to this. It's just it becomes another chore when setting up a new system, or it simply gets forgotten. And then your OS disk partition reports 98% capacity one Tuesday morning and you end up wasting your best productivity cycles chasing garbage problems like this.

How do I change the default Docker image / container / volume storage location?

Good question.

Here's how I did it.

Edit your /etc/default/docker file (Ubuntu), and add the -g option to DOCKER_OPTS:

DOCKER_OPTS="--dns 8.8.8.8 --dns 8.8.4.4 -g /some/path/to/fast/disk"

Note: I learned how to do this here.

Recreating With Volumes What We Had With Bind Mounts

Remember earlier we used a bind mount, and the absolute path of /tmp/docker-test. This directory contained a file, which we deleted, called my-file.

We're going to recreate this using two approaches:

First, we will mount the new, empty crv volume onto a new container instance, and from within the container we will write to that volume by replicating the my-file from earlier.

This is good, and useful.

However, second, and a touch more real world, we will look at how to populate the volume with data from our host system. This is much more likely to be the task you need to carry out.

# note this uses slightly different syntax to that we used previously
# the only reason for this is to show some allowable variations
# more: https://docs.docker.com/engine/admin/volumes/bind-mounts/#choosing-the--v-or-mount-flag

$ docker run -d \
  --mount type=volume,src=crv,dst=/var/www/madeup \
  nginx

f229b05498dc9d56c7935ea1e4e1cf5304f4a38c666fe637bcefab35fa9ca422

$ docker ps -a
CONTAINER ID    IMAGE   COMMAND                  CREATED          STATUS          PORTS    NAMES
f229b05498dc    nginx   "nginx -g 'daemon ..."   20 minutes ago   Up 19 minutes   80/tcp   upbeat_brahmagupta

I show the output of a docker ps -a not because anything has changed. It hasn't. Just because we use a Volume rather than a bind mount has no noticeable impact on the container.

We can, however, inspect the container and see loads of information, along with a section that contains very similar output to that which we saw from our docker volume inspect crv command earlier:

$ docker inspect f22

[
    {
        "Id": "f229b05498dc9d56c7935ea1e4e1cf5304f4a38c666fe637bcefab35fa9ca422",
        "Created": "2017-09-02T19:52:18.546923724Z",
        "Path": "nginx",
        "Args": [
            "-g",
            "daemon off;"
        ],
        "State": {
            "Status": "running",
            "Running": true,
            "Paused": false,
            "Restarting": false,
            "OOMKilled": false,
            "Dead": false,
            "Pid": 32007,
            "ExitCode": 0,
            "Error": "",
            "StartedAt": "2017-09-02T19:52:18.789628682Z",
            "FinishedAt": "0001-01-01T00:00:00Z"
        },
        "Image": "sha256:b8efb18f159bd948486f18bd8940b56fd2298b438229f5bd2bcf4cedcf037448",
        "ResolvConfPath": "/var/lib/docker/containers/f229b05498dc9d56c7935ea1e4e1cf5304f4a38c666fe637bcefab35fa9ca422/resolv.conf",
        "HostnamePath": "/var/lib/docker/containers/f229b05498dc9d56c7935ea1e4e1cf5304f4a38c666fe637bcefab35fa9ca422/hostname",
        "HostsPath": "/var/lib/docker/containers/f229b05498dc9d56c7935ea1e4e1cf5304f4a38c666fe637bcefab35fa9ca422/hosts",
        "LogPath": "/var/lib/docker/containers/f229b05498dc9d56c7935ea1e4e1cf5304f4a38c666fe637bcefab35fa9ca422/f229b05498dc9d56c7935ea1e4e1cf5304f4a38c666fe637bcefab35fa9ca422-json.log",
        "Name": "/upbeat_brahmagupta",
        "RestartCount": 0,
        "Driver": "aufs",
        "MountLabel": "",
        "ProcessLabel": "",
        "AppArmorProfile": "docker-default",
        "ExecIDs": null,
        "HostConfig": {
            "Binds": null,
            "ContainerIDFile": "",
            "LogConfig": {
                "Type": "json-file",
                "Config": {}
            },
            "NetworkMode": "default",
            "PortBindings": {},
            "RestartPolicy": {
                "Name": "no",
                "MaximumRetryCount": 0
            },
            "AutoRemove": false,
            "VolumeDriver": "",
            "VolumesFrom": null,
            "CapAdd": null,
            "CapDrop": null,
            "Dns": [],
            "DnsOptions": [],
            "DnsSearch": [],
            "ExtraHosts": null,
            "GroupAdd": null,
            "IpcMode": "",
            "Cgroup": "",
            "Links": null,
            "OomScoreAdj": 0,
            "PidMode": "",
            "Privileged": false,
            "PublishAllPorts": false,
            "ReadonlyRootfs": false,
            "SecurityOpt": null,
            "UTSMode": "",
            "UsernsMode": "",
            "ShmSize": 67108864,
            "Runtime": "runc",
            "ConsoleSize": [
                0,
                0
            ],
            "Isolation": "",
            "CpuShares": 0,
            "Memory": 0,
            "NanoCpus": 0,
            "CgroupParent": "",
            "BlkioWeight": 0,
            "BlkioWeightDevice": null,
            "BlkioDeviceReadBps": null,
            "BlkioDeviceWriteBps": null,
            "BlkioDeviceReadIOps": null,
            "BlkioDeviceWriteIOps": null,
            "CpuPeriod": 0,
            "CpuQuota": 0,
            "CpuRealtimePeriod": 0,
            "CpuRealtimeRuntime": 0,
            "CpusetCpus": "",
            "CpusetMems": "",
            "Devices": [],
            "DeviceCgroupRules": null,
            "DiskQuota": 0,
            "KernelMemory": 0,
            "MemoryReservation": 0,
            "MemorySwap": 0,
            "MemorySwappiness": -1,
            "OomKillDisable": false,
            "PidsLimit": 0,
            "Ulimits": null,
            "CpuCount": 0,
            "CpuPercent": 0,
            "IOMaximumIOps": 0,
            "IOMaximumBandwidth": 0,
            "Mounts": [
                {
                    "Type": "volume",
                    "Source": "crv",
                    "Target": "/var/www/madeup"
                }
            ]
        },
        "GraphDriver": {
            "Data": null,
            "Name": "aufs"
        },
        "Mounts": [
            {
                "Type": "volume",
                "Name": "crv",
                "Source": "/var/lib/docker/volumes/crv/_data",
                "Destination": "/var/www/madeup",
                "Driver": "local",
                "Mode": "",
                "RW": true,
                "Propagation": ""
            }
        ],
        "Config": {
            "Hostname": "f229b05498dc",
            "Domainname": "",
            "User": "",
            "AttachStdin": false,
            "AttachStdout": false,
            "AttachStderr": false,
            "ExposedPorts": {
                "80/tcp": {}
            },
            "Tty": false,
            "OpenStdin": false,
            "StdinOnce": false,
            "Env": [
                "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
                "NGINX_VERSION=1.13.3-1~stretch",
                "NJS_VERSION=1.13.3.0.1.11-1~stretch"
            ],
            "Cmd": [
                "nginx",
                "-g",
                "daemon off;"
            ],
            "ArgsEscaped": true,
            "Image": "nginx",
            "Volumes": null,
            "WorkingDir": "",
            "Entrypoint": null,
            "OnBuild": null,
            "Labels": {},
            "StopSignal": "SIGTERM"
        },
        "NetworkSettings": {
            "Bridge": "",
            "SandboxID": "5a76a873f642862687e9dbf18cfdff3e7f0f6641d49866cdcaed39ba9d47fc0f",
            "HairpinMode": false,
            "LinkLocalIPv6Address": "",
            "LinkLocalIPv6PrefixLen": 0,
            "Ports": {
                "80/tcp": null
            },
            "SandboxKey": "/var/run/docker/netns/5a76a873f642",
            "SecondaryIPAddresses": null,
            "SecondaryIPv6Addresses": null,
            "EndpointID": "cc2c74a8567ed1e89c24152d74de50293b66e1c66d0fcda720da7ab557d9d937",
            "Gateway": "172.17.0.1",
            "GlobalIPv6Address": "",
            "GlobalIPv6PrefixLen": 0,
            "IPAddress": "172.17.0.2",
            "IPPrefixLen": 16,
            "IPv6Gateway": "",
            "MacAddress": "02:42:ac:11:00:02",
            "Networks": {
                "bridge": {
                    "IPAMConfig": null,
                    "Links": null,
                    "Aliases": null,
                    "NetworkID": "a316823e219862782eb6f34587579e7b68e201a28445829b635ea8ebd859e5dd",
                    "EndpointID": "cc2c74a8567ed1e89c24152d74de50293b66e1c66d0fcda720da7ab557d9d937",
                    "Gateway": "172.17.0.1",
                    "IPAddress": "172.17.0.2",
                    "IPPrefixLen": 16,
                    "IPv6Gateway": "",
                    "GlobalIPv6Address": "",
                    "GlobalIPv6PrefixLen": 0,
                    "MacAddress": "02:42:ac:11:00:02",
                    "DriverOpts": null
                }
            }
        }
    }
]

It's potentially useful to have access to all of this information, and in a nice JSON format too. But we don't need it right now, nor will we do so during this series.

The interesting section if it wasn't obvious is this:

"Mounts": [
    {
        "Type": "volume",
        "Name": "crv",
        "Source": "/var/lib/docker/volumes/crv/_data",
        "Destination": "/var/www/madeup",
        "Driver": "local",
        "Mode": "",
        "RW": true,
        "Propagation": ""
    }
],

Now lets create a file on the container, and see what happens:

$ docker exec f22 ls -la /var/www/madeup

total 8
drwxr-xr-x 2 root root 4096 Sep  2 19:52 .
drwxr-xr-x 3 root root 4096 Sep  2 19:52 ..

Ok, so we know that the my-file file does not exist. But the directory structure does.

I'm going to use bash inside the container to complete the next steps:

$ docker exec -it f22 /bin/bash

root@f229b05498dc:/# cd /var/www/madeup

root@f229b05498dc:/var/www/madeup# ls -la

total 8
drwxr-xr-x 2 root root 4096 Sep  2 20:21 .
drwxr-xr-x 3 root root 4096 Sep  2 19:52 ..
-rw-r--r-- 1 root root    0 Sep  2 20:21 my-file

root@f229b05498dc:/var/www/madeup# echo "hello world" >> my-file

root@f229b05498dc:/var/www/madeup# cat my-file
hello world

Then, taking the information we gleaned from docker inspect f22, or docker volume inspect crv, we can check the real location of the data is as expected:

$ sudo ls -la /var/lib/docker/volumes/crv/_data

total 12
drwxr-xr-x 2 root root 4096 Sep  2 21:21 .
drwxr-xr-x 3 root root 4096 Sep  2 20:52 ..
-rw-r--r-- 1 root root   12 Sep  2 21:23 my-file

$ sudo cat /var/lib/docker/volumes/crv/_data/my-file
hello world

Perfect.

This might not go as swimmingly on OSX. I have no experience with Windows either. Please do let me know if you encounter issues.

The Harder Way

I said earlier we would approach this problem from two directions.

First, the easier way - which we just saw - where we create a new volume, mount it to the container, and then create new data inside that container.

And secondly - now - we want to look at the more real world situation:

What if I already have some data, and I want to move it into a volume?

Let's reset, and tackle this problem:

$ docker stop f22
f22

$ docker rm f22
f22

$ docker volume rm crv
crv

$ sudo cat /var/lib/docker/volumes/crv/_data/my-file
cat: /var/lib/docker/volumes/crv/_data/my-file: No such file or directory

$ sudo ls -la /var/lib/docker/volumes
total 740
drwx------ 56 root root 131072 Sep  2 21:29 .
drwx--x--x 12 root root   4096 Sep  2 09:06 ..

That container, and any data we might have created is gone.

On our host, lets create a new folder and file:

$ pwd
/tmp/docker-test

$ mkdir my-folder
$ touch my-folder/my-file
$ echo "docker is great" >> my-folder/my-file

$ cat /tmp/docker-test/my-folder/my-file
docker is great

We still need a volume to store our data in. I'm going to create another volume named crv:

$ docker volume create crv
crv

And spin up a new container, with this volume mounted in the usual place:

# note this uses slightly different syntax to that we used previously
# the only reason for this is to show some allowable variations
# more: https://docs.docker.com/engine/admin/volumes/bind-mounts/#choosing-the--v-or-mount-flag

$ docker run -d \
  --mount src=crv,destination=/var/www/madeup \
  nginx

9fe85eee2f09b52b68c7576a23d6be26f014a21defebba5b6999b112108f3a59

$ docker ps -a

CONTAINER ID    IMAGE   COMMAND                  CREATED          STATUS          PORTS    NAMES
9fe85eee2f09    nginx   "nginx -g 'daemon ..."   20 minutes ago   Up 5 seconds    80/tcp   pedantic_thompson

Pedantic Thompson :) I do love some of these whacky names it spews out.

Ok, now we need to get the data from my-folder/my-file in to the running container.

How to do this?

Firstly, there are a bunch of rules as to what must and must not exist if the source is a file or directory, and in what condition, and so on, etc.

My advice is simply this:

Make sure the folder you are trying to cp to exists on the running container.

$ docker exec -it 9fe85eee2f09 /bin/bash

root@9fe85eee2f09 mkdir -p /var/www/madeup/my-folder

Then from your host:

$ docker cp my-folder/my-file pedantic_thompson:/var/www/madeup/my-folder/my-file

And back on the container:

root@9fe85eee2f09:/var/www/madeup/my-folder# cat my-file 

docker is great

Likewise, we can check from the host that the volume itself contains the data:

$ sudo cat /var/lib/docker/volumes/crv/_data/my-folder/my-file

docker is great

You may also be thinking - heck, surely I don't need to even mount a volume, right?

I could just cp my files to the underlying directory, e.g. /var/lib/docker/volumes/crv/_data/.

As best I know you can. I have done this myself once. The only reason I am unsure about this is related to potential permissions problems - a classic Docker happenstance.

In Summary

That just about wraps up our introduction to Docker volumes.

We've seen how bind mounts appear easier, but the extra time investment in working with Volumes is worthwhile.

We will be using Volumes over bind mounts throughout this series as they are the suggested approach from Docker.

You likely have questions. I would encourage you to play around with the command line and experiment for yourself. Please leave a comment and share anything you find that may be interesting to myself and others.

Don't worry if you feel overwhelmed, or confused still at this stage. The further we progress with Docker, the more accustomed you will become to the terminology and methodology.

Also, as hinted at in this video, docker-compose quite heavily masks this whole issue from us. But that's not to say you don't need to know this stuff.

Code For This Course

Get the code for this course.

Episodes