Persistent data in Docker volumes
As Docker containers supposed to be small, single process and easy replaceable instances, it’s not particularly clear how persistent data fits into that picture. Imagine you have MySQL container which you decided to upgrade. What will you do with its database files? In containers world “upgrade” means “nuke an old one, start a new one” and your data will turn into radioactive ashes with the rest of container’s file system.
However, along with the problem Docker also provides a solution: Docker volumes.
How we can manage docker data ?
Generally speaking, Docker volume is just a host directory mounted to container’s file system. As it no longer belongs to container’s FS, it’s not a problem to delete one container, create another one and mount existing data volume to it. There’re several approaches of how to use Docker volumes and today we’ll take a look at three of them.
Docker has two options for containers to store files in the host machine, so that the files are persisted even after the container stops: volumes, and bind mounts. If you’re running Docker on Linux you can also use a tmpfs mount.
Keep reading for more information about these two ways of persisting data.
Choose the right type of mount
No matter which type of mount you choose to use, the data looks the same from within the container. It is exposed as either a directory or an individual file in the container’s filesystem.
An easy way to visualize the difference among volumes, bind mounts, and tmpfs mounts is to think about where the data lives on the Docker host.
- Volumes (since docker version 1.9) are stored in a part of the host filesystem which is managed by Docker (/var/lib/docker/volumes/on Linux). Non-Docker processes should not modify this part of the filesystem. Volumes are the best way to persist data in Docker.
- Bind mounts may be stored anywhere on the host system. They may even be important system files or directories. Non-Docker processes on the Docker host or a Docker container can modify them at any time.
- tmpfs mounts are stored in the host system’s memory only, and are never written to the host system’s filesystem.
1. Simple directory mounts
The simplest approach is mounting arbitrary host directory to container’s FS. Imagine you you’re running mysql container and want to preserve its data files during upgrades, or just perform occasional backups. We can map host directory to container’s data directory, so anything mysql writes to e.g. /var/lib/mysql will end up in relative safety of host FS:
$ docker run -d \
-e MYSQL_ROOT_PASSWORD=my-secret-pw \
-v /home/docker/mysql-data:/var/lib/mysql \
--name mysqlserver \
mysql
When we destroy mysqlserver, its data will survive.
$ docker stop mysqlserver
$ docker rm mysqlserver
$ ls /home/docker/mysql-data
#auto.cnf client-cert.pem ib_logfile0 mysql/ public_key.pem sys/
#ca-key.pem client-key.pem ib_logfile1 performance_schema/ server-cert.pem
#ca.pem ib_buffer_pool ibdata1 private_key.pem server-key.pem
Now we can start new mysql container, mount the same data directory to it and continue as if nothing has happened.
Read-only mounts
If container doesn’t supposed to update the data in mounted directory, it can be made read-only by simply adding :ro suffix. Obviously, it doesn’t make much sense to do so for database mount, but for a web server it’s quite logical:
$ docker run
-v /home/docker/www:/usr/share/nginx/html:ro
...
Listing existing mounts
Don’t even try to remember what exactly is mounted to what container – our brains doesn’t work like that. Instead, docker inspect %container% not only will tell network, host and container settings, but also what volumes and mounts it uses at the moment:
$ docker inspect mysqlserver
# ...
# "Mounts": [
# {
# "Source": "/home/docker/mysql-data",
# "Destination": "/var/lib/mysql",
# "RW": true,
# ...
# }
# ],
# ...
2. Docker data volumes
Let’s try another thing: run the first mysql example one more time, but this time skip -v (volume) argument. Then, if we inspect it, it’ll be hard to not notice that it still has a volume attached to it!
$ docker run -d \
-e MYSQL_ROOT_PASSWORD=my-secret-pw \
--name mysqlserver \
mysql
$ docker inspect mysqlserver
#...
# "Mounts": [
# {
# "Name": "aac828f46e6ff00fedaf45bc21e89523bd57663d3d58bf0e3c73e8cb4092d768",
# "Source": "/mnt/sda1/var/lib/docker/volumes/aac828f46e6ff00fedaf45bc21e89523bd57663d3d58bf0e3c73e8cb4092d768/_data",
# "Destination": "/var/lib/mysql",
# ...
# }
# ],
#...
Yes, this time it has a weird name, and mount source path is much longer, but it’s still points to /var/lib/mysql data folder. How is that possible?
The answer lies in the bottom of mysql image Dockerfile:
...
VOLUME /var/lib/mysql
...
VOLUME /var/lib/mysql creates a new volume attached to /var/lib/mysql. It behaves slightly like a regular directory mount, but actually is not quite the same. Whenever Docker sees a volume declaration, it’ll generate a unique 64 byte name for it, create new mount directory (a volume) in host FS – /var/lib/docker/volumes/%name%, and when container starts the first time, unlike with regular host directory mounts, it’ll copy whatever container had in /var/lib/mysql to the volume, and after that will use the content from the volume, not container’s FS. That has an important implication: when I create a container from newer mysql image that also has newer content in /var/lib/mysql, if the volume already exists, that new content will be ignored.
Creating data volumes from a command line
Volumes declarations don’t have to be in Dockerfile, we can create them from a command line as well:
$ docker run \ -v /data \ ubuntu \ touch /data/README.md
This command creates a new volume mounted to /data, then touch /data/README.md creates an empty file in it and after that container immediately exits.
Unlike with regular host mounts, docker keeps track of all volumes it ever created and provides some docker volume ... commands for them. One of them is ls, which prints out existing volumes. By using it we can find the volume we just created:
$ docker volume ls
#DRIVER VOLUME NAME
#local b45436c7f8bab37c0bfe998f962001226470cbc1dfe4ac59cc0287276e3d7a64
Docker volumes don’t exclusively belong to containers that created them. Using the name, we can be mount them to any number of other containers:
Classic example of using this feature is connecting a volume to a container that is going to make data backup:
Creating volumes without containers
You don’t even need a container to create a volume. We could use docker volume create command to create a volume in advance and then mount it to any container that needs one. The beauty of this approach is that we can choose the volume name and once and for all get rid of that 64 chars monstrosity:
$ docker run -ti \
-v b45436c7f8bab37c0bfe998f962001226470cbc1dfe4ac59cc0287276e3d7a64:/data \
ubuntu bash
root@21bd05dfa2dd:/# cd /data
root@21bd05dfa2dd:/data# ls
README.md
Creating shared-storage data volumes
volume create command also provides something much more powerful than just selecting a name. Until now we’ve been creating volumes that are hosted in current file system, which is not particularly scalable. In order to address that Docker supports volume plugins that enable storing volume data in other locations: Azure, DigitalOcean, and several others.
Installing and configuring such plugins might be tricky sometimes, but after it’s done, using a plugin is just a matter of adding one more argument to volume create command:
$ docker volume create --driver dostorage --name my-volume
3. Volume containers
There’s also old Docker pattern called data-only container. As name suggests, it’s a container with one or more attached volumes, who’s sole responsibility it to exist and provide that volumes for others. Container doesn’t even have to be running.
Once we’ve got such container, we can attach its volumes all at once to other containers with --volumes-from argument:
$ docker run -d --volumes-from mydatacontainer mysql
Honestly, I don’t see any benefits of having dedicated volume container. Same functionality could be achieved by regular data volumes with zero overhead. Maybe this feature made more sense before Docker 1.8 introduced Volumes API, but now it’s more confusing than useful.
Summary
Today we took a look at several ways to persist data inside of docker containers: host directory mount, data volume and volume container. While all three of them are supported by Docker, only the second one – data volume – looks like a ‘true’ way to do the job. After all, directory mounts work only at local host, volume containers don’t add much value comparing to named data volumes, and only data volumes work well both on local host and anywhere in the cloud, assuming you installed a plugin for that.
Helpful links :