Persistent data in Docker volumes

Radhouen Assakra

Published Oct 21, 2018

As Docker containers supposed to be small, single process and easy replaceable instances, it’s not particularly clear how persistent data fits into that picture. Imagine you have MySQL container which you decided to upgrade. What will you do with its database files? In containers world “upgrade” means “nuke an old one, start a new one” and your data will turn into radioactive ashes with the rest of container’s file system.

However, along with the problem Docker also provides a solution: Docker volumes.

How we can manage docker data ?

Generally speaking, Docker volume is just a host directory mounted to container’s file system. As it no longer belongs to container’s FS, it’s not a problem to delete one container, create another one and mount existing data volume to it. There’re several approaches of how to use Docker volumes and today we’ll take a look at three of them.

Docker has two options for containers to store files in the host machine, so that the files are persisted even after the container stops: volumes, and bind mounts. If you’re running Docker on Linux you can also use a tmpfs mount.

Keep reading for more information about these two ways of persisting data.

Choose the right type of mount

No matter which type of mount you choose to use, the data looks the same from within the container. It is exposed as either a directory or an individual file in the container’s filesystem.

An easy way to visualize the difference among volumes, bind mounts, and tmpfs mounts is to think about where the data lives on the Docker host.

Volumes (since docker version 1.9) are stored in a part of the host filesystem which is managed by Docker (/var/lib/docker/volumes/on Linux). Non-Docker processes should not modify this part of the filesystem. Volumes are the best way to persist data in Docker.
Bind mounts may be stored anywhere on the host system. They may even be important system files or directories. Non-Docker processes on the Docker host or a Docker container can modify them at any time.
tmpfs mounts are stored in the host system’s memory only, and are never written to the host system’s filesystem.

1. Simple directory mounts

The simplest approach is mounting arbitrary host directory to container’s FS. Imagine you you’re running mysql container and want to preserve its data files during upgrades, or just perform occasional backups. We can map host directory to container’s data directory, so anything mysql writes to e.g. /var/lib/mysql will end up in relative safety of host FS:

$ docker run -d \
  -e MYSQL_ROOT_PASSWORD=my-secret-pw \
  -v /home/docker/mysql-data:/var/lib/mysql \
  --name mysqlserver \
  mysql

When we destroy mysqlserver, its data will survive.

$ docker stop mysqlserver
$ docker rm mysqlserver
$ ls /home/docker/mysql-data
#auto.cnf            client-cert.pem     ib_logfile0         mysql/              public_key.pem      sys/
#ca-key.pem          client-key.pem      ib_logfile1         performance_schema/ server-cert.pem
#ca.pem              ib_buffer_pool      ibdata1             private_key.pem     server-key.pem

Now we can start new mysql container, mount the same data directory to it and continue as if nothing has happened.

Read-only mounts

If container doesn’t supposed to update the data in mounted directory, it can be made read-only by simply adding :ro suffix. Obviously, it doesn’t make much sense to do so for database mount, but for a web server it’s quite logical:

$ docker run 
  -v /home/docker/www:/usr/share/nginx/html:ro
  ...

Listing existing mounts

Don’t even try to remember what exactly is mounted to what container – our brains doesn’t work like that. Instead, docker inspect %container% not only will tell network, host and container settings, but also what volumes and mounts it uses at the moment:

$ docker inspect mysqlserver
#    ...
#    "Mounts": [
#            {
#                "Source": "/home/docker/mysql-data",
#                "Destination": "/var/lib/mysql",
#                "RW": true,
#                ...
#            }
#        ],
#    ...

2. Docker data volumes

Let’s try another thing: run the first mysql example one more time, but this time skip -v (volume) argument. Then, if we inspect it, it’ll be hard to not notice that it still has a volume attached to it!

$ docker run -d \
  -e MYSQL_ROOT_PASSWORD=my-secret-pw \
  --name mysqlserver \
  mysql

$ docker inspect mysqlserver
#...
#       "Mounts": [
#            {
#                "Name": "aac828f46e6ff00fedaf45bc21e89523bd57663d3d58bf0e3c73e8cb4092d768",
#                "Source": "/mnt/sda1/var/lib/docker/volumes/aac828f46e6ff00fedaf45bc21e89523bd57663d3d58bf0e3c73e8cb4092d768/_data",
#                "Destination": "/var/lib/mysql",
#                ...
#            }
#        ],
#...

Yes, this time it has a weird name, and mount source path is much longer, but it’s still points to /var/lib/mysql data folder. How is that possible?

The answer lies in the bottom of mysql image Dockerfile:

...
VOLUME /var/lib/mysql
...

VOLUME /var/lib/mysql creates a new volume attached to /var/lib/mysql. It behaves slightly like a regular directory mount, but actually is not quite the same. Whenever Docker sees a volume declaration, it’ll generate a unique 64 byte name for it, create new mount directory (a volume) in host FS – /var/lib/docker/volumes/%name%, and when container starts the first time, unlike with regular host directory mounts, it’ll copy whatever container had in /var/lib/mysql to the volume, and after that will use the content from the volume, not container’s FS. That has an important implication: when I create a container from newer mysql image that also has newer content in /var/lib/mysql, if the volume already exists, that new content will be ignored.

Creating data volumes from a command line

Volumes declarations don’t have to be in Dockerfile, we can create them from a command line as well:

$ docker run \
  -v /data \ 
  ubuntu \
  touch /data/README.md

This command creates a new volume mounted to /data, then touch /data/README.md creates an empty file in it and after that container immediately exits.

Unlike with regular host mounts, docker keeps track of all volumes it ever created and provides some docker volume ... commands for them. One of them is ls, which prints out existing volumes. By using it we can find the volume we just created:

$ docker volume ls
#DRIVER              VOLUME NAME
#local               b45436c7f8bab37c0bfe998f962001226470cbc1dfe4ac59cc0287276e3d7a64

Docker volumes don’t exclusively belong to containers that created them. Using the name, we can be mount them to any number of other containers:

Classic example of using this feature is connecting a volume to a container that is going to make data backup:

Creating volumes without containers

You don’t even need a container to create a volume. We could use docker volume create command to create a volume in advance and then mount it to any container that needs one. The beauty of this approach is that we can choose the volume name and once and for all get rid of that 64 chars monstrosity:

$ docker run -ti \
  -v b45436c7f8bab37c0bfe998f962001226470cbc1dfe4ac59cc0287276e3d7a64:/data \
  ubuntu bash

root@21bd05dfa2dd:/# cd /data
root@21bd05dfa2dd:/data# ls
README.md

Creating shared-storage data volumes

volume create command also provides something much more powerful than just selecting a name. Until now we’ve been creating volumes that are hosted in current file system, which is not particularly scalable. In order to address that Docker supports volume plugins that enable storing volume data in other locations: Azure, DigitalOcean, and several others.

Installing and configuring such plugins might be tricky sometimes, but after it’s done, using a plugin is just a matter of adding one more argument to volume create command:

$ docker volume create --driver dostorage --name my-volume

3. Volume containers

There’s also old Docker pattern called data-only container. As name suggests, it’s a container with one or more attached volumes, who’s sole responsibility it to exist and provide that volumes for others. Container doesn’t even have to be running.

Once we’ve got such container, we can attach its volumes all at once to other containers with --volumes-from argument:

$ docker run -d --volumes-from mydatacontainer mysql

Honestly, I don’t see any benefits of having dedicated volume container. Same functionality could be achieved by regular data volumes with zero overhead. Maybe this feature made more sense before Docker 1.8 introduced Volumes API, but now it’s more confusing than useful.

Summary

Today we took a look at several ways to persist data inside of docker containers: host directory mount, data volume and volume container. While all three of them are supported by Docker, only the second one – data volume – looks like a ‘true’ way to do the job. After all, directory mounts work only at local host, volume containers don’t add much value comparing to named data volumes, and only data volumes work well both on local host and anywhere in the cloud, assuming you installed a plugin for that.

Helpful links :

To view or add a comment, sign in

Persistent data in Docker volumes

Radhouen Assakra

How we can manage docker data ?

Choose the right type of mount

1. Simple directory mounts

Read-only mounts

Listing existing mounts

2. Docker data volumes

Creating data volumes from a command line

Creating volumes without containers

Creating shared-storage data volumes

3. Volume containers

Summary

More articles by Radhouen Assakra

Others also viewed

SQL Performance Watch: A Story-First Guide to MySQL Database Health

The Silent Performance Killer: How Buffer Caches Shape MySQL & PostgreSQL Speed

Inside DataClarity 2023.4: improved embedded analytics experience, data preparation, and more

Announcing SQL Server 2025 Public Preview

Monitoring On-Premise SQL Servers using Azure SQL Insights

How Marten Happily Makes PostgreSQL a Robust Event Store

Connect to private Azure SQL database with Logic Apps Single tenant (Preview)

SQL Server 2022, always innovating

The "Missing Updates" Mystery with Debezium and OCI HeatWave MySQL

Explore content categories

How we can manage docker data ?

Choose the right type of mount

1. Simple directory mounts

Read-only mounts

Listing existing mounts

2. Docker data volumes

Creating data volumes from a command line

Creating volumes without containers

Creating shared-storage data volumes

3. Volume containers

Summary

More articles by Radhouen Assakra

Ansible Vs Terraform

Deploy Angular using Docker , Ansible and Packer

Reusable Angular components

Angular 7 with Azure DevOps Build Pipeline

Run Golang in production

Continuous Integration - Jenkins vs Travis-CI

10 Best Jenkins Plugins For DevOps

Jenkins, InfluxDB, Docker and Grafana

Systemd : Basic commands

INFRASTRUCTURE AS CODE with Packer

Others also viewed

SQL Performance Watch: A Story-First Guide to MySQL Database Health

The Silent Performance Killer: How Buffer Caches Shape MySQL & PostgreSQL Speed

Inside DataClarity 2023.4: improved embedded analytics experience, data preparation, and more

Announcing SQL Server 2025 Public Preview

Monitoring On-Premise SQL Servers using Azure SQL Insights

How Marten Happily Makes PostgreSQL a Robust Event Store

Connect to private Azure SQL database with Logic Apps Single tenant (Preview)

SQL Server 2022, always innovating

The "Missing Updates" Mystery with Debezium and OCI HeatWave MySQL

Explore content categories