Quickstart : Docker + GPU + Deeplearning

Satya Tumati

Published Jun 5, 2018

Hello Network, after a few articles on medium about popular NLP papers, I decided to switch to LinkedIn articles and begin my stride here. I am just an amateur and am really excited about sharing a thing or two that I learn which "may" help you. Wish me luck!

So earlier this quarter, I decided to do as many possible Deep Learning related projects as part of my coursework. While the theory seemed so intuitive and natural, it was tough to get started with the environment. Running on CPU using a Jupyter notebook is a cakewalk if you use Keras or Pytorch. But as I tried to do multiple projects and each of them has a different set of requirements, the environment set up was messy. I got access to one of our research lab's servers which have GPUs available. It was even difficult using the servers as I don't have sudo permissions and has to download every package I want to use. Arggh! That was painful. I spent most of the time initially if not all the time figuring out the setup itself. Have you also faced such situations? If yes, I have a simple and powerful solution for you. Docker!

Yes, I knew, I knew it is popular, developers are using it all the time recently. But never kind of felt the need to use it. As always, it's primetime that I leverage it. I spent a day to understand it and realised how powerful it is. With that said, I will conclude the intro section of this blog and walk you through the steps to easily train your deep learning models on GPUs.

Wait, What is Docker anyway?

It's not a VM. It's a lightweight container. All the containers use the Host OS instead of having their own Operating Systems separately.

Docker containers are easy to create within a split second using a simple command. They are isolated from other containers so you can have different environments in each container.

How do you start a container?

Just follow the 4 steps.

Step 1: Select a base image eg: Ubuntu/ nvidia:pytorch. You don't need to write the OS. You need to choose the base image on top of which you want to create an environment you like. It's like the crest of pizza and you can choose your toppings.

Step 2: Once you have selected a base image(crest), you need to mention how you are going to modify or what new libraries you want on top of this base image. These can be any packages, environment variables, any bash command you want to run. The below is a sample Dockerfile I wrote. The FROM tag specifies the base image and the RUN tag mentions the commands you want to run on this base image. A Dockerfile basically acts as a table of contents for the Docker image.

Step 3:

docker build -t <image_name> <path/to/Dockerfile>

Just run the command and name your image. It generates an image which resides in the system. It's like any other image and occupies some space. One great thing about Docker is that it caches the intermediate images generated. Here there are 8 RUN commands so there will be 7 intermediate images cached. So if you create a new Dockerfile similar to this but has the last 3 RUN statements changed, it will take the 5th cached image and run the changed commands on top of it.

You can list the images in the system by simply doing "docker images"

Step 4: The final step is creating a running instance of the container called container. Just like a program's running instance is a process, an image's running instance is a container. There can be multiple containers of the same image and they will still be isolated.

docker run --rm -ti <image_name>

There are multiple options for the run command. https://github.com/docker/labs has all the information you need to know.

Now, our docker is ready with an isolated environment. You can do anything in that without effecting the other containers or the host machine.

How can I use GPU?

This is the simplest part. If you have Nvidia GPU drivers, you can install nvidia-docker using the instructions at https://github.com/nvidia/nvidia-docker/wiki/Installation-(version-1.0)

So instead of "docker run" , we just need to use "nvidia-docker run" to start a container with access to GPUs. But of course, you need to mention what GPUs are you allocating to the container. In my machine, I have 8 GPUs available and I am allocating 5 and 6 to my docker container.

NV_GPU=5,6 nvidia-docker run --rm -ti <image_name>

Note: Inside the container the GPU device ids will still be seen as 0,1.

I also wanted to mention that if you are a member of Nvidia DGX registry you will have an access to a number of Docker images that have all the required libraries and GPU compatible environment set up in them. The base image I am using is a pytorch image from the DGX registry.

I have added a short demo of how to run nvidia-docker incase you want to see.

Travis Huang 7y

This is awesome Reddy! thanks for sharing

To view or add a comment, sign in

Quickstart : Docker + GPU + Deeplearning

Satya Tumati

Wait, What is Docker anyway?

How do you start a container?

How can I use GPU?

More articles by Satya Tumati

Others also viewed

Getting Started with PyTorch: How to Install It on Your Machine

A Comprehensive Guide to Training SSD with ML-Commons 0.7 Benchmark and PyTorch

PyTorch 2.5.0: A Major Release for Advancing AI Development

The Easiest Way of Running Llama 3 Locally

Getting started with Quantization in PyTorch

Building A Personal Deep Learning Machine

FROM LAPTOP-SIZE TRAINING TO A GPU ARMY: HOW YOU CAN DO IT

Kubernetes for Distributed Machine Learning

PyTorch Mobile vs. TensorFlow Lite: A comparison for on-device machine learning

TensorFlow vs PyTorch: Why Meta Is Now Leveraging Google’s Chips

Explore content categories

Wait, What is Docker anyway?

How do you start a container?

How can I use GPU?

More articles by Satya Tumati

Spice Fleets to GPU Farms: How Civilization Funds Its Leaps

Hierarchical Deep CNN for Image Recognition

Others also viewed

Getting Started with PyTorch: How to Install It on Your Machine

A Comprehensive Guide to Training SSD with ML-Commons 0.7 Benchmark and PyTorch

PyTorch 2.5.0: A Major Release for Advancing AI Development

The Easiest Way of Running Llama 3 Locally

Getting started with Quantization in PyTorch

Building A Personal Deep Learning Machine

FROM LAPTOP-SIZE TRAINING TO A GPU ARMY: HOW YOU CAN DO IT

Kubernetes for Distributed Machine Learning

PyTorch Mobile vs. TensorFlow Lite: A comparison for on-device machine learning

TensorFlow vs PyTorch: Why Meta Is Now Leveraging Google’s Chips

Explore content categories