A Manager’s Guide to Deep Learning
What the **** is deep learning?
This question is asked around the globe in industries no matter if you work with cars, wind turbines or people. Everybody seems to be talking about deep learning. The whole board of your company is going crazy for it as it fits so nicely with their other buzzwords, e.g., digitalization and big data.
We will get you to join in on the conversation today. In this post, I will do an executive summary of deep learning to give you enough background to either dominate the manager small talk on the topic or at least grasp what the engineers or data scientists in your team are talking about.
Let’s get the first important question out of the way.
Is deep learning all hype and no substance?
Yes and no. Honestly, mostly no. Yes, it is a fancy name describing neural networks with more than two layers (don’t panic, we’ll come back to that). Yes, most of the concepts have been around for decades.
However, most people agree that three recent developments have pushed deep learning from a concept known to a handful of people to board rooms worldwide. These three developments are (1) more data, (2) more computational power*, and (3) Geoffrey Hinton.
(This video is almost 9 years old. Goes to show how time flies...)
Obviously, I am exaggerating with (3) (or maybe not). What it stands for is a number of important breakthroughs in the way deep networks are trained (and much more technical stuff reserved for another article).
Personally, I would add (4), the amount of open sourced deep learning toolkits to that list. Toolkits which are either community-driven, e.g., Theano, or developed by the likes of Google and Microsoft. All four developments have pushed deep learning into the mainstream and are now unlocking the power of these networks to a broad range of companies and startups.
Onto the core question...
What the **** is deep learning?
At its core, deep learning is one of many machine learning methods. In the easiest use case (called supervised learning) these methods will take input data, e.g., an image, and tell you the label of that image, e.g., image of a dog wearing a hat.
Example of multiple objects classified in a given image (Original)
In this scenario, all machine learning methods are supposed to get better when you show them more pictures of dogs or hats or both. To do that, machine learning methods build models that map input data to labels.
Ok, so how is deep learning different?
There is one main practical difference that is easy to explain, if we take the object classification task as an example.
In the past, sophisticated processing pipelines have been necessary to transform the raw pixels of an image into so called features. These features are high-level representations of the image. One of the most commonly used features for object detection is called histogram of oriented gradients (HOG). A MIT project from 2013 offers a glimpse into what the machine "sees" when using these HOGs.
HOG representation (above) for a picture of a hummer (below) (Original)
Finding these features is a complex process called feature engineering involving a significant amount of domain knowledge. Feature engineering has often been the hardest part about solving a given learning problem in the past.
Deep learning pipelines are different in that each layer of the network can implicitly represent learned features (more on these layers later). Hence, we can work with the raw input data (e.g., the image pixels) without any explicit feature engineering**.
This has two main implications: applicability and scalability. Applicability since you can concentrate on applying (almost) the same set of methods to different domains with almost no involvement by domain experts. Scalability as you can scale the number of labels your model can detect without explicitly engineering the features relevant to a new label.
The last one is especially important if you think of any task where you have a potentially infinite amount of things to detect and cannot possibly engineer features for all of those. A good example is object recognition and mapping for automated driving.
Autonomous car mapping its environment (Original)
What is even more astonishing is the fact that deep learning outperforms (often by a wide margin) methods that rely on these expert features or solve perception problems only humans could solve so far.
The deep learning revolution has been fueled by these "simple tasks" revolving around sensory input, e.g., image classification, text recognition, speech, translation. These tasks are the reason deep learning is so popular right now. But due to the outstanding results and the popularity they are (in combination with other techniques) successfully applied to ever more complex tasks, e.g. decision making for robots.
That's awesome right? We move away from simple sensory tasks in the direction of something resembling intelligence.
So how does the knowledge look like that is stored in these models?
This is the current Achilles’ heel of deep learning. The models are basically black boxes. But let’s take a step back here and ask: How do neural networks work? Neural networks have been first described in the '40s and take their inspiration from a simple idea of how human brains work. The networks are composed of ‘neurons’ and two ‘neurons’ have a ‘synapse’ connecting them. An easy example is depicted in the picture below.
We have two green neurons with one outgoing synapse to the red neuron. When these green neurons fire (output some value) that value is then multiplied by a weight (e.g., w1) and then acts as the input to the red neuron.
Still here? Thank you and congratulations! Let’s take a look at an example of a “deep” network in the picture below.
Again, we have an input layer in green, e.g., the pixels of an image, and an output layer in red, e.g., the label of that image. In the middle, there are so called hidden layers. All the neurons in one layer are connected to all neurons on the next layer. Input to one neuron is, thus, depending on all neurons in the layer before and the respective weights. Every neuron contains a so called activation function, which will fire depending on this input.
Again, deep means that these networks have more than one hidden layer. These are called hidden, because what happens in these layers is basically a black box.
Is this still a manager’s guide?
Yes, sorry. I got carried away. Let’s look at some pretty pictures and give you some small talk items to reward you for reading this far. First, we start with pretty pictures. People try to visualize the complexity of the hidden layers to something humans can understand (e.g., two-dimensional plots, yeah!). One of those techniques is shown below. It’s a 2D representation of a network trying to separate seven classes. In this picture, we basically mapped seven classes onto two neurons. The colors are given by the class label and what you get is a nice star shaped object. It shows which of the classes are easy to separate given the input data and which might need additional data to separate.
Awesome, right? Ok, I have something even cooler and the perfect item for your next deep learning small talk. Enter the Deep Dream project by Google that explores (among other things) how deep networks “dream”!
You basically turn the network around, give it some input, tell him what to look for, e.g., a building, and get an output image. You can then see what the network ‘imagines’ this object to look like. Awesome stuff better explained by the Google Engineers!
What’s in it for you?
In his excellent 2014 Wired article on artificial intelligence, Kevin Kelly stated:
The business plans of the next 10,000 startups are easy to forecast: Take X and add AI. This is a big deal, and now it’s here.
If you are building this X today and this X could be enhanced by anything that involves image and object recognition, speech or text recognition or translation, automatic user interface adaptation, anything really that involves automatically classifying large amounts of “sensory” input into possible large amounts of labels and classes, you will either already be working with some of these deep learning methods or you need to do it now.
It might even be worth looking at some of the industrial use cases that currently still benefit from a lot of domain knowledge, e.g., predictive maintenance etc. Just make sure it’s not the only tool you are looking at and you do understand the limitations. You need big amounts of data to start with, some computational power and machine learning expertise to build really good models.
Summary
Deep learning is much more than hype. It probably stands for the biggest jump in artificial intelligence capabilities since the inception of the field. The rapid success across different industries is a good sign that deep learning is here to stay. It is, obviously, not the ultimate tool in solving all problems and it will not be achieving human-like intelligence any time soon.
But it has triggered a new artificial intelligence gold rush in both industry and academia and it has put us on a trajectory towards real machine intelligence. It can be considered as one of the driving forces into the future of computing and it might even help us understand our own intelligence better.
Notes and Acknowledgments
* Mostly the step from CPU to GPU architectures, which allow for a much higher level of parallelism
** Obviously, you can still do feature engineering
Thanks to Timo Nolle (https://www.garudax.id/in/timo-nolle-8a3a2a8a) for letting me use some images from his Master’s Thesis.
Terrific piece, Immanuel! In reference to your statement that "deep learning outperforms ... [expert systems]," do you think the statement that most Kaggle competitions are won with either (XGBoosted) Decision Trees or Deep Learning is due to the fact that domain knowledge is lacking in the various competitions, and so there's less feature engineering used? After all, even deep learning seldom needs more than a single hidden layer to capture most of the model's non-linearities.