What is so “deep” about Deep Learning

Valeri Voev

Published Aug 23, 2017

Recently my interest in AI and deep learning was rekindled when Andrew Ng announced his new specialization on Coursera. After completing the first course of the sequence (which I enjoyed very much), I’d like to share some thoughts. My PhD supervisor, Prof. Winfried Pohlmeier, during one of our lunch breaks some 15 years ago exclaimed “These artificial neural networks are just a fancy name for non-linear econometric models” – he was very right about this, at least when it comes to supervised learning.

In an interview with Andrew Ng, Geoffrey Hinton mentioned that basically back propagation and the whole “learning” bit is not much more than taking first derivatives of various functions to apply gradient descent – ideas that have been around for a very long time (the idea seems to trace at least as far back as an 1863 paper from Bernhard Riemann). Next time someone claims “But I will never use linear algebra and (matrix) calculus in my job” you can easily prove them wrong. 😊

Andrew Ng also jokingly reflected that deep learning became rather popular after it was rebranded as “deep” – before that it was just a boring “neural networks with many hidden layers” – yawn. We like “deep” – it’s just cool.

It’s also quite amazing to realize that one of the recent “breakthroughs” in deep learning was to replace sigmoid functions (e.g. a logistic function) with a so-called rectified linear unit (ReLU) which is simply f(x) = max(0, x), having the nice property that its derivative is equal to one and thus “not close” to zero for any x > 0 (as opposed to sigmoid functions like logistic or tanh whose derivative tends to zero for small/large x). Having a function with this property makes learning (aka gradient descent) faster. It is interesting to reflect that such simple ideas can have a big impact.

Hasn’t it been for the millions (maybe billions?) pictures of cats out there and the rapid advance in computing power, we would still be doing old-fashioned logistic regressions and “estimate parameters” using “maximum likelihood” rather than “deep learn” using “back prop”. 😊

Ricko W. H. Clausen 8y

Great article, Valeri Voev! I very much agree with it! The real reason for the deep learning hype really lies with the accelerated optimization/learning due to the significant improvement in hardware(gpu), and not with the theory itself.

1 Reaction

Valeri Voev 8y

I should also mention that the above cat image hasn't been published before and I'd like to donate it to the image recognition community 🙂

1 Reaction

Valeri Voev 8y

Winfried Pohlmeier, Ionut Alexandru Militaru, Ricko W. H. Clausen, Anders Gottfred Aaen

See more comments

To view or add a comment, sign in

What is so “deep” about Deep Learning

Valeri Voev

More articles by Valeri Voev

Others also viewed

The Quest for the Master Algorithm in Machine Learning and AI By: Jose Segadaes

TensorFlow - Aamir P

MLops : Session - 11

Should we expect model generalizability in biomedical machine learning?

Do you have someone who can turn you off when they want? And why do i?

Machine Learning Series

Introducing DGL: graph + deep learning

Tricky Machine learning question asked during interview

Challenges and Benefits of Deep Learning in AI

Deep Learning in NLP

Gradient Descent Variants

Deep Learning for Personalized Ecommerce

Explore content categories