Thoughts on the maturity of Computer Vision

Malcolm Mason

Published Oct 5, 2019

I have been throwing a lot of effort into computer vision over the last six months, namely because that is my next course that I will be running at Digital Jersey.

However up until recently, I was never really driven towards computer vision over any other area, because I wanted to focus on the mathematical elements for training models. I have always been more interested in going beyond the models which are now becoming standard within the current deep learning frameworks, and that involves living in good old NumPy.

When building course collateral, what I have learned about the field of computer vision is that we are in very early stages, and that is purely down to the maturity of pre and post processing. Outlined below is why I think this is the case.

So to be clear, I had been using the following general definition for computer vision:

Computer Vision is a field of study that enables machines to see, identify and process images like humans

The way I prefer to look at it is in my illustration below, as we have to consider that real world scenarios are greater than our visual input sensors. While it is not agreed in total how many chemo, photo, mechano and thermo receptors the human has, we are confident that we have, approx ¼ billion photo receptors:

The reason for my excitement is as follows:

The input sensors that have been invented through time make a mockery of our eyes from a resolution perspective. Yes, depth is analogue in the eye, but pixel versus protoreceptor density is an engineering win hands down. Our protoreceptors from our eyes only offer 120 million monochrome neurons which is an 8k TV approx, and a paltry 6 million colour neurons, yes, about ¾ of a 4k TV. We have smartphones which have greater colour resolution.
We largely rely on artificial neural network models where we have loads of data, and these have been around for decades with more recent optimisation algorithms. However, this has served us well so far, where bayes error is often a lot higher than human error for structured models.
The really interesting part is the human’s pre and post processing ability, which I have found fascinating. I have spent most of my time using OpenCV for this part. What impresses me here, is while libraries such as OpenCV have lots of features to cover the shortfall for the deep learning frameworks, it appears we have hardly scratched the surface when comparing computer vision frameworks versus our own pre and post processing mechanisms built into our brain.

Consider this, for typical object recognition on a fully trained CNN using a deep neural network without pre-processing, we don’t really have great accuracy. That is due to the classifiers not being pre-processed to address elements such as image locality, viewpoint variation, clutter, scale, light intensity, intra class variation and clutter. We can get around some of these challenges through using pre and post processing with frameworks such as OpenCV. However, if we compare what we can do as humans, we are miles off. Even forgetting the accompanying receptors over the ¼ bn photoreceptors, our ability as humans to take in electrical signals and pre-process is a world away from what we have now.

I have three more subjects that I want to teach, beyond computer vision. Once done, I am all in on pre-processing for computer vision. I cannot wait to see how this field develops.

Philip Godley 6y

Really interesting Malcolm Mason, thanks for sharing your thoughts. I'm looking forward to the fruits of your labour on the computer vision course next month.

1 Reaction

To view or add a comment, sign in

Thoughts on the maturity of Computer Vision

Malcolm Mason

More articles by Malcolm Mason

Others also viewed

📚 The AI @ ADI Reading Digest July Edition: Recommended research papers and books about AI at the edge.

What is Computer Vision? Past, Present and Future

Bringing MobileNetV1 to Life: Building an On-Device Object Classifier: From Theory to Practice.

"First Publication on IEEE Xplore"

Accelerating Matrix Products: Introducing RXTX, a Faster Algorithm for Computing XXT

DAC 2022: Day 3

Physics-informed machine learning: The motivation

There Is One Thing Computers Will Never Beat Us At

AI for Swarm Intelligence: Deconstructing the Dynamics of Emergent Order in High-Dimensional Agent Spaces

Explore content categories

More articles by Malcolm Mason

Understanding Bayes Error: How a low cost machine learning strategy could have a big impact

My Deep Learning hobby - A gradient ascent!

Using Deep Learning to find the right employee - gameplay surely!

The impact of deep learning on business

Deep learning as a hobby – my story so far

How does your End User Compute strategy match up in today’s connected world?

Is gamification within business really about user engagement or more an evolution in society?

Does business suffer due to poor Cloud service integration?