Thoughts on the maturity of Computer Vision

Thoughts on the maturity of Computer Vision

I have been throwing a lot of effort into computer vision over the last six months, namely because that is my next course that I will be running at Digital Jersey.

However up until recently, I was never really driven towards computer vision over any other area, because I wanted to focus on the mathematical elements for training models. I have always been more interested in going beyond the models which are now becoming standard within the current deep learning frameworks, and that involves living in good old NumPy.

When building course collateral, what I have learned about the field of computer vision is that we are in very early stages, and that is purely down to the maturity of pre and post processing. Outlined below is why I think this is the case.

So to be clear, I had been using the following general definition for computer vision:

Computer Vision is a field of study that enables machines to see, identify and process images like humans

The way I prefer to look at it is in my illustration below, as we have to consider that real world scenarios are greater than our visual input sensors. While it is not agreed in total how many chemo, photo, mechano and thermo receptors the human has, we are confident that we have, approx ¼ billion photo receptors:

No alt text provided for this image

The reason for my excitement is as follows:

  • The input sensors that have been invented through time make a mockery of our eyes from a resolution perspective. Yes, depth is analogue in the eye, but pixel versus protoreceptor density is an engineering win hands down. Our protoreceptors from our eyes only offer 120 million monochrome neurons which is an 8k TV approx, and a paltry 6 million colour neurons, yes, about ¾ of a 4k TV. We have smartphones which have greater colour resolution.
  • We largely rely on artificial neural network models where we have loads of data, and these have been around for decades with more recent optimisation algorithms. However, this has served us well so far, where bayes error is often a lot higher than human error for structured models.
  • The really interesting part is the human’s pre and post processing ability, which I have found fascinating. I have spent most of my time using OpenCV for this part. What impresses me here, is while libraries such as OpenCV have lots of features to cover the shortfall for the deep learning frameworks, it appears we have hardly scratched the surface when comparing computer vision frameworks versus our own pre and post processing mechanisms built into our brain.

Consider this, for typical object recognition on a fully trained CNN using a deep neural network without pre-processing, we don’t really have great accuracy. That is due to the classifiers not being pre-processed to address elements such as image locality, viewpoint variation, clutter, scale, light intensity, intra class variation and clutter. We can get around some of these challenges through using pre and post processing with frameworks such as OpenCV. However, if we compare what we can do as humans, we are miles off. Even forgetting the accompanying receptors over the ¼ bn photoreceptors, our ability as humans to take in electrical signals and pre-process is a world away from what we have now.

I have three more subjects that I want to teach, beyond computer vision. Once done, I am all in on pre-processing for computer vision. I cannot wait to see how this field develops.

Really interesting Malcolm Mason, thanks for sharing your thoughts. I'm looking forward to the fruits of your labour on the computer vision course next month.

To view or add a comment, sign in

More articles by Malcolm Mason

Others also viewed

Explore content categories