Large-Scale Video Classification

Sateesh Singh

Published Jul 11, 2022

Convolutional neural networks represent a powerful model class for image recognition because of the prevalence of images and videos on the Internet. Many algorithms have been developed to analyse the semantic content of image and video for various applications, including for search and summary. Associated research objectives in this area include evaluating convolutional neural networks for large-scale video classification. Against a new data set of 1 million YouTube videos belonging to over 400 classes.

Previous related work in this field involved a three-stage standard approach to video classification. In the first stage, local visual features describing a region of video are extracted. Then in the second stage, features are combined into a fixed-sized video level description. And then finally, in the third stage, a classifier, for example, a support vector machine, is trained on the resulting bag of words to distinguish between the relevant visual classes. Convolution neural networks are biologically inspired classes of deep learning models that replace the three stages with a single neural network.

But there had been, to this point, relatively limited work on using convolution neural networks for video classification. In consideration of the types of research models used for this research, each video is treated as a collection of short fixed-size clips. There exist several options for extended connectivity and in particular, the research models use three general connectivity pattern groupings. Early, late, and slow fusion of information.

Recommended by LinkedIn

The Path to GenAI: the power of Neural Networks

Carlo Consoli 1 year ago

Regularization, Parameter Norm Penalties, Dataset…

Himanshu Salunke 2 years ago

What Is Neural Network In Artificial Intelligence

Chandan Kumar Thakur 2 years ago

Multiresolution convolutional neural networks are used and achieve a compromise between training time and performance by using two streams of processing, the fovea and context streams. The fovea generally for visual acuity and color discrimination, and a context stream for the context. This results in manageable learning optimization while taking advantage of data augmentation and pre-processing, which reduced the effect of overfitting.

Results showed that convolutional neural networks architectures can learn powerful features from data that is weakly-labelled. And this greatly exceeds the performance of typical feature-based methods. It was also found that the slow fusion model consistently performed best.

And in the end, the results demonstrate that mixed resolution architecture consisting of a low-resolution context and a high-resolution fovea stream effectively sped up convolutional neural networks without sacrificing accuracy.

Abhitosh Tripathi 3y

Looks very good Satesh. Do you do frame by frame object Identification and combine it with audio to text conversion and then follow up with subject meta data creation? If so then this seems to combine both picture and text ML using Natural Language Processing algorithms.

To view or add a comment, sign in

Large-Scale Video Classification

Sateesh Singh

Recommended by LinkedIn

More articles by Sateesh Singh

Others also viewed

Neural Networks

AI: Taking A Peek Under The Hood. Part 2, Creating a Two-Layer Neural Network

Navigating the GenAI Frontier: Transformers, GPT, and the Path to Accelerated Innovation

Integrating Categorical Features in End-to-End ASR

Integrating Categorical Features in End-to-End ASR

Neural Networks-Working methods with examples

Understanding ReLU, LeakyReLU, and PReLU: A Comprehensive Guide

Unveiling Deep Learning: Insights from Biological Neurons

Demystifying Artificial Neural Networks

Explore content categories

Recommended by LinkedIn

More articles by Sateesh Singh

Strategic Management - Shareholder Approach

Eviden (AtoS) tools on generative AI

Atos Quantum Learning Machine (QLM)

Building a Data-Driven Culture

AI Workforce Structures

AI Lung Cancer Diagnosis

Predicting Deforestation in Amazon Rainforests

Computer Vision and GANs

CNN Basics and Evolution

The Impact of Computer Vision in Aerospace, Automobiles, and Robotics

Others also viewed

Neural Networks

AI: Taking A Peek Under The Hood. Part 2, Creating a Two-Layer Neural Network

Navigating the GenAI Frontier: Transformers, GPT, and the Path to Accelerated Innovation

Integrating Categorical Features in End-to-End ASR

Integrating Categorical Features in End-to-End ASR

Neural Networks-Working methods with examples

Understanding ReLU, LeakyReLU, and PReLU: A Comprehensive Guide

Unveiling Deep Learning: Insights from Biological Neurons

Demystifying Artificial Neural Networks

Explore content categories