Large-Scale Video Classification

Large-Scale Video Classification

Convolutional neural networks represent a powerful model class for image recognition because of the prevalence of images and videos on the Internet. Many algorithms have been developed to analyse the semantic content of image and video for various applications, including for search and summary. Associated research objectives in this area include evaluating convolutional neural networks for large-scale video classification. Against a new data set of 1 million YouTube videos belonging to over 400 classes.

Previous related work in this field involved a three-stage standard approach to video classification. In the first stage, local visual features describing a region of video are extracted. Then in the second stage, features are combined into a fixed-sized video level description. And then finally, in the third stage, a classifier, for example, a support vector machine, is trained on the resulting bag of words to distinguish between the relevant visual classes. Convolution neural networks are biologically inspired classes of deep learning models that replace the three stages with a single neural network.

But there had been, to this point, relatively limited work on using convolution neural networks for video classification. In consideration of the types of research models used for this research, each video is treated as a collection of short fixed-size clips. There exist several options for extended connectivity and in particular, the research models use three general connectivity pattern groupings. Early, late, and slow fusion of information.

Multiresolution convolutional neural networks are used and achieve a compromise between training time and performance by using two streams of processing, the fovea and context streams. The fovea generally for visual acuity and color discrimination, and a context stream for the context. This results in manageable learning optimization while taking advantage of data augmentation and pre-processing, which reduced the effect of overfitting.

Results showed that convolutional neural networks architectures can learn powerful features from data that is weakly-labelled. And this greatly exceeds the performance of typical feature-based methods. It was also found that the slow fusion model consistently performed best.

And in the end, the results demonstrate that mixed resolution architecture consisting of a low-resolution context and a high-resolution fovea stream effectively sped up convolutional neural networks without sacrificing accuracy.

Looks very good Satesh. Do you do frame by frame object Identification and combine it with audio to text conversion and then follow up with subject meta data creation? If so then this seems to combine both picture and text ML using Natural Language Processing algorithms.

Like
Reply

To view or add a comment, sign in

More articles by Sateesh Singh

  • Strategic Management - Shareholder Approach

    This approach primarily emphasizes maximizing shareholder value and returns. Shareholders, who have invested in the…

  • Eviden (AtoS) tools on generative AI

    Generative AI, also known as generative artificial intelligence, refers to a subset of artificial intelligence…

  • Atos Quantum Learning Machine (QLM)

    The Atos Quantum Learning Machine (QLM) is a quantum computing simulator developed by Atos, a global information…

  • Building a Data-Driven Culture

    In the dynamic landscape of data-driven organizations, a well-structured data team is essential, comprising various…

  • AI Workforce Structures

    The landscape of the workforce is undergoing rapid transformations, particularly accelerated by the onset of COVID-19…

  • AI Lung Cancer Diagnosis

    In the initial step of identifying cancerous tumors and cells, we must first identify the lungs in each X-ray image…

  • Predicting Deforestation in Amazon Rainforests

    We begin by importing the necessary libraries, including scikit-learn, Pandas, NumPy, and various functions from Keras…

  • Computer Vision and GANs

    Object detection is a significant application of computer vision, with various uses such as image annotation, face…

  • CNN Basics and Evolution

    Now that we have witnessed numerous remarkable applications of Computer Vision in the real world, let's delve into…

  • The Impact of Computer Vision in Aerospace, Automobiles, and Robotics

    Computer vision has emerged as a crucial technology in various industries, revolutionizing efficiency and safety. In…

Others also viewed

Explore content categories