Perform Pose Estimation using Computer Vision

Keerthi Thaneeru

Published Jul 30, 2021

What is Computer Vision?

Computer vision is an interdisciplinary scientific field that deals with how computers can gain high-level understanding from digital images or videos. From the perspective of engineering, it seeks to understand and automate tasks that the human visual system can do

Computer vision tasks include methods for acquiring, processing, analyzing and understanding digital images, and extraction of high-dimensional data from the real world in order to produce numerical or symbolic information, e.g. in the forms of decisions. Understanding in this context means the transformation of visual images (the input of the retina) into descriptions of the world that make sense to thought processes and can elicit appropriate action. This image understanding can be seen as the disentangling of symbolic information from image data using models constructed with the aid of geometry, physics, statistics, and learning theory.

What is Pose Estimation?

Human pose estimation and tracking is a computer vision task that includes detecting, associating, and tracking semantic key points. Examples of semantic keypoints are “right shoulders”, “left knees” or the “left brake lights of vehicles”.

The performance of semantic keypoint tracking in live video footage requires high computational resources what has been limiting the accuracy of pose estimation. With the latest advances, new applications with real-time requirements become possible, such as self-driving cars and last-mile delivery robots.

Today, the most powerful image processing models are based on convolutional neural networks (CNNs). Hence, state-of-the-art methods are typically based on designing the CNN architecture tailored particularly for human pose inference.

Bottom-up vs. Top-down methods

All approaches for pose estimation can be grouped into bottom-up and top-down methods.

Bottom-up methods estimate each body joint first and then group them to form a unique pose. Bottom-up methods were pioneered with DeepCut (a method we will cover later in more detail).
Top-down methods run a person detector first and estimate body joints within the detected bounding boxes.

Pose Estimation with Deep Learning

With the rapid development of deep learning solutions in recent years, deep learning has been shown to outperform classical computer vision methods in various tasks, including image segmentation or object detection. Therefore, deep learning techniques brought significant advances and performance gains in pose estimation tasks.

Next, we will list and review the popular pose estimation methods.

The Most popular Pose Estimation methods

High-Resolution Net (HRNet)
OpenPose
DeepCut
AlphaPose

Deep Learning based Pose Estimation methods

High-Resolution Net(HRNet):It is neural network for human pose estimation. It is an architecture used in image processing problems to find what we know as key-points (joints) with respect to the specific object or person in an image. One advantage of this architecture over other architectures is that most existing methods match high-resolution representations of postures from low-resolution representations with respect to using high-low resolution networks. In place of this bias, the neural network maintains high-resolution representations when estimating postures.
OpenPose :one of the most popular bottom-up approaches for multi-person human pose estimation. This architecture features real-time, multi-person pose estimation. OpenPose is an open-sourced real-time multi-person detection, with high accuracy in detecting body, foot, hand, and facial keypoints. An advantage of OpenPose is that it is an API that gives users the flexibility of selecting source images from camera fields, webcams, and others, more importantly for embedded system applications
DeepCut: It another popular bottom-up approach for multi-person human pose estimation. DeepCut is used for detecting the poses of multiple people. The model works by detecting the number of people in an image and then predicting the joint locations for each image. DeepCut can be applied to videos or images with multi-persons/objects, for example, football, basketball, and more.
AlphaPose : It is a popular top-down method of pose estimation. It is useful for detecting poses in the presence of inaccurate human bounding boxes. That is, it is an optimal architecture for estimating human poses via optimally detected bounding boxes. AlphaPose architecture is applicable for detecting both single and multi-person poses in images or video fields.
DeepPose This is a human pose estimator that leverages the use of deep neural networks. The deep neural network (DNN) of DeepPose captures all joints, hinges a pooling layer, a convolution layer, and a fully-connected layer to form part of these layers.

Perform Pose Estimation using Computer Vision

Keerthi Thaneeru

What is Computer Vision?

What is Pose Estimation?

Bottom-up vs. Top-down methods

Recommended by LinkedIn

Pose Estimation with Deep Learning

The Most popular Pose Estimation methods

Deep Learning based Pose Estimation methods

Use Cases and Applications of Pose Estimation

More articles by Keerthi Thaneeru

Others also viewed

Uses of Linear Algebra in AI

Color and Classification

Computer Vision - the next big thing in the field of AI

Nvidia DIGITS — An Easy Way to Get Started with Deep Learning

What is missing for the success of the AI HW chips

A Deep Dive into Neural Style Transfer with Machine Learning

Pose Estimation using Computer Vision

Fatal Train Derailment (Watch), SE and Artificial Intelligence & more …

Architects of Intelligence

Principles of Generative AI

Explore content categories

What is Computer Vision?

What is Pose Estimation?

Bottom-up vs. Top-down methods

Recommended by LinkedIn

Pose Estimation with Deep Learning

The Most popular Pose Estimation methods

Deep Learning based Pose Estimation methods

Use Cases and Applications of Pose Estimation

More articles by Keerthi Thaneeru

Text extraction from given images

Face Detection in live video feed

Exploratory data analysis

Others also viewed

Uses of Linear Algebra in AI

Color and Classification

Computer Vision - the next big thing in the field of AI

Nvidia DIGITS — An Easy Way to Get Started with Deep Learning

What is missing for the success of the AI HW chips

A Deep Dive into Neural Style Transfer with Machine Learning

Pose Estimation using Computer Vision

Fatal Train Derailment (Watch), SE and Artificial Intelligence & more …

Architects of Intelligence

Principles of Generative AI

Similar topics

Techniques for Computer Vision

Explore content categories