Scrolling web browser by hand gesture using CV2 module

Abu Nayem

Published Mar 5, 2021

I have practised this project cause of the learning purpose. Using this program, anyone can scroll through any website. The program can detect any red colour object and by helping this object the program can scroll down-to-top. You need to use any red colour object to scroll.

Doing this project, I have used three modules. The first one is cv2 or computer vision 2, which is mostly used in computer vision, machine language and image processing. This python module commonly uses in real-time project. The second one is the NumPy library, which is working for arrays and 50x faster than a traditional array. It has worked for linear algebra, Fourier transform and matrices. The last one is pyautogui. it's special to control the mouse and keyboard.

import cv2
import numpy as np
import pyautogui

Define a range of red colour, which is defined as BGR format.

lower_red = np.array([0, 80, 80])
upper_red = np.array([10, 255, 255])
prev_y = 0

Using this VideoCapture we define an object. Device index 0 or 1 we pass as an argument. You can capture frame-by-frame. In the end, don't forget to release the capture.

cap = cv2.VideoCapture(0)

The read function gives a boolean value, if the camera work correctly, then it provides True, otherwise False.

ret, frame = cap.read()

The resize function is used to scaling the image. The first parameter refers to an image source, the second one the pixel size of images and the last one is the interpolation, here we used INTER_AREA to shrinking any image.

frame = cv2.resize(frame, (342, 192), interpolation=cv2.INTER_AREA)

The cvtColor or convert colour one to another. There is 150 colour space conversion in this function.

hsv = cv2.cvtColor(frame, cv2.COLOR_BGR2HSV)

The inRange function uses to identify the colour range. Here the hsv is the image source, lower_red and the upper_rad value indicates the rage of colour.

mask = cv2.inRange(hsv, lower_red, upper_red)

The findContours is highly used in shape analysis, object detection and recognition. As you can see there are three parameters, the first one working as a source, the second one to make retrieval mode and the last one is contour approximation method.

contours, hierarchy = cv2.findContours(mask, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)

The contourArea function is used to calculate the detected area by the findContours function.

area = cv2.contourArea(c)

This function considers a straight rectangle. x and y define the top-left corner and identify the width and height.

x, y, w, h = cv2.boundingRect(c)

Draw a rectangle with this function.

cv2.rectangle(frame, (x, y), (x+w, y+h), (0, 255, 0), 2)

Pretty awesome logic to move down by space key.

if y < prev_y:
    pyautogui.press('space')

Make two windows. One for Hue Saturation Value and another one for the normal image.

cv2.imshow("HSV Frame", mask)
cv2.imshow('Video Frame', frame)

Wait for the press a key

if cv2.waitKey(10) == ord("q"):
        break

All of the code included in the while loop.

while True:
    ret, frame = cap.read()
    frame = cv2.resize(frame, (342, 192), interpolation=cv2.INTER_AREA)
    hsv = cv2.cvtColor(frame, cv2.COLOR_BGR2HSV)
    mask = cv2.inRange(hsv, lower_red, upper_red)
    contours, hierarchy = cv2.findContours(mask, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
    for c in contours:
        area = cv2.contourArea(c)
        if area > 400:
            x, y, w, h = cv2.boundingRect(c)
            cv2.rectangle(frame, (x, y), (x+w, y+h), (0, 255, 0), 2)
            if y < prev_y:
                pyautogui.press('space')
            prev_y = y
    
    cv2.imshow("HSV Frame", mask)
    cv2.imshow('Video Frame', frame)    
    if cv2.waitKey(10) == ord("q"):
        break

It is used to release an acquire lock. If the lock is locked, the function does the unlock. This function can demolish all of the windows we create.

cap.release()
cv2.destroyAllWindows()

To get source code: https://bit.ly/3rxIKUc

SOURAV A K 3y

Keep going..

To view or add a comment, sign in

See all

Scrolling web browser by hand gesture using CV2 module

Abu Nayem

More articles by this author

Others also viewed

Bootstrap Resampling

ML.NET 1.0 Released - generally available

Search Algorithms Simplified - Using Python

DEPLOYING ML CLASSIFIER INTO WEB APP USING FASTAPI

PCA for image reconstruction, from scratch

Mastering XGBoost: From Basics to Advanced Techniques with a Complete Use Case

RAG using LangChain : Part 2- Text Splitters and Embeddings

Automating Tasks with Google Colab: A Step-by-Step Guide to Using Cron Jobs

Building an AI-Powered Q&A Agent with Jupyter Notebook and Streamlit

Building Applications with LLMs

Explore content categories

Why Triplet Loss is the "Secret Sauce" of Modern Search & Verification

Jan 24, 2026

From Raw Data to Intelligent Databases: A Beginner’s Guide to Data Management

Dec 19, 2025

Stop Guessing, Start Knowing: How to Turn Raw Data into Knowledge

Nov 21, 2025

What I learn from the "Foundations: Data, Data, Everywhere" course

May 2, 2021

A motivation to learn python for CSE students

Apr 24, 2021

Text to speech converting modules by python

Feb 20, 2021

A simple project (Digital Clock) by Python language

Feb 17, 2021

Derivative the Gradient Descent of Linear and Logistic Regression

Aug 18, 2020

Voice Formulation in physiological aspect

Sep 11, 2019

Human Computing Interaction

Oct 28, 2018