A Computer Vision Processor in one line of bash

A Computer Vision Processor in one line of bash

Imagine you’re tasked with building a near real-time computer vision processor for work targeting NVIDIA Jetson or dGPU hardware. The requirements are as follows:

  • 🎥 Must support consuming from either MP4 files or H264-encoded RTSP streams
  • 🕒 Must synchronize to wall-clock time using RTP NTP timestamps (if available)
  • ⚡ Must use hardware-accelerated decoders
  • 📦 Must batch frames for inference
  • 🔁 Must minimize device↔host memory copies
  • 🧠 Must perform inference using a single-shot detector
  • 👣 Must track objects across frames
  • 📤 Must publish results to Kafka
  • 🧵 Must utilize threading for efficient parallelization
  • 📍 Must annotate frames with detection results
  • 📺 Must stream a preview feed for operator monitoring

Before scrolling on, take a moment to think what approach you might use to build this yourself.

You might reach for Python, leveraging OpenCV or FFmpeg libraries, a deep learning framework like TensorFlow or PyTorch, Kafka clients, a web streaming service, and lots of glue code. The code would need to be carefully audited to ensure allocations are happening in the right places and memory isn't unnecessarily copied across thread boundaries or between the device and host unexpectedly.

After you've taken a moment to think through what your solutions' architecture would look like, take a look below to see this accomplished in a single bash command with GStreamer.


Just add GStreamer

Here's a simplified example of what this might look like with GStreamers' test tool gst-launch-1.0.

gst-launch-1.0 \
    rtspsrc location=rtsp://camera \
    ! rtph264depay \
    ! h264parse \
    ! queue \
    ! nvv4l2decoder \
    ! nvstreammux batch-size=16 attach-sys-ts=False \
    ! nvinfer config-file-path=ssd_config.txt \
    ! nvtracker tracker-config-file=tracker_config.txt \
    ! nvdsosd \
    ! tee name=t \
        t. ! queue \
           ! nvvideoconvert \
           ! nvh264enc \
           ! rtspclientsink location=rtsp://preview_sink \
        t. ! queue \
           ! nvmsgconv \
           ! nvmsgbroker proto-lib=kafka_proto.so conn-str="kafka:9092" topic=vision_output        

Running this simple (ok not that simple) bash one-liner & few GStreamer plugins creates a processor that:

  • 🎥 Receives the video stream via RTSP
  • ⚡ Decodes H264 video using hardware acceleration
  • 📦 Batches our frames together
  • 🕒 Attaches our frames with NTP timestamps derived from RTP headers
  • 🤖 Runs inference via TensorRT
  • 👣 Tracks objects across frames
  • 📍 Overlays bounding boxes showing our tracks
  • 🔀 Branches to both publish our tracks to Kafka and created a live preview stream

We've also offloaded all threading to GStreamer; each queue in the pipeline represents a thread boundary, meaning receiving the video, performing inference, producing our preview feed and publishing our tracks are all happening in their own dedicated threads to maximize resource utilization & prevent these activities from blocking one another.

Furthermore, GStreamer buffers use reference counting and copy-on-write to ensure we minimize the copying of data as it moves around our pipeline & from device to host. As an example of this, both the Kafka publishing path and the preview video feed path will reference the same buffer until it is modified by either one.

And if we want to read from a local file instead of an RTSP stream we can swap out rtspsrc with filesrc instead.

I hope this bash one liner demonstrates how GStreamer can be used to tackle complicated computer vision projects quickly and efficiently. Have fun out there!

Oh dear! I need to try that! :)

"And to deliver something on time, we must make decisions which we pray at night never see the light of day." - I may not write my computer vision processors in python (because I do not write computer vision processors), but I recognize a killer line when I see one 😂 🫡

To view or add a comment, sign in

More articles by William Marsman

Others also viewed

Explore content categories