A Computer Vision Processor in one line of bash
Imagine you’re tasked with building a near real-time computer vision processor for work targeting NVIDIA Jetson or dGPU hardware. The requirements are as follows:
Before scrolling on, take a moment to think what approach you might use to build this yourself.
You might reach for Python, leveraging OpenCV or FFmpeg libraries, a deep learning framework like TensorFlow or PyTorch, Kafka clients, a web streaming service, and lots of glue code. The code would need to be carefully audited to ensure allocations are happening in the right places and memory isn't unnecessarily copied across thread boundaries or between the device and host unexpectedly.
After you've taken a moment to think through what your solutions' architecture would look like, take a look below to see this accomplished in a single bash command with GStreamer.
Just add GStreamer
Here's a simplified example of what this might look like with GStreamers' test tool gst-launch-1.0.
Recommended by LinkedIn
gst-launch-1.0 \
rtspsrc location=rtsp://camera \
! rtph264depay \
! h264parse \
! queue \
! nvv4l2decoder \
! nvstreammux batch-size=16 attach-sys-ts=False \
! nvinfer config-file-path=ssd_config.txt \
! nvtracker tracker-config-file=tracker_config.txt \
! nvdsosd \
! tee name=t \
t. ! queue \
! nvvideoconvert \
! nvh264enc \
! rtspclientsink location=rtsp://preview_sink \
t. ! queue \
! nvmsgconv \
! nvmsgbroker proto-lib=kafka_proto.so conn-str="kafka:9092" topic=vision_output
Running this simple (ok not that simple) bash one-liner & few GStreamer plugins creates a processor that:
We've also offloaded all threading to GStreamer; each queue in the pipeline represents a thread boundary, meaning receiving the video, performing inference, producing our preview feed and publishing our tracks are all happening in their own dedicated threads to maximize resource utilization & prevent these activities from blocking one another.
Furthermore, GStreamer buffers use reference counting and copy-on-write to ensure we minimize the copying of data as it moves around our pipeline & from device to host. As an example of this, both the Kafka publishing path and the preview video feed path will reference the same buffer until it is modified by either one.
And if we want to read from a local file instead of an RTSP stream we can swap out rtspsrc with filesrc instead.
I hope this bash one liner demonstrates how GStreamer can be used to tackle complicated computer vision projects quickly and efficiently. Have fun out there!
Oh dear! I need to try that! :)
"And to deliver something on time, we must make decisions which we pray at night never see the light of day." - I may not write my computer vision processors in python (because I do not write computer vision processors), but I recognize a killer line when I see one 😂 🫡