#spatialai #python #3d #research | Dr. Florent POUX

Dr. Florent POUX

2w

The current modular stack of robotic perception (separate blocks for detection, depth, and tracking) is a temporary artifact of our computational limits. Latency and accumulated error are killed by unified architectures. By 2030, we won't be fusing distinct outputs; we will be querying a single, holistic scene representation that encodes geometry, semantics, and affordance simultaneously. We are moving from "pipelines" to "foundation world models." The simplification of the stack will be the greatest driver of reliability. Here is my prediction for the next decade of perception. 👇 #SpatialAI #python #3d #research

2 Comments

Dr. Florent POUX, graphic

Dr. Florent POUX 2w

Preparing for this future means understanding the fundamental blocks today. I break down the core components in "3D Data Science with Python." 👉 https://www.oreilly.com/library/view/3d-data-science/9781098161323/

Piotr Golczyk, graphic

Piotr Golczyk 2w

This move towards unified world models in robotics is a familiar narrative. For human control, the real test is how this holistic scene representation accounts for physics in the feedback loop, not just perception. How much internal predictive power needs to come from action-consequence, not only vision?

See more comments

To view or add a comment, sign in

More Relevant Posts

Andy Suri
1mo
Report this post
Modern AI driving systems rely on one fundamental capability: knowing where objects are, tracking them across time, and making decisions based on that information. Whether it is a self-driving vehicle locking onto a pedestrian across dozens of frames or a robotaxi monitoring its surroundings at speed, the core pipeline is the same — detect, track, and reason. I built a real-time people counting system that works on exactly that foundation. Using Ultralytics YOLO for detection and DeepSORT for identity-stable tracking, the system follows individuals across frames, assigns them persistent IDs, and counts directional crossings through a virtual corridor — the same building blocks that power production computer vision systems at scale. Built in Python with OpenCV, PyTorch, and an ONNX export path for flexible deployment, the full project is open source and available here: https://lnkd.in/e8BvMbdV
1 Comment
Like Comment
To view or add a comment, sign in
Nitheesh Kumar R
3w
Report this post
✅ Day 93 of 100 Days LeetCode Challenge Problem: 🔹 #657 – Robot Return to Origin 🔗 https://lnkd.in/gfZBi3XR Learning Journey: 🔹 Today’s problem involved tracking movements of a robot on a 2D plane. 🔹 I used a dictionary to map each move to its coordinate change: • 'U' → +1 (y-axis) • 'D' → -1 (y-axis) • 'R' → +1 (x-axis) • 'L' → -1 (x-axis) 🔹 Maintained a coordinate array ans = [0, 0] representing (x, y). 🔹 Iterated through each move and updated the respective axis. 🔹 Finally checked whether the robot returned to the origin [0, 0]. Concepts Used: 🔹 Coordinate Simulation 🔹 HashMap / Dictionary 🔹 String Traversal Key Insight: 🔹 The robot returns to origin only if horizontal and vertical movements cancel out. 🔹 Net displacement in both x and y directions must be zero. Complexity: 🔹 Time: O(n) 🔹 Space: O(1) #LeetCode #Algorithms #DataStructures #CodingInterview #100DaysOfCode #Python #ProblemSolving #LearningInPublic #TechCareers
Like Comment
To view or add a comment, sign in
Ankush Chitral
1mo
Report this post
🚫 Mouse? I Replaced It With Air Gestures 👋 Just hand gestures → full control. 👉 ☝️ Index finger → Move cursor 👉 👍 + ☝️ → Left click 👉 ☝️ + ✌️ → Right click No mouse. No touch. Just vision + code. 💡 What’s powering this? • OpenCV → real-time camera processing • MediaPipe → accurate hand tracking • PyAutoGUI → system cursor control 🎯 Result: Smooth, responsive, and feels like controlling your PC with air gestures. This is just the beginning — turning ideas into real-world interaction. 👇 Watch the demo (30 sec) Would you use this in daily life? #ComputerVision #OpenCV #MediaPipe #Python #Innovation #TechProjects #AI #StudentDeveloper #FutureTech #BuildInPublic #MachineLearning #DeveloperLife #Engineering #TechInnovation #SmartSystems #Automation
Like Comment
To view or add a comment, sign in
Birendra Kumar Sahu
1w
Report this post
Most AI systems in production look nothing like the simple diagrams you see online. This is the real architecture — Planner orchestrating specialist agents across 5 knowledge model types, with Context Engineering, MCP governance, Agent Evals, and a Critic that rejects low-confidence answers before they reach the user. I built this system and documented every decision in a book. 📖 Agentic AI Systems: From Prompts to Production Launch price → https://lnkd.in/gH-XmJh6 #AgenticAI #LLM #RAG #KnowledgeGraphs #LangGraph #AIEngineering #Python
Like Comment
To view or add a comment, sign in
Akshat Mishra
3w
Report this post
Day 24/30 ML Challenge; Bahdanau Attention; Standard Encoder-Decoder architectures suffer from catastrophic amnesia because they force the Encoder to compress entire sequences into a single fixed-size vector. To solve this, I engineered an Attention bridge that dynamically calculates alignment scores, allowing the Decoder to "look back" at specific Encoder hidden states at every single generation step. Core Mechanics; 1. Architecture : A Bidirectional GRU Encoder paired with an Attention-driven Unidirectional GRU Decoder. 2. Tokenization : Strict character-level mapping to prevent the infinite vocabulary explosion inherent to mathematical domains. 3. Evaluation : Exact Match Accuracy (EMA). Character Error Rate is useless in calculus; a single hallucinated token invalidates the entire equation. 4. Data Pipeline : Engineered a deterministic synthetic generator using SymPy to build abstract syntax trees and exact ground-truth targets. The architecture works, but the mathematical engine is too slow to scale. Full Explanation, Math and Python Code in Repository. Repo : https://lnkd.in/gj-pd8dg #MachineLearning #PyTorch #DeepLearning #ArtificialIntelligence #SequenceModeling #Engineering #DataScience #AI
Like Comment
To view or add a comment, sign in
Akshat Mishra
3w
Report this post
Day 25/30; Transformer Cross-Attention bridge; Problem : PGN-to-FEN Translation. Mapping a 1D sequence of events (chess move history) into a static 2D spatial snapshot (board layout). Architecture Implemented : 1. Zero Recurrence : Replaced the O(n) sequential bottleneck with O(1) parallel matrix multiplication using Scaled Dot-Product Attention. 2. Positional Encoding : Injected sine/cosine frequencies directly into the token embeddings to mathematically plot time geographically. 3. Multi-Head Bridge : The Decoder (Space) fires a Query (Q) at the Encoder's historical Keys (K), computing a dot-product alignment score to extract the exact Value (V) payload of the move. 4. VRAM Optimization : Aggressively cropped the context window and implemented bitwise padding + causal masking to prevent future-state leakage during autoregressive training. Genuinely one of the most interesting problems I have solved; I thank god for letting me have the ability to experience this beauty of a model and learn it. I'm gonna call this the QKV Gambit 😆 . The math is just beautiful. The model doesn’t "memorize" boards; it physically learns how to simulate piece trajectories across an attention matrix. Explanation, Math, Pytorch Implementation in the repository. Repo : https://lnkd.in/g8swe4yC #MachineLearning #DeepLearning #Transformers #AI #Python #DataScience #SoftwareEngineering #30DayChallenge
1 Comment
Like Comment
To view or add a comment, sign in
Nivetha Mayilvaganan
6d Edited
Report this post
Built a real-time computer vision system to automatically detect and capture high-quality moments of my dog using OpenCV and YOLOv8. What looked simple at first turned into a problem of instability, fast motion caused inconsistent detections, motion blur reduced accuracy, and lighting variations affected frame reliability. To handle this, I focused on: ● Improving detection consistency across frames ● Adding rule-based filtering to avoid false triggers ● Designing a centering mechanism to capture usable shots instead of random frames Instead of just detecting objects, the goal was to decide when a frame is worth capturing, which made this more of a decision problem than a detection task. Result: a system that behaves like a smart camera, capturing stable, well-centered moments in real time. Tech used: YOLOv8 for detection, combined with custom logic for temporal stability and frame selection. GitHub Repository: https://lnkd.in/ggTvDub4 #OpenCV #YOLO #Python #ComputerVision #AI #FunProject

4 Comments
Like Comment
To view or add a comment, sign in
Mohamed Ouazze
1w
Report this post
Cyber-Heist – Multi-Agent RL Control Room https://lnkd.in/eR65CHcF I built a full-stack reinforcement learning platform where multiple agents interact in a dynamic environment: Concept Thieves try to steal loot and escape Guards attempt to detect and capture them Limited vision, cameras, and obstacles create a challenging environment AI & RL Multi-agent environment using Gymnasium + PettingZoo Training with Stable-Baselines3 (PPO) Custom reward shaping & shared CNN+MLP policy GPU acceleration with PyTorch Full-Stack Architecture FastAPI backend (REST + WebSocket) React + Electron desktop control room Real-time simulation & live training metrics Features Step-by-step or real-time simulation Live training visualization Multi-agent coordination and competition Checkpointing & TensorBoard integration This project helped me explore: Multi-agent reinforcement learning Real-time systems (WebSocket streaming) Full-stack AI application design I’d love to hear your feedback or suggestions! #AI #ReinforcementLearning #MachineLearning #Python #FastAPI #React #Electron #DeepLearning #OpenSource
Like Comment
To view or add a comment, sign in
Hamed Dabiri
1w
Report this post
Still not sure what FFT actually does to your signal ⁉️ This one's for you. 💡 FFT (Fast Fourier Transform) is one of the most powerful tools in vibration analysis and condition monitoring — but staring at a spectrum plot without context rarely builds real intuition. So I built a 3D waterfall animation that shows the full picture in one view, with five planes stacked in perspective: Back plane — the raw vibration signal. It looks complex and noisy. That's because it is the sum of everything below it. 〰️ Middle three planes — the three pure sine waves hidden inside the signal, each scrolling in real time: → 3 Hz (blue): slow, dominant wave (rotor imbalance) → 10 Hz (green): medium oscillation (structural resonance) → 25 Hz (red): fast, subtle ripple (bearing defect) Dashed vertical lines mark every cycle boundary on each wave — so you can literally count "3 cycles in 1 second = 3 Hz". Frequency stops being abstract. Front plane: the FFT spectrum. Three peaks emerge one by one, each labelled with its frequency and physical meaning. The height of each peak matches the amplitude of the corresponding wave above it. The moment it clicks: trace a peak on the front plane back through the depth — and you land exactly on the sine wave that created it. That's what FFT does. It takes a messy signal and tells you precisely what frequencies are hiding inside, and how strong each one is. ***One thing worth remembering: FFT quality is only as good as your data. A higher sampling rate means you can resolve higher frequencies and detect faults earlier. enDAQ Sensors (Mide Technology Corporation) sample up to 20,000 Hz, which gives FFT the resolution it needs to be genuinely reliable for condition monitoring applications. #FFT #SignalProcessing #VibrationAnalysis #ConditionMonitoring #PredictiveMaintenance #StructuralHealthMonitoring #Python #DataScience #AL #ML #AgenticAI #enDAQ
Like Comment
To view or add a comment, sign in
Chee Keong Ng
5d
Report this post
The Fast Fourier Transform (FFT) is a way to take complex, changing data from a machine and break it down into simpler parts so it’s easier to understand. Instead of just seeing how a machine vibrates or behaves over time, FFT shows the different “frequency components” that make it up—like separating the different sources of motion inside the machine. For non-engineers, this helps reveal patterns or unusual behavior, making it easier to spot problems early, such as parts that are wearing out or not working properly. Hamed explains it very well in his post. Do reach out to me if you have problems with vibration measurement / monitoring.
Hamed Dabiri

PhD. | Machine Learning and Data Science | Signal Processing | FEA
1w

Still not sure what FFT actually does to your signal ⁉️ This one's for you. 💡 FFT (Fast Fourier Transform) is one of the most powerful tools in vibration analysis and condition monitoring — but staring at a spectrum plot without context rarely builds real intuition. So I built a 3D waterfall animation that shows the full picture in one view, with five planes stacked in perspective: Back plane — the raw vibration signal. It looks complex and noisy. That's because it is the sum of everything below it. 〰️ Middle three planes — the three pure sine waves hidden inside the signal, each scrolling in real time: → 3 Hz (blue): slow, dominant wave (rotor imbalance) → 10 Hz (green): medium oscillation (structural resonance) → 25 Hz (red): fast, subtle ripple (bearing defect) Dashed vertical lines mark every cycle boundary on each wave — so you can literally count "3 cycles in 1 second = 3 Hz". Frequency stops being abstract. Front plane: the FFT spectrum. Three peaks emerge one by one, each labelled with its frequency and physical meaning. The height of each peak matches the amplitude of the corresponding wave above it. The moment it clicks: trace a peak on the front plane back through the depth — and you land exactly on the sine wave that created it. That's what FFT does. It takes a messy signal and tells you precisely what frequencies are hiding inside, and how strong each one is. ***One thing worth remembering: FFT quality is only as good as your data. A higher sampling rate means you can resolve higher frequencies and detect faults earlier. enDAQ Sensors (Mide Technology Corporation) sample up to 20,000 Hz, which gives FFT the resolution it needs to be genuinely reliable for condition monitoring applications. #FFT #SignalProcessing #VibrationAnalysis #ConditionMonitoring #PredictiveMaintenance #StructuralHealthMonitoring #Python #DataScience #AL #ML #AgenticAI #enDAQ
Like Comment
To view or add a comment, sign in

Dr. Florent POUX

18,458 followers

View Profile Connect

More from this author

Explore content categories