💥 A 450M model just beat bigger VLAs on real robot tasks, and… 100% open source! [📍 bookmark for later] Came across SmolVLA, a new vision-language-action model for robotics that’s compact, fast, and trained entirely on open community datasets from LeRobot via Hugging Face. What stood out to me is how it matches or outperforms much larger models like ACT using noisy, real-world community data instead of giant private datasets. Why it’s worth a look ✅ 26% performance boost from pretraining on open-source data ✅ Runs on consumer hardware, even a MacBook ✅ 30% faster responses with async inference and smart architecture tweaks ✅ Strong results across Meta-World, LIBERO, SO100, and SO101 ✅ Fully open source: weights, code, training pipeline, eval stack They also introduced smart efficiency tricks like using fewer visual tokens, pulling outputs from mid-layer, and separating perception from action to make it all run fast. Useful links 📘 Blog: https://lnkd.in/dnZSHdqU 📦 Model: https://lnkd.in/dUZMzTDN 📄 Paper: arxiv.org/abs/2506.01844 SmolVLA is a strong case for what can happen when the robotics community shares data and builds in the open. Definitely worth keeping an eye on.
Advances in Compact Neural Networks for Robotics
Explore top LinkedIn content from expert professionals.
Summary
Advances in compact neural networks for robotics are making it possible for robots to use smaller, faster, and more energy-efficient artificial intelligence models, enabling them to perceive, understand, and act in the real world without needing bulky hardware. Compact neural networks combine vision, language, and action in one streamlined model, making robotics more accessible for everyday and industrial tasks.
- Explore open-source options: Try using community-trained vision-language-action models that run smoothly on standard computers without expensive equipment.
- Streamline training workflows: Speed up dataset labeling and model fine-tuning by using compact AI models that adapt quickly to new tasks and environments.
- Adopt power-saving hardware: Consider neuromorphic chips inspired by biology that allow robots to sense and process information quickly while using less energy.
-
-
Today, Science Robotics has published our work on the first drone performing fully #neuromorphic vision and control for autonomous flight! 🥳 Deep neural networks have led to amazing progress in Artificial Intelligence and promise to be a game-changer as well for autonomous robots 🤖. A major challenge is that the computing hardware for running deep neural networks can still be quite heavy and power consuming. This is particularly problematic for small robots like lightweight drones, for which most deep nets are currently out of reach. A new type of neuromorphic hardware draws inspiration from the efficiency of animal eyes 👁 and brains 🧠. Neuromorphic cameras do not record images at a fixed frame rate, but instead have the pixels track the brightness over time, sending a signal only when the brightness changes. These signals can now be sent to a neuromorphic processor, in which the neurons communicate with each other via binary spikes, simplifying calculations. The resulting asynchronous, sparse sensing and processing promises to be both quick and energy efficient! 🔋 In our article, we investigated how a spiking neural network (#SNN) can be trained and deployed on a neuromorphic processor for perceiving and controlling drone flight 🚁. Specifically, we split the network in two. First, we trained an SNN to transform the signals from a downward looking neuromorphic camera to estimates of the drone’s own motion. This network was trained on data coming from our drone itself, with self-supervised learning. Second, we used an artificial evolution 🦠🐒🚶♂️ to train another SNN for controlling a simulated drone. This network transformed the simulated drone’s motion into motor commands such as the drone’s orientation. We then merged the two SNNs 👩🏻🤝👩🏻 and deployed the resulting network on Intel Labs’ neuromorphic research chip "Loihi". The merged network immediately worked on the drone, successfully bridging the reality gap. Moreover, the results highlight the promises of neuromorphic sensing and processing: The network ran 10-64x faster 🏎💨 than a comparable network on a traditional embedded GPU and used 3x less energy. I want to first congratulate all co-authors at TU Delft | Aerospace Engineering: Federico Paredes Vallés, Jesse Hagenaars, Julien Dupeyroux, Stein Stroobants, and Yingfu Xu 🎉 Moreover, I would like to thank the Intel Labs' Neuromorphic Computing Lab and the Intel Neuromorphic Research Community (#INRC) for their support with Loihi (among others Mike Davies and Yulia Sandamirskaya). Finally, I would like to thank NWO (Dutch Research Council), the Air Force Office of Scientific Research (AFOSR) and Office of Naval Research Global (ONR Global) for funding this project. All relevant links can be found below. Delft University of Technology, Science Magazine #neuromorphic #spiking #SNN #spikingneuralnetworks #drones #AI #robotics #robot #opticalflow #control #realitygap
-
I've started a series of short experiments using advanced Vision-Language Models (#VLM) to improve #robot #perception. In the first article, I showed how simple prompt engineering can steer Grounded SAM 2 to produce impressive detection and segmentation results. However, the major challenge remains: most #robotic systems, including mine, lack GPUs powerful enough to run these large models in real time. In my latest experiment, I tackled this issue by using Grounded SAM 2 to auto-label a dataset and then fine-tuning a compact #YOLO v8 model. The result? A small, efficient model that detects and segments my SHL-1 robot in real time on its onboard #NVIDIA #Jetson computer! If you're working in #robotics or #computervision and want to skip the tedious process of manually labeling datasets, check out my article (code included). I explain how I fine-tuned a YOLO model in just a couple of hours instead of days. Thanks to Roboflow and its amazing #opensource tools for making all of this more straightforward. #AI #MachineLearning #DeepLearning
-
Drawing insights from biological signal processing, neuromorphic computing promises a substantially lower power solution to improve energy efficiency of visual odometry (VO) in robotics. Published in Nature Machine Intelligence, this novel approach develops a VO algorithm built from neuromorphic building blocks called resonator networks. Demonstrated on Intel’s Loihi neuromorphic chip, the network generates and stores a working memory of the visual environment, while at the same time estimating the changing location and orientation of the camera. The system outperforms deep learning approaches on standard VO benchmarks in both precision and efficiency – relying on less than 100,000 neurons without any training. This work is a key step in using neuromorphic computing hardware for fast and power-efficient VO and the related task of simultaneous localization and mapping (SLAM), enabling robots to navigate reliably. A companion paper explores how the neuromorphic resonator network can be applied to visual scene understanding. By formulating the generative model based on vector symbolic architectures (VSA), a scene can be described as a sum of vector products, which can then be efficiently factorized by a resonator network to infer objects and their poses. The work demonstrates a new path for solving problems of perception and many other complex inference problems using energy efficient neuromorphic algorithms and Intel hardware. Congratulations to researchers from the Institute of Neuroinformatics, University of Zurich and ETH Zurich, Accenture Labs, Redwood Center for Theoretical Neuroscience at UC Berkeley, and Intel Labs. Learn more about neuromorphic VO: https://lnkd.in/gJCVVMCz Learn how the VSA framework was developed for neuromorphic visual scene understanding based on a generative model (companion paper): https://lnkd.in/gjAENfpp #iamintel #Neuromorphic #Robotics
-
The robots are getting a new brain architecture. It's called VLA: Vision-Language-Action. Traditional robots work in steps. See. Think. Act. Each module separate. VLAs fuse all three into one model. The robot sees the environment, understands a language command, and outputs motor actions in a single pass. Figure's Helix is the first VLA to control a full humanoid upper body. Arms, hands, torso, head, individual fingers. Two robots working together on tasks they've never seen before. NVIDIA's Groot N1 uses a dual-system architecture. System 2 (a VLM) handles high-level reasoning. System 1 (a diffusion policy) handles fast motor control at 10ms latency. Google's Gemini Robotics extends Gemini 2.0 to the physical world. Dexterous enough to fold origami. Hugging Face released SmolVLA in June. 450 million parameters. Trained entirely on community datasets from LeRobot. Runs on consumer hardware. The architecture uses a truncated vision-language backbone with a flow-matching transformer for action prediction. Asynchronous inference decouples prediction from execution. 30% faster response time. The key insight is that VLMs already understand the world. They know what a cup is. They know what "put it on the table" means. The challenge was translating that knowledge into motion. VLAs solve the translation problem. The training data is interesting too. Hundreds of hours of robot teleoperation. Human videos. Synthetic environments. Figure trained Helix on 1,800+ task environments. SmolVLA trained on 30,000 episodes from 487 community datasets spanning labs and living rooms. VLAs compress vision, language, and proprioceptive state into a shared latent representation. The action decoder samples from this space. For coarse manipulation, this works. For fine-grained tasks like grasping or precision assembly, the latent space doesn't capture enough detail. Increasing latent dimensionality helps but increases compute requirements. Cross-embodiment transfer remains a challenge. A policy trained on one robot arm doesn't transfer to another with different kinematics. Sim-to-real gap persists. Policies trained in simulation fail in the real world due to differences in physics and visual appearance. Viewpoint changes and lighting differences degrade performance. UMA launched last week. Ex-Tesla, Google DeepMind, and Hugging Face team building general-purpose robots in Europe. Mobile industrial robots and compact humanoids. First pilots in logistics and manufacturing target 2026. We're still early. These systems struggle with novel environments and long-horizon tasks. But the architecture is converging. Vision, language, and action in one model. Humanoid robots that learn by watching humans work. That's the trajectory.
-
Google just entered the race for local physical intelligence with the launch of its on-device vision-language-action (VLA) model. Google DeepMind has introduced Gemini Robotics On-Device - a compact VLA model that brings generative AI directly onto physical robots. Why does this matter for industry? 🔹 The model runs entirely on-device, enabling real-time responses and autonomy even in offline or latency-sensitive environments. 🔹 It performs complex, two-handed tasks right out of the box - from folding clothes to advanced tool use. 🔹 It learns new behaviors from demonstrations, requiring just 50–100 examples to adapt on the fly - even in dynamic production settings. The raising competition in local VLAs is an important shift from cloud-based to embedded AI unlocks smarter, more responsive robotic systems tailored for manufacturing, logistics, and remote operations. As expected, the first wave of general-purpose LLMs is now giving way to specialized models with strong momentum toward edge intelligence - a critical enabler for the next generation of physical AI. Maria Danninger, Andrew Smith, Dietmar Guhe, Julius Bockamp, Christian Souche, Nino Scheidler
Explore categories
- Hospitality & Tourism
- Productivity
- Finance
- Soft Skills & Emotional Intelligence
- Project Management
- Education
- Technology
- Leadership
- Ecommerce
- User Experience
- Recruitment & HR
- Customer Experience
- Real Estate
- Marketing
- Sales
- Retail & Merchandising
- Science
- Supply Chain Management
- Future Of Work
- Consulting
- Writing
- Economics
- Employee Experience
- Healthcare
- Workplace Trends
- Fundraising
- Networking
- Corporate Social Responsibility
- Negotiation
- Communication
- Engineering
- Career
- Business Strategy
- Change Management
- Organizational Culture
- Design
- Innovation
- Event Planning
- Training & Development