We trained a humanoid with 22-DoF dexterous hands to assemble model cars, operate syringes, sort poker cards, fold/roll shirts, all learned primarily from 20,000+ hours of egocentric human video with no robot in the loop. Humans are the most scalable embodiment on the planet. We discovered a near-perfect log-linear scaling law (R² = 0.998) between human video volume and action prediction loss, and this loss directly predicts real-robot success rate. Humanoid robots will be the end game, because they are the practical form factor with minimal embodiment gap from humans. Call it the Bitter Lesson of robot hardware: the kinematic similarity lets us simply retarget human finger motion onto dexterous robot hand joints. No learned embeddings, no fancy transfer algorithms needed. Relative wrist motion + retargeted 22-DoF finger actions serve as a unified action space that carries through from pre-training to robot execution. Our recipe is called "EgoScale": - Pre-train GR00T N1.5 on 20K hours of human video, mid-train with only 4 hours (!) of robot play data with Sharpa hands. 54% gains over training from scratch across 5 highly dexterous tasks. - Most surprising result: a *single* teleop demo is sufficient to learn a never-before-seen task. Our recipe enables extreme data efficiency. - Although we pre-train in 22-DoF hand joint space, the policy transfers to a Unitree G1 with 7-DoF tri-finger hands. 30%+ gains over training on G1 data alone. The scalable path to robot dexterity was never more robots. It was always us. - Website: https://lnkd.in/gxzgeP-2 - Paper: https://lnkd.in/g7PJdz_8
Training Robots Without Pre-Programmed Models
Explore top LinkedIn content from expert professionals.
Summary
Training robots without pre-programmed models means teaching machines to learn from their own experiences or from human demonstrations, instead of relying on detailed instructions or digital blueprints set in advance. This approach allows robots to adapt to new tasks, environments, and challenges in real time, making them more flexible and resilient.
- Encourage exploration: Let robots try different movements and actions so they can learn to correct their mistakes and build practical skills through trial and error.
- Use real-world data: Feed robots human demonstrations or allow them to observe themselves via cameras so they can mimic complex behaviors without needing perfect, pre-collected datasets.
- Prioritize resilience: Teach robots to recover from unpredictable situations instead of focusing only on precision, so they can handle messy environments and unexpected challenges.
-
-
Massachusetts Institute of Technology researchers just dropped something wild; a system that lets robots learn how to control themselves just by watching their own movements with a camera. No fancy sensors. No hand-coded models. Just vision. Think about that for a second. Right now, most robots rely on precise digital models to function - like a blueprint telling them exactly how their joints should bend, how much force to apply, etc. But what if the robot could just... figure it out by experimenting, like a baby flailing its arms until it learns to grab things? That’s what Neural Jacobian Fields (NJF) does. It lets a robot wiggle around randomly, observe itself through a camera, and build its own internal "sense" of how its body responds to commands. The implications? 1) Cheaper, more adaptable robots - No need for expensive embedded sensors or rigid designs. 2) Soft robotics gets real - Ever tried to model a squishy, deformable robot? It’s a nightmare. Now, they can just learn their own physics. 3) Robots that teach themselves - instead of painstakingly programming every movement, we could just show them what to do and let them work out the "how." The demo videos are mind-blowing; a pneumatic hand with zero sensors learning to pinch objects, a 3D-printed arm scribbling with a pencil, all controlled purely by vision. But here’s the kicker: What if this is how all robots learn in the future? No more pre-loaded models. Just point a camera, let them experiment, and they’ll develop their own "muscle memory." Sure, there are still limitations (like needing multiple cameras for training), but the direction is huge. This could finally make robotics flexible enough for messy, real-world tasks - agriculture, construction, even disaster response. #AI #MachineLearning #Innovation #ArtificialIntelligence #SoftRobotics #ComputerVision #Industry40 #DisruptiveTech #MIT #Engineering #MITCSAIL #RoboticsResearch #MachineLearning #DeepLearning
-
The craziest thing you will see all weekend. ㅤ A humanoid robot just learned to play tennis well enough to sustain multi-shot rallies with real humans. ㅤ This is LATENT, out of Tsinghua, Peking, and Galbot. They trained a Unitree G1 to play actual tennis using nothing but imperfect fragments of human motion data. Not full match recordings. Not teleoperation. Fragments. ㅤ The clever bit is the approach. Instead of needing perfect motion capture data from real tennis matches, they used short clips of individual skills like swings, footwork patterns, and recovery steps. The policy learns to correct and compose those fragments into fluid sequences that hold up under real match conditions. Reactive footwork, targeted returns, the lot. ㅤ What makes this more than a demo reel is the sim-to-real transfer actually working. The robot adjusts to different human players with different speeds and styles. It’s not running a fixed script. It’s reading the ball and responding. ㅤ We went from robots that could barely walk on flat ground to ones rallying tennis balls in about 18 months. The data efficiency angle here is the real story though. You don’t need perfect data anymore. You need the right primitives and a system smart enough to stitch them together. ㅤ This is what athletic intelligence looks like when you stop waiting for perfect training data and start working with what you’ve got.
-
What a Self-Driving Bike Just Revealed About the Future of AI A team at the Robotics and AI Institute (RAI) just built a bike that rides itself. No joystick. No remote. No pre-programmed routes. Just reinforcement learning in motion. It learns balance through trial and error — the same way humans do. Every wobble becomes feedback, every near-fall becomes data, every correction becomes memory. Why it matters Most AI systems fail when reality gets messy. This one doesn’t. It adapts. It treats unpredictability not as a bug to fix, but as a teacher to learn from. That’s a quiet but radical shift in how intelligence forms. What this enables → Delivery robots that stay upright in crowded streets → Mobility aids that self-stabilize for elderly or disabled users → Rescue robots that recover in rough terrain → Industrial systems that keep moving safely under pressure The deeper insight We’ve spent years training AI for perfect control. But real intelligence — human or artificial — isn’t about control. It’s about correction. The ability to recover when the world stops behaving as expected. Maybe the next era of AI won’t be about prediction at all. Maybe it will be about recovery. So here’s my question: Should the next generation of AI be trained for resilience before accuracy? #AI #Robotics #MachineLearning #Resilience #Innovation #FutureOfWork
-
What if AI could not only learn from data but also reflect on its own reasoning to continuously improve—without relying on static datasets, teaching itself to solve problems in a structured manner? We're excited to introducing PRefLexOR🚀: A Philosophically-inspired AI framework for recursive scientific reasoning and optimization. The concept does not rely on conventional datasets and instead produces its own learning curriculum in-situ, with multiple training and inference stages that facilitate scaling the approach to iteratively improve performance. PRefLexOR (Preference-based Recursive Language Modeling for Exploratory Optimization of Reasoning and Agentic Thinking) is a method that combines philosophical principles with advanced machine learning. Inspired by the reflective thinking proposed in Hermann Hesse’s Glass Bead Game, PRefLexOR leverages recursive learning cycles to refine its reasoning capabilities over time. Unlike traditional models that depend on static datasets, PRefLexOR generates tasks and data in situ, dynamically adapting to new challenges. This approach mimics how scientists refine hypotheses through continuous experimentation and reflection. The model is designed to self-teach, using recursive reasoning and preference optimization to navigate complex, interdisciplinary problems—particularly in fields like materiomics or biological materials science. 🌟PRefLexOR incorporates 🧠 metacognition, enabling the model to reflect on its own thought processes, refine answers, and adapt in real time—much like how humans evaluate and improve their problem-solving strategies. The framework unfolds in three distinct phases: Structured Thought Integration Training, Independent Reasoning Development & Recursive Reasoning Algorithm (Inference). Key Features: ➡️Philosophically-guided recursive reasoning: The model iteratively refines its thought processes, mirroring the cycles of reflection and adjustment found in philosophical and scientific inquiry. ➡️In situ task generation: PRefLexOR eliminates the need for large pre-generated datasets, allowing the model to learn on-the-fly by generating tasks that push its reasoning capabilities in real time. In-situ datasets are generated through dynamic knowledge graphs that connect disparate concepts. ➡️Challenging Tasks & Preference Optimization: The framework continuously presents the model with increasingly difficult tasks, forcing it to navigate ambiguous or complex scenarios. Using preference optimization, PRefLexOR refines its responses by learning from feedback on preferred and rejected outputs. This challenges the model to discover novel solutions, making each cycle progressively harder. ➡️Dynamic feedback loops: Inspired by reinforcement learning, PRefLexOR uses feedback from each iteration to improve its decision-making processes, allowing it to continuously refine and optimize its output. Paper: https://lnkd.in/eU-yuEPU Code: https://lnkd.in/eWxJiWWu
-
The robots are getting a new brain architecture. It's called VLA: Vision-Language-Action. Traditional robots work in steps. See. Think. Act. Each module separate. VLAs fuse all three into one model. The robot sees the environment, understands a language command, and outputs motor actions in a single pass. Figure's Helix is the first VLA to control a full humanoid upper body. Arms, hands, torso, head, individual fingers. Two robots working together on tasks they've never seen before. NVIDIA's Groot N1 uses a dual-system architecture. System 2 (a VLM) handles high-level reasoning. System 1 (a diffusion policy) handles fast motor control at 10ms latency. Google's Gemini Robotics extends Gemini 2.0 to the physical world. Dexterous enough to fold origami. Hugging Face released SmolVLA in June. 450 million parameters. Trained entirely on community datasets from LeRobot. Runs on consumer hardware. The architecture uses a truncated vision-language backbone with a flow-matching transformer for action prediction. Asynchronous inference decouples prediction from execution. 30% faster response time. The key insight is that VLMs already understand the world. They know what a cup is. They know what "put it on the table" means. The challenge was translating that knowledge into motion. VLAs solve the translation problem. The training data is interesting too. Hundreds of hours of robot teleoperation. Human videos. Synthetic environments. Figure trained Helix on 1,800+ task environments. SmolVLA trained on 30,000 episodes from 487 community datasets spanning labs and living rooms. VLAs compress vision, language, and proprioceptive state into a shared latent representation. The action decoder samples from this space. For coarse manipulation, this works. For fine-grained tasks like grasping or precision assembly, the latent space doesn't capture enough detail. Increasing latent dimensionality helps but increases compute requirements. Cross-embodiment transfer remains a challenge. A policy trained on one robot arm doesn't transfer to another with different kinematics. Sim-to-real gap persists. Policies trained in simulation fail in the real world due to differences in physics and visual appearance. Viewpoint changes and lighting differences degrade performance. UMA launched last week. Ex-Tesla, Google DeepMind, and Hugging Face team building general-purpose robots in Europe. Mobile industrial robots and compact humanoids. First pilots in logistics and manufacturing target 2026. We're still early. These systems struggle with novel environments and long-horizon tasks. But the architecture is converging. Vision, language, and action in one model. Humanoid robots that learn by watching humans work. That's the trajectory.
Explore categories
- Hospitality & Tourism
- Productivity
- Finance
- Soft Skills & Emotional Intelligence
- Project Management
- Education
- Technology
- Leadership
- Ecommerce
- User Experience
- Recruitment & HR
- Customer Experience
- Real Estate
- Marketing
- Sales
- Retail & Merchandising
- Science
- Supply Chain Management
- Future Of Work
- Consulting
- Writing
- Economics
- Employee Experience
- Healthcare
- Workplace Trends
- Fundraising
- Networking
- Corporate Social Responsibility
- Negotiation
- Communication
- Engineering
- Career
- Business Strategy
- Change Management
- Organizational Culture
- Design
- Innovation
- Event Planning
- Training & Development