Methods for Replicating Robot Skills

Explore top LinkedIn content from expert professionals.

Summary

Methods for replicating robot skills focus on teaching robots to learn and perform tasks by observing humans, learning from their own experiences, or adapting to new situations over time. These methods combine techniques from imitation learning, continual adaptation, and smart data collection to help robots handle complex, real-world activities without constant human guidance.

  • Encourage continual learning: Set up robots to keep learning new tasks without forgetting previously acquired abilities, allowing them to grow their skill set over time without constant resets or retraining.
  • Use demonstration-based training: Teach robots by showing them how tasks are performed through videos or in-person examples, making it possible for them to replicate actions even with limited information.
  • Align robot and human perspectives: Adjust the learning process so that robots can imitate human actions using only the data available from their own sensors, bridging the gap between what the teacher knows and what the robot can see or sense.
Summarized by AI based on LinkedIn member posts
  • View profile for Ilir Aliu

    AI & Robotics | 150k+ | 22Astronauts

    106,502 followers

    Robot models get better only when humans feed them more demos. This one improves by learning from its own mistakes. pi*0.6 is a new VLA from Physical Intelligence, that can refine its skills through real-world RL, not just teleop data. The team calls the method Recap, and from what I can see, the gains are not small. A quick summary: ✅ Learns from its own rollouts using a value function trained across all data ✅ Humans only step in when the robot is about to drift too far ✅ Every correction updates the model and improves future rollouts ✅ Works across real tasks like espresso prep, laundry, and box assembly ✅ Throughput more than doubles on hard tasks, with far fewer failure cases What stands out is the structure: a general policy, a shared value function, and a loop where the robot collects data, improves the critic, then improves itself again. No huge fleets of teleoperators. No massive manual resets. If VLAs can reliably self-improve in the real world, the bottleneck shifts. Data becomes cheaper. Deployment becomes the real test bench. Full paper, videos, and method details here: https://lnkd.in/dgCeZdjT

  • View profile for Davide Scaramuzza

    Professor of Robotics and Perception at the University of Zurich

    52,009 followers

    We are excited to share our #ICLR2025 spotlight paper "Student-Informed Teacher Training," which addresses the teacher-student asymmetry in imitation learning, i.e., when the teacher has access to privileged information about the environment and task while the student only receives partial information, such as images. We show applications to drones and manipulators. Code released! PDF: https://lnkd.in/dU69N2BP Imitation learning with a privileged teacher has proven effective for learning complex control behaviors from high-dimensional inputs, such as images. In this framework, a teacher is trained with privileged information, while a student tries to predict the actions of the teacher with limited observations, e.g., in a robot navigation task, the teacher might have access to the robot state and distances to all nearby obstacles, while the student only receives images of the scene. However, privileged imitation learning faces a key challenge: the student might be unable to imitate the teacher’s behavior due to the discrepancy between the different observations. This problem arises because the teacher is trained without considering if the student is capable of imitating the learned behavior. As a consequence of the information discrepancy (i.e., asymmetry), the teacher tends to over-rely on its full observability of the environment without considering the limited observation space of the student. This causes the teacher to provide target actions that the student cannot infer from its observations since the student lacks access to the same level of environmental information. Consider for example a robot navigating an obstacle-filled environment. In this case, an information asymmetry in the observation space could easily arise if the teacher policy receives the relative distances to "all of" the surrounding obstacles while the student, limited by its forward-facing camera, requires that obstacles be within the view of the camera. To address this teacher-student asymmetry, we propose a framework for joint training of the teacher and student policies, encouraging the teacher to learn behaviors that can be imitated by the student despite the student’s limited access to information and its partial observability. Based on the performance bound in imitation learning, we add (i) the approximated action difference between teacher and student as a penalty term to the reward function of the teacher, and (ii) a supervised teacher-student alignment step. We demonstrate our method on complex vision-based quadrotor flight and manipulation tasks. Kudos to Nico Messikommer, Jiaxu Xing, Elie Aljalbout! Reference: Student-Informed Teacher Training International Conference on Learning Representations (ICLR), 2025. Spotlight Presentation. PDF: https://lnkd.in/dU69N2BP Code: https://lnkd.in/dJBQv5uC Video: https://lnkd.in/dh-8RArg University of Zurich, European Research Council (ERC), UZH Department of Informatics

  • View profile for Ralf Römer

    Robot Learning PhD Student @TUM | ETH | EPFL | Bosch | Open to Research Internships

    2,800 followers

    🤖 How can we teach robots to continually learn new skills, without forgetting the old ones? 👉 CLARE ensures that as your robot gets smarter, it doesn't lose the skills it (and you as teleoperator 😅) already worked hard to master. Fine-tuning pre-trained vision-language-action models (#VLAs) on a new task has become the standard for robotic manipulation. However, since this recipe updates existing representations, it is unsuitable for long-term operation in the real world, where robots must continually adapt to new tasks and environments without forgetting the knowledge they have already acquired. Existing continual learning methods for robotics require storing previous data (exemplars), struggle with long task sequences, or rely on oracle task identifiers for deployment. We present CLARE: Continual Learning for Vision-Language-Action Models via Autonomous Adapter Routing and Expansion. CLARE is a parameter-efficient, exemplar-free framework that allows robots to continuously adapt to new tasks and environments: 🚫 No Exemplars Needed: We don't need to store past data, which is often impossible due to privacy and storage constraints. 🧠 Autonomous Routing: Our autoencoder-based mechanism dynamically selects the right adapter for the current task—no task labels required during deployment. 📉 Efficient Dynamic Expansion: The model autonomously decides when to expand its capacity, increasing parameter counts by only ~2% per task. 🏆 SOTA Results: We achieve significantly higher continual learning performance on the LIBERO benchmark compared to baselines, including methods that replay past data. 📄 Paper: https://lnkd.in/dskhxphh 🌐 Project Website: https://lnkd.in/dRDk63dP 💻 Code: https://lnkd.in/d--udZja 🤗 Hugging Face: https://lnkd.in/dswqWWUr This work has been a great collaboration with Yi Zhang, who is currently on the job market :) Angela Schoellig Technical University of Munich Learning Systems and Robotics Lab Munich Institute of Robotics and Machine Intelligence (MIRMI) at the Technical University of Munich Robotics Institute Germany #Robotics #AI #MachineLearning #ContinualLearning

  • View profile for Ahsen Khaliq

    ML @ Hugging Face

    36,028 followers

    Robot See Robot Do Imitating Articulated Object Manipulation with Monocular 4D Reconstruction Humans can learn to manipulate new objects by simply watching others; providing robots with the ability to learn from such demonstrations would enable a natural interface specifying new behaviors. This work develops Robot See Robot Do (RSRD), a method for imitating articulated object manipulation from a single monocular RGB human demonstration given a single static multi-view object scan. We first propose 4D Differentiable Part Models (4D-DPM), a method for recovering 3D part motion from a monocular video with differentiable rendering. This analysis-by-synthesis approach uses part-centric feature fields in an iterative optimization which enables the use of geometric regularizers to recover 3D motions from only a single video. Given this 4D reconstruction, the robot replicates object trajectories by planning bimanual arm motions that induce the demonstrated object part motion. By representing demonstrations as part-centric trajectories, RSRD focuses on replicating the demonstration's intended behavior while considering the robot's own morphological limits, rather than attempting to reproduce the hand's motion. We evaluate 4D-DPM's 3D tracking accuracy on ground truth annotated 3D part trajectories and RSRD's physical execution performance on 9 objects across 10 trials each on a bimanual YuMi robot. Each phase of RSRD achieves an average of 87% success rate, for a total end-to-end success rate of 60% across 90 trials. Notably, this is accomplished using only feature fields distilled from large pretrained vision models -- without any task-specific training, fine-tuning, dataset collection, or annotation.

  • View profile for Pablo Vela

    Computer Vision Researcher and Engineer

    5,193 followers

    Trying to wrap my head around fwd/bwd kinematics for imitation learning, so I built a fully differentiable kinematic hand skeleton in JAX. I visualized it with Rerun's new callback system in a Jupyter Notebook. This shows each joint angle and how it impacts the kinematic skeleton. Goal: create a robot “training ground” for dexterous manipulation. Assume I have both exocentric (3rd‑person) and egocentric (1st‑person) views from calibrated, time‑synced cameras watching a human perform a skilled task. From those videos, I need finger joint angles (axis‑angle) + 6‑DoF wrist poses to retarget motion to a robot hand. Capture‑to‑Angles pipeline 1. Input: Time‑synced calibrated RGB frames     2. 2‑D keypoints: detect pixel joints (MediaPipe / RTM)     3. 3‑D joints: triangulate or PnP → metric joints (fixes bone lengths)     4. Inverse kinematics: optimize 3‑D joints → 48 joint θ (15×3 DoF) + 3 wrist DoF, plus xyz translation. Uses Levenberg–Marquardt for lightning‑fast convergence.     5. Forward kinematics: reconstruct 3‑D joints from θ as a sanity check.     Because every layer is differentiable, I can project the 21×3 metric keypoints back into each camera, measure reprojection error against the raw 2‑D detections, and close the loop. Result: real‑time optimization that converts multiview 2‑D detections straight into robot‑ready joint angles.

  • View profile for Aaron Prather

    Director, Robotics & Autonomous Systems Program at ASTM International

    85,047 followers

    Training general-purpose robots—the kind that could fold laundry or tidy up like Rosie from The Jetsons—is notoriously difficult because they need vast amounts of real-world data. Traditionally, this data comes from carefully arranged external cameras, but NYU’s General-purpose Robotics and AI Lab, led by Lerrel Pinto, is testing a more scalable approach: EgoZero. EgoZero uses Meta’s research-only smart glasses to record tasks from a human’s point of view. This “egocentric” data captures exactly what a person sees while performing actions, making it both portable and highly relevant. In tests, robots trained only on this human data (no robot data required) achieved a 70% success rate on seven manipulation tasks, such as placing bread on a plate. Instead of relying on full images—which don’t translate well between human hands and robot arms—EgoZero maps hand movements as 3D points in space. This allows robots to generalize: if trained to pick up a roll, they can adapt to handle ciabatta in a new setting. The NYU team is also developing open-source robot designs, touch sensors, and smartphone-based data collection tools. Their ultimate aim is scalability: while large language models train on the Internet, robots lack an equivalent dataset for the physical world. EgoZero and similar methods could begin to close that gap by turning everyday human actions into training fuel for general-purpose robots. 📝 Research Paper: https://lnkd.in/e5m25bSA 💻 Project Page: https://lnkd.in/ewM3VPdH

  • View profile for Mike Kalil

    10M+ Annual Reach | Covering the Rise of the Machines Without an Agenda | mikekalil.com

    4,573 followers

    Chinese researchers say a humanoid robot has taken a surprising step toward learning athletic skills. The robotic tennis player recently pulled off a shocking rally against a human opponent after teaching itself to play in a way similar to how human athletes train: through drills and repetition. Researchers say the achievement could mark a milestone that opens the door for robots to master complex physical skills far beyond sports. The work comes from a at Beijing’s Tsinghua University and Peking University, working with the fast-rising robotics startup Galbot. Their training framework, called LATENT (short for Learning Athletic Humanoid Tennis Skills from Imperfect Motion Data) teaches robots athletic movements using incomplete examples of human motion. Instead of recording full tennis matches, the team fed the robot’s artificial intelligence incomplete snippets of motion data captured from motion-tracking recordings of amateur tennis players. The snippets included basic tennis techniques such as forehand swings, backhands and footwork patterns. Though imperfect, the researchers said the data still contained valuable information about how athletes move. Using machine learning, the system corrected and recombined these fragments into complete tennis actions the robot could learn. A digital twin of the humanoid then practiced thousands of variations in simulation, where conditions constantly changed to prepare it for the chaotic realities of live tennis. By experimenting with countless scenarios, the virtual robot learned which movements worked best. For the real-world tests, the team customized a Unitree G1 humanoid robot. They attached a tennis racket to the robot’s right arm using a 3D-printed adapter, replacing the robot’s hand so it could strike the ball. Reflective markers were also added so cameras in an optical motion-capture system could track the robot’s position and movement during rallies. In real-world experiments, the robot returned incoming tennis balls with a success rate of more than 90 percent for forehand shots and about 78 percent for backhands. While its skills still pale in comparison to a trained human player, the researchers say future systems could through robot vs. robot training.

  • View profile for Nicholas Nouri

    Founder | Author

    132,620 followers

    If you’ve seen Figure’s latest demo, you watched the same AI model that folded laundry and triaged packages now place plates and glasses into a dishwasher. The noteworthy part isn’t the chore itself - it’s how the system learned it: by feeding the model more data, not by hard coding a new routine. What’s actually hard about “just putting dishes away”? Household scenes hide several classic robotics problems: - Singulation: pulling a single plate off a stacked set without dragging others along. (Researchers have treated plate/part singulation as a distinct manipulation subproblem.) - Bi manual coordination: two hands passing or reorienting a slippery glass while the body keeps balanced - much harder than one arm pick‑and‑place. (See bi manual benchmarks like BiGym) - Tight tolerances: racks allow only centimeter scale error; small misalignments cause snags and drops. TRI flagged dishwasher loading years ago as a “hard problem” because it mixes planning and dexterity. Figure’s Helix is a Vision Language Action model: it takes camera streams and simple instructions and produces continuous motor commands for the robot’s torso, arms, wrists, and fingers. Add diverse demonstrations, and the same policy acquires new behaviors without bespoke code paths. A single model picking up multiple skills from data is a strong signal for scalability: every new task expands the robot’s repertoire without rebuilding the stack. But polished demos aren’t the same as reliability across messy, unfamiliar homes. We still need better evaluation in varied kitchens, lighting, and dishware; stronger recovery from failures; safe contact with people; and a path to cost‑effective hardware. #innovation #technology #future #management #startups

  • View profile for Chris Paxton

    AI + Robotics Research Scientist

    8,965 followers

    Just collecting manipulation data isn’t enough for robots - they need to be able to move around in the world, which has a whole different set of challenges from pure manipulation. And bringing navigation and manipulation together in a single framework is even more challenging. Enter HERMES, from Zhecheng Yuan and Tianming Wei. This is a four-stage process in which human videos are used to set up an RL sim-to-real training pipeline in order to overcome differences between robot and human kinematics, and used together with a navigation foundation model to move around in a variety of environments. To learn more, join us as Zhecheng Yuan and Tianming Wei tell us about how they built their system to perform mobile dexterous manipulation from human videos in a variety of environments. Watch Episode #45 of RoboPapers today, hosted by Michael Cho and Chris Paxton! Abstract: Leveraging human motion data to impart robots with versatile manipulation skills has emerged as a promising paradigm in robotic manipulation. Nevertheless, translating multi-source human hand motions into feasible robot behaviors remains challenging, particularly for robots equipped with multi-fingered dexterous hands characterized by complex, high-dimensional action spaces. Moreover, existing approaches often struggle to produce policies capable of adapting to diverse environmental conditions. In this paper, we introduce HERMES, a human-to-robot learning framework for mobile bimanual dexterous manipulation. First, HERMES formulates a unified reinforcement learning approach capable of seamlessly transforming heterogeneous human hand motions from multiple sources into physically plausible robotic behaviors. Subsequently, to mitigate the sim2real gap, we devise an end-to-end, depth image-based sim2real transfer method for improved generalization to real-world scenarios. Furthermore, to enable autonomous operation in varied and unstructured environments, we augment the navigation foundation model with a closed-loop Perspective-n-Point (PnP) localization mechanism, ensuring precise alignment of visual goals and effectively bridging autonomous navigation and dexterous manipulation. Extensive experimental results demonstrate that HERMES consistently exhibits generalizable behaviors across diverse, in-the-wild scenarios, successfully performing numerous complex mobile bimanual dexterous manipulation tasks Project Page: https://lnkd.in/e-aEbQzn ArXiV: https://lnkd.in/eemU6Pwa Watch/listen: Youtube: https://lnkd.in/erzbkYjz Substack: https://lnkd.in/e3ea76Q8

    Ep#45: HERMES: Human-to-Robot Embodied Learning From Multi-Source Motion Data for Mobile Dexterous Manipulation

    Ep#45: HERMES: Human-to-Robot Embodied Learning From Multi-Source Motion Data for Mobile Dexterous Manipulation

    robopapers.substack.com

  • View profile for Sina Pourghodrat (PhD)

    Surgical Robotics Engineer

    9,466 followers

    🚀Amazon FAR (Frontier AI & Robotics) introduce OmniRetarget: teaching humanoids to interact with objects and their environment, just like humans do 𝘏𝘦𝘳𝘦’𝘴 𝘵𝘩𝘦 𝘮𝘢𝘪𝘯 𝘪𝘥𝘦𝘢 (𝘴𝘪𝘮𝘱𝘭𝘪𝘧𝘪𝘦𝘥): In robotics, teaching humanoid complex skills means showing them how humans move and interact — but just copying human motions (or using them as kinematic references) don’t work cleanly. Human body vs. robot body: not the same shape, not the same joints, not the same kinematics. On top of that, interactions (touching objects, walking on surfaces) are often lost or distorted during retargeting (the process of adapting human motions to robot bodies). But OmniRetarget fixes these. What is OmniRetarget? A system that converts human motion + human scenes into robot-compatible motion while preserving interactions (contacts, spatial relations) with objects and terrain. Uses an interaction mesh to model where contacts happen (hand touching box, feet on ground) and keeps them consistent when mapping to a robot. From one demonstration (a recording of a human performing the task), it can generate many variations: different robots, object positions, terrains. Why it’s better than older approaches? Older methods often ignore interaction preservation, leading to artifacts like foot sliding or unrealistic motions. OmniRetarget enforces both robot limits (joints, geometry) and real interactions (which part touches what) at the same time. Produces 8+ hours of high-quality trajectories, beating baselines in realism and consistency. Trained reinforcement learning (RL) policies can now perform long, complex tasks (up to 30 seconds) on a physical humanoid (Unitree G1). 📖Open-source contribution They are releasing the OmniRetarget Dataset — over 8 hours of humanoid loco-manipulation and interaction data — freely available on Hugging Face: [https://lnkd.in/eYBn2hfe] Why this matters? Robots don’t just need to move, they must interact with the world. High-quality, interaction-aware data has been a major bottleneck. OmniRetarget makes this data available to the community, helping researchers and companies build humanoids that can operate in cluttered, object-rich environments. 📖 Full paper: https://lnkd.in/ej2But4W 👩💻GitHub: https://lnkd.in/ejmUahtr 👩🔬 Authors: Lujie Yang, Xiaoyu Huang, Zhen Wu, Angjoo Kanazawa, Pieter Abbeel, Carmelo Sferrazza, C. Karen Liu, Rocky Duan, Guanya Shi Thank you, Lujie Yang, for giving permission to use the video: Video: Unitree G1 humanoid carries a chair, climbs, leaps, and rolls, all in real time, using only its own body senses (no vision or LiDAR). A big step toward agile, human-like loco-manipulation.

Explore categories