Addressing Morphology Challenges in Robot Training

Explore top LinkedIn content from expert professionals.

Summary

Addressing morphology challenges in robot training means teaching robots to perform tasks across different body designs, sizes, and capabilities. This involves ensuring that control systems and learning methods work well even when the robot’s hardware or structure changes, allowing for wider adaptability and fewer retraining cycles.

Design with diversity: Include a wide range of robot shapes and configurations during training to help the model handle new and unfamiliar morphologies without starting from scratch each time.
Balance sensors and control: Combine information from both vision and movement sensors so robots can adjust to changing or noisy environments, switching between strategies as needed for stable movement.
Unify learning approaches: Build frameworks that align language, vision, and action data so a single policy can guide many types of robots, making it easier to transfer new skills between different hardware setups.

Summarized by AI based on LinkedIn member posts

Apurv Saha

Building Robot Brains 🧠

13,944 followers 6mo
Report this post
Robots walking and running is nothing new. What's new is one that can hold a single-leg stand, throw a high kick and stay upright under a soccer-style strike. The researchers at Tsinghua University, Shanghai Qi Zhi Institute, University of California, Berkeley, Haas School of Business and UC San Diego have built a unified framework named #HuB that they've presented recently at the #CoRL 2025. It tackles 3 entrenched problems in humanoid control- reference motion errors (you train / copy a motion but the hardware can't match), learning mismatch from human-to-robot morphology and sim-to-real transfer (sensor noise, unmodelled dynamics etc). Here's how HuB approaches each of them: - Before the robot even learns the policy, motions are optimized so the robot’s hardware/morphology can physically execute them. - The policy doesn’t just track the motion, it explicitly learns stability (balance) as a core objective. - The final step injects disturbances, dynamics mismatch, sensor noise so that the learned policy holds up in the real world. On the physical robot (a Unitree Robotics G1 humanoid), HuB handles extreme balance tasks like deep squat, single-leg stand, Bruce Lee-style high kick and stays stable even under strong push perturbations too, where the baseline methods fail. So we can say that we're now at a point where a humanoid has learnt how to stay balanced and can kick, squat and recover ✌ Paper- 𝐇𝐮𝐁: 𝐋𝐞𝐚𝐫𝐧𝐢𝐧𝐠 𝐄𝐱𝐭𝐫𝐞𝐦𝐞 𝐇𝐮𝐦𝐚𝐧𝐨𝐢𝐝 𝐁𝐚𝐥𝐚𝐧𝐜𝐞 Link- https://lnkd.in/guyTz-7g Kudos to the team behind it: Tong Zhang, Boyuan Zheng, Ruiqian Nai, Yingdong Hu, Yen-Jen Wang, Geng Chen, Fanqi Lin, Jiongye Li, Chuye Hong, Koushil Sreenath and Yang Gao #CoRL2025
Like Comment
Daily Papers

Machine Learning Engineer at Hugging Face

12,255 followers 2mo
Report this post
Building robots that can generalize across different embodiments and handle long-horizon tasks remains one of the biggest challenges in robotics. Most Vision-Language-Action models hit a wall: they either overfit to specific robot morphologies or struggle with complex multi-step instructions in the real world. Green-VLA from Sber Robotics Center takes a different approach. Instead of just scaling up models, they introduce a staged curriculum that gradually builds from foundational vision-language understanding to reinforcement learning alignment. The result is a single policy that controls humanoids, mobile manipulators, and fixed-base arms. The key is a unified 64-dimensional action space paired with embodiment-aware prompting. This means no spurious gradients from unused degrees of freedom, and true zero-shot transfer between robot types. Their data pipeline is equally thoughtful: - 3,000 hours of demonstrations - Optical-flow based temporal resampling to normalize speeds across datasets - Quality filtering to remove bad trajectories The training follows a clear progression: R0 for multi-embodiment pretraining, R1 for embodiment-specific fine-tuning, and R2 for RL-based policy alignment that targets long-horizon consistency. On real robots, this translates to measurable gains: - ALOHA bimanual table cleaning: 69.5% first-item success vs 35.6% baseline - Simpler benchmarks: 71.8% success after RL alignment - 2x faster execution (95 seconds vs 179 seconds) What I find most compelling is the episode-end prediction head that reduces "post-success fidgeting"—a practical detail that shows they've actually deployed this on real hardware. The full pipeline is open and documented on Hugging Face, following the trend of making robotics research more accessible. Check out the paper for details on their JPM guidance module that boosts out-of-distribution performance from 10% to 72% on unseen objects. Paper: https://lnkd.in/eJWiy37j Project page: https://lnkd.in/e7zpTZKw
No more previous content

No more next content
Like Comment
Rangel Isaías Alvarado Walles

Robotics & AI Engineer | AI Engineer | Machine Learning | Deep Learning | Computer Vision | Agentic AI | Reinforcement Learning | Self-Driving Cars | IoT | IIoT | AIOps | MLOps | LLMOps | DevOps | Cloud | Edge AI

4,576 followers 5mo
Report this post
X-Embodiment: Cross-Embodiment Generalization of Multimodal Large Language Models in Robotic Manipulation Arxiv: https://lnkd.in/e_66SH5E Project: https://lnkd.in/enedJpmd Can a single multimodal large language model (MLLM) control different robot embodiments—from arms to humanoids—without retraining? X-Embodiment explores how vision-language-action (VLA) models can generalize across robot morphologies, enabling cross-embodiment learning through a unified multimodal framework. By aligning sensorimotor spaces across embodiments, it bridges the gap between language understanding and physical adaptability. 🔁 At a Glance 💡 Goal: Achieve scalable generalization of multimodal LLMs across diverse robotic embodiments and manipulation tasks. ⚙️ Approach: Cross-Embodiment Alignment: map visual and proprioceptive features into a shared latent space. Multimodal Pretraining: unify RGB, depth, proprioception, and text for multi-robot learning. Task-Agnostic Representation: train on diverse datasets from different robot morphologies. Action Decoding Head: conditioned on embodiment embeddings for consistent policy transfer. 📈 Impact (Key Metrics) 🧪 Benchmarks: RoboSet, RT-X, BridgeData, and RealRobotSuite. +26% avg. improvement in success rate across unseen robot embodiments. Demonstrates zero-shot task transfer from 7-DoF manipulators to mobile arms and humanoid platforms. Maintains consistent visuomotor grounding despite mechanical differences. 🤖 Cross-Hardware Validation Trained on data from Franka, Sawyer, and XArm, tested on UR5, Allegro Hand, and mobile manipulators. Outperforms prior VLA baselines on both simulated and real-world generalization. 🔬 Experiments 🧪 Benchmarks: RoboSet + RT-X + RealRobotSuite. 🎯 Tasks: Pick-and-place, open drawer, stack blocks, and deformable manipulation. 🦾 Robots: 6+ embodiments (Franka, Sawyer, UR5, Allegro Hand, MobileBase). 📐 Inputs: RGB, proprioception, natural language, and embodiment descriptors. 🛠 How to Implement 1️⃣ Collect multimodal data across robot embodiments. 2️⃣ Train shared encoders for vision, language, and proprioception. 3️⃣ Align representations via contrastive embodiment projection. 4️⃣ Deploy on new robot embodiments without fine-tuning. 📦 Deployment Benefits ✅ Cross-embodiment: works on unseen morphologies. ✅ Multimodal: integrates vision, language, and proprioception. ✅ Scalable: trained on heterogeneous robotic datasets. ✅ Efficient: enables zero-shot policy transfer to real robots. Takeaway X-Embodiment shows that a single multimodal LLM can think, see, and act across different robot bodies. It’s a step toward universal embodied intelligence—where one model can command any robot, anywhere. Follow me to know more about AI, ML and Robotics!
Like Comment
Lukas M. Ziegler

Robotics evangelist @ planet Earth 🌍 | Telling your robot stories.

243,591 followers 2mo
Report this post
One policy to control all humanoid robots! 💍 Researchers from Shanghai AI Lab and Shanghai Jiao Tong University just released XHugWBC, a single whole-body control policy that generalizes across seven different humanoid robots with zero-shot transfer. One policy, trained once, controls them all. Today, most whole-body controllers require robot-specific training. Every time you have a new robot with different joints, dimensions, and dynamics, you train from scratch. This is expensive, time-consuming, and doesn't scale. So how they approached it: → Physics-consistent morphological randomization during training exposes the policy to a broad distribution of robot designs. → Semantically aligned observation and action spaces allow the same policy to interface with robots that have completely different hardware. → A graph-based policy architecture explicitly models morphological and dynamical properties of each robot. On the output we get a generalist policy that achieves approximately 85% of the performance of specialist policies trained specifically for each robot. After fine-tuning, the generalist policy improves another 10% beyond specialist performance. Cross-embodiment locomotion across seven humanoids with diverse degrees of freedom and morphological structures. Real-time teleoperation of diverse robots using a single policy driven by one human operator. Instead of training specialist models for every robot, you train one generalist model that works across all of them. The same shift that happened in language models is now happening in robotics locomotion. Here's the paper: https://xhugwbc.github.io/ ~~ ♻️ Join the weekly robotics newsletter, and never miss any news → ziegler.substack.com

8 Comments
Like Comment
Zhang Min

Excellent Dynamic Performance Robot Manufacturer - GitHub.com/unitreerobotics

15,531 followers 1y
Report this post
VB-Com: Learning Vision-Blind Composite Humanoid Locomotion Against Deficient Perception Junli Ren, Tao Huang, Huayi Wang, Zirui Wang, Qingwei Ben, Jiangmiao Pang, Ping Luo The performance of legged locomotion is closely tied to the accuracy and comprehensiveness of state observations. Blind policies, which rely solely on proprioception, are considered highly robust due to the reliability of proprioceptive observations. However, these policies significantly limit locomotion speed and often require collisions with the terrain to adapt. In contrast, Vision policies allows the robot to plan motions in advance and respond proactively to unstructured terrains with an online perception module. However, perception is often compromised by noisy real-world environments, potential sensor failures, and the limitations of current simulations in presenting dynamic or deformable terrains. Humanoid robots, with high degrees of freedom and inherently unstable morphology, are particularly susceptible to misguidance from deficient perception, which can result in falls or termination on challenging dynamic terrains. To leverage the advantages of both vision and blind policies, we propose VB-Com, a composite framework that enables humanoid robots to determine when to rely on the vision policy and when to switch to the blind policy under perceptual deficiency. We demonstrate that VB-Com effectively enables humanoid robots to traverse challenging terrains and obstacles despite perception deficiencies caused by dynamic terrains or perceptual noise. Subjects: Robotics (cs.RO) Cite as: arXiv:2502.14814 [cs.RO] (or arXiv:2502.14814v1 [cs.RO] for this version) https://lnkd.in/g-tB4-cb Focus to learn more Tips: Base on #Unitree #G1 #Humanoid #Robot #Platform #Embodied #Obstacle #Avoidance

1 Comment
Like Comment
Lerrel Pinto

Co-founder of ARI

7,023 followers 1y
Report this post
The robot behaviors shown below are trained without any teleop, sim2real, genai, or motion planning. Simply show the robot a few examples of doing the task yourself, and our new method, called Point Policy spits out a robot-compatible policy! Point Policy uses sparse key points to represent both human demonstrators and robots, bridging the morphology gap. The scene is hence encoded through semantically meaningful key points from minimal human annotations. The overall algorithm is simple: 1. Extract key points from human videos. 2. Train a transformer policy to predict future robot key points. 3. Convert predicted key points to robot actions. This project was an almost solo effort from Siddhant Haldar. And as always, this project is fully opensourced. Project page: https://lnkd.in/e32RtQK9 Paper: https://lnkd.in/emQpENTy

4 Comments
Like Comment

Addressing Morphology Challenges in Robot Training

Summary

More in Advancing Robotics Technology

Explore categories