New Datasets for Robotics Skill Training

Explore top LinkedIn content from expert professionals.

Summary

New datasets for robotics skill training are large collections of data—often generated through simulation or by capturing human demonstrations—that help robots learn to perform tasks like picking up objects or navigating environments. These datasets allow robots to practice and adapt to a wide variety of situations, even those that are rare or difficult to replicate in the real world.

Harness synthetic data: Use simulation tools and synthetic data generation to create diverse training scenarios for robots that go far beyond what real-world data collection can offer.
Embrace structured annotation: Build specialized datasets with detailed annotations targeting specific skills, such as motion understanding or grasping, to boost robots’ ability to learn complex tasks.
Scale with human demonstrations: Capture human actions using tools like smart glasses or VR, then multiply this data through simulation, giving robots access to millions of practice scenarios based on real-life behavior.

Summarized by AI based on LinkedIn member posts

Jim Fan Jim Fan is an Influencer

NVIDIA Director of AI & Distinguished Scientist. Co-Lead of Project GR00T (Humanoid Robotics) & GEAR Lab. Stanford Ph.D. OpenAI's first intern. Solving Physical AGI, one motor at a time.

238,090 followers 1y
Report this post
Exciting updates on Project GR00T! We discover a systematic way to scale up robot data, tackling the most painful pain point in robotics. The idea is simple: human collects demonstration on a real robot, and we multiply that data 1000x or more in simulation. Let’s break it down: 1. We use Apple Vision Pro (yes!!) to give the human operator first person control of the humanoid. Vision Pro parses human hand pose and retargets the motion to the robot hand, all in real time. From the human’s point of view, they are immersed in another body like the Avatar. Teleoperation is slow and time-consuming, but we can afford to collect a small amount of data. 2. We use RoboCasa, a generative simulation framework, to multiply the demonstration data by varying the visual appearance and layout of the environment. In Jensen’s keynote video below, the humanoid is now placing the cup in hundreds of kitchens with a huge diversity of textures, furniture, and object placement. We only have 1 physical kitchen at the GEAR Lab in NVIDIA HQ, but we can conjure up infinite ones in simulation. 3. Finally, we apply MimicGen, a technique to multiply the above data even more by varying the *motion* of the robot. MimicGen generates vast number of new action trajectories based on the original human data, and filters out failed ones (e.g. those that drop the cup) to form a much larger dataset. To sum up, given 1 human trajectory with Vision Pro -> RoboCasa produces N (varying visuals) -> MimicGen further augments to NxM (varying motions). This is the way to trade compute for expensive human data by GPU-accelerated simulation. A while ago, I mentioned that teleoperation is fundamentally not scalable, because we are always limited by 24 hrs/robot/day in the world of atoms. Our new GR00T synthetic data pipeline breaks this barrier in the world of bits. Scaling has been so much fun for LLMs, and it's finally our turn to have fun in robotics! We are creating tools to enable everyone in the ecosystem to scale up with us: - RoboCasa: our generative simulation framework (Yuke Zhu). It's fully open-source! Here you go: http://robocasa.ai - MimicGen: our generative action framework (Ajay Mandlekar). The code is open-source for robot arms, but we will have another version for humanoid and 5-finger hands: https://lnkd.in/gsRArQXy - We are building a state-of-the-art Apple Vision Pro -> humanoid robot "Avatar" stack. Xiaolong Wang group’s open-source libraries laid the foundation: https://lnkd.in/gUYye7yt - Watch Jensen's keynote yesterday. He cannot hide his excitement about Project GR00T and robot foundation models! https://lnkd.in/g3hZteCG Finally, GEAR lab is hiring! We want the best roboticists in the world to join us on this moon-landing mission to solve physical AGI: https://lnkd.in/gTancpNK

102 Comments
Like Comment
Asad Ansari

Founder | Data & AI Transformation Leader | Driving Digital & Technology Innovation across UK Government and Financial Services | Board Member | Commercial Partnerships | Proven success in Data, AI, and IT Strategy

29,653 followers 2mo
Report this post
You cannot train AI on reality alone anymore. There is not enough of it. Jensen Huang explains why NVIDIA built Cosmos, an AI world model that generates synthetic training data grounded in physics. The problem is simple. Teaching physical AI like robotics requires vast amounts of diverse interaction data. Videos exist, but not nearly enough to capture the variety of situations robots will encounter. So NVIDIA transformed compute into data. Using synthetic data generation grounded by laws of physics, they can selectively generate training scenarios that would be impossible to capture otherwise. The example Huang shows is remarkable. A basic traffic simulator output gets fed into Cosmos. What emerges is physically plausible surround video that AI can learn from. This solves a fundamental limitation. You cannot train autonomous systems on every possible scenario by recording reality. There are not enough cameras or time. But you can simulate physics accurately enough that AI trained on synthetic data generalises to real environments. This applies beyond robotics. Any AI learning physical interactions, from manufacturing to logistics to infrastructure monitoring, faces the same data scarcity problem. Synthetic data generation grounded in physics laws is how you create training sets reality cannot provide. The organisations building AI for physical systems will either master synthetic data generation or get limited by whatever reality they can record. Watch the full presentation to hear Huang explain how Cosmos generates training data for physical AI. What physical AI application needs synthetic data because reality cannot provide enough examples? #AI #SyntheticData #Robotics #NVIDIA #MachineLearning

62 Comments
Like Comment
Karyna Naminas

CEO of Label Your Data. Helping AI teams deploy their ML models faster.

6,591 followers 4mo
Report this post
A 7B model just beat Gemini and a 72B model on motion understanding. New research from Massachusetts Institute of Technology, NVIDIA, University of Michigan, University of California, Berkeley, and Stanford University demonstrates something counterintuitive: 👉 smaller models with better training data beat larger models with generic data Their results: ✅ NVILA-Video-15B: 91.5% on AV-Car benchmark ✅ Gemini-2.5-Flash: 84.1% ✅ Qwen-2.5-VL-72B: 83.3% The smaller Qwen-2.5-VL-7B saw +11.7% gains on daily activity tasks. 🔍 Why? Data architecture. The FoundationMotion dataset used five structured QA types targeting different error modes: • motion recognition • temporal ordering • object-action association • location-based motion • repetition counting Standard pre-training data (467K samples) caused performance drops on the same models. FoundationMotion's structured annotations provided consistent gains. 🎯 Implications: Motion understanding is the bottleneck for robotics and AV applications. VLMs recognize objects well but struggle with spatial reasoning — understanding how movements happen, not just what they are. Teams investing in specialized data architectures will see disproportionate returns compared to those just scaling model parameters. 📌 Worth noting: their "automated" pipeline still required manual benchmark curation and builds on SAM2 (190K human annotations). Automation scales pattern detection. Humans design the annotation schema that determines what patterns matter. Research by: Yulu G., Ligeng Zhu, Dandan Shan, Baifeng Shi, Hongxu (Danny) Y., Boris Ivanovic, Song Han, Trevor Darrell, Jitendra MALIK, Marco Pavone, and Boyi Li Paper: https://lnkd.in/dTBn7gnX Video source: https://lnkd.in/d8NctwtV #MotionUnderstanding #VisionLanguageModels #SpatialReasoning #VideoAnnotation #AutonomousVehicles

51 Comments
Like Comment
Adithya Murali

Staff Research Scientist at NVIDIA | MIT TR35, Prev CMU PhD, Berkeley AI Research

3,182 followers 9mo
Report this post
I’m super excited to release a multi-year project we have been cooking at NVIDIA Robotics. Grasping is a foundational challenge in robotics 🤖 — whether for industrial picking or general-purpose humanoids. VLA + real data collection is all the rage now but is expensive and scales poorly for this task. For every new embodiment and/or scene, we'll have to recollect the dataset in this paradigm for the best perf. Key Idea: Since grasping is a well-defined task in physics simulation - why can’t we just scale synthetic data generation and train a GenAI model for grasping? By embracing modularity and standardized grasp formats, we can make this a turnkey technology that works zero-shot for multiple settings. Introducing… 🚀 GraspGen: A Diffusion-Based Framework for 6-DOF Grasping GraspGen is a modular framework for diffusion-based 6-DOF grasp generation that scales across embodiment types, observability conditions, clutter, task complexity. Key Features: ✅ Multi-embodiment support: suction, antipodal pinch, and underactuated pinch grippers ✅ Generalization to both partial and complete 3D point clouds ✅ Generalization to both single-objects and cluttered scenes ✅ Modular design relies on other robotics packages and foundation models (SAM2, cuRobo, FoundationStereo, FoundationPose). This allows GraspGen to focus on only one thing - grasp generation ✅ Training recipe: grasp discriminator is trained with On-Generator data from the diffusion model - so that it learns to correct any mistakes of the diffusion generator ✅ Real-time performance (~20 Hz) before any GPU acceleration; low memory footprint 📊 Results: • SOTA on the FetchBench [Han et. al. CoRL 2024] benchmark • Zero-shot sim-to-real transfer on unknown objects and cluttered scenes • Dataset of 53M simulated grasps across 8K objects from Objaverse We're also releasing: 🔹 Simulation-based grasp data generation workflows 🔹 Standardized formats and gripper definitions 🔹 Full training infrastructure 📄 arXiv: https://lnkd.in/gaYmcfz4 🌐 Website: https://lnkd.in/gGiKRCMX 💻 Code: https://lnkd.in/gYR77bEh A huge thank you to everyone involved in this journey — excited to hear the feedback from the community! Joint work with Clemens Eppner, Balakumar Sundaralingam, Yu-Wei Chao, Mark T. Carlson, Jun Yamada and other collaborators. Many thanks to Yichao Pan, Shri Sundaram, Spencer Huang, Buck Babich, Amit Goel for product management and feedback. #robotics #grasping #physicalAI #simtoreal

27 Comments
Like Comment
Aaron Prather

Director, Robotics & Autonomous Systems Program at ASTM International

84,969 followers 8mo
Report this post
Training general-purpose robots—the kind that could fold laundry or tidy up like Rosie from The Jetsons—is notoriously difficult because they need vast amounts of real-world data. Traditionally, this data comes from carefully arranged external cameras, but NYU’s General-purpose Robotics and AI Lab, led by Lerrel Pinto, is testing a more scalable approach: EgoZero. EgoZero uses Meta’s research-only smart glasses to record tasks from a human’s point of view. This “egocentric” data captures exactly what a person sees while performing actions, making it both portable and highly relevant. In tests, robots trained only on this human data (no robot data required) achieved a 70% success rate on seven manipulation tasks, such as placing bread on a plate. Instead of relying on full images—which don’t translate well between human hands and robot arms—EgoZero maps hand movements as 3D points in space. This allows robots to generalize: if trained to pick up a roll, they can adapt to handle ciabatta in a new setting. The NYU team is also developing open-source robot designs, touch sensors, and smartphone-based data collection tools. Their ultimate aim is scalability: while large language models train on the Internet, robots lack an equivalent dataset for the physical world. EgoZero and similar methods could begin to close that gap by turning everyday human actions into training fuel for general-purpose robots. 📝 Research Paper: https://lnkd.in/e5m25bSA 💻 Project Page: https://lnkd.in/ewM3VPdH

4 Comments
Like Comment
Naomi Kaduwela

⚡️🤖 C Suite AI Advisor | Head of Kavi Labs @ Kavi Global | Go To Partner for MS AWS GCP Databricks Snowflake Tableau SAS SAP | RCM Automation AI & Outsourcing | Author | Multi-Patented Innovator | International Speaker

10,875 followers 2w
Report this post
𝗧𝗵𝗲 𝘄𝗼𝗿𝗸𝗲𝗿𝘀 𝘁𝗿𝗮𝗶𝗻𝗶𝗻𝗴 𝘁𝗵𝗲 𝗳𝘂𝘁𝘂𝗿𝗲 𝗼𝗳 𝗿𝗼𝗯𝗼𝘁𝗶𝗰𝘀 𝗳𝗿𝗼𝗺 𝘁𝗵𝗲𝗶𝗿 𝗵𝗼𝗺𝗲𝘀. In Nigeria, a medical student straps an iPhone to his forehead after hospital shifts. In India, an engineering student records himself on a cramped balcony. They're folding laundry. Washing dishes. Ironing clothes. And they're getting paid $15/hour to teach humanoid robots how to move. 𝗛𝗲𝗿𝗲'𝘀 𝘄𝗵𝗮𝘁'𝘀 𝗵𝗮𝗽𝗽𝗲𝗻𝗶𝗻𝗴: Companies like Tesla, Figure AI, and Agility Robotics are racing to build humanoid robots for factories and homes. But robots are notoriously hard to train. Virtual simulations can't model real-world physics perfectly. 𝗧𝗵𝗲 𝘀𝗼𝗹𝘂𝘁𝗶𝗼𝗻? Real humans. Real homes. Real chores. 𝗧𝗵𝗲 𝗻𝗲𝘄 𝗴𝗹𝗼𝗯𝗮𝗹 𝗴𝗶𝗴 𝗲𝗰𝗼𝗻𝗼𝗺𝘆: 🔹 Micro1 has hired thousands across 50+ countries 🔹 Workers mount iPhones on their heads 🔹 They record themselves doing everyday tasks 🔹 AI + humans review and label the footage 🔹 Robotics companies buy the data for $100M+ annually 𝗪𝗵𝗮𝘁 𝘁𝗵𝗲 𝘄𝗼𝗿𝗸 𝗹𝗼𝗼𝗸𝘀 𝗹𝗶𝗸𝗲: → Keep hands visible to the camera → Move at natural speed → Create variety in daily tasks → Submit weekly videos for review → Get paid when footage is accepted For workers, it's complicated: ✅ Good income in struggling economies ❌ Repetitive and boring ❌ Hard to create variety in small homes ❌ Privacy concerns about recording intimate home life ❌ Family members wandering into frame ❌ No clarity on how data is used or stored 𝗧𝗵𝗲 𝗯𝗶𝗴𝗴𝗲𝗿 𝗾𝘂𝗲𝘀𝘁𝗶𝗼𝗻𝘀: How much data do we actually need? Large language models trained on 100,000 years of human text. Humanoid robots may need even more. 𝗦𝗰𝗮𝗹𝗲 𝗔𝗜: 100,000+ hours collected 𝗠𝗶𝗰𝗿𝗼𝟭: Tens of thousands of hours 𝗕𝘂𝘁 𝗿𝗼𝗯𝗼𝘁𝗶𝗰𝗶𝘀𝘁𝘀 𝘄𝗼𝗻𝗱𝗲𝗿: → Is the data reliable enough? → Are workers teaching robots bad habits? → Can we review this much footage for quality? 𝗪𝗵𝗮𝘁 𝘁𝗵𝗶𝘀 𝗺𝗲𝗮𝗻𝘀: Investors poured $6 billion into humanoid robots in 2025. The rise of ChatGPT inspired this shift: Just as LLMs learned from massive text data, robots might learn movement from massive video data. But the humans behind this data deserve more than anonymity. 𝗧𝗵𝗲𝘆 𝗱𝗲𝘀𝗲𝗿𝘃𝗲: ✓ Transparency about how their data is used ✓ Control over deletion requests ✓ Understanding of long-term implications ✓ Fair compensation for intimate footage 𝗔𝘂𝘁𝗼𝗺𝗮𝘁𝗶𝗻𝗴 𝗵𝗼𝘂𝘀𝗲𝗵𝗼𝗹𝗱 𝗿𝗼𝘂𝘁𝗶𝗻𝗲 𝘁𝗮𝘀𝗸𝘀 𝗶𝘀 𝘁𝗲𝗺𝗽𝘁𝗶𝗻𝗴, 𝗯𝘂𝘁 𝗮𝘁 𝘄𝗵𝗮𝘁 𝗰𝗼𝘀𝘁? 👉 𝗙𝗼𝗹𝗹𝗼𝘄 Naomi Kaduwela 𝗳𝗼𝗿 𝗶𝗻𝘀𝗶𝗴𝗵𝘁𝘀 𝗼𝗻 𝘁𝗵𝗲 𝗵𝘂𝗺𝗮𝗻 𝘀𝗶𝗱𝗲 𝗼𝗳 𝘁𝗲𝗰𝗵𝗻𝗼𝗹𝗼𝗴𝘆 🔁 𝗦𝗵𝗮𝗿𝗲 𝘁𝗵𝗶𝘀 𝗶𝗳 𝘆𝗼𝘂 𝗯𝗲𝗹𝗶𝗲𝘃𝗲 𝗴𝗶𝗴 𝘄𝗼𝗿𝗸𝗲𝗿𝘀 𝗱𝗲𝘀𝗲𝗿𝘃𝗲 𝘁𝗿𝗮𝗻𝘀𝗽𝗮𝗿𝗲𝗻𝗰𝘆 #HumanoidRobots #GigEconomy #AITraining #Robotics #DataLabeling #FutureOfWork #AIEthics #TechInnovation #GlobalWorkforce #ResponsibleAI
No more previous content

No more next content
9 Comments
Like Comment
Thomas Wolf

Co-founder at 🤗 Hugging Face – Angel

183,180 followers 6mo
Report this post
Impressive work by the new Amazon Frontier AI & Robotics team (from Covariant acquisition) and collaborators! This research enable mapping long sequences of human motion (>30 sec) on robots with various shapes as well as robots interacting with objects (box, table, etc) of different size nd in particular different from the size in the training data. This enable easier in-simulation data-augmentation and zero-shoot transfer. This is impressive and a huge potential step for reducing the need for human teleoperation data (which is hard to gather for humanoids) The dataset trajectories is available on Hugging Face at: https://lnkd.in/eygXVVHx The full code framework is coming soon. Check out the project page which has some pretty nice three.js interactive demos: https://lnkd.in/e2S-6K2T And kudos to the authors on open-sourcing the data, releasing the paper and (hopefully soon) the code. This kind of open-science projects are game changers in robotics.

11 Comments
Like Comment
Mukundan Govindaraj Mukundan Govindaraj is an Influencer

Global Developer Relations | Physical AI | Digital Twin | Robotics

18,718 followers 5mo
Report this post
Came across Egocentric-10K, a new dataset collected entirely in real factory environments. 10,000 egocentric clips, all captured from actual shop floors. What’s cool is that it focuses on hand visibility and active manipulation, which most “in-the-wild” datasets completely miss. For anyone working on robotics, industrial AI, or sim2real learning, this feels like a big step forward — finally, data that looks like what happens in real factories, not lab setups. What’s interesting here is how datasets like this could pair with tools like NVIDIA Cosmos to extrapolate and augment real-world data — generating photoreal synthetic variations that help close the sim-to-real gap even faster. 👉 https://lnkd.in/gjyHTHmt Cosmos: https://lnkd.in/gTnS5TvG #PhysicalAI #Sim2Real #IndustrialAI #Robotics #ComputerVision #DigitalTwins #Omniverse NVIDIA

5 Comments
Like Comment
DEBASHIS DAS

Founder BRS CENTRAL AI | Creating Robots For People | 6 PATENTS | Architecting the World’s First Truly working EMBODIED AI Stack for Industrial Autonomy | RAAS GTM Leader| DeepTech Entrepreneur

31,650 followers 3w
Report this post
$5B+ has been invested in building #foundationmodels for #Robotics. Most of it is training on the wrong data. #Simulation can't reproduce monsoon humidity warping a factory floor. Internet video doesn't carry sensor telemetry. A million repetitions of the same weld is one data point repeated a million times. #Volume ≠ #Diversity. And foundation models need diversity. The richest data for Physical AI comes from machines operating in environments that refuse to cooperate — where forklifts appear from blind spots, floors change condition hourly, and no two shifts look the same. We've collected 12,000+ of these edge cases across numerous deployed machines, 35+ OEM brands, 24 months of production operations. Our customers pay us to collect the data that foundation model companies spend billions trying to simulate. Full analysis below 👇 #SLAgradeRobotics #PhysicalAI #FoundationModels #Robotics #AutonomousRobots #Humanoids #GTC2026 #NVIDIA #EmbodiedAI#FutureOfWork #ValueEngineering #EmbodiedAI #AI #AGI #GTM #SLAgradeRobotics #EdgeCases #ConstructionRobotics #CleaningRobotics #MiningRobotics #MaterialHandlingRobotics #autonomous #EmbodiedIntelligence #artificialintelligence #computervision #machinelearning #deeplearning #deeptech #roboticist #humanoid #HumanoidRobotics #venturecapital #vc

The Foundation Model for Robotics Won't Come From Where You Think DEBASHIS DAS on LinkedIn
Like Comment

New Datasets for Robotics Skill Training

Summary

More in Advancing Robotics Technology

Explore categories