Two-Stage Tool Grounding Techniques in Robotics

Explore top LinkedIn content from expert professionals.

Summary

Two-stage tool grounding techniques in robotics are specialized training methods that help robots understand and use tools in the real world by breaking the learning process into two distinct phases. This approach makes it easier for robots to translate visual and motion data into precise actions, improving their ability to manipulate objects and perform complex tasks.

Build strong foundations: Start by teaching robots to recognize and encode visual or motion information before moving on to action-based training.
Combine learning methods: Use a mix of supervised training and reinforcement learning to ensure robots can select the right tools and perform tasks reliably.
Test in real settings: Always validate robot performance in realistic environments to confirm their skills and adaptability beyond simulations.

Summarized by AI based on LinkedIn member posts

Daily Papers

Machine Learning Engineer at Hugging Face

12,259 followers 4mo
Report this post
Researchers from NVIDIA, the University of Michigan, and Ohio State University are pushing the boundaries of what Vision Language Models (VLMs) can do in robotics with their latest work: SpaceTools. VLMs excel at qualitative visual understanding, but often struggle with the metrically precise spatial reasoning essential for embodied AI and real-world robot manipulation. SpaceTools introduces an exciting solution to this challenge. At its core is Double Interactive Reinforcement Learning (DIRL), a novel two-phase training framework. This approach enables VLMs to effectively learn and coordinate a wide variety of vision and robotic tools, going beyond fixed pipelines and discovering optimal tool-use patterns. This powerful multi-tool coordination is underpinned by "Toolshed," a scalable infrastructure designed to efficiently deploy compute-heavy tools like depth estimators, segmentation models, and pose estimators during both training and inference. SpaceTools achieves state-of-the-art performance on spatial understanding benchmarks and demonstrates reliable real-world robot manipulation using a 7-DOF robot. It's a significant step towards VLMs that can not only see, but also act with precision in complex physical environments. We are thrilled to see papers like SpaceTools shared on Hugging Face. The code, models, and data are planned for release soon, and we encourage researchers to make use of the Hugging Face Hub to share their artifacts and foster community collaboration. Learn more about SpaceTools and DIRL: Paper: https://lnkd.in/eV69kwMU Project Page: https://lnkd.in/evr7acgJ Code (coming soon): https://lnkd.in/em9y4zVn
Like Comment
Marc Theermann

Chief Strategy Officer and GTM Leader at Boston Dynamics (Building the world’s most capable mobile #robots and Embodied AI)

65,674 followers 1y
Report this post
Another robotics masterpiece from our friends from Disney Research! Recent progress in physics-based character control has improved learning from unstructured motion data, but it's still hard to create a single control policy that handles diverse, unseen motions and works on real robots. To solve this, the team at Disney proposes a new two-stage technique. In the first stage, an autoencoder is used to learn a latent space encoding from short motion clips. In the second stage, this encoding helps train a policy that maps kinematic input to dynamic output, ensuring accurate and adaptable movements. By keeping these stages separate, the method benefits from better motion encoding and avoids common issues like mode collapse. This technique has shown to be effective in simulations and has successfully brought dynamic motions to a real bipedal robot, marking an important step forward in robot control. You can find the full paper here: https://lnkd.in/d-kzexdJ What Markus Gross, Moritz Baecher and the rest of the gang are bringing to life is unbelievable!

27 Comments
Like Comment
Ashutosh Hathidara

Senior ML Scientist @SAP AI | Machine Learning Researcher | Opensource Creator | Motion Graphics Designer

50,941 followers 2mo
Report this post
Training reliable tool-using agents is notoriously difficult. It often presents a trade-off: rely on expensive manual human intervention or settle for "simulated" environments where an LLM judges another LLM (often unverifiable). A new paper, "ASTRA" (Automated Synthesis of agentic Trajectories and Reinforcement Arenas), proposes a fully automated solution to close this gap. 🤖 Here is the breakdown of how it works: 1. Verifiable Environments over Simulation Instead of relying on LLM-based simulators for feedback, ASTRA synthesizes executable environments. It converts Question-Answer traces into independent, code-executable Python environments. This allows the Reinforcement Learning (RL) process to receive deterministic, rule-based rewards rather than "vibes-based" feedback. 2. Two-Stage Training Pipeline The framework utilizes a complementary approach: - SFT (Supervised Fine-Tuning): Uses synthesized trajectories based on tool-call graphs to give the model a strong "cold start" in tool usage. - Online Multi-Turn RL: The agent interacts with the synthesized environments. Crucially, the training mixes in "irrelevant tools" (distractors). This forces the agent to learn tool discrimination rather than just memorizing which tool to pick. 3. Performance The results are significant for the open-source community. On agentic benchmarks like BFCL v3 and ACEBench, ASTRA-trained models (14B and 32B) achieve state-of-the-art performance for their size, approaching the capabilities of closed-source systems while preserving their core reasoning abilities. Limitations: While the automated environment synthesis is scalable, it is computationally expensive to generate these verifiable sandboxes. Additionally, the current framework focuses on goal-oriented tasks and has not yet fully integrated complex, multi-turn human-user interactions during training. The full pipeline and models have been open-sourced. 🛠️ #MachineLearning #AI #LLM #ToolCalling #AgenticAI
Like Comment

Two-Stage Tool Grounding Techniques in Robotics

Summary

More in Robotics Engineering Technical Skills

Explore categories