Will Embodied AI Create Robotic Coworkers?

Will embodied AI create robotic coworkers? June 30, 2025 | Article A pragmatic look at what general-purpose robots can—and can’t yet—do in the workplace. From C-3PO’s polished diplomacy to R2-D2’s battlefield heroics, robots have long captured our imagination. Today, what was once confined to science fiction is inching toward industrial reality. General-purpose robots, powered by increasingly capable embodied AI, are being tested in warehouses, factories, hospitals, and fields.1 And unlike previous generations of robots, they’re not just performing a single preprogrammed task but adapting to dynamic environments, learning new motions, and even following verbal commands. Much of the current buzz centers on humanoids—robots that resemble people—whose recent exploits include running marathons and performing backflips. General-purpose robots also come in many other forms, however, including those that rely on four legs or wheels for movement (Exhibit 1). But as executives weigh automation road maps and workforce evolution, their focus should not be on whether their robots look human but on whether these robots can flex across tasks in environments designed for humans. This issue is both urgent and intriguing because general-purpose robots, including those in the multipurpose subcategory, may become part of the workplace team: trained to pack, pick, lift, inspect, move, and collaborate with people in real time.2 Surge in investment and innovation The sector has seen an explosion in activity. General-purpose robotics funding grew fivefold from 2022 to 2024, surpassing $1 billion in annual investment, with leading start-ups such as Figure AI, Skild AI, and Agility Robotics raising hundreds of millions of dollars. Patent filings have also surged, with a 40 percent CAGR in volume since 2022. Governments are taking notice, too. China has designated embodied AI a national priority, anchoring a $138 billion innovation fund. McKinsey Global Institute’s recent research report, The next big arenas of competition, identifies embodied AI and robotics as one of five emerging frontiers that are shaping future global productivity and digital infrastructure. AI foundation models as robotics brainpower Just as large language models unlocked natural conversation for chatbots, vision-language-action (VLA) foundation models enable robots to interpret visual cues, follow spoken instructions, and execute complex sequences. These foundation models support key robotic functions, including perception, reasoning, and decision-making. When paired with multimodal sensors—those that can ingest and act on multiple inputs, such as touch and force—they create systems that can learn by observing humans, without being manually programmed step by step.

Reminds me of the gap between "impressive demo" and "works reliably on Tuesday afternoon in Building C." The VLA models are interesting but the multimodal sensor fusion part is where things still break. Force feedback especially — demos look smooth until you need consistent grip pressure across different object textures.

Like
Reply

To view or add a comment, sign in

Explore content categories