Trends in AI Task Completion

Explore top LinkedIn content from expert professionals.

Summary

Trends in AI task completion describe the rapid advances in how artificial intelligence systems are able to take on longer, more complex tasks without human intervention. In just a few years, AI agents have evolved from handling short assignments to autonomously managing hour-long and even multi-day projects.

Rethink workflows: Build flexible processes and systems that can quickly adapt as AI agents take on increasingly complex and time-consuming tasks.
Embrace adaptive planning: Move away from rigid, long-term strategies and focus on ongoing experimentation, feedback, and modular designs that can keep pace with AI's exponential progress.
Prioritize upskilling: Equip teams with new skills and knowledge to collaborate effectively with AI, ensuring that human workers stay relevant as task automation expands.

Summarized by AI based on LinkedIn member posts

Gajen Kandiah

Chief Executive Officer, Rackspace Technology

23,625 followers 1y
Report this post
One fiscal quarter is the equivalent of a full year of AI agent advancement. An 18-month roadmap equates to about a decade of capability shifts. And systems designed for today’s AI may already be outdated by the time they go live. These are some of the insights extrapolated from a new research paper from METR, “Measuring AI Ability to Complete Long Tasks,” which may provide the clearest explanation yet of how rapidly AI agents are transforming the nature of knowledge work. https://lnkd.in/eB8KPNtb Instead of narrow benchmarks, the study asks a broader and more useful question: How long can an AI system work on a real-world task before it breaks down? The answer: the “task completion horizon” has been doubling every 3 to 7 months since 2019. If your planning frameworks are built around human time scales, it’s worth recognizing that AI is evolving in dog years — or faster. This presents a strategic challenge most enterprise leaders haven’t encountered before: We are being asked to plan for something that is unknowable in its specifics, but inevitable in its trajectory. There’s no steady-state to optimize around. No predictable plateau. There’s just an exponential curve that is already reshaping what’s possible in software development, cybersecurity, reasoning, and long-horizon task automation. The temptation to wait for maturity is understandable — but with this rate of change, waiting creates risk, and inaction becomes a liability. So what’s the alternative? Enterprises that thrive in this environment will embrace adaptive strategy — grounded in action today and built for flexibility tomorrow: • Design workflows and systems that can scale with rising agent capabilities • Rethink governance as a dynamic, living framework • Embed feedback loops, experimentation, and modularity • Focus on readiness, not perfection The METR team is careful not to overstate the trend’s longevity — but the current data is clear: AI agents are now reliably completing hour-long tasks. Full-day or week-long task automation is no longer speculative — it’s within reach. We may not know the precise timeline. But we know where we’re headed. And we know the speed is unfamiliar. AI’s future is unknowable. Its impact is inevitable. And it’s unfolding on a clock most organizations weren’t built to manage. The question isn’t whether to act. It’s whether your organization can learn, adapt, and lead at the speed of change. #AI #GenAI #EnterpriseAI #AIAgents #DigitalTransformation #Leadership #Strategy #AIReadiness #FutureOfWork #MooresLaw #UnknowableButInevitable

Measuring AI Ability to Complete Long Tasks metr.org

2 Comments
Like Comment
Raghvender Arni

26,611 followers 1y
Report this post
𝗧𝗟;𝗗𝗥: As per McKinsey, success of AI depends 𝗽𝗿𝗶𝗺𝗮𝗿𝗶𝗹𝘆 𝗼𝗻 𝗖𝗘𝗢 𝗹𝗲𝘃𝗲𝗹 𝘀𝗽𝗼𝗻𝘀𝗼𝗿𝘀𝗵𝗶𝗽 and the ability to 𝗿𝗲𝘄𝗶𝗿𝗲 𝗮𝗻 𝗼𝗿𝗴𝗮𝗻𝗶𝘇𝗮𝘁𝗶𝗼𝗻’𝘀 𝘄𝗼𝗿𝗸𝗳𝗹𝗼𝘄𝘀 (vs just deploying intelligent chatbots). Interestingly as per METR, AI performance in terms of the 𝗹𝗲𝗻𝗴𝘁𝗵 𝗼𝗳 𝘁𝗮𝘀𝗸𝘀 𝗔𝗜 𝗮𝗴𝗲𝗻𝘁𝘀 𝗰𝗮𝗻 𝗰𝗼𝗺𝗽𝗹𝗲𝘁𝗲 𝗵𝗮𝘀 𝗯𝗲𝗲𝗻 𝗰𝗼𝗻𝘀𝗶𝘀𝘁𝗲𝗻𝘁𝗹𝘆 𝗲𝘅𝗽𝗼𝗻𝗲𝗻𝘁𝗶𝗮𝗹𝗹𝘆 𝗶𝗻𝗰𝗿𝗲𝗮𝘀𝗶𝗻𝗴 𝗼𝘃𝗲𝗿 𝘁𝗵𝗲 𝗽𝗮𝘀𝘁 𝟲 𝘆𝗲𝗮𝗿𝘀, 𝘄𝗶𝘁𝗵 𝗮 𝗱𝗼𝘂𝗯𝗹𝗶𝗻𝗴 𝘁𝗶𝗺𝗲 𝗼𝗳 𝗮𝗿𝗼𝘂𝗻𝗱 𝟳 𝗺𝗼𝗻𝘁𝗵𝘀. This will have a huge impact on business rewiring and faster time to outcomes. Some key points from McKinsey & Company State of AI report (https://mck.co/4hMale0): • 78% of organizations now use AI in at least one business function, up from 55% last year. • 𝗟𝗮𝗿𝗴𝗲 𝗰𝗼𝗺𝗽𝗮𝗻𝗶𝗲𝘀 𝗹𝗲𝗮𝗱 𝗔𝗜 𝗮𝗱𝗼𝗽𝘁𝗶𝗼𝗻 𝘁𝗵𝗿𝗼𝘂𝗴𝗵 𝘄𝗼𝗿𝗸𝗳𝗹𝗼𝘄 𝗿𝗲𝗱𝗲𝘀𝗶𝗴𝗻𝘀 𝗮𝗻𝗱 𝗱𝗲𝗱𝗶𝗰𝗮𝘁𝗲𝗱 𝗶𝗺𝗽𝗹𝗲𝗺𝗲𝗻𝘁𝗮𝘁𝗶𝗼𝗻 𝘁𝗲𝗮𝗺𝘀. • CEO oversight of AI governance shows strongest correlation with positive financial impact. • Organizations increasingly mitigate AI risks around accuracy, security, and IP infringement. • Companies are both hiring AI specialists and reskilling existing employees. • Over 80% of organizations still see no material enterprise-level EBIT impact from AI. On a related topic to workflow redesign, METR did some great work (https://bit.ly/4hCk2LQ) where they showed AI's ability to complete tasks (measured by equivalent human time required) has been doubling approximately every 7 months for the past 6 years which means that 𝘄𝗶𝘁𝗵𝗶𝗻 𝟮-𝟰 𝘆𝗲𝗮𝗿𝘀, 𝗔𝗜 𝗮𝗴𝗲𝗻𝘁𝘀 𝗰𝗼𝘂𝗹𝗱 𝗮𝘂𝘁𝗼𝗻𝗼𝗺𝗼𝘂𝘀𝗹𝘆 𝗰𝗼𝗺𝗽𝗹𝗲𝘁𝗲 𝘄𝗲𝗲𝗸-𝗹𝗼𝗻𝗴 𝗽𝗿𝗼𝗷𝗲𝗰𝘁𝘀 𝗱𝗼𝗻𝗲 𝗯𝘆 𝗵𝘂𝗺𝗮𝗻𝘀! (hat tip to Ethan Mollick for the METR link) Organizations that strategically reimagine their operations 𝗮𝗿𝗼𝘂𝗻𝗱 𝗶𝗻𝗰𝗿𝗲𝗮𝘀𝗶𝗻𝗴𝗹𝘆 𝗰𝗮𝗽𝗮𝗯𝗹𝗲 𝗔𝗜 𝗮𝗴𝗲𝗻𝘁𝘀—centralizing risk and data governance while distributing tech talent in hybrid models as the McKinsey survey suggests—will capture greater value. 𝗔𝗰𝘁𝗶𝗼𝗻 𝗳𝗼𝗿 𝗖𝗘𝗢𝘀 𝗮𝗻𝗱 𝗖𝗔𝗜𝗢𝘀: Rather than waiting for AI to demonstrate enterprise-wide EBIT impact, 𝗳𝗼𝗿𝘄𝗮𝗿𝗱-𝘁𝗵𝗶𝗻𝗸𝗶𝗻𝗴 𝗰𝗼𝗺𝗽𝗮𝗻𝗶𝗲𝘀 𝘀𝗵𝗼𝘂𝗹𝗱 𝗯𝗲 𝗺𝗮𝗽𝗽𝗶𝗻𝗴 𝗼𝘂𝘁 𝘄𝗵𝗶𝗰𝗵 𝗶𝗻𝗰𝗿𝗲𝗮𝘀𝗶𝗻𝗴𝗹𝘆 𝗰𝗼𝗺𝗽𝗹𝗲𝘅 𝘁𝗮𝘀𝗸𝘀 𝗔𝗜 𝘄𝗶𝗹𝗹 𝗵𝗮𝗻𝗱𝗹𝗲 𝗶𝗻 𝘁𝗵𝗲 𝗰𝗼𝗺𝗶𝗻𝗴 𝗺𝗼𝗻𝘁𝗵𝘀 𝗮𝗻𝗱 𝘆𝗲𝗮𝗿𝘀, allowing them to proactively restructure roles, retrain employees, and redesign processes to leverage this exponential growth in AI task completion capabilities.
No more previous content

No more next content
Like Comment
Mikaël Wornoo🐺

Studying the collision of AI & Work | Founder @ TechWolf🐺

9,138 followers 7mo
Report this post
Are we underestimating exponential growth again? Just like during COVID? An Anthropic researcher thinks we do. The study from METR reveals AI task completion is doubling every 7 months. Two years ago, GPT3.5 managed 15-second tasks. Today's models handle 1-hour assignments at a 50% success rate. Julian Schrittwieser argues that underestimating exponential AI progress mirrors our early Covid-19 blindness. Despite clear exponential curves, leaders dismissed pandemic risks until disruption hit. The same pattern emerges with AI capability debates. We see models struggle with complex code and conclude plateauing. Yet the trajectory remains exponential. I made this mistake myself when evaluating early LLMs. Watching GPT4 fail at basic tasks, I assumed "fundamental" limits. Looking back, that assessment aged poorly within 18 months. The METR index tracks autonomous software engineering specifically. Not demos or cherry-picked examples, but measurable task completion across difficulty levels. Current frontier models approach hour-long engineering problems. By mid-2026, the data suggests 4-hour task mastery. What happens when AI handles full-day engineering sprints? The productivity implications for tech organizations are staggering. The challenge for people leaders: exponential change breaks linear planning models. Traditional hiring cycles assume stable skill requirements over 12-24 months. But doubling every 7 months means 8x capability improvement in 21 months. Organizations treating AI adoption as optional might face the same rude awakening as February 2020 pandemic skeptics...
No more previous content

No more next content
4 Comments
Like Comment
Waseem Alshikh

Co-founder and CTO of Writer

15,763 followers 6mo
Report this post
The way AI agents work has fundamentally changed in just two years—and most people haven't noticed yet. The Evolution Story - 2023: The "Thinking" Era Remember when we were amazed that AI could show its reasoning? Chain-of-Thought agents walked us through their logic step-by-step. They were thorough, methodical... and painfully slow. Great for complex problems, but you wouldn't want one handling your customer support tickets. - 2024: The "Doing" Revolution Then came the tool-masters. These agents didn't overthink—they acted. Need data? Call an API. Need to calculate? Use a calculator. They were fast, efficient, and cost-effective. But ask them to handle a nuanced ethical dilemma? Not their strength. - 2025: The "Adaptive" Breakthrough This year changed everything. Hybrid agents emerged that can do BOTH—and more importantly, they know WHEN to do what. Facing a simple data lookup? Quick API call. Complex legal reasoning required? Deep thinking mode activated. Multi-step workflow with dependencies? Seamlessly switching between strategies. It's like having a team member who knows when to send a quick Slack message versus scheduling a deep-dive meeting. What I'm Seeing in Production The difference is dramatic: Agents that used to need human intervention every 10 steps? Now running 90+ steps autonomously Tasks that took 4 minutes? Down to under 3 Most importantly: They're actually completing what we ask them to do, not just trying their best My 2026 Predictions 1. Goodbye Generic Chatbots, Hello Specialized Operators We'll stop deploying one-size-fits-all solutions. Instead: domain-specific agents trained on your industry's context, compliance requirements, and workflows. 2. Self-Correction Becomes Expected Hit an error? The agent should figure it out—not escalate to you. The bar for "good enough" is rising fast. 3. ROI Transparency Gets Real CFOs will demand clear answers: "How much value per dollar spent?" Vendors who can't quantify business impact won't survive. 4. Agent Orchestration Goes Mainstream One agent for research, another for writing, a third for fact-checking—all coordinated by a conductor agent that knows how to leverage each specialist. The Paradigm Shift Here's what's really changing: We're moving from "Can AI do this task?" to "How efficiently can AI decide HOW to do this task?" The intelligence isn't just in the execution anymore—it's in the meta-cognition. The ability to self-assess, choose strategies, and adapt on the fly. What This Means for You If you're building AI products: Flexibility > Raw power If you're implementing AI: Measure outcomes, not speed If you're investing in AI: Look for adaptive systems, not specialists The agents that win in 2026 won't be the fastest or the smartest—they'll be the ones that know when to be which.
No more previous content

No more next content
11 Comments
Like Comment
Bill Faruki

Founder & CEO, MindHYVE.ai | Building the World’s First Federated Agentic Intelligence Ecosystem | Creator of Eve-Genesis™

12,954 followers 2mo
Report this post
December 2025 – January 2026 will be remembered as the period when AI-assisted software development crossed an irreversible threshold. Not because of one launch. Because three independent trendlines converged — and the compounding effect broke our mental models. 𝟏. 𝐑𝐞𝐚𝐬𝐨𝐧𝐢𝐧𝐠 𝐝𝐢𝐬𝐩𝐥𝐚𝐜𝐞𝐝 𝐢𝐧𝐟𝐞𝐫𝐞𝐧𝐜𝐞 𝐚𝐬 𝐭𝐡𝐞 𝐝𝐨𝐦𝐢𝐧𝐚𝐧𝐭 𝐦𝐨𝐝𝐞. For years, LLMs operated on a single forward pass: pattern-match the input, produce the most likely output. Useful, but brittle. That changed when reasoning models moved from curiosity to default. Per OpenRouter’s 100-trillion-token study, reasoning-optimized models went from negligible usage in early Q1 2025 to over 50% of all tokens processed by Q4. When a model spends tokens exploring solution paths before committing, the failure modes change. It doesn’t just guess better — it plans. And planning is the prerequisite for autonomy. 𝟐. 𝐓𝐡𝐞 𝐭𝐨𝐤𝐞𝐧 𝐞𝐜𝐨𝐧𝐨𝐦𝐢𝐜𝐬 𝐢𝐧𝐯𝐞𝐫𝐭𝐞𝐝. Reasoning models use 10–20x more tokens per task. Sounds expensive. But developers voted with their wallets: they chose slower, costlier models that actually complete complex tasks over fast, cheap ones requiring constant human correction. If a model independently resolves a multi-file bug that would take a senior engineer 4 hours, $15 in reasoning tokens is a rounding error on a $200/hr loaded cost. 𝟑. 𝐓𝐚𝐬𝐤 𝐚𝐮𝐭𝐨𝐧𝐨𝐦𝐲 𝐮𝐧𝐝𝐞𝐫𝐰𝐞𝐧𝐭 𝐚 𝐬𝐭𝐞𝐩-𝐟𝐮𝐧𝐜𝐭𝐢𝐨𝐧 𝐜𝐡𝐚𝐧𝐠𝐞. METR has tracked the “task-completion time horizon” — the duration of tasks AI agents complete with 50% reliability. It’s been doubling every ~7 months for 6 years. In 2024–2025, that accelerated to every ~4 months. Concretely: Opus 4.5 hit a 50% time horizon of ~4h49m. Sonnet 4.5 handles 30+ hour autonomous sessions. CTOs I talk to report multi-day runs on specialized codebases. SWE-bench scores went from ~50% to 80%+ in one year. Scale AI’s harder SWE-Bench Pro still stumps the best models at under 25%. That gap is the roadmap. 𝐖𝐡𝐲 𝐭𝐡𝐢𝐬 𝐦𝐚𝐭𝐭𝐞𝐫𝐬 𝐟𝐨𝐫 𝐞𝐯𝐞𝐫𝐲 𝐟𝐨𝐮𝐧𝐝𝐞𝐫: We’ve entered the era of AI-delegated development. The question isn’t “can AI help my team code faster?” It’s “what work can I delegate entirely, and what’s the supervision model?” If METR’s trend holds, agents will complete week-long tasks within 2 years. The orgs that thrive will build the muscle memory now: what to delegate, how to decompose work for agents, how to validate autonomous output, and how to restructure teams around human-agent collaboration. What’s the longest autonomous task you’ve successfully delegated to an AI agent?
No more previous content

No more next content
1 Comment
Like Comment
Patrick Salyer

Partner at Mayfield (AI & Enterprise); Previous CEO at Gigya

9,618 followers 2mo
Report this post
As we kick off the (fiscal) year, here are the "Top 10" patterns I’m seeing in application-layer startups. Some are becoming obvious, but all are worth reflecting on: 1. Software as Labor: AI apps are evolving from productivity tools to work replacements. Success is no longer measured by feature usage, but by measurable throughput. 2. The “Good Enough” Era: The fear of model obsolescence is fading. Frontier models have crossed the threshold of commercial utility. Differentiation now moves upstream to workflow integration, context management, and UX. 3. The Rise of “World Models”: Advances in multi-modality are unlocking agentic capabilities in blue-collar, physical, and complex vertical workflows. 4. Long-Term Reasoning: Reliable long-horizon planning, expanded context windows, and Chain of Thought (CoT) are enabling entirely new classes of autonomous workflows. 5. Browser as the Integration Layer: "Computer use" allows agents to operate legacy UIs via the browser. This creates a "wedge" into SAP, Oracle, and EHRs without waiting for clean APIs. 6. Workflow Logic > Data Moats: Controversial take: Proprietary data is overrated. The real moat is deep domain workflow logic—the playbooks, compliance rules, and nuances of how a job actually gets done. 7. The “Context Graph” Opportunity: Agents can now capture the "missing layer" of enterprise execution: the decisions and exceptions currently trapped in Slack or human heads. This data will eventually augment or replace traditional Systems of Record. 8. Vertical Autonomy: Vertical AI is moving from augmentation to end-to-end task completion (the “AI Employee”), powered by reinforcement learning and better orchestration. 9. The Agentic Customer Journey: Agents are reshaping discovery and procurement. From machine-to-machine selling to automated RFP responses, agents are becoming the new marketing funnel. 10. The Service-First Wedge: Startups are winning by leading with AI-powered services, then "automating the human out" over time to transition into high-margin software platforms. What did I miss?

5 Comments
Like Comment
Azeem Azhar Azeem Azhar is an Influencer

Making sense of the Exponential Age

430,787 followers 1y
Report this post
AI models are now reaching the ability to complete tasks that take skilled humans nearly an hour - and this capability is doubling every 7 months. A new paper "Measuring AI Ability to Complete Long Tasks" measures the "time horizon" - the human-time-equivalent of tasks AI systems can complete with 50% reliability. Using this approach, researchers found that... ▪️ Current frontier models like Claude 3.7 Sonnet have a 50% time horizon of ~50 minutes ▪️Time horizons have been doubling approximately every 7 months since 2019 ▪️ This progress is driven by better reasoning, tool use, and adaptation to mistakes ▪️At 80% reliability, horizons are much shorter (15 minutes vs. 59 minutes for Claude 3.7) What makes this fascinating is how predictable this growth appears to be. And while these benchmarks aren't perfect representations of real-world tasks, the trend holds even when controlling for task "messiness" factors. If this trajectory continues, we may see AI agents capable of handling month-long software development tasks before 2031.
No more previous content

No more next content
3 Comments
Like Comment
Adam Hofmann

AI Transformation Partner for CEOs & Executives | I Help Organizations See the Future, Design for It, and Make AI Actually Work 🚀 Vision, Operating Models, Enterprise-Scale Impact ✨

5,685 followers 2w
Report this post
This is insane…AI can do tasks that would take a human engineer 12 hours, 50% of the time. Look closely at this chart (it was updated this month). It measures how long an AI model can autonomously work on a software task with a 50% success rate. In 2019–2020, we were talking minutes. Fixing small bugs. Training simple classifiers. Now? The frontier models are handling tasks equivalent to 6–7 hours of expert-level engineering work. That’s not a marginal improvement. That’s a shift in the time horizon. This data shows a consistent exponential trend over ~6 years. The doubling time is roughly 7 months. That means the “autonomy window” …how long AI can stay coherent and productive on a single task keeps expanding at a predictable rate. If that curve holds: → By 2028: multi-day engineering tasks, end-to-end → Soon after: week-long autonomous builds → Not copilots. Not assistants. Autonomous contributors. This is the real unlock. -Short tasks automate skills. -Long tasks automate roles. When AI can hold context, reason across hours of work, recover from errors, and ship something meaningful…. the structure of teams changes. Planning cycles change. What we consider “entry-level” changes. The curve is bending upward. Most org charts are not. Companies are still budgeting assuming AI saves 20–30% productivity. But what happens when one model can execute what used to require a small team for a full day? What does hiring look like then? What does training look like? I teach and work with teams deploying these systems. The biggest gap I see isn’t technical. It’s psychological. People anchor to what AI was 18 months ago. Not what it is now. And this chart suggests the next 18 months will matter more than the last five years combined. The safest move right now is simple: use these systems for real work. Push their limits. See where they break. Because the boundary is moving… fast. ↓ Read & subscribe my AI newsletter insights from my bio.
No more previous content

No more next content
27 Comments
Like Comment
Robert Kelly

VP of Innovation | AI CTO - Leading Enterprise Innovation & AI-Driven Transformation

3,933 followers 4mo
Report this post
The ability for AI models to complete tasks relative to human effort is now doubling roughly every 4 months. METR has continued their research measuring the success rate for frontier models completing tasks at increasing complexity. The results are evaluating a 50% liklihood of success for tasks that would take an experienced engineer a given amount of time. Until recently, the pace of doubling was occurring every 7 months. This year, now ending with Claude Opus 4.5 leading the pack, the rate of improvement has shown to be accelerating, doubling every 4 months. The improvements are exponential. If the accelerated trend continues, we could see AI completing month-long tasks by the end of 2027. This rate of change is amazing. With the competition and investment focus moving from consumer to enterprise productivity, I’m sure 2026 will bring more improvement. But even if progress slows, the fact is that we’ve already seen massive improvements in the ability of frontier models to accomplish real-world tasks. I think this data shows we should keep our eyes open to the innovation in the space and take advantage of the latest improvements with each new model release. The real win here is that we can see measurable improvement for no added cost to us. When a new model becomes available, they’re getting faster AND less expensive on average. It’s often at most just a checkbox for an admin to enable access to the new models. How have you or your teams kept up with the pace of change so far? Are you seeing these improvements show up in real world value? (Links to the AI Digest article on the METR research in the comments.)
No more previous content

No more next content
14 Comments
Like Comment
John Bailey

Strategic Advisor | Investor | Board Member

18,493 followers 1y
Report this post
Researchers at METR @METR just published a new paper that shows that the length of tasks AI agents can complete autonomously has been doubling every 7 months since 2019 - essentially revealing a "Moore's Law" of sorts that can help us better understand the trajectory of AI capabilities. Key Takeaways: - To measure AI progress in a way that compares to humans, the study introduces a new metric: the 50%-task-completion time horizon. This represents the longest task an AI agent can complete correctly half the time, based on how long it usually takes a human expert to finish the same task. - AI’s ability to complete long, complex tasks has been doubling every 7 months since 2019. - If this trend continues, AI agents could independently handle tasks that take humans a month by 2028-2031. - The biggest drivers of improvement: Better reasoning, tool use, and adaptability—not just bigger models. It will be interesting to see how approaches like OpenAI and Google's Deep Research impact this. - AI still struggles with messy, real-world tasks that require intuition, judgment, and seeking out missing information. Paper: https://lnkd.in/d7bW6RTV METR thread: https://lnkd.in/dp4-Y64v Great thread on the background of the paper from Elizabeth (Beth) Barnes: https://lnkd.in/dfuvi6ZY
No more previous content

No more next content
5 Comments
Like Comment

Trends in AI Task Completion

Summary

More in AI Trends and Innovations

Explore categories