Close Menu
    Trending
    • Optimizing Data Transfer in Distributed AI/ML Training Workloads
    • Achieving 5x Agentic Coding Performance with Few-Shot Prompting
    • Why the Sophistication of Your Prompt Correlates Almost Perfectly with the Sophistication of the Response, as Research by Anthropic Found
    • From Transactions to Trends: Predict When a Customer Is About to Stop Buying
    • America’s coming war over AI regulation
    • “Dr. Google” had its issues. Can ChatGPT Health do better?
    • Evaluating Multi-Step LLM-Generated Content: Why Customer Journeys Require Structural Metrics
    • Why SaaS Product Management Is the Best Domain for Data-Driven Professionals in 2026
    ProfitlyAI
    • Home
    • Latest News
    • AI Technology
    • Latest AI Innovations
    • AI Tools & Technologies
    • Artificial Intelligence
    ProfitlyAI
    Home » New Benchmark Shows AI Agents Perform Poorly When Automating Real Jobs
    Latest News

    New Benchmark Shows AI Agents Perform Poorly When Automating Real Jobs

    ProfitlyAIBy ProfitlyAINovember 5, 2025No Comments4 Mins Read
    Share Facebook Twitter Pinterest LinkedIn Tumblr Reddit Telegram Email
    Share
    Facebook Twitter LinkedIn Pinterest Email


    A brand new paper from the Heart for AI Security and Scale AI has launched the Remote Labor Index (RLI), the primary benchmark designed to measure how nicely AI brokers can carry out paid, distant jobs.

    The RLI benchmark contains real-world initiatives from freelance platforms, spanning complicated fields corresponding to recreation improvement, structure, knowledge evaluation, and video manufacturing. These aren’t easy duties: The initiatives represented over 6,000 hours of human work valued at greater than $140,000.

    The outcomes? Present AI brokers carried out poorly.

    Manus, the top-performing agent, may solely automate 2.5 p.c of the work. Different prime fashions, corresponding to Grok 4 and Sonnet 4.5, managed simply 2.1 p.c, whereas GPT-5 hit 1.7 p.c  and Gemini 2.5 Professional got here in beneath 1 p.c. The researchers famous failures stemmed from incomplete deliverables, damaged recordsdata, and low-quality work that would not meet skilled requirements.

    Whereas these low numbers may appear reassuring to human staff, they do not inform the entire story. To know what these findings actually imply for the way forward for AI within the workforce, I mentioned them with SmarterX and Advertising AI Institute founder and CEO Paul Roetzer on Episode 178 of The Artificial Intelligence Show.

    Why Common Brokers Are the Flawed Measuring Stick

     

    Roetzer wasn’t shocked by the low automation charges, noting that the benchmark checks normal brokers that are not particularly educated for these complicated jobs.

    The actual and far sooner progress is going on with specialised brokers. He factors to examples together with OpenAI reportedly hiring Goldman Sachs bankers to coach fashions to do the job of an funding banker.

    “My guess is OpenAI’s is method additional alongside than 2.5 p.c for that particular factor,” he says.

    This highlights a vital distinction in how we must always take into consideration AI’s capabilities. The RLI gives a invaluable baseline for normal fashions, however the true financial impression will doubtless come from fashions intensely targeted on a selected job.

    Good at Duties Not But at Jobs

    Roetzer explains this utilizing a easy framework: duties, initiatives, and jobs.

    Proper now, AI is superb on the process stage, which incorporates the small, discrete actions that make up a bigger venture.

    “It’s good on the duties,” he says. “It isn’t good at doing the complete factor.”

    An agent cannot exchange a CEO, for instance, nevertheless it would possibly assist with 25 totally different duties {that a} CEO does each month. People, nonetheless, are nonetheless important for setting targets, planning, connecting knowledge sources, integrating instruments, and, most significantly, overseeing and verifying the AI output.

    The Financial Turing Take a look at

    The important thing metric to look at, in accordance with Roetzer, is how lengthy an agent can work and not using a human needing to intervene, an idea he calls “actions per disengagement,” just like how Tesla measures self-driving.

    We have not but reached what he calls the “financial Turing check,” the place the financial labor of AI is indistinguishable from that of a human.

    “Is it to the purpose the place I’d rent an agent or a symphony of brokers as an alternative of a human?” he asks. “In each occasion I can consider, the reply continues to be no.”

    Nonetheless, brokers are getting higher, extra autonomous, and extra dependable inside particular jobs slowly however certainly. And even augmentation of individuals with AI brokers might result in a discount within the variety of individuals wanted, says Roetzer.

    “Because the brokers get extra autonomous, as they get extra dependable, as extra corporations perceive the best way to construct and combine them into workflows, you do not want as many individuals doing the work that you simply beforehand did.”





    Source link

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Previous ArticleVad är AI-PC: Vilken ska jag köpa
    Next Article What Mercor’s $10B Valuation Could Mean for the Future of Work
    ProfitlyAI
    • Website

    Related Posts

    Latest News

    Why Google’s NotebookLM Might Be the Most Underrated AI Tool for Agencies Right Now

    January 21, 2026
    Latest News

    Why Optimization Isn’t Enough Anymore

    January 21, 2026
    Latest News

    Adversarial Prompt Generation: Safer LLMs with HITL

    January 20, 2026
    Add A Comment
    Leave A Reply Cancel Reply

    Top Posts

    A Bird’s-Eye View of Linear Algebra: Why Is Matrix Multiplication Like That?

    August 13, 2025

    How Conversational AI is Framing the Future of Automobiles?

    June 25, 2025

    Healthcare Data De-identification: Achieving Compliance in 2024 & Beyond

    April 6, 2025

    Data Visualization Explained (Part 5): Visualizing Time-Series Data in Python (Matplotlib, Plotly, and Altair)

    November 20, 2025

    Automated Data Extraction for AI Workflows: A Complete Guide

    September 2, 2025
    Categories
    • AI Technology
    • AI Tools & Technologies
    • Artificial Intelligence
    • Latest AI Innovations
    • Latest News
    Most Popular

    The road to artificial general intelligence

    August 13, 2025

    Vana is letting users own a piece of the AI models trained on their data | MIT News

    April 4, 2025

    Baidu släpper ERNIE 4.5 som öppen källkod

    June 30, 2025
    Our Picks

    Optimizing Data Transfer in Distributed AI/ML Training Workloads

    January 23, 2026

    Achieving 5x Agentic Coding Performance with Few-Shot Prompting

    January 23, 2026

    Why the Sophistication of Your Prompt Correlates Almost Perfectly with the Sophistication of the Response, as Research by Anthropic Found

    January 23, 2026
    Categories
    • AI Technology
    • AI Tools & Technologies
    • Artificial Intelligence
    • Latest AI Innovations
    • Latest News
    • Privacy Policy
    • Disclaimer
    • Terms and Conditions
    • About us
    • Contact us
    Copyright © 2025 ProfitlyAI All Rights Reserved.

    Type above and press Enter to search. Press Esc to cancel.