Close Menu
    Trending
    • Hybrid Neuro-Symbolic Fraud Detection: Guiding Neural Networks with Domain Rules
    • What Most B2B Contact Data Comparisons Get Wrong
    • Building a Like-for-Like solution for Stores in Power BI
    • How Pokémon Go is helping robots deliver pizza on time
    • What Are Agent Skills Beyond Claude?
    • When Data Lies: Finding Optimal Strategies for Penalty Kicks with Game Theory
    • Three OpenClaw Mistakes to Avoid and How to Fix Them
    • I Stole a Wall Street Trick to Solve a Google Trends Data Problem
    ProfitlyAI
    • Home
    • Latest News
    • AI Technology
    • Latest AI Innovations
    • AI Tools & Technologies
    • Artificial Intelligence
    ProfitlyAI
    Home » New Benchmark Shows AI Agents Perform Poorly When Automating Real Jobs
    Latest News

    New Benchmark Shows AI Agents Perform Poorly When Automating Real Jobs

    ProfitlyAIBy ProfitlyAINovember 5, 2025No Comments4 Mins Read
    Share Facebook Twitter Pinterest LinkedIn Tumblr Reddit Telegram Email
    Share
    Facebook Twitter LinkedIn Pinterest Email


    A brand new paper from the Heart for AI Security and Scale AI has launched the Remote Labor Index (RLI), the primary benchmark designed to measure how nicely AI brokers can carry out paid, distant jobs.

    The RLI benchmark contains real-world initiatives from freelance platforms, spanning complicated fields corresponding to recreation improvement, structure, knowledge evaluation, and video manufacturing. These aren’t easy duties: The initiatives represented over 6,000 hours of human work valued at greater than $140,000.

    The outcomes? Present AI brokers carried out poorly.

    Manus, the top-performing agent, may solely automate 2.5 p.c of the work. Different prime fashions, corresponding to Grok 4 and Sonnet 4.5, managed simply 2.1 p.c, whereas GPT-5 hit 1.7 p.c  and Gemini 2.5 Professional got here in beneath 1 p.c. The researchers famous failures stemmed from incomplete deliverables, damaged recordsdata, and low-quality work that would not meet skilled requirements.

    Whereas these low numbers may appear reassuring to human staff, they do not inform the entire story. To know what these findings actually imply for the way forward for AI within the workforce, I mentioned them with SmarterX and Advertising AI Institute founder and CEO Paul Roetzer on Episode 178 of The Artificial Intelligence Show.

    Why Common Brokers Are the Flawed Measuring Stick

     

    Roetzer wasn’t shocked by the low automation charges, noting that the benchmark checks normal brokers that are not particularly educated for these complicated jobs.

    The actual and far sooner progress is going on with specialised brokers. He factors to examples together with OpenAI reportedly hiring Goldman Sachs bankers to coach fashions to do the job of an funding banker.

    “My guess is OpenAI’s is method additional alongside than 2.5 p.c for that particular factor,” he says.

    This highlights a vital distinction in how we must always take into consideration AI’s capabilities. The RLI gives a invaluable baseline for normal fashions, however the true financial impression will doubtless come from fashions intensely targeted on a selected job.

    Good at Duties Not But at Jobs

    Roetzer explains this utilizing a easy framework: duties, initiatives, and jobs.

    Proper now, AI is superb on the process stage, which incorporates the small, discrete actions that make up a bigger venture.

    “It’s good on the duties,” he says. “It isn’t good at doing the complete factor.”

    An agent cannot exchange a CEO, for instance, nevertheless it would possibly assist with 25 totally different duties {that a} CEO does each month. People, nonetheless, are nonetheless important for setting targets, planning, connecting knowledge sources, integrating instruments, and, most significantly, overseeing and verifying the AI output.

    The Financial Turing Take a look at

    The important thing metric to look at, in accordance with Roetzer, is how lengthy an agent can work and not using a human needing to intervene, an idea he calls “actions per disengagement,” just like how Tesla measures self-driving.

    We have not but reached what he calls the “financial Turing check,” the place the financial labor of AI is indistinguishable from that of a human.

    “Is it to the purpose the place I’d rent an agent or a symphony of brokers as an alternative of a human?” he asks. “In each occasion I can consider, the reply continues to be no.”

    Nonetheless, brokers are getting higher, extra autonomous, and extra dependable inside particular jobs slowly however certainly. And even augmentation of individuals with AI brokers might result in a discount within the variety of individuals wanted, says Roetzer.

    “Because the brokers get extra autonomous, as they get extra dependable, as extra corporations perceive the best way to construct and combine them into workflows, you do not want as many individuals doing the work that you simply beforehand did.”





    Source link

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Previous ArticleVad är AI-PC: Vilken ska jag köpa
    Next Article What Mercor’s $10B Valuation Could Mean for the Future of Work
    ProfitlyAI
    • Website

    Related Posts

    Latest News

    Shaip Joins Ubiquity to Accelerate Enterprise AI Data Delivery at Global Scale

    February 23, 2026
    Latest News

    Which Method Maximizes Your LLM’s Performance?

    February 13, 2026
    Latest News

    Ubiquity to Acquire Shaip AI, Advancing AI and Data Capabilities

    February 12, 2026
    Add A Comment
    Leave A Reply Cancel Reply

    Top Posts

    Modular Arithmetic in Data Science

    August 19, 2025

    The Machine Learning “Advent Calendar” Day 9: LOF in Excel

    December 9, 2025

    Bridging the Gap Between Research and Readability with Marco Hening Tallarico

    January 19, 2026

    Rethinking Data Science Interviews in the Age of AI

    July 4, 2025

    Designing Data and AI Systems That Hold Up in Production

    February 26, 2026
    Categories
    • AI Technology
    • AI Tools & Technologies
    • Artificial Intelligence
    • Latest AI Innovations
    • Latest News
    Most Popular

    Let’s Analyze OpenAI’s Claims About ChatGPT Energy Use

    June 16, 2025

    Building Video Game Recommender Systems with FastAPI, PostgreSQL, and Render: Part 1

    September 25, 2025

    This Self-Driving Taxi Could Replace Uber by 2025 — And It’s Backed by Toyota

    April 25, 2025
    Our Picks

    Hybrid Neuro-Symbolic Fraud Detection: Guiding Neural Networks with Domain Rules

    March 10, 2026

    What Most B2B Contact Data Comparisons Get Wrong

    March 10, 2026

    Building a Like-for-Like solution for Stores in Power BI

    March 10, 2026
    Categories
    • AI Technology
    • AI Tools & Technologies
    • Artificial Intelligence
    • Latest AI Innovations
    • Latest News
    • Privacy Policy
    • Disclaimer
    • Terms and Conditions
    • About us
    • Contact us
    Copyright © 2025 ProfitlyAI All Rights Reserved.

    Type above and press Enter to search. Press Esc to cancel.