Close Menu
    Trending
    • The Math That’s Killing Your AI Agent
    • Building Robust Credit Scoring Models (Part 3)
    • How to Measure AI Value
    • What’s the right path for AI? | MIT News
    • MIT and Hasso Plattner Institute establish collaborative hub for AI and creativity | MIT News
    • OpenAI is throwing everything into building a fully automated researcher
    • Agentic RAG Failure Modes: Retrieval Thrash, Tool Storms, and Context Bloat (and How to Spot Them Early)
    • The Basics of Vibe Engineering
    ProfitlyAI
    • Home
    • Latest News
    • AI Technology
    • Latest AI Innovations
    • AI Tools & Technologies
    • Artificial Intelligence
    ProfitlyAI
    Home » OpenAI’s New Benchmark Shows AI Does Knowledge Work 100X Faster and Cheaper Than Experts
    Latest News

    OpenAI’s New Benchmark Shows AI Does Knowledge Work 100X Faster and Cheaper Than Experts

    ProfitlyAIBy ProfitlyAISeptember 30, 2025No Comments5 Mins Read
    Share Facebook Twitter Pinterest LinkedIn Tumblr Reddit Telegram Email
    Share
    Facebook Twitter LinkedIn Pinterest Email


    For years, the gold normal for measuring AI progress has been difficult tutorial assessments and summary puzzles. However the true query has all the time been: Can AI do the precise work individuals receives a commission for?

    OpenAI is trying to reply that query with the launch of its new analysis framework, GDPval, and the outcomes are a wake-up name for each information employee and enterprise chief.

    In line with the blind evaluations run by trade specialists, right this moment’s greatest fashions—like GPT-5 and Claude Opus 4.1—are already producing work rated as equal to or higher than human output almost half the time. This framework, which measures efficiency throughout 44 information work occupations, is the type of real-world evaluation that AI has desperately wanted.

    To unpack this new analysis framework’s significance, I spoke to SmarterX and Advertising and marketing AI Institute founder and CEO Paul Roetzer on Episode 170 of The Artificial Intelligence Show.

    Why GDPval Is the Actual-World Take a look at That Issues

    At its core, GDPval principally features like a real-world check for AI to find out if it may well do economically precious information work. Not like conventional benchmarks that use easy textual content prompts or exam-style questions, the GDPval analysis system is constructed on real-world deliverables and contexts:

    • The analysis spans 1,320 specialised duties, all primarily based on actual work merchandise like authorized briefs, engineering blueprints, buyer assist conversations, and nursing care plans.
    • Each activity was meticulously crafted by subject material specialists with over a decade of expertise, who then served because the blind graders. They in contrast the human- and AI-generated deliverables with out figuring out which was which, providing critiques and rankings.
    • The duties aren’t easy textual content prompts; they embrace reference information and context, with anticipated deliverables spanning paperwork, slides, diagrams, spreadsheets, and multimedia.

    This deal with the fact of labor is important. 

    “The factor we’ve talked about for some time is that the IQ assessments [in traditional AI evaluations] have been saturated,” he says. “What we actually wanted to know was the implications on precise work. Individuals do the duties which can be a part of these jobs.”

    And, if GDPval is any indication, AI is getting superb on the duties that folks do as a part of their jobs.

    100X Sooner and 100X Cheaper

    OpenAI’s analysis discovered that frontier fashions can full the GDPval duties roughly 100 occasions quicker and 100 occasions cheaper than human trade specialists.

    Roetzer emphasised the importance of this discovering, particularly contemplating the comparability level: these are trade specialists, not simply common employees. We’re already on the level the place plainly giving a few of these duties to an AI mannequin as an alternative of a human would save each money and time.

    That’s going to have some disruptive results on the financial system as we all know it. The occupations chosen for the examine have been these contributing most to whole wages and compensation within the 9 industries that contribute over 5% of US GDP. 

    This deliberate focus parallels the technique of AI labs and VCs trying on the “whole addressable market of salaries” to find out which markets may be most disrupted by AI know-how.

    In different phrases, GDPval shouldn’t be solely an analysis framework, but additionally a roadmap that factors to precisely which information work jobs AI may disrupt.

    2026 because the 12 months AI Begins to Overtake People

    The GDPval outcomes are a present snapshot, however one pc scientist and AI researcher—Julian Schrittwieser, a key participant within the growth of Google’s AlphaGo and AlphaZero—issued a transparent warning concerning the tempo of future progress.

    In a widely shared post, Schrittwieser cautioned in opposition to the entice of concluding that AI is plateauing simply because it makes occasional errors. Extrapolating the constant development of exponential efficiency enchancment, he predicts that 2026 will probably be a pivotal yr for widespread integration of AI into the financial system:

    • By mid-2026, he says fashions will be capable of work autonomously for full eight-hour work days.
    • By the tip of 2026, a minimum of one mannequin will match the efficiency of human specialists throughout many industries.
    • And by the tip of 2027, fashions will steadily outperform specialists on many duties.

    This sober evaluation, that “extrapolating straight strains on graphs is probably going to provide you a greater mannequin of the long run than most specialists,” is why economists are beginning to sound the alarm. 

    A new research paper from specialists at Stanford is already recommending a analysis agenda to deal with the affect of “transformative AI” on financial progress, earnings distribution, and human wellbeing.

    Why You Can’t Afford to Have Blindspots

    This confluence of proof—the GDPval’s present proof of expert-level functionality and the conservative timeline for AGI—means nobody can afford to stay skeptical.

    The dialog is shifting from “AI would not actually do something” to the conclusion that it is getting actually good in any respect the stuff you do. OpenAI’s says their objective is to maintain everybody on the “up elevator” of AI by democratizing entry and supporting employees via change.

    However the problem is that essentially the most direct proof of AI’s affect is private adoption. 

    As Roetzer concluded, whenever you cease to take a look at the duties that make up your job, you’ll be able to see the change occurring. The sunshine bulb second, the place individuals understand how extremely useful and environment friendly the instruments are when utilized to their on a regular basis work, is the second the financial system actually begins to rework earlier than all our eyes.

    However for those who don’t use the instruments sufficient to succeed in that time, you danger growing some critical blindspots with regards to AI’s affect in your profession.





    Source link

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Previous ArticleHow AI-Generated Content Is Destroying Team Productivity
    Next Article OpenAI’s New Report Details How We Use ChatGPT at Work
    ProfitlyAI
    • Website

    Related Posts

    Latest News

    Shaip Joins Ubiquity to Accelerate Enterprise AI Data Delivery at Global Scale

    February 23, 2026
    Latest News

    Which Method Maximizes Your LLM’s Performance?

    February 13, 2026
    Latest News

    Ubiquity to Acquire Shaip AI, Advancing AI and Data Capabilities

    February 12, 2026
    Add A Comment
    Leave A Reply Cancel Reply

    Top Posts

    Breaking the Host Memory Bottleneck: How Peer Direct Transformed Gaudi’s Cloud Performance

    February 25, 2026

    The Math That’s Killing Your AI Agent

    March 20, 2026

    Google släpper Gemini 2.5 Pro Preview 05-06 med fokus på kodning

    May 6, 2025

    Generative AI hype distracts us from AI’s more important breakthroughs

    December 15, 2025

    Overcoming Nonsmoothness and Control Chattering in Nonconvex Optimal Control Problems

    December 30, 2025
    Categories
    • AI Technology
    • AI Tools & Technologies
    • Artificial Intelligence
    • Latest AI Innovations
    • Latest News
    Most Popular

    The Hidden Trap of Fixed and Random Effects

    July 18, 2025

    Why MissForest Fails in Prediction Tasks: A Key Limitation You Need to Keep in Mind

    September 26, 2025

    OpenAI inför vattenstämplar på gratisgenererade bilder

    April 9, 2025
    Our Picks

    The Math That’s Killing Your AI Agent

    March 20, 2026

    Building Robust Credit Scoring Models (Part 3)

    March 20, 2026

    How to Measure AI Value

    March 20, 2026
    Categories
    • AI Technology
    • AI Tools & Technologies
    • Artificial Intelligence
    • Latest AI Innovations
    • Latest News
    • Privacy Policy
    • Disclaimer
    • Terms and Conditions
    • About us
    • Contact us
    Copyright © 2025 ProfitlyAI All Rights Reserved.

    Type above and press Enter to search. Press Esc to cancel.