Close Menu
    Trending
    • Using OpenClaw as a Force Multiplier: What One Person Can Ship with Autonomous Agents
    • From NetCDF to Insights: A Practical Pipeline for City-Level Climate Risk Analysis
    • Building a Production-Grade Multi-Node Training Pipeline with PyTorch DDP
    • A Beginner’s Guide to Quantum Computing with Python
    • How ElevenLabs Voice AI Is Replacing Screens in Warehouse and Manufacturing Operations
    • Seeing sounds | MIT News
    • MIT engineers design proteins by their motion, not just their shape | MIT News
    • How to Make Your AI App Faster and More Interactive with Response Streaming
    ProfitlyAI
    • Home
    • Latest News
    • AI Technology
    • Latest AI Innovations
    • AI Tools & Technologies
    • Artificial Intelligence
    ProfitlyAI
    Home » This is the most misunderstood graph in AI
    AI Technology

    This is the most misunderstood graph in AI

    ProfitlyAIBy ProfitlyAIFebruary 5, 2026No Comments3 Mins Read
    Share Facebook Twitter Pinterest LinkedIn Tumblr Reddit Telegram Email
    Share
    Facebook Twitter LinkedIn Pinterest Email


    That was definitely the case for Claude Opus 4.5, the most recent model of Anthropic’s strongest mannequin, which was launched in late November. In December, METR introduced that Opus 4.5 gave the impression to be able to independently finishing a job that will have taken a human about 5 hours—an unlimited enchancment over what even the exponential development would have predicted. One Anthropic security researcher tweeted that he would change the course of his analysis in mild of these outcomes; one other worker on the firm merely wrote, “mother come decide me up i’m scared.”

    Credit score: METR.ORG

    However the fact is extra difficult than these dramatic responses would recommend. For one factor, METR’s estimates of the skills of particular fashions include substantial error bars. As METR explicitly acknowledged on X, Opus 4.5 may have the ability to repeatedly full solely duties that take people about two hours, or it would succeed on duties that take people so long as 20 hours. Given the uncertainties intrinsic to the tactic, it was inconceivable to know for certain. 

    “There are a bunch of ways in which persons are studying an excessive amount of into the graph,” says Sydney Von Arx, a member of METR’s technical workers.

    Extra essentially, the METR plot doesn’t measure AI skills writ massive, nor does it declare to. So as to construct the graph, METR checks the fashions totally on coding duties, evaluating the problem of every by measuring or estimating how lengthy it takes people to finish it—a metric that not everybody accepts. Claude Opus 4.5 may have the ability to full sure duties that take people 5 hours, however that doesn’t imply it’s wherever near changing a human employee.

    METR was based to evaluate the dangers posed by frontier AI techniques. Although it’s best identified for the exponential development plot, it has additionally labored with AI corporations to judge their techniques in larger element and revealed a number of different unbiased analysis initiatives, together with a widely covered July 2025 study suggesting that AI coding assistants may really be slowing software program engineers down. 

    However the exponential plot has made METR’s status, and the group seems to have an advanced relationship with that graph’s usually breathless reception. In January, Thomas Kwa, one of many lead authors on the paper that launched it, wrote a blog post responding to some criticisms and making clear its limitations, and METR is at present engaged on a extra intensive FAQ doc. However Kwa isn’t optimistic that these efforts will meaningfully shift the discourse. “I believe the hype machine will mainly, no matter we do, simply strip out all of the caveats,” he says.

    However, the METR group does assume that the plot has one thing significant to say concerning the trajectory of AI progress. “You must completely not tie your life to this graph,” says Von Arx. “But additionally,” she provides, “I wager that this development is gonna maintain.”



    Source link

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Previous ArticleBrian Hedden named co-associate dean of Social and Ethical Responsibilities of Computing | MIT News
    Next Article The Rule Everyone Misses: How to Stop Confusing loc and iloc in Pandas
    ProfitlyAI
    • Website

    Related Posts

    AI Technology

    This startup wants to change how mathematicians do math

    March 25, 2026
    AI Technology

    Agentic commerce runs on truth and context

    March 25, 2026
    AI Technology

    The AI Hype Index: AI goes to war

    March 25, 2026
    Add A Comment
    Leave A Reply Cancel Reply

    Top Posts

    Kernel Case Study: Flash Attention

    April 3, 2025

    Generative AI is learning to spy for the US military

    April 11, 2025

    Elon Musk i konflikt med Groks källhänvisning

    June 22, 2025

    AI, Digital Growth & Overcoming the Asset Cap

    August 26, 2025

    The Simplest Possible AI Web App

    May 29, 2025
    Categories
    • AI Technology
    • AI Tools & Technologies
    • Artificial Intelligence
    • Latest AI Innovations
    • Latest News
    Most Popular

    Tencent lanserar HunyuanWorld-Voyager AI förvandlar foton till spelbara 3D-världar

    September 24, 2025

    Torchvista: Building an Interactive Pytorch Visualization Package for Notebooks

    July 23, 2025

    ROC AUC Explained: A Beginner’s Guide to Evaluating Classification Models

    September 17, 2025
    Our Picks

    Using OpenClaw as a Force Multiplier: What One Person Can Ship with Autonomous Agents

    March 28, 2026

    From NetCDF to Insights: A Practical Pipeline for City-Level Climate Risk Analysis

    March 28, 2026

    Building a Production-Grade Multi-Node Training Pipeline with PyTorch DDP

    March 27, 2026
    Categories
    • AI Technology
    • AI Tools & Technologies
    • Artificial Intelligence
    • Latest AI Innovations
    • Latest News
    • Privacy Policy
    • Disclaimer
    • Terms and Conditions
    • About us
    • Contact us
    Copyright © 2025 ProfitlyAI All Rights Reserved.

    Type above and press Enter to search. Press Esc to cancel.