Don’t let hype about AI agents get ahead of reality

Let’s begin with the time period “agent” itself. Proper now, it’s being slapped on every thing from easy scripts to stylish AI workflows. There’s no shared definition, which leaves loads of room for corporations to market primary automation as one thing far more superior. That form of “agentwashing” doesn’t simply confuse prospects; it invitations disappointment. We don’t essentially want a inflexible commonplace, however we do want clearer expectations about what these techniques are speculated to do, how autonomously they function, and the way reliably they carry out.

And reliability is the following massive problem. Most of immediately’s brokers are powered by massive language fashions (LLMs), which generate probabilistic responses. These techniques are highly effective, however they’re additionally unpredictable. They’ll make issues up, go off observe, or fail in delicate methods—particularly once they’re requested to finish multistep duties, pulling in exterior instruments and chaining LLM responses collectively. A latest instance: Customers of Cursor, a well-liked AI programming assistant, have been informed by an automatic help agent that they couldn’t use the software program on a couple of machine. There have been widespread complaints and experiences of customers cancelling their subscriptions. But it surely turned out the policy didn’t exist. The AI had invented it.

In enterprise settings, this type of mistake may create immense harm. We have to cease treating LLMs as standalone merchandise and begin constructing full techniques round them—techniques that account for uncertainty, monitor outputs, handle prices, and layer in guardrails for security and accuracy. These measures might help be sure that the output adheres to the necessities expressed by the consumer, obeys the corporate’s insurance policies relating to entry to data, respects privateness points, and so forth. Some corporations, together with AI21 (which I cofounded and which has acquired funding from Google), are already transferring in that course, wrapping language fashions in additional deliberate, structured architectures. Our newest launch, Maestro, is designed for enterprise reliability, combining LLMs with firm information, public data, and different instruments to make sure reliable outputs.

Nonetheless, even the neatest agent received’t be helpful in a vacuum. For the agent mannequin to work, totally different brokers must cooperate (reserving your journey, checking the climate, submitting your expense report) with out fixed human supervision. That’s the place Google’s A2A protocol is available in. It’s meant to be a common language that lets brokers share what they’ll do and divide up duties. In precept, it’s a terrific concept.

In apply, A2A nonetheless falls brief. It defines how brokers speak to one another, however not what they really imply. If one agent says it will possibly present “wind situations,” one other has to guess whether or not that’s helpful for evaluating climate on a flight route. With out a shared vocabulary or context, coordination turns into brittle. We’ve seen this drawback earlier than in distributed computing. Fixing it at scale is way from trivial.

Source link

Why AI should be able to “hang up” on you

From slop to Sotheby’s? AI art enters a new phase

Future-proofing business capabilities with AI technologies

How to Turn Employee AI Use into a Strategic Advantage with Brian Madden [MAICON 2025 Speaker Series]

The Rise of Semantic Entity Resolution

AI-modell tränas på hälsodata från 57M britter för att förutse sjukdomar

5 Techniques to Prevent Hallucinations in Your RAG Question Answering

AI Video Magic Meets Copyright Chaos

Most Popular

Checking the quality of materials just got easier with a new AI tool | MIT News

A Developer’s Guide to Building Scalable AI: Workflows vs Agents

How to Use AI as a Productivity Tool with Mike Kaput [MAICON 2025 Speaker Series]

Our Picks

Creating AI that matters | MIT News

Scaling Recommender Transformers to a Billion Parameters

Hidden Gems in NumPy: 7 Functions Every Data Scientist Should Know

Don’t let hype about AI agents get ahead of reality

Related Posts