DeepSeek may have found a new way to improve AI’s ability to remember

At present, most giant language fashions break textual content down into 1000’s of tiny items referred to as tokens. This turns the textual content into representations that fashions can perceive. Nevertheless, these tokens shortly grow to be costly to retailer and compute with as conversations with finish customers develop longer. When a consumer chats with an AI for prolonged durations, this problem may cause the AI to neglect issues the consumer has already instructed it and get data muddled, an issue some name “context rot.”

The brand new strategies developed by DeepSeek (and printed in its latest paper) may assist to beat this problem. As an alternative of storing phrases as tokens, its system packs written data into picture type, virtually as if it’s taking an image of pages from a e-book. This enables the mannequin to retain almost the identical data whereas utilizing far fewer tokens, the researchers discovered.

Basically, the OCR mannequin is a testbed for these new strategies that let extra data to be packed into AI fashions extra effectively.

Moreover utilizing visible tokens as an alternative of simply textual content ones, the mannequin is constructed on a sort of tiered compression that’s not not like how human recollections fade: Older or much less essential content material is saved in a barely extra blurry type with a view to save area. Regardless of that, the paper’s authors argue that this compressed content material can nonetheless stay accessible within the background, whereas sustaining a excessive degree of system effectivity.

Textual content tokens have lengthy been the default constructing block in AI techniques. Utilizing visible tokens as an alternative is unconventional, and because of this, DeepSeek’s mannequin is shortly capturing researchers’ consideration. Andrej Karpathy, the previous Tesla AI chief and a founding member of OpenAI, praised the paper on X, saying that pictures might in the end be higher than textual content as inputs for LLMs. Textual content tokens is likely to be “wasteful and simply horrible on the enter,” he wrote.

Manling Li, an assistant professor of pc science at Northwestern College, says the paper affords a brand new framework for addressing the present challenges in AI reminiscence. “Whereas the thought of utilizing image-based tokens for context storage isn’t fully new, that is the primary research I’ve seen that takes it this far and reveals it would really work,” Li says.

Source link

How AI is turning the Iran conflict into theater

Is the Pentagon allowed to surveil Americans with AI?

The AI Arms Race Has Real Numbers: Pentagon vs China 2026

Improving Cash Flow with AI-Driven Financial Forecasting

Claude Education en ny AI-chattbot utformad för högre utbildningsinstitutioner

This patient’s Neuralink brain implant gets a boost from Grok

Data Mesh Diaries: Realities from Early Adopters

Data Analyst or Data Engineer or Analytics Engineer or BI Engineer ?

Most Popular

Mastering NLP with spaCY — Part 1 | Towards Data Science

AI companies have stopped warning you that their chatbots aren’t doctors

Which Hallucinates Less And How To Fix Both » Ofemwire

Our Picks

When Data Lies: Finding Optimal Strategies for Penalty Kicks with Game Theory

Three OpenClaw Mistakes to Avoid and How to Fix Them

I Stole a Wall Street Trick to Solve a Google Trends Data Problem

DeepSeek may have found a new way to improve AI’s ability to remember

Related Posts