Within the three years since ChatGPT’s explosive debut, OpenAI’s expertise has upended a outstanding vary of on a regular basis actions at house, at work, in faculties—wherever folks have a browser open or a cellphone out, which is in all places.
Now OpenAI is making an specific play for scientists. In October, the agency introduced that it had launched an entire new crew, known as OpenAI for Science, devoted to exploring how its massive language fashions may assist scientists and tweaking its instruments to help them.
The final couple of months have seen a slew of social media posts and academic publications through which mathematicians, physicists, biologists, and others have described how LLMs (and OpenAI’s GPT-5 specifically) have helped them make a discovery or nudged them towards an answer they may in any other case have missed. Partially, OpenAI for Science was set as much as interact with this group.
And but OpenAI can also be late to the celebration. Google DeepMind, the rival agency behind groundbreaking scientific fashions similar to AlphaFold and AlphaEvolve, has had an AI-for-science crew for years. (Once I spoke to Google DeepMind’s CEO and cofounder Demis Hassabis in 2023 about that crew, he instructed me: “That is the explanation I began DeepMind … In truth, it’s why I’ve labored my complete profession in AI.”)
So why now? How does a push into science match with OpenAI’s wider mission? And what precisely is the agency hoping to attain?
I put these inquiries to Kevin Weil, a vp at OpenAI who leads the brand new OpenAI for Science crew, in an unique interview final week.
On mission
Weil is a product man. He joined OpenAI a few years in the past as chief product officer after being head of product at Twitter and Instagram. However he began out as a scientist. He received two-thirds of the best way by a PhD in particle physics at Stanford College earlier than ditching academia for the Silicon Valley dream. Weil is eager to spotlight his pedigree: “I believed I used to be going to be a physics professor for the remainder of my life,” he says. “I nonetheless learn math books on trip.”
Requested how OpenAI for Science matches with the agency’s present lineup of white-collar productiveness instruments or the viral video app Sora, Weil recites the corporate mantra: “The mission of OpenAI is to attempt to construct synthetic normal intelligence and, you already know, make it useful for all of humanity.”
The influence on science of future variations of this expertise could possibly be wonderful, he says: New medicines, new supplies, new gadgets. “Give it some thought serving to us perceive the character of actuality, serving to us suppose by open issues. Possibly the largest, most constructive influence we’re going to see from AGI will really be from its means to speed up science.”
He provides, “With GPT-5, we noticed that changing into doable.”
As Weil tells it, LLMs are actually ok to be helpful scientific collaborators, spitballing concepts, suggesting novel instructions to discover, and discovering fruitful parallels between a scientist’s query and obscure analysis papers printed a long time in the past or in overseas languages.
That wasn’t the case a 12 months or so in the past. Because it introduced its first reasoning mannequin, o1, in December 2024, OpenAI has been pushing the envelope of what the expertise can do. “You return a couple of years and we have been all collectively mind-blown that the fashions may get an 800 on the SAT,” says Weil.
However quickly LLMs have been acing math competitions and fixing graduate-level physics issues. Final 12 months, OpenAI and Google DeepMind each introduced that their LLMs had achieved gold-medal-level efficiency within the Worldwide Math Olympiad, one of many hardest math contests on the earth. “These fashions are not simply higher than 90% of grad college students,” says Weil. “They’re actually on the frontier of human talents.”
That’s an enormous declare, and it comes with caveats. Nonetheless, there’s little doubt that GPT-5 is an enormous enchancment on GPT-4 in terms of difficult problem-solving. GPT-5 features a so-called reasoning mannequin, a kind of LLM that may break down issues into a number of steps and work by them one after the other. This system has made LLMs much better at fixing math and logic issues than they was once.
Measured towards an trade benchmark referred to as GPQA, which incorporates greater than 400 multiple-choice questions that take a look at PhD-level data in biology, physics, and chemistry, GPT-4 scores 39%, nicely under the human-expert baseline of round 70%. Based on OpenAI, GPT-5.2 (the most recent replace to the mannequin, launched in December) scores 92%.
Overhyped
The joy is obvious—and maybe extreme. In October, senior figures at OpenAI, together with Weil, boasted on X that GPT-5 had discovered options to a number of unsolved math issues. Mathematicians have been fast to level out that in reality what GPT-5 appeared to have performed was dig up present options in previous analysis papers, together with at the least one written in German. That was nonetheless helpful, however it wasn’t the achievement OpenAI appeared to have claimed. Weil and his colleagues deleted their posts.
Now Weil is extra cautious. It’s typically sufficient to search out solutions that exist however have been forgotten, he says: “We collectively stand on the shoulders of giants, and if LLMs can form of accumulate that data in order that we don’t spend time struggling on an issue that’s already solved, that’s an acceleration all of its personal.”
He performs down the concept that LLMs are about to provide you with a game-changing new discovery. “I don’t suppose fashions are there but,” he says. “Possibly they’ll get there. I’m optimistic that they’ll.”
However, he insists, that’s not the mission: “Our mission is to speed up science. And I don’t suppose the bar for the acceleration of science is, like, Einstein-level reimagining of a complete area.”
For Weil, the query is that this: “Does science really occur sooner as a result of scientists plus fashions can do way more, and do it extra rapidly, than scientists alone? I feel we’re already seeing that.”
In November, OpenAI printed a sequence of anecdotal case research contributed by scientists, each inside and out of doors the corporate, that illustrated how they had used GPT-5 and how it had helped. “A lot of the circumstances have been scientists that have been already utilizing GPT-5 immediately of their analysis and had come to us a technique or one other saying, ‘Have a look at what I’m in a position to do with these instruments,’” says Weil.
The important thing issues that GPT-5 appears to be good at are discovering references and connections to present work that scientists weren’t conscious of, which typically sparks new concepts; serving to scientists sketch mathematical proofs; and suggesting methods for scientists to check hypotheses within the lab.
“GPT 5.2 has learn considerably each paper written within the final 30 years,” says Weil. “And it understands not simply the sphere {that a} explicit scientist is working in; it may carry collectively analogies from different, unrelated fields.”
“That’s extremely highly effective,” he continues. “You may all the time discover a human collaborator in an adjoining area, however it’s tough to search out, you already know, a thousand collaborators in all thousand adjoining fields which may matter. And along with that, I can work with the mannequin late at night time—it doesn’t sleep—and I can ask it 10 issues in parallel, which is form of awkward to do to a human.”
Fixing issues
A lot of the scientists OpenAI reached out to again up Weil’s place.
Robert Scherrer, a professor of physics and astronomy at Vanderbilt College, solely performed round with ChatGPT for enjoyable (“I used to it rewrite the theme music for Gilligan’s Island within the fashion of Beowulf, which it did very nicely,” he tells me) till his Vanderbilt colleague Alex Lupsasca, a fellow physicist who now works at OpenAI, instructed him that GPT-5 had helped remedy an issue he’d been engaged on.
Lupsasca gave Scherrer entry to GPT-5 Professional, OpenAI’s $200-a-month premium subscription. “It managed to resolve an issue that I and my graduate pupil couldn’t remedy regardless of engaged on it for a number of months,” says Scherrer.
It’s not excellent, he says: “GTP-5 nonetheless makes dumb errors. In fact, I do too, however the errors GPT-5 makes are even dumber.” And but it retains getting higher, he says: “If present developments proceed—and that’s an enormous if—I think that every one scientists shall be utilizing LLMs quickly.”
Derya Unutmaz, a professor of biology on the Jackson Laboratory, a nonprofit analysis institute, makes use of GPT-5 to brainstorm concepts, summarize papers, and plan experiments in his work learning the immune system. Within the case research he shared with OpenAI, Unutmaz used GPT-5 to research an previous information set that his crew had beforehand checked out. The mannequin got here up with recent insights and interpretations.
“LLMs are already important for scientists,” he says. “When you’ll be able to full evaluation of knowledge units that used to take months, not utilizing them will not be an choice anymore.”
Nikita Zhivotovskiy, a statistician on the College of California, Berkeley, says he has been utilizing LLMs in his analysis for the reason that first model of ChatGPT got here out.
Like Scherrer, he finds LLMs most helpful after they spotlight surprising connections between his personal work and present outcomes he didn’t find out about. “I imagine that LLMs have gotten a vital technical instrument for scientists, very similar to computer systems and the web did earlier than,” he says. “I count on a long-term drawback for individuals who don’t use them.”
However he doesn’t count on LLMs to make novel discoveries anytime quickly. “I’ve seen only a few genuinely recent concepts or arguments that might be value a publication on their very own,” he says. “To date, they appear to primarily mix present outcomes, typically incorrectly, slightly than produce genuinely new approaches.”
I additionally contacted a handful of scientists who aren’t linked to OpenAI.
Andy Cooper, a professor of chemistry on the College of Liverpool and director of the Leverhulme Analysis Centre for Useful Supplies Design, is much less enthusiastic. “We now have not discovered, but, that LLMs are basically altering the best way that science is finished,” he says. “However our current outcomes counsel that they do have a spot.”
Cooper is main a venture to develop a so-called AI scientist that may fully automate parts of the scientific workflow. He says that his crew doesn’t use LLMs to provide you with concepts. However the tech is beginning to show helpful as a part of a wider automated system the place an LLM will help direct robots, for instance.
“My guess is that LLMs may stick extra in robotic workflows, at the least initially, as a result of I’m undecided that individuals are able to be instructed what to do by an LLM,” says Cooper. “I’m definitely not.”
Making errors
LLMs could also be changing into increasingly helpful, however warning continues to be suggested. In December, Jonathan Oppenheim, a scientist who works on quantum mechanics, called out a mistake that made its way into a scientific journal. “OpenAI management are selling a paper in Physics Letters B the place GPT-5 proposed the primary thought—probably the primary peer-reviewed paper the place an LLM generated the core contribution,” Oppenheim posted on X. “One small drawback: GPT-5’s thought exams the improper factor.”
He continued: “GPT-5 was requested for a take a look at that detects nonlinear theories. It offered a take a look at that detects nonlocal ones. Associated-sounding, however totally different. It’s like asking for a COVID take a look at, and the LLM cheerfully palms you a take a look at for chickenpox.”
It’s clear that a number of scientists are discovering progressive and intuitive methods to interact with LLMs. Additionally it is clear that the expertise makes errors that may be so refined even specialists miss them.
A part of the issue is the best way ChatGPT can flatter you into letting down your guard. As Oppenheim put it: “A core concern is that LLMs are being educated to validate the person, whereas science wants instruments that problem us.” In an excessive case, one particular person (who was not a scientist) was persuaded by ChatGPT into considering for months that he’d invented a new branch of mathematics.
In fact, Weil is nicely conscious of the issue of hallucination. However he insists that newer fashions are hallucinating much less and fewer. Even so, specializing in hallucination is likely to be lacking the purpose, he says.
“One in every of my teammates right here, an ex math professor, mentioned one thing that caught with me,” says Weil. “He mentioned: ‘Once I’m doing analysis, if I’m bouncing concepts off a colleague, I’m improper 90% of the time and that’s form of the purpose. We’re each spitballing concepts and looking for one thing that works.’”
“That’s really a fascinating place to be,” says Weil. “When you say sufficient improper issues after which someone stumbles on a grain of fact after which the opposite individual seizes on it and says, ‘Oh, yeah, that’s not fairly proper, however what if we—’ You progressively form of discover your path by the woods.”
That is Weil’s core imaginative and prescient for OpenAI for Science. GPT-5 is nice, however it isn’t an oracle. The worth of this expertise is in pointing folks in new instructions, not arising with definitive solutions, he says.
In truth, one of many issues OpenAI is now is making GPT-5 dial down its confidence when it delivers a response. As an alternative of claiming Right here’s the reply, it would inform scientists: Right here’s one thing to contemplate.
“That’s really one thing that we’re spending a bunch of time on,” says Weil. “Making an attempt to ensure that the mannequin has some form of epistemological humility.”
One other factor OpenAI is is use GPT-5 to fact-check GPT-5. It’s typically the case that if you happen to feed one in all GPT-5’s solutions again into the mannequin, it’ll decide it aside and spotlight errors.
“You may form of hook the mannequin up as its personal critic,” says Weil. “Then you may get a workflow the place the mannequin is considering after which it goes to a different mannequin, and if that mannequin finds issues that it may enhance, then it passes it again to the unique mannequin and says, ‘Hey, wait a minute—this half wasn’t proper, however this half was fascinating. Maintain it.’ It’s nearly like a few brokers working collectively and also you solely see the output as soon as it passes the critic.”
What Weil is describing additionally sounds rather a lot like what Google DeepMind did with AlphaEvolve, a instrument that wrapped the LLM Gemini inside a wider system that filtered out the nice responses from the unhealthy and fed them again in once more to be improved on. Google DeepMind has used AlphaEvolve to solve several real-world problems.
OpenAI faces stiff competitors from rival corporations, whose personal LLMs can do most, if not all, of the issues it claims for its personal fashions. If that’s the case, why ought to scientists use GPT-5 as an alternative of Gemini or Anthropic’s Claude, households of fashions which are themselves bettering yearly? Finally, OpenAI for Science could also be as a lot an effort to a flag in new territory as the rest. The actual improvements are nonetheless to return.
“I feel 2026 shall be for science what 2025 was for software program engineering,” says Weil. “Firstly of 2025, if you happen to have been utilizing AI to write down most of your code, you have been an early adopter. Whereas 12 months later, if you happen to’re not utilizing AI to write down most of your code, you’re in all probability falling behind. We’re now seeing those self same early flashes for science as we did for code.”
He continues: “I feel that in a 12 months, if you happen to’re a scientist and also you’re not closely utilizing AI, you’ll be lacking a chance to extend the standard and tempo of your considering.”
