• The Deep View
  • Posts
  • ⚙️ OpenAI releases ‘smartest and most capable models to date’

⚙️ OpenAI releases ‘smartest and most capable models to date’

‘The magic is that, under the hood, it’s still just next-token prediction.’

Good morning. Well, yesterday turned itself into a busy day (thank you, OpenAI).

But we are on the verge of Friday. Sweet, blessed Friday.

— Ian Krietzberg, Editor-in-Chief, The Deep View

In today’s newsletter:

  • 🧠 AI for Good: Braining mice

  • 🚨 ASML and the first strike of tariff uncertainty 

  • 🎙️ IBM drops new speech-to-text model that aims to ‘fill in the middle’

  • 👁️‍🗨️ OpenAI releases ‘smartest and most capable models to date’

AI for Good: Braining mice

Source: Unsplash

Researchers at Stanford Medicine have constructed a “digital twin” of the part of the mouse brain that processes visual information, a feat made possible through the construction of a novel AI model. 

The details: The model was trained on data gathered from mice as they watched movies — particularly, action movies, since mice see movement more than they see specific colors and photgraphic details. 

  • The team recorded more than 900 minutes of brain activity from a group of eight mice as they watched clips of movies like “Mad Max.” Cameras additionally tracked their eye movements and behavior during the screenings. 

  • That data was used to train an AI model that the researchers were able to, through fine-tuning, turn into a digital twin of any of the individual mice in question. 

What is a digital twin? Digital twins are pretty much exactly what they sound like: digital representations of physical things, designed to accurately mirror the behavior of the physical thing. They enable a level and speed of experimentation and forecasting that can range from difficult to impossible to conduct physically. 

What did they find? Importantly, the digital twins were able to “closely simulate" the neural activity of the biological mice as a response to new visual stimuli (clips that weren’t included in the training data). 

  • One of these twins was able to predict the cell type and anatomical locations of thousands of neurons, plus the connections between those neurons, found in the visual cortex. 

  • This, through real-world verification, enabled the researchers to publish a map of the mouse visual cortex in “unprecedented detail.” 

Why it matters: As we’ve discussed often, very little is known about cognitive function. Accurate digital twins will enable a vast variety of neurological experimentation that will very likely boost our understanding of biological brains, something that has massive implications for our ability to, for instance, address neurological diseases and disorders.

This tech company grew 32,481%...

No, it's not Nvidia… It's Mode Mobile, 2023’s fastest-growing software company according to Deloitte.1

Their disruptive tech, the EarnPhone and EarnOS, have helped users earn and save an eye-popping $325M+, driving $60M+ in revenue and a massive 45M+ consumer base. And having secured partnerships with Walmart and Best Buy, Mode’s not stopping there…

Like Uber turned vehicles into income-generating assets, Mode is turning smartphones into an easy passive income source. The difference is that you have a chance to invest early in Mode’s pre-IPO offering3 at just $0.26/share.

They’ve just been granted the stock ticker $MODE by the Nasdaq2 but their share price is changing in under two weeks.

ASML and the first strike of tariff uncertainty 

Source: ASML

ASML, the world’s largest supplier of the equipment needed to produce computer chips, on Wednesday affirmed its guidance for the year, though acknowledged the potential impact of the uncertainty posed by U.S. tariffs. 

The details: The first among the semiconductor sector to report first-quarter earnings, ASML reported 7.74 billion euros ($8.7 billion) in net sales and 2.36 billion euros in net profit, right around in line with analyst expectations.

  • But the company’s net bookings,  indicative of demand, landed at 3.94 billion euros, below analyst expectations of 4.89 billion. 

  • The company affirmed its second-quarter (7.2 to 7.7 billion euros) and full-year (30 to 35 billion euros) revenue guidance, saying that “artificial intelligence continues to be the primary growth driver in our industry. It has created a shift in the market dynamics that benefits some customers more than others, contributing to both upside potential and downside risks as reflected in our 2025 revenue range.”

However, CEO Christophe Fouquet said that “the recent tariff announcements have increased uncertainty in the macro environment and the situation will remain dynamic for a while.” Still, he expects 2025 and 2026 to be “growth years” for the company. 

Shares of ASML fell more than 7%.

The landscape: Though the Trump Administration’s major “reciprocal” tariffs have been paused for 90 days, and despite the fact that phones, smartphones and computer chips have been exempted from these reciprocal tariffs, the administration is expected to soon impose semiconductor-specific tariffs

Even without the tariffs, the semiconductor industry has become increasingly sensitive to increasingly tense macroeconomic conditions; Nvidia reported Tuesday that it will record a quarterly charge of $5.5 billion related to sales of its H20 chip in China.

AMD, likewise, said it expects to record an $800 million hit related to its own chip sales in China.

Both stocks fell Wednesday.

  • No AG for Musk: California’s Attorney General has declined to join Elon Musk’s lawsuit against OpenAI and Sam Altman, with his office writing that it doesn’t see how Musk’s action serves the public interest.

  • OpenAI’s step away from safety: In an update to its preparedness framework, OpenAI said that if another model developer releases a high-risk system without comparable safeguards, “we may adjust our requirements.” After a rigorous assessment, of course. The adjustment comes after OpenAI has been bleeding safety researchers for months, most recently with Quiñonero Candelaas quietly stepping down from his position leading the preparedness team. GPT-4.1 was released without a safety assessment.

  • Tech stocks drop as Nvidia, AMD warn of higher costs from China export controls (CNBC).

  • Why an overreliance on AI-driven modelling is bad for science (Nature).

  • OpenAI announces members of new nonprofit commission, affirms that the nonprofit ‘isn’t going anywhere’ (OpenAI).

  • Google used AI to suspend over 39M ad accounts suspected of fraud (TechCrunch).

  • OpenAI is in talks to acquire WindSurf, the coding tool formerly known as ‘codeium’ (Bloomberg).

IBM drops new speech-to-text model that aims to ‘fill in the middle’

Source: IBM

Large language models are fundamentally next-token prediction systems. 

Trained on massive bodies of content, LLMs, based on inputted information, are designed to generate the next most likely word in a given sequence. 

And while language models tend to perform quite well when it comes to standard next-token text prediction, they struggle, according to IBM, with “predicting the correct tokens based on the tokens that come before and after. In other words, conventional autoregressive LLMs cannot ‘fill in the middle.’”

IBM designed Granite 3.3 to do just that. 

The details: IBM on Wednesday unveiled Granite 3.3, the latest iteration of its small family of language models. Unlike its predecessors, this version of Granite functions as IBM’s “first official” speech-to-text model. It was trained to provide automated translation for eight languages. 

  • It was also trained with “fill-in-the-middle capabilities,” something IBM achieved by redesigning its training tasks (splitting passages into prefix, suffix and middle) to “trick the LLM into predicting tokens in the middle using its intrinsic left-to-right prediction ability.” This makes the model particularly more capable when it comes to code generation or review. 

  • The model also comes with a suite of low-rank adaptation (LoRA) adaptors, some of which are specifically designed to mitigate hallucinations and unreliability. The Rag Hallucination Detection LoRA provides a “faithfulness score,” certifying how closely the output reflects the information contained within retrieved documents. 

The Uncertainty LoRA, somewhat similarly, enables the model to generate a certainty score for each output, mathematically certifying the degree to which a given piece of output is supported by information in the model’s training data. 

This mirrors the idea of confidence scoring, which we’ve discussed before

The model was trained on a mix of licensed data and synthetic data, according to its model card. The electrical cost of training and operating the model is unclear. 

OpenAI releases ‘smartest and most capable models to date’

Source: Jason Redmond / TED

OpenAI on Wednesday released two new ‘reasoning’ models: o3 and o4-mini. 

These are two models that came very close to never seeing the light of day, at least, as standalone models. In February, OpenAI CEO Sam Altman said that the company planned to release both models exclusively as components within GPT-5, which he said would function as a system designed to connect all of OpenAI’s different models. 

It’s not clear when GPT-5 will actually be released. 

But with competitive pressure mounting, OpenAI opted for a “change of plans,” releasing GPT-4.1 earlier this week, a launch that touched off this barrage of model releases. 

o3, o4-mini and o4-mini-high — replacing o1, o3-mini and o3-mini-high — are already available to ChatGPT Plus, Pro and Teams users; Enterprise users will have to wait about a week to gain access. And free users, meanwhile, will get to try out 4o-mini by toggling on “think” in ChatGPT.  

On a livestream heralding the launch, OpenAI co-founder Greg Brockman called the release a “qualitative step into the future,” adding: “these are the first models where top scientists say they produce legitimately good and useful, novel ideas.”

The details: Trained to perform Chain-of-Thought reasoning by scaling up reinforcement learning, the models can search the web, analyze files, process images, use tools and generate images. 

  • OpenAI said that o3 makes 20% “fewer major errors than OpenAI o1 on difficult, real-world tasks — especially excelling in areas like programming, business/consulting and creative ideation.”

  • O4-mini, meanwhile, was optimized for “fast, cost-efficient reasoning.” 

Both models score highly on a range of benchmarks. 

O3 costs $10 per million input tokens and $40 per million output tokens, putting it on the more expensive end of OpenAI’s models. o4-mini costs $1.1 per million input tokens and $4.4 per million output tokens. GPT-4o, meanwhile, costs $3.75 and $15, respectively. 

In the only really significant difference between these and other models, both o3 and o4-mini were trained to incorporate images into their chains of thought (which remain obscured from users). OpenAI calls this “thinking with images.” Beyond that, the models score much higher on coding benchmarks, and were trained specifically to use tools autonomously. 

During the livestream, Brockman said that “the magic is that, under the hood, it’s still just next-token prediction.” 

Going deeper: Transluce, an independent AI research lab, conducted a series of tests and experiments on a pre-release version of o3, finding that “it frequently fabricates actions it never took, and then elaborately justifies these actions when confronted.”

  • Across a thousand conversations, o3 regularly claimed that it had access to coding tools, despite not having access to coding tools, then staunchly defended its claims when challenged, according to Transluce. This phenomenon was present in o3-mini and o1, as well. 

  • “These behaviors are surprising,” Transluce said. “It seems that despite being incredibly powerful at solving math and coding tasks, o3 is not by default truthful about its capabilities.”

Indeed, the system card for both o3 and o4-mini makes some mention of this, though declines to go into any kind of detail: “Specifically, o3 tends to make more claims overall, leading to more accurate claims as well as more inaccurate/hallucinated claims.”

o3 has an accuracy rate, according to that system card, of .59, higher than o1’s .47. But it hallucinates at a higher rate, with a score of .33, compared to o1’s .16 (lower scores indicate fewer hallucinations). 

When it comes to safety, OpenAI’s team determined that neither model represents a ‘high” risk across biological, cybersecurity and self-improvement categories. As such, OpenAI is not releasing a safeguards report for either model. 

As per usual, training data, energy intensity and carbon emissions (for both training and operation) of both models remain unknown, though the energy use of ‘reasoning’ models is significantly higher than that of non-reasoning models. 

I’ve said this often, but demos, benchmarks and blogpost releases don’t indicate much. 

None of this has been peer-reviewed. None of this information has been independently verified. 

There is next to no transparency. 

When o3 was first unveiled last year, it was revealed as a massive breakthrough. At the time of its unveiling, OpenAI opened up applications for external safety testing; the rate of the improvement made it seem as though there was a chance o3 would exceed OpenAI’s safety framework for deployment. 

But in the months since, OpenAI has changed its safety approach, maintained its lack of transparency and, unsurprisingly, opted for a release that speaks to the state of progress within the industry: targeted improvements bounded by persistent limitations

I find it exceptionally revealing that, on the livestream, Brockman said that the model wasn’t doing anything more than next token prediction. These are choreographed demos. I don’t think that statement was off the cuff, but it counteracts claims of impending digital god that have proliferated online (including from OpenAI’s own engineers).

At TED, Altman said: “there’s really no way to know if it is thinking … or if it just saw that a lot of times in the training set. If you can’t tell the difference, how much do you care?” 

I, for one, care quite a bit. We are very likely dealing with the illusion of intelligence, rather than the real thing. Because of the manner of the discourse, the nuance matters. (And there’s still no GPT-5). 

Which image is real?

Login or Subscribe to participate in polls.

🤔 Your thought process:

Selected Image 1 (Left):

  • “The woman's legs are not symmetrical in the real image, which tipped me off.”

Selected Image 1 (Left):

  • “This one isn’t even close.”

💭 A poll before you go

Thanks for reading today’s edition of The Deep View!

We’ll see you in the next one.

P.S. Enjoyed reading? Take The Deep View with you on the go! We’ve got exclusive, in-depth interviews for you on The Deep View: Conversations podcast every Tuesday morning. Subscribe here!

Here’s your view on giving Anthropic access to Google Drive:

50% of you wouldn’t even consider allowing this.

23% said they might consider it, and 14% said they would do it.

Nope:

  • “Anthropic has nowhere near established the necessary trust. Good product but unfettered access to my files? An obscure quote from the movie, Risky Business comes to mind: ‘I don't think so, Joel.’”

Yep:

  • “Most people are already giving access to codebases with significant IP, for me it's about whether the model trains on it or not.”

If you want to get in front of an audience of 450,000+ developers, business leaders and tech enthusiasts, get in touch with us here.

*Mode Mobile Disclaimers

1 Mode Mobile recently received their ticker reservation with Nasdaq ($MODE), indicating an intent to IPO in the next 24 months. An intent to IPO is no guarantee that an actual IPO will occur.

2 The rankings are based on submitted applications and public company database research, with winners selected based on their fiscal-year revenue growth percentage over a three-year period.

3 A minimum investment of $1,950 is required to receive bonus shares. 100% bonus shares are offered on investments of $9,950+.