• The Deep View
  • Posts
  • ⚙️ Radical compression and the road to on-device adoption

⚙️ Radical compression and the road to on-device adoption

Good morning. In an effort to ensure AI data centers can keep the lights on, President Trump is expected to sign an executive order to boost the use of coal.

This comes as Trump’s tariffs have sent U.S.-based data center providers into a frenzy over massive potential increases in the cost of all the hardware needed to make data centers operational.

— Ian Krietzberg, Editor-in-Chief, The Deep View

In today’s newsletter:

  • 🎙️ Podcast: What do you get when you mix AI and quantum? 

  • ⚡️ IBM is rolling out an AI-enabled mainframe

  • 🔏 Report: The massive rift between the ‘experts’ and everyone else

  • 👁️‍🗨️ Radical compression and the road to on-device adoption

🎙️ Podcast: What do you get when you mix AI and quantum? 

In the latest episode of The Deep View: Conversations, I sat down with Dr. Stefan Leichenauer, Sandbox AQ's VP of Engineering, to break down the ways in which Sandbox is bringing the two technologies together, and the impact that conversion might have.

“Some of the problems that we're trying to solve today, I think will be solved in the next few years,” he said. “There will be new medical diagnostic devices out there in hospitals saving lives, and it's going to be amazing. There will be some drugs getting through clinical development or clinical trials that came from these AI-based searches … the fruits of today's labors will have yields in the next few years.

AI agents that understand your business

Reduce time to value with one enterprise-grade platform that lets you build no-code AI agents integrated with all your company's apps.

Turn manual processes into automated workflows in minutes

Create meeting assistants that capture insights and action items

Deploy agents that work seamlessly across your tools and systems

Search across apps and deploy agents for deep research

Extensible APIs. Enterprise-grade security. With unparalleled flexibility.

IBM is rolling out an AI-enabled mainframe

Source: IBM

In 1991, InfoWorld analyst Stewart Aslop — somewhat famously — predicted that “the last mainframe will be unplugged on March 15, 1996.”

In the 34 years since, that prediction hasn’t really borne out. 

Even as the world has moved toward cloud computing, mainframes remain a core part of IBM’s infrastructure. This week, the company announced IBM z17, the result of a five-year-long development process that represents the next-generation version of its mainframe.  

The details: And where the previous incarnations of IBM’s mainframe were designed to process billions of simple calcluations and transactions in real time — IBM’s mainframes process around 90% of all credit card transactions — the new upgrade, capable of handling more complex generative AI workloads, comes powered by AI-specific hardware and infused with AI capabilities.

  • Later this year, IBM customers will be able to run IBM chatbots and agents natively on z17, ensuring both efficiency and security. 

  • The system will become generally available in June of this year. 

“The industry is quickly learning that AI will only be as valuable as the infrastructure it runs on,” Ross Mauri, general manager of IBM Z and LinuxONE, said in a statement. “With z17, we’re bringing AI to the core of the enterprise with the software, processing power, and storage to make AI operational quickly.”

The reliable AI conference for leaders daring to disrupt with AI

AI Disrupt will bring together industry leaders and decision-makers to explore how AI integrates with existing systems — with a focus on accuracy and transparency.

Attendees can expect:

  • Engaging panel discussions

  • Real-world case studies

  • Hands-on workshops

All are led by experts at the forefront of AI innovation. Register now to secure your spot.

  • Benchmark gaming: The disappointment around Meta’s release of Llama 4 goes a step further than what we discussed earlier this week; see, in the fine print, Meta acknowledged that the version of Maverick that’s killing it on the LMArena benchmark isn’t the same one that’s available to the public. It was specifically optimized for “conversationality.”

  • Volatility, volatility: Markets tried hard to stage a come-back Tuesday. And for a while, it looked as though they were going to succeed. But in the last few hours of the session, the rally evaporated, and the major indices closed, once again, in the red. The S&P fell nearly 2%, and the Nasdaq fell more than 2%, dragged down by Big Tech names like Apple, Tesla and Nvidia.

  • Trump order seeks to tap coal power in quest to dominate AI (Bloomberg).

  • A nonprofit is using AI agents to raise money for charity (TechCrunch).

  • Trump’s tariffs are testing Nvidia’s chip supremacy. Can Jensen Huang weather the storm? (Rest of World).

  • DOGE is using AI to snoop on federal workers (Reuters).

  • Are AI Agents Sustainable? It depends (Hugging Face).

Report: The massive rift between the ‘experts’ and everyone else

Source: Unsplash

At a time of seemingly rapid advancement, and against a backdrop of wild hype that blurs genuine science with dramatic sci-fi visions, there has arisen a significant divide in perception between the people building AI and the people AI will impact. 

Last week, the Pew Research Center published the results of an extensive survey intended to identify the specific sources of this rift. To do so, Pew polled 1,000 “AI experts,” which it defines as people whose work or research relates to AI, alongside 5,000 American adults. 

The findings: In general, the “experts” are far more optimistic than everyone else. 56% of experts think AI will have a positive impact on the U.S. over the next two decades, 47% are more excited than concerned about the increased use of AI in daily life and 76% think they will benefit personally from AI. 

  • 51% of the public, meanwhile, is more concerned than excited about the increasing proliferation of AI; 43% think it will harm them personally and only 17% think it will have a positive impact on the country. 

  • 64% of the public believes that AI will lead to fewer jobs over the next 20 years, a view shared by only 39% of the experts. 

There is, however, some common ground between the two camps. More than half of each group wants more control over how AI is used in their lives. Relatedly, majorities of both groups are concerned the government won’t go far enough to effectively regulate the technology; both groups have little confidence in the developers to design and deploy AI responsibly. 

An interesting point where the two groups do, again, diverge, relates to human connection, with 57% of the public concerned that the increasing proliferation of AI will degrade human relationships, a concern shared by only 37% of the experts Pew surveyed. 

I am reminded of a comment Dr. Noah Giansiracusa, an author and math professor, shared with me last year, that many people “just want to live in a world of real things and real people.”

Radical compression and the road to on-device adoption 

Source: Unsplash

Transformers have a bit of a size problem. 

It’s not exactly a surprising phenomenon. When Google researchers first unveiled the transformer architecture in their 2017 Attention Is All You Need paper, their model was small, tiny, even, by today’s standards, featuring just six layers in both the encoder and decoder and 512 hidden dimensions. 

It cost, according to Stanford estimates, $670 to train. 

That model touched off a new paradigm in AI that researchers and companies have been running with ever since; the most notable result, is, of course, OpenAI’s ChatGPT, with GPT standing for “generative pre-trained transformer.” 

  • GPT-1 came out in 2018 with 117 million parameters; GPT-2 followed in 2019, with 1.5 billion parameters; GPT-3 came out in 2020 with 175 billion parameters; GPT-4, which came out in 2023, is rumored to have roughly 1.8 trillion parameters, though that number has not been confirmed by OpenAI. 

  • And as model size has increased, so too has model performance and general capability, giving rise to this impression that “scale is all you need.” Of course, the reality here is more complicated (it always is!) but the observation has largely proven to be accurate thus far. 

The problem with this is that, when performance is reliant on exponential increases in scale, high performance becomes highly expensive. GPT-3, according to Stanford estimates, cost $4 million to train, while GPT-4 cost $80 million; Meta’s Llama 2 cost $3 million to train, while Llama 3 cost $170 million. 

And that cost of training doesn’t include the cost of inference, a cost that is continually rising as developers have turned their focus to so-called ‘reasoning’ models. 

In light of this, researchers have been working on coaxing greater capabilities out of smaller models. IBM, for instance, has been experimenting with Chain of Thought reasoning techniques within its Granite family of small language models, an approach that can make them “bigger on demand.” 

Model distillation, meanwhile, in which a small model is trained on output patterns from a larger (teacher) model, has become wildly popular, leveraged by DeepSeek and Meta, in addition to independent researchers. 

A new approach: Multiverse computing is taking a slightly different road to that same destination. 

In 2024, the company unveiled a quantum-inspired compression approach it called CompactifAI, an approach that begins with the identification of model layers that are suitable for compression, followed by the replacement of the weights behind those layers with specific tensor networks.  

The approach adds credence to the idea that “deeper layers tend to be ineffective in the performance of LLM models,” according to the paper. 

On Tuesday, Multiverse unveiled compressed versions of Llama 3.1 - 8B and Llama 3.3 - 70B. 

  • The models have been compressed by 80%. 

  • Both models have 60% fewer parameters than their original counterparts, a size reduction that led to 84% greater energy efficiency, 40% faster inference and a 50% reduction in cost. Multiverse achieved this all “without sacrificing accuracy.”

On benchmarks, the compressed models are largely on par with the originals. 

It is unclear, however, if the performance of these compressed models holds up over time. Multiverse did not respond to a request for comment on this point, and whether the company has run tests to assess the performance of its models over time. 

“We’re rapidly delivering compressed versions of the most powerful LLMs in the world,” Sam Mugel, CTO at Multiverse, said in a statement. “The advanced capabilities of these two massive models can now fit into smartphones, laptops and cars, or real-world machines like oil rigs and satellites.”

Multiverse intends to roll out “dozens of compressed, leading LLMs,” something Mugel thinks could “dramatically accelerate the impact of AI in the real world.” 

Which image is real?

Login or Subscribe to participate in polls.

🤔 Your thought process:

Selected Image 2 (Left):

  • “The fabric of the flannel shirt in the first one was wrong (not detailed enough, the pattern indistinct, a common issue AI images have with depicting fabrics of any kind), and the composition of the photograph was "too good", focused so exclusively on the ax that there was little in the background besides the person holding it…”

Selected Image 1 (Right):

  • “Dude’s arm in Image 2 seemed weird.”

💭 A poll before you go

Thanks for reading today’s edition of The Deep View!

We’ll see you in the next one.

P.S. Enjoyed reading? Now listen! We’ve got exclusive, in-depth interviews for you on “The Deep View: Conversations” podcast every Tuesday morning. Subscribe here!

Here’s your view on Llama 4:

The bulk of you were disappointed by the Llama 4 release.

Only 8% think the model is great.

Well - Do you think AI will have a positive impact on the U.S. over the next 20 years?

Login or Subscribe to participate in polls.

If you want to get in front of an audience of 450,000+ developers, business leaders and tech enthusiasts, get in touch with us here.