• The Deep View
  • Posts
  • ⚙️ IBM and the road to achieving AI efficiency

⚙️ IBM and the road to achieving AI efficiency

Good morning. We missed marking this milestone, but Friday’s edition marked my 150th edition. 150 days, 600 stories and tons and tons of artificial intelligence, all processed by this real-life human.

We all enormously appreciate your engagement and ongoing support.

Onward!

— Ian Krietzberg, Editor-in-Chief, The Deep View

In today’s newsletter:

  • 👀 AI for Good: Reducing injury in pianists

  • 👁️‍🗨️ Meta releases (small) new AI model 

  • ⚡️ IBM and the road to achieving AI efficiency

AI for Good: Reducing injury in pianists

Source: Stanford

The piano is one of those instruments that is most accessible to people with big hands and long fingers; this doesn’t mean smaller-handed people cannot play, though. What it does mean is pianists with smaller hands often have to stretch repeatedly to perform certain pieces of music, sometimes resulting — according to several decades of research — in hand and forearm injuries. 

What happened: Stanford engineers built an AI model designed to recreate the hand movements of elite pianists, a baseline from which they can examine the sources of potential injury. 

  • The researchers recruited 15 pianists who — while being recorded from every imaginable angle — performed a total of 10 hours of music. Using computer vision, the researchers were able to digitally recreate the hand movements. 

  • Karen Liu, a Stanford professor of computer science, called the quality and diversity of this dataset “unprecedented.”  

Why it matters: “This isn’t a model that is replacing people — we’re working with musicians to help understand and solve problems. This project is about advancing people, and AI is just a tool for that,” Liu said in a statement. It’s a first step (of many) that, according to the researchers, could be used to identify and test injury-avoiding solutions (such as narrower keyboards). 

Elizabeth Schumann, Stanford’s director of keyboard studies, said that this work could make piano playing “more sustainable.”

Speaking as a pianist myself, I can’t think of a solution other than narrower keyboards that might help with this — and I don’t see narrower keyboards happening; it would be nearly impossible for musicians trained on normal keyboards to make the switch, and vice versa.

This game-changing device measures your metabolism

Lumen, the world's first hand-held metabolic coach, quickly and easily measures your metabolism with your breath.

Based on this daily breath measurement, Lumen’s app lets you know if you’re burning fat or carbs, then gives you tailored guidance to improve your nutrition, workouts, sleep and even stress management.

Lumen helps you build a personalized, scientifically sound health plan based on your data.

Meta releases (small) new AI model

Source: Meta

Meta last week released Llama 3.3, a new 70 billion parameter generative AI model that, according to the company, achieves the same performance as the much larger 405 billion parameter model “but is easier & more cost-efficient to run.”

The details: The model costs $.1 per million input tokens and $.4 per million output tokens; Meta’s 405B model, on the other hand, costs $1 per million input tokens and $1.8 per million output tokens. 

  • In its system card, Meta did not describe the data used to train the model, saying only that it includes “a new mix of publicly available online data.” Between that omission — which is in line with all of Meta’s purportedly “open” releases — and its lack of transparency around the source code, neither this nor any other Meta model are truly “open,” at least by the latest accepted standards

  • Meta did, however, say that the model was trained using (cumulatively) 39.3 million GPU hours that resulted in approximately 11,390 tons of carbon emissions — this is not the first time Meta has shared the carbon footprint of its model training efforts, though it is significant considering that most major developers do not share this information. 

What Meta did not share, however, involves the energy use, or expected energy use, of operating the model, a process that is far more energy-intensive than training.

Meta said it had extensively red-teamed the model before the release, though added that it’s impossible to cover every scenario in testing and that “the model may in some instances produce inaccurate, biased or other objectionable responses to user prompts.”

  • The U.S. government has approved Microsoft’s plans to export advanced AI chips to a Microsoft-operated facility in the United Arab Emirates, according to Axios, which cited two unnamed sources. The approval process was delayed due to concerns about China gaining access to the hardware.

  • Grok, xAI’s chatbot, is now available, for free, to all X/Twitter users, according to The Verge. The chatbot was previously only available to paying subscribers. And, in addition, the chatbot now has a new photorealistic photo generator, called Aurora, that, like its last one, doesn’t care about copyrights or public figures.

  • US court upholds law forcing TikTok to divest or face a ban (Semafor).

  • Uber will offer robotaxi rides in Abu Dhabi through partnership with WeRide (CNBC).

  • OpenAI announced reinforcement fine-tuning, which will launch early next year (OpenAI).

  • The ideology of ‘AI slop’ (Eryk Salvaggio).

  • Record-breaking diamond storage can save data for millions of years (New Scientist).

If you want to get in front of an audience of 200,000+ developers, business leaders and tech enthusiasts, get in touch with us here.

IBM and the road to achieving AI efficiency

Source: IBM

Last Friday, I set my alarm for 5:00 a.m., queued up a bunch of albums and drove north, up past New York City and into Yorktown, a small town that’s far enough beyond the city — roughly 38 miles — that the only skyscrapers around are the burgeoning, tree-studded mountains of the Hudson Valley, the East Coast iteration (and, in many ways, antithesis) of California’s Silicon Valley. 

My destination was Yorktown Heights, the headquarters of IBM Research. The facility is home to a working lab, at the center of which sits, according to IBM, the “most advanced quantum computer in the world.” Alongside the futuristic-looking, humming, hexagonal floor-to-ceiling computer is the facility’s own miniature data center, filled with racks of IBM’s next-generation — and fully operational — AIU chips. 

The intention of the space is simple: to “bring together the latest in quantum computing, AI processing and hybrid cloud technologies.” 

Over the course of a six-hour visit there, that’s exactly what I got. 

Though the takeaways are vast — especially when it comes to the implications of quantum computing — a key theme emerged: computational efficiency. 

The core premise of today’s AI — here, largely referring to language models, which really don’t live up to the term “AI” at all — is efficiency achieved through time and cost savings. What it really boils down to is automation, something that isn’t new at all; if algorithms can be strategically deployed, work that might once have taken hundreds of man-hours could be reduced to minutes. 

  • As much as that end goal of efficiency remains the core focus of enterprises looking to adopt generative AI, the technology itself is enormously inefficient.

  • This comes across in a few ways, but perhaps most important is the energy inefficiencies associated with it. The (largely Nvidia) GPUs used to train models are enormously energy-consumptive; and inference (the operation or deployment of a trained model) is even more energy intensive, simply because training is done only once, where inference occurs regularly. 

This represents the crux of two major issues in this industry: cost and sustainability. 

Since both cost and sustainability largely have to do with energy consumption, the two are intimately tied together by … efficiency (or, in this case, inefficiency). 

And IBM is intent on identifying and deploying methods of making these systems vastly more efficient, something that has major implications for cost — and therefore enterprise adoption — as well as sustainability.

This begins with the firm’s Granite family of language models, a series of small language models designed to be combined with enterprise-specific data. Since the models are lightweight, they, according to IBM, “can be run on a single GPU.” IBM has said that this approach is “3x-23x less” costly than using larger models.

Nick Fuller, IBM’s VP of AI and Automation, told me that, between April and October, when IBM released an updated version of Granite, the company — through an increase in quality training data — was able to boost model performance without vastly increasing model size (referring to the number of parameters).

Doing more, with less. Efficiency. 

The hardware side of things: IBM said that while GPU chips are well suited to the training of AI models, they are not well-suited for inference due to their high energy requirements. 

  • So, in 2019, IBM started developing a new chip designed specifically to efficiently handle increasingly complex AI workloads; in 2022, the firm unveiled these “AIU” chips, and at the end of 2023, installed a cluster of them at Yorktown Heights. 

  • This cluster of chips — which sits in a specially air-conditioned room — currently runs HAP (hate, abuse and profanity) filtering for models that appear on IBM’s watsonx platform. According to IBM, “this workload currently … is using roughly eight times less power, with comparable throughput, to a cluster of GPUs optimized for training.”

The AIUs are starting to make their way out of Yorktown; just two weeks ago, IBM announced that the University of Alabama in Huntsville is installing a new compute cluster — complete with AIU chips — intended to deploy geospatial, climate and weather models from NASA and IBM. 

IBM said that this “AIU cluster saves 23 kW of power per second, the equivalent energy of 20 U.S. homes' use per year and 85 tons of carbon emissions.”

I asked Shawn Holiday, an AIU product manager, if IBM Research had maxed out their capacity for energy efficiency represented by that 8x power differential; he told me that they were nowhere near the bottom

Lightspeed: In keeping with this theme of efficiency, IBM today unveiled a breakthrough years in the making that will enable the company to equip data centers with advanced optics technologies. 

  • Fiber optics, which refers to the transmission of data using light instead of electrons, has been leveraged in communication technologies for decades. Today, though most data centers use fiber optics for external network communications, the internal racks still mainly run chip-to-chip communications through electrical wires. The problem, according to IBM, is that, due to the resulting poor communication, GPUs could spend as much as half their time idle, which leads to prolonged training times and tons of wasted electricity, carbon emissions and dollars.  

  • Co-packaged optics (CPO) enables researchers to overcome the inherent limitations of optics technology by manipulating the packaging, enabling a “co-optimization of electronics and photonics,” according to one study

IBM researchers figured out a new process for CPO, which would enable chipmakers to add six times as many fibers to the edge of a silicon photonics chip compared to current CPO technology. 

A close-up of the new chip (IBM).

These chips, according to IBM, could enable a “5x power reduction in energy consumption compared to mid-range electrical interconnects.” IBM added that this breakthrough could enable developers to train Large Language Models five times faster, something that has major implications for cost and energy savings. 

“As generative AI demands more energy and processing power, the data center must evolve — and co-packaged optics can make these data centers future-proof,” Dario Gil, SVP and director of research at IBM, said in a statement. “With this breakthrough, tomorrow’s chips will communicate much like how fiber optics cables carry data in and out of data centers, ushering in a new era of faster, more sustainable communications that can handle the AI workloads of the future.”

The time between unveil and mass production could be lengthy, but it’s a significant step in a lengthy process to introduce greater energy efficiency into the data center at a time when the power consumption — and subsequent carbon emissions — of data centers has been on a steady rise. 

Goldman Sachs has projected that data centers will use 3%-4% of the world’s power by the end of the decade. 

The energy requirements of generative AI have become such a problem for Big Tech companies that almost all of them — Meta, Microsoft, Google and Amazon — are pursuing a revitalization of nuclear power, specifically to power their investments in generative AI.

Which image is real?

Login or Subscribe to participate in polls.

🤔 Your thought process:

Selected Image 1 (Left):

  • “The book and the legs position with regards to the sitting position look not right in image 2.”

Selected Image 2 (Right):

  • “Thought image 1 looked too perfect.”

💭 A poll before you go

Thanks for reading today’s edition of The Deep View!

We’ll see you in the next one.

Here’s your view on paying $200 for ChatGPT:

40% of you said you would never pay that much (10% of you said you wouldn’t even pay $5/month for generative AI).

But 10% of you said you would absolutely pay the $200, and 30% said you would … but only if you were getting more than $200/month out of it.

If those numbers scale up, OpenAI is about to have a major new revenue stream.

Have you been using Grok? How do you like it?

Login or Subscribe to participate in polls.