The Deep View
Posts
⚙️ Why LLMs struggle with math

⚙️ Why LLMs struggle with math

Ian Krietzberg
September 23, 2024

Good morning. Today, we’re featuring research from the world’s first AI-only research institution, the Mohamed Bin Zayed University of Artificial Intelligence.

Today’s topic is focused on math problems in language models, but future topics run the gamut from cultural equity in language models to the specifics of AI applications in healthcare.

— Ian Krietzberg, Editor-in-Chief, The Deep View

In today’s newsletter:

✖️ MBZUAI Research: LLMs struggle with math problems. Here’s why
💻 Amazon’s got a new AI assistant
🔌 Microsoft’s thirst for energy restarts nuclear power plant

MBZUAI Research: LLMs struggle with math problems. Here’s why

Art by Lorena Spurio, created for The Deep View.

Even as large language models (LLMs) have become more powerful and more capable, they continue to struggle with math word problems.

Where a lot of research has focused on evaluations of LLMs’ shoddy performance when it comes to such word problems, recent research from the Mohamed Bin Zayed University of Artificial Intelligence (MBZUAI) seeks to explain why LLMs struggle in this area in the first place.

The details: Researchers tested a variety of open-source LLMs on more complex math word problems, specifically focusing on problems that require real-world knowledge and some level of multi-step complexity.

The researchers found several areas of weakness inherent to these math word problems that presented specific difficulties for the LLMs they tested.
They found that lengthy problems with “low readability scores” and problems requiring real-world knowledge were “seldom solved correctly” by the language models. The researchers also found that problems requiring a large number (and diversity) of mathematical functions were “particularly challenging” to solve.

Why it matters: The research here demonstrates some clear limitations of the LLMs that underlie popular chatbots, an important distinction as the idea of GenAI math tutors gains steam.

To learn more about MBZUAI’s research visit their website.

Autodesk: Empowering innovators across industries to design and make anything, from architecture to manufacturing. Trusted by millions for over 40 years to transform how the world is designed and made. *
Join us for an inside look at how Ada and OpenAI build trust in enterprise AI adoption for customer service, starting with minimizing risks such as hallucinations. Viewers will also be able to participate in a Q&A.
- September 26, 1PM ET/10AM PT. Sign up for the webinar today! *

Middle Eastern funds are plowing billions of dollars into hottest AI start-ups (CNBC).
Exclusive: US to propose ban on Chinese software, hardware in connected vehicles, sources say (Reuters).
A social network where everyone’s a bot (The Verge).

If you want to get in front of an audience of 200,000+ developers, business leaders and tech enthusiasts, get in touch with us here.

Amazon’s new AI assistant

Source: Amazon

Last week, Amazon launched a new generative AI assistant designed to help sellers out. In keeping with its ‘A’ theme, the assistant is called Amelia.

The details: Amazon is pitching it as a general-purpose tool to help sellers grow and manage their businesses.

Amelia covers three main areas: specific knowledge-based questions, detailed sales data and customer traffic information and problem resolution.
The project is currently available in beta and will continue to roll out over the coming weeks.

Some context: Amazon did not say what content Amelia was trained on or how/if it has been safeguarded against potential biases or hallucinations. Though Amazon makes no mention of hallucination in its statement regarding Amelia, such issues have yet to be reliably resolved within the technology, creating the potential for confidently generated misinformation to misinform sellers.

At the same time, generative AI is enormously energy-intensive; the cost, in terms of energy and water, of constructing and operating this assistant remains unclear.

Microsoft's thirst for energy restarts nuclear power plant

Source: Unsplash

In the morning of March 28, 1979, one of the units in Pennsylvania’s Three Mile Island nuclear power plant experienced a partial meltdown. Its water pumps malfunctioned, the reactor overheated and nuclear fuel began to burn through its container, melting half of the reactor core.

Though it didn’t explode, the cleanup took 14 years and cost $1 billion.

The incident is credited with sparking America’s aversion to nuclear power.

Microsoft plans to restart one of the plant’s units.

The details: Microsoft signed a deal with Constellation Energy on Friday to bring one of the plant’s units back online. The financial terms of the deal were not disclosed.

The plant is expected to come online in 2028, and its revival will require a roughly $1.6 billion investment from Constellation.
Under the terms of the deal, Microsoft will buy energy from the plant — which will generate 835 megawatts of electricity — for 20 years.

This unit is unrelated to the unit that melted down in 1979; it was shuttered due to economic reasons in 2019.

Some context: As we’ve regularly reported, AI has made Big Tech more energy-hungry than ever before, pushing the major players further from their sustainability goals due to the quantity of electricity and water required to run AI data centers.

While nuclear power is a better option than others due to being both reliable and largely carbon-free, this is a pure allocation of additional sustainable resources to Microsoft. In other words, since Microsoft is using all this clean energy (which is enough to power around 700,000 homes), other sources cannot.

This just adds more to the grid, it doesn’t necessarily reduce anything, something that is sorely needed right now.

Clean energy is important, but, according to the World Economic Forum, energy efficiency is even more important. And right now, energy demand is spiking for the first time in decades because of AI.

Which image is real?

🤔 Your thought process:

Selected Image 1 (Left):

“Split second decision looking at the reflections.”

Selected Image 2 (Right):

“The trees have a more natural look, natural symmetry, but the cabin looks ‘too perfect’ ... still, I think this is real.”

💭 A poll before you go

Thanks for reading today’s edition of The Deep View!

We’ll see you in the next one.

Here’s your view on deepfake detection:

45% of you said you would absolutely use a deepfake detection tool. A third said you’d use one if it was free.

10% said they don’t really care.

Yes:

“Companies that distribute media or content should be required to adhere to standards set by independent third-party organizations that develop detection engines. These detection systems should be integrated across social media and news platforms, operating through browsers and app interfaces.”

Something else:

“Datambits numbers look good but not infallible. The big question is who will qualify as a Deepfake candidate and will this require a voice and image/movie registration for the everyday person to benefit?”