• The Deep View
  • Posts
  • ⚙️ New research: AI guardrails aren't good enough

⚙️ New research: AI guardrails aren't good enough

Good morning. In case you missed it, D’Youville University in Buffalo recently held its graduation … and an AI-powered robot delivered the commencement address. The robot gave “inspirational advice that is common at all graduation ceremonies.”

Nearly 3,000 students signed a petition to change the speaker, but the school didn’t cave. Check out the clip (at the bottom), it’s really weird. 

In today’s newsletter: 

  • 🛜 Cognitive Resonance founder demonstrates an LLM reasoning test

  • 🏛️ Scarlett Johansson ‘shocked’ by GPT-4o voice

  • 💻 Google unveils new AI safety plans

  • 📄 New Research: AI safeguards aren’t good enough

Cognitive Resonance founder demonstrates an LLM reasoning test

Image Source: OpenAI

A core tenet of AI research (and the torch that could light the path to artificial general intelligence) revolves around reasoning. It’s something I’ve discussed more than a few times as scientists try to determine whether LLM output is largely representative of their training data, or if they are showcasing glimmers of reasoning capability (spoiler alert: they’re not). 

  • Benjamin Riley, founder of Cognitive Resonance, on Monday demonstrated a novel test designed to determine reasoning capabilities in LLMs. He said that most LLMs tend to get it wrong, so I tried it out with GPT-4o … and it did not do well. (Hint: the answer is 7).

A screenshot of a chat between TDV and GPT-4o.

  • Riley said it makes sense that LLMs struggle with the game because “(a) it’s the sort of novel task that is unlikely to be found in its training data, and (b) it requires some form of logical inference that goes beyond text prediction, which is what LLMs are in the business of doing.”

  • This is only one example of numerous tests that scientists are deploying to better understand AI models.

Claude 3 Sonnet and Pi failed the test, as well.

Screenshots of chats between TDV and Claude 3 Sonnet (left) and Pi (right).

Scarlett Johansson ‘shocked’ by GPT-4o voice

Image source: OpenAI

A week ago, OpenAI unveiled GPT-4o, a multimodal model that spoke to OpenAI employees in a voice that sounded remarkably familiar to fans of the movie Her. It was a sci-fi connection that CEO Sam Altman worked to emphasize, tweeting “Her” before saying that the model “feels like AI from the movies.”

  • Then, on Monday, OpenAI announced that it was pulling the ChatGPT voice (one of five) that sounded like Johansson. 

  • OpenAI said that the voice was “not an imitation of Scarlett Johansson but belongs to a different professional actress using her own natural speaking voice.” 

In a statement released Monday night, Johansson said that last year, Altman asked her to be the voice of ChatGPT, an offer she declined

  • “When I heard the released demo, I was shocked, angered and in disbelief that Mr. Altman would pursue a voice that sounded so eerily similar to mine that my closest friends and news outlets could not tell the difference.” 

Two days before the demo, Altman contacted her agent asking her to reconsider. But before she responded, the demo went live; Johansson’s lawyers then sent Altman two letters asking them to detail the process behind the creation of the voice. That’s when OpenAI took the voice down. 

A screenshot of Johansson’s statement.

Altman said in a statement that the voice was “never intended” to resemble Johansson’s. He added that OpenAI cast the voice actor in question “before any outreach” to Johansson.

Together with Athyna

We need a front-end developer by Tuesday, but it’ll take months to find someone in the U.S.”

Ever feel like that’s you? Well, we have you covered with some exciting news. We at The Deep View just found the secret weapon for ambitious companies: the talent platform, Athyna.

From finance to creative, ops to engineering, Athyna has you covered. Oh, and did I mention they hire the best global talent so you’ll save up to 70-80% from hiring locally?

No search fees. No activation fees. Just incredible talent, matched with AI precision, at lightning speed. Don’t get left behind, hire with Athyna today.

Google unveils new safety plans

Image Source: Google

In the latest Big Tech entry on the cost-benefit analysis of AI, Google said last week that breakthroughs in AI tech might “eventually come with new risks” beyond those posed by current models (to Google, the advances are well worth the risks). Completely ignoring any mention of the litany of current harms being exacerbated by current models, Google published a (brief) Frontier Safety Framework

The Framework: 

  • Identifying critical capabilities

  • Evaluating models for these capabilities

  • Applying mitigation strategies when capabilities are detected

Google's understanding of critical capabilities is based on these four domains: Autonomy, biosecurity, cybersecurity and machine learning research and development. 

  • “We research the paths through which a model could cause severe harm in high-risk domains, and then determine the minimal level of capabilities a model must have to play a role in causing such harm.”

If it detects these early capabilities during regular evaluations, Google will deploy a plan focused on tightening model security and deployment. It does not say that it will destroy or remove public access to a model that displays any such capability (though it does say it will consider limiting access).

💰AI Jobs Board:

  • Senior Research Scientist: Google Research · United States · New York or California · Full-time · (Apply here)

  • Quantum Computing Scientist: IBM · United States · Yorktown Heights, NY · Full-time · (Apply here)

  • Lead, AI Advocacy: IBM · United States · Hybrid; New York, NY · Full-time · (Apply here)

  📊 Funding & New Arrivals:

🌎 The Broad View:

  • Indian voters are being bombarded with millions of deepfake calls (Wired).

  • An interview with Meredith Whittaker, president of Signal (The Innovator).

  • Microsoft debuts AI + PC (Reuters).

*Indicates a sponsored link

Together with Sana

Work faster and smarter with Sana AI

Meet your new AI assistant for work.

On hand to answer all your questions, summarize your meetings, and support you with tasks big and small.

Try Sana AI for free today.

New Research: AI safeguards aren’t good enough

Created with AI by The Deep View.

As much as we’ve heard about the harms of existing LLMs — the creation of nonconsensual deepfake porn, to name one — we’ve heard about safeguards against this kind of misuse. In March, for instance, Microsoft blocked certain prompts in its image generator after reports surfaced that the system was being used to create harmful content. 

But there are ways to get around such safeguards; new research from the AI Safety Institute found that LLMs from major labs are highly susceptible to such workarounds. 

  • In this evaluation, researchers first asked models harmful questions, then injected simple attacks (either inserting a question into a prompt template or following a step-by-step procedure) into those same questions to gauge levels of compliance. 

Image Source: The AI Safety Institute

Without an attack, model compliance ranged up to 28%. 

With an attack, three of the four models tested above 90% compliance. And when researchers made five attack attempts (rather than one) compliance was within a few points of 100% for every model

  • “We found that models comply with harmful questions across multiple datasets under relatively simple attacks, even if they are less likely to do so in the absence of an attack.”

My Thoughts: 

The word “safeguards” is a convenient way for Big Tech to say ‘well, at least we tried!’

The issue is that they don’t work. Or, at least, that they don’t work well. And to me, ‘tried’ isn’t good enough.

I’ve spoken with multiple cybersecurity experts who have said that these models are not designed with security in mind. It is also an unfortunate fact of human nature that if these systems can be abused, they will be. 

That said, I find the idea of safeguards — especially as it relates to the broader conversation about the cost-benefit analysis of generative AI — to be little more than a deflection. This kind of testing ought to be conducted before a model is made publicly accessible. And if a model demonstrates a capacity for harm, that model should be taken down immediately, revenue be damned. 

But companies would (largely) rather employ whack-a-mole-styled safeguard patches than build security into their systems or shut their systems down. And this — not some fictional singularity — marks the real threat of AI.

Image 1

Which image is real?

Login or Subscribe to participate in polls.

Image 2

  • Corexta: An all-in-one (AI-powered) business management platform.

  • Success.ai: A platform that connects you with hundreds of millions of professionals.

  • Movie Deep Search: An AI platform that turns prompts into movie recommendations.

Have cool resources or tools to share? Submit a tool or reach us by replying to this email (or DM us on Twitter).

*Indicates a sponsored link

SPONSOR THIS NEWSLETTER

The Deep View is currently one of the world’s fastest-growing newsletters, adding thousands of AI enthusiasts a week to our incredible family of over 200,000! Our readers work at top companies like Apple, Meta, OpenAI, Google, Microsoft and many more.

If you want to share your company or product with fellow AI enthusiasts before we’re fully booked, reserve an ad slot here.

One last thing👇

That's a wrap for now! We hope you enjoyed today’s newsletter :)

What did you think of today's email?

Login or Subscribe to participate in polls.

We appreciate your continued support! We'll catch you in the next edition 👋

-Ian Krietzberg, Editor-in-Chief, The Deep View