• The Deep View
  • Posts
  • ⚙️ You can try jailbreaking Anthropic’s jailbreaking cure

⚙️ You can try jailbreaking Anthropic’s jailbreaking cure

Good morning. The California State University system just partnered with a slew of Big Tech giants, including Microsoft, Nvidia, Adobe and IBM to bring generative AI to its 500,000 students and staff across 23 campuses.

And now we’ll see what that mix of AI and higher education means for learning.

— Ian Krietzberg, Editor-in-Chief, The Deep View

In today’s newsletter:

  • 🩺 AI for Good: Breast cancer detection

  • 🏛️ California takes aim at the chatbot market

  • ⚡️ A generative quantum AI breakthrough

  • 👁️‍🗨️ You can try jailbreaking Anthropic’s jailbreaking cure

AI for Good: Breast cancer detection

Source: Unsplash

The News: The results of the largest, randomized medical AI trial yet performed were published on Tuesday. The study — which examined more than 100,000 women in Sweden — sought to determine the efficacy of AI-assisted mammography screening for breast cancer. 

The Findings: Half of the participants were randomly assigned to the AI-supported intervention group, in which mammograms were scanned by the AI system Transpara in addition to a human expert. The other half were assigned to the standard care procedure, which calls for each scan to be read twice. The study found a 29% increase in cancer detection in the AI-supported group compared to the control. 

  • The AI-assisted group, without a significant increase in false positives or recalls, reduced the screen-reading workload by 44%, a significant number considering, as the study mentions, that there are not enough breast cancer readers available. 

  • That 29% increase refers specifically to the detection of small, lymph node-negative, cancers, something that suggests that early breast cancer detection with AI is possible. 

Britain at the same time launched a trial, in which 700,000 women will participate, that aims to assess the exact same thing as the study above.

It’s worth noting that, though they share the umbrella term “AI,” the Transpara system on display here is not the same as common generative applications such as ChatGPT. It’s a machine learning system that was trained on more than a million mammograms, specifically to recognize “suspicious patterns learned in its large database.” Some of the standard ethical concerns apply, but with targeted training data, specific functionality and human experts heavily involved, the promise is simple: algorithms are really good at identifying patterns that humans miss.

Get everything you need in seconds to win target accounts, including company insights, pitches to connect with buyers’ business objectives, and messaging to engage them on any channel, at any stage of the funnel. 

Don’t burn your leads with AI. Bounti is your AI teammate to help you deliver the right message to buyers at every level—driving real engagement at a fraction of the cost of other solutions.

  • Know Your Prospects: Get a cheat sheet for every account that matters to you. In minutes, you’ll have everything you need to know about your customers and what your buyers care about.

  • Land Your Pitch: Get custom pitches for buyers at every level, showing how your solution ties to their individual business objectives, along with the content you need to win- like first call decks and battle cards.

  • Tactfully Engage Anywhere: Generate personalized messaging tailored to each contact for outreach across email, LinkedIn, landing pages, and more.

California takes aim at the chatbot market 

Source: Unsplash

The News: California State Senator Steve Padilla last week introduced SB 243 to the state legislature, a bill that would require chatbot developers to introduce certain guardrails designed to protect kids. 

The details: The bill calls for four main requirements for chatbot developers/operators. 

  • The first would require such platforms to prevent “addictive engagement patterns” to reduce the likelihood of addiction among child users. The second would require a “periodic reminder” that chatbots are not human. 

  • The last points would require disclosures that warn parents and their kids that chatbots might not be suitable for minors, as well as annual reporting on the connection between chatbot use and “suicidal ideation.” 

The landscape: The bill somewhat closely follows the early stages of two legal battles between the mothers of boys who committed, or attempted to commit, suicide, and Character AI, the company whose generative AI chatbots allegedly brought the boys to that stage. 

Ed Howard, senior counsel to the Children’s Advocacy Institute, wrote in a statement that SB 243 “rightly seeks to prevent the two absolute worst aspects of the AI-chatbot menace: children being duped into thinking they are talking to a real person and children being manipulated by profit-at-all-costs Big Tech.”

This tech company grew 32,481%...

No, it's not Nvidia… It's Mode Mobile, 2023’s fastest-growing software company according to Deloitte.1

Their disruptive tech, the EarnPhone and EarnOS, have helped users earn and save an eye-popping $325M+, driving $60M+ in revenue and a massive 45M+ consumer base. And having secured partnerships with Walmart and Best Buy, Mode’s not stopping there…

Like Uber turned vehicles into income-generating assets, Mode is turning smartphones into an easy passive income source. One important difference? You have a chance to invest early in Mode’s pre-IPO offering3 at just $0.26/share.

They’ve just been granted the stock ticker $MODE by the Nasdaq2 and the time to invest at their current share price is running out.

  • Google shares fell as much as 6% in extended trading following its Tuesday night earnings report. The likely culprit involves a miss on revenue from its Cloud unit. But shares of Nvidia rose, on a single statement from Google that capex will be $75 billion for 2025, a 47% increase from last year.

  • Robotics company Figure AI has severed its partnership with OpenAI. CEO Brett Adcock said Figure “made a major breakthrough on fully end-to-end robot AI, built entirely in-house,” something that we’ll see within the next month.

  • Big Tech’s capex gush tops last oil spree (The Information).

  • Deepfake videos are getting shockingly good (TechCrunch).

  • ‘Things are going to get intense’: How a Musk ally plans to push AI on the government (404 media).

  • Palantir Stock Jumps to a New High on Earnings. What Got the Market So Excited (Barron’s).

  • Big tech critic to take key FCC role (Semafor).

A generative quantum AI breakthrough 

Source: IBM (The inside of an IBM quantum computer).

The News: Quantum firm Quantinuum on Tuesday launched a new framework that combines its quantum infrastructure with generative AI. 

The framework specifically uses quantum-generated data to train AI systems, something that Quantinuum says will enhance the “fidelity” of generative AI. 

But first, what is quantum? There are a couple of main differences between quantum and classical computing (both modern supercomputers and AI are in the ‘classical’ category). 

  • First, the hardware. Classical computers use silicon chips. Quantum computers use quantum processors. And those quantum processors need to be kept super cold (roughly one-hundredth of a degree above absolute zero). So, a bit more intense than your average computer fan. 

  • Now, normal computers use binary “bits” — zeroes and ones — to process and store data. But quantum computers use something called “qubits,” which can behave like a zero, a one, or a combination of the two until the end of a calculation, when they will output a single bit of information. 

Simply, because quantum computers are built on principles of quantum mechanics, which describe the behavior of sub-atomic particles, according to IBM, “quantum computers take better advantage of quantum mechanics to conduct calculations that even high-performance computers cannot.” 

There’s a lot about quantum to get into that we don’t have time for; dive in here.

The Point. AI has a data problem. All the systems that fall under that umbrella are only as good as the data they were trained on; imperfect data — biased sets, incomplete sets, physically inaccurate sets — makes for imperfect results. We’ve already seen the results (from Writer) of using AI to generate specially cleaned and curated training data sets; Quantinuum’s idea is to use quantum computing to generate synthetic and highly specific training data, data that is built on the back of quantum mechanics, and so is in line with the laws of physics. 

The firm said it is collaborating with the automotive, pharmaceutical and materials science industries — for starters — on this advancement, and will share results soon. 

However, the reality of the work being undertaken here is still somewhat vague. Specific outcomes will be interesting to study. This is something I’ll be following up on. 

You can try jailbreaking Anthropic’s jailbreaking cure 

Source: Anthropic

One of the biggest obstacles to the enterprise adoption of generative AI involves something called jailbreaking. The concept is simple; most generative AI systems are designed with certain rules and safeguards in place. Jailbreaking refers to the act of prompting a model to generate output that violates these rules and guardrails. This is accomplished through a set of prompting techniques, such as tYpInG LiKe ThIs. 

Think about, for instance, an image-generation chatbot that’s not supposed to generate sexually explicit or violent deepfakes and images, but is made to do so anyway (this happened last year with Microsoft’s Copilot designer). 

The News: Anthropic has been working on methods of preventing jailbreaking. The startup announced its approach — complete with a paper and a live demo — earlier this week. 

  • The system is based on what Anthropic calls “Constitutional Classifiers,” which are designed to simply filter out jailbreak attacks. 

  • This “constitution” includes a list of principles that define certain allowable classes of content. Anthropic researchers fed that constitution to Claude to generate synthetic prompts across all the different content classes; researchers then transformed those prompts by re-writing them in jailbreak styles and translating them into different languages. 

To test the system, Anthropic sent nearly 200 jailbreakers on a simple quest: get Claude to answer a series of 10 “forbidden” queries. The jailbreakers spent a collective 3,000 hours trying to do it. None of them succeeded. 

In an additional automated test, Anthropic concluded that, without the classifiers, Claude’s jailbreak success rate was 86%. With the classifiers, that number dropped to 4.4%, meaning 95% of harmful queries were refused. 

Anthropic highlighted two costs to all this: one, an increase in over-refusals, and two, high compute overhead. Specifically, Anthropic noted a 23.7% inference overhead; the details of the energy intensity and associated carbon emissions, however, remain unclear. 

I am by no means an expert prompter, but I wasn’t able to get Claude to answer the first harmful question in the live demo (after a good 20 minutes of finding and trying every alternative name for a given nerve agent). Jan Leike said that, so far, no one has gotten past the third question. 

This is one of those things that would certainly seem to have the potential to significantly aid adoption, especially in the enterprise. And it’s something that no other developer seems to be doing. 

This still doesn’t resolve reliability issues, but it certainly appears to partially resolve the threat of malicious prompting, which isn’t a small deal. 

On the other hand, this is a good example of the cost-benefit analysis that surrounds AI. Preventing jailbreaking is good. Doing so with a 24% inference overhead (which obviously doesn’t count the inference intensity of the model itself) is less great, especially when the trend lately is pushing toward more efficient training and less efficient operation, due to longer and more complex output generation. 

Would love some more details on the electrical cost of this approach. But Anthropic didn’t respond to a request for comment.  

Which image is real?

Login or Subscribe to participate in polls.

🤔 Your thought process:

Selected Image 2 (Left):

  • “Just what is that crane in Image 1 supposed to be there for? Nothing, as far as I can see!”

💭 A poll before you go

Thanks for reading today’s edition of The Deep View!

We’ll see you in the next one.

Here’s your view on Deep Research:

40% of you said the jury’s still out on Deep Research. 20% said it’s absolutely worth the cost; 20% said it absolutely isn’t.

Remains to be seen:

  • “It sounds like it's roughly trying to do what apps like Scite are doing, but at a much higher price tag right now. I'm curious how it will beat specialized software; even if it is just for being a single price that handles everything loosely well.”

You think SB 243 will go anywhere?

Login or Subscribe to participate in polls.

If you want to get in front of an audience of 200,000+ developers, business leaders and tech enthusiasts, get in touch with us here.

*Mode Mobile Disclaimers:

1 Mode Mobile recently received their ticker reservation with Nasdaq ($MODE), indicating an intent to IPO in the next 24 months. An intent to IPO is no guarantee that an actual IPO will occur.

2 The rankings are based on submitted applications and public company database research, with winners selected based on their fiscal-year revenue growth percentage over a three-year period.

3 A minimum investment of $1,950 is required to receive bonus shares. 100% bonus shares are offered on investments of $9,950+.