The Deep View
Posts
⚙️ Anthropic beats GPT-4o with Claude 3.5 ... in another blogpost

⚙️ Anthropic beats GPT-4o with Claude 3.5 ... in another blogpost

Ian Krietzberg
June 21, 2024

Good morning, and happy Friday. Not to be outdone by OpenAI’s somewhat recent release of GPT-4o, Anthropic released Claude 3.5 Sonnet, a model it says outperforms the competition.

I messed around with it a bit and found it was marginally better at certain tests than Claude 3 Sonnet — but still can’t trust its accuracy enough to actually use it.

Read on for the full story.

In today’s newsletter:

🏡 AI for Good: Promoting in-home energy conservation
🚗 Study: Autonomous cars are better than human — except at dawn, dusk and during turns
🎨 Glaze creators respond to a new attack
💻 Anthropic beats GPT-4o with Claude 3.5 … in another blogpost

AI for Good: Promoting in-home energy conservation

Photo by Jason Mavrommatis (Unsplash).

Homes have been getting smarter and smarter for the past 20 years or so. And in the push for more automation and intelligence, AI has been quietly revolutionizing Heating, Ventilation and Air Conditioning (HVAC) systems.

The details:

Energy optimization — AI is great at analyzing data and finding trends. Algorithms can parse historical and real-time data (occupant patterns, weather forecasts, etc.) to optimize the function of HVAC systems. Studies have shown that this integration can result in a minimum of 10% energy savings, while others have shown more than 25%.
Predictive maintenance — Through this kind of integration, algorithms can also predict when elements of an HVAC system will need maintenance, reducing downtime.
Indoor air quality — AI-enabled systems can also regularly scan for pathogens and harmful particulate matter, and can control ventilation systems based on these analyses, resulting in cleaner air.

Why it matters: Buildings account for 30% of global energy consumption, according to the International Energy Agency (IEA). And HVAC systems are responsible for a significant portion of that consumption.

Energy efficiency has been called the “first fuel” in the transition to green energy.

Study: Autonomous cars are better than human – except at dawn, dusk and during turns

Photo by Jules PT (Unsplash).

Autonomous vehicles are one of the biggest promises of artificial intelligence. And like the rest of the industry, they have been lacking in real progress.

Tesla’s self-driving tech has spent the past five years proving Elon Musk’s predictions about cross-country summonses and robotaxi fleets wrong. Cruise only recently re-started its operations after a months-long hiatus.

And Tesla, Cruise, Waymo and Zoox are all under federal investigation for safety concerns related to their self-driving tech.

New findings: In the midst of this, new research compared nearly 40,000 accidents involving both AVs and human-driven vehicles.

The report found that, in general, AVs are safer than human-driven vehicles.
The major caveat here is that Level 4 AVs are around five times more likely to get in an accident at dawn/dusk than human drivers; AVs are also around two times more likely to get in an accident during turns than humans.

But: The study combined data from Level 4 self-driving vehicles with data regarding Tesla’s self-driving tech, which operates at a Level 2, something that could be impacting the study’s more general findings.

Missy Cummings, director of George Mason University’s Autonomy and Robotics Center and former safety advisor for the National Highway Traffic Safety Administration, told IEEE Spectrum that combining these two categories goes against the “ground rules” of research in this area.

“We do not have enough information to make sweeping statements,” she said.

Together with Guidde

Magically create video documentation with AI.

Tired of explaining the same thing over and over again to your colleagues? It’s time to delegate that work to AI. guidde is a GPT-powered tool that helps you explain the most complex tasks in seconds with AI generated documentation.

Turn boring documentation into stunning visual guides
Save valuable time by creating video documentation 11x faster
Share or embed your guide anywhere for your team to see

Simply click capture on our browser extension and the app will automatically generate step-by-step video guides complete with visuals, voiceover and call to actions.

The best part? Our extension is 100% free. Try it here

Glaze creators respond to new attack

Image Source: Glaze

In today’s world, where digital art functions as fuel for the relentless engines of generative AI, artists have been looking for secure solutions.

One of these is Glaze, a tool that applies machine learning methods to help prevent the style mimicry found within generative image models. But a new paper released this week found a novel attack that at least partially erases the protection that Glaze offers, particularly in Glaze’s first version.

The push and pull of cybersecurity: The creators of Glaze in turn released Glaze 2.1, an update that addresses and mitigates the new attack.

The creators added that artists must consistently Glaze their work to ensure it remains protected: “This disclosure serves as a reminder that once glazed does not mean protected forever. Unfortunately protecting one's art in this world we live in requires constant vigilance.”

💰AI Jobs Board:

Conversational AI App Engineer: Koko Home · United States · Los Angeles, CA · Full-time · (Apply here)
Analyst — Market Intelligence & AI: Obviant · United States · Washington D.C. - Baltimore Area · Full-time · (Apply here)
Applied Machine Learning Researcher: Latham & Watkins · United States · Menlo Park, CA · Full-time · (Apply here)

📊 Funding & New Arrivals:

BrightWave, a startup designing an AI research assistant, raised $6 million in funding.
Astrocade AI, an AI-powered social gaming company, raised $12 million in funding.
Black Semiconductor, a European startup, raised $273 million in seed funding.

🌎 The Broad View:

Solar installations exceed expectations as costs drop (The Economist).
Amazon’s ditching the plastic air pillows in its boxes (CNBC).
Dell employees reject return to office push (Semafor).
New York moves to limit 'addictive' social media feeds for kids (AP).
Ready to supercharge your career?
- Sidebar is a leadership program where you get matched to a small peer group, have access to a tech-enabled platform, and get an expert-led curriculum. Members say it’s like having their own Personal Board of Directors.*
Learn how Vanta’s automation can help your business each vital security compliance at a special live session next month. *

*Indicates a sponsored link

Together with Sidebar

Accelerate your career in 2024 with Sidebar

It’s lonely at the top of the mountain.

But if you want to become a better leader, the best way to get there is to get feedback from your peers. Sidebar helps you find them.

"You're the average of the people you keep closest; Sidebar helps you raise that bar." - Global Director, Reddit

Sidebar is an exclusive leadership program that matches you with small circles of high-powered peers. It’s not a social club — this group provides personalized support, feedback and advice that will take you to the next level.

You could spend decades working to build a robust network … or you could join thousands of leaders (from companies like Microsoft and Amazon) by applying for Sidebar today. And you’re sure to feel the impact; 93% of Sidebar’s customers say it has supercharged their career.

Take the first step with Sidebar today. Request an invite to join here.

Anthropic beats GPT-4o with Claude 3.5 … in another blogpost

Another day, another model.

Created with AI by The Deep View.

Anthropic on Thursday released Claude 3.5 Sonnet, the company’s first entry in its upcoming “3.5” model family. Anthropic said in a blogpost that the model — which is available for free — outperforms all the competition.

Some details: Anthropic said that the model operates at “twice the speed” of Claude 3 Opus, the most powerful model in Anthropic’s previous model family.

“In an internal agentic coding evaluation, Claude 3.5 Sonnet solved 64% of problems, outperforming Claude 3 Opus which solved 38%.”
The model is also Anthropic’s “strongest vision model” to date.

On the safety side: Anthropic said that despite the jump in capabilities, the model doesn’t present any higher risk than current LLMs.

Anthropic shared the model with the U.K. AI Safety Institute for pre-deployment evaluation.

The problem with release-by-blogpost: A recent study that explored open-washing in AI took issue with the release-by-blogpost approach that has become the norm in the AI world.

The authors said that this approach allows “releases to retain the veneer of scientific work while at the same time avoiding the fine-grained accounting and the scrutiny of peer review that comes with actual scientific publication.”

The authors go on to say that the tables that compare model benchmarks — which are also present in Anthropic’s latest release — allow companies to cherry-pick results to show their models in the best possible light.
“When generative AI follows the release-by-blogpost model, it is reaping the benefits of mimicking scientific communication — including associations of reproducibility and rigor — without actually doing the work.”

Cognitive scientist and AI researcher Gary Marcus said that Claude 3.5 Sonnet represents “yet another model, roughly in the same ballpark as many others.”

“When did AI stop being a science? You can’t conclude that Claude 3.5 is ‘better than 4o’ when there are no error bars, and GPT-4o actually did better than Claude in 2 of the 6 comparisons,” he added.

Marcus has said that the regular release of slightly more capable models indicates that deep learning is, in fact, hitting a wall. He has argued that scaling is *not all you need, and that scientists will need to explore a new AI paradigm if they want to develop an AI system that is more than just a marketing term.

“Children continuously grow, but they don’t continuously grow *exponentially,” Marcus said. “AGI won’t come from a bunch of ‘if you squint, you can see it improvements’; it will take real breakthroughs.”

My view: This leap-frogging between OpenAI, Google and Anthropic might seem like a big deal within the industry, but I don’t know how much the average person cares about these slightly improved metrics.

They have not solved hallucinations or cracked the black box of LLMs. They haven’t cured cancer or saved the climate or reduced the energy intensity of model usage/training by any significant margin.

It’s just the same, but faster. And that’s great for the people who use these systems regularly. But for everyone else, this race feels like a distraction from true use cases and applications of AI.

Image 1

Which image is real?

Image 2

Brave Search API: An ethical, human-representative web dataset to train your AI models. *
Transcriptal: AI-powered platform ensures precise transcriptions, capturing every word with exceptional clarity.
Otter: AI-powered transcription tool.

Have cool resources or tools to share? Submit a tool or reach us by replying to this email (or DM us on Twitter).

*Indicates a sponsored link

SPONSOR THIS NEWSLETTER

The Deep View is currently one of the world’s fastest-growing newsletters, adding thousands of AI enthusiasts a week to our incredible family of over 200,000! Our readers work at top companies like Apple, Meta, OpenAI, Google, Microsoft and many more.

If you want to share your company or product with fellow AI enthusiasts before we’re fully booked, reserve an ad slot here.

One last thing👇

*taps sign*
— Sasha Luccioni, PhD 🦋🌎✨🤗 (@SashaMTL)
2:44 PM • Jun 20, 2024

That's a wrap for now! We hope you enjoyed today’s newsletter :)

What did you think of today's email?

We appreciate your continued support! We'll catch you in the next edition 👋

-Ian Krietzberg, Editor-in-Chief, The Deep View