- The Deep View
- Posts
- Agents, tasks, RAG and the double-edged sword
Agents, tasks, RAG and the double-edged sword
Good morning. 404 Media reported that the vast majority of traffic to a nonconsensual nudify app (around 90%) is coming from Meta; the app is buying ads on Instagram, and the app’s users are swelling as a direct result.
Maybe it’s not such a good morning …
— Ian Krietzberg, Editor-in-Chief, The Deep View
In today’s newsletter:
⚕️AI for Good: Cancer prognosis foundation model
💻 British AI video firm raises $180 million
📊 Report: The 2025 state of AI
📱 Agents, tasks, RAG and the double-edged sword
AI for Good: Cancer prognosis foundation model
Source: Unsplash
Increasingly, AI-based tools are being leveraged in clinical settings to aid in diagnostics. But the leap from diagnosis to prognosis (and the treatment plans associated with that) is a big one, largely owing to the quantity and type of training data required.
Researchers at Stanford Medicine and Harvard Medical School recently bridged that gap, unveiling a fine-tunable foundation model — nicknamed “MUSK” — that is capable of oncology prognoses that, in some cases, outperformed traditional prognostic methods.
The details: The model — trained on 50 million medical images and one billion pathology-related and follow-up texts for people with 16 types of cancer — is capable of processing both visual and textual information. This combination of, say, X-ray images with exam notes enables the model to actually output accurate prognoses.
In trials, MUSK was able to predict the disease-specific survival of a given patient 75% of the time, whereas traditional methods are correct 64% of the time, according to the researchers.
The model was able to predict whether a patient would benefit from immunotherapy with a 77% accuracy rate, above the 61% accuracy rate of the standard methodology, which is based on the expression of a single protein within a tumor — “That’s a biomarker made of just one protein. In contrast, if we can use artificial intelligence to assess hundreds or thousands of bits of many types of data, including tissue imaging, as well as patient demographics, medical history, past treatments, and laboratory tests gathered from clinical notes, we can much more accurately determine who might benefit,” Dr. Ruijiang Li, the study’s lead author, said.
Why it matters: “The biggest unmet clinical need is for models that physicians can use to guide patient treatment,” Li said. “We designed MUSK because, in clinical practice, physicians never rely on just one type of data to make clinical decisions.”
Entering a New Era of Business Automation
Embracing AI in your business isn't just a tech upgrade; it's an evolution that redefines problem-solving, boosts efficiency, and elevates customer engagement.
Dive into the guide, Transforming Process Orchestration with Artificial Intelligence, for an in-depth overview of how AI can revolutionize business automation, featuring real-world examples and best practices.
Here's what you'll discover:
The critical impacts of AI across various industries
Significant benefits and potential risks of AI
How to develop an AI automation strategy
The future of AI in process orchestration
How Camunda integrates advanced AI technologies
British AI video firm raises $180 million
Source: Synthesia
Synthesia, a U.K.-based enterprise AI video platform that enables users to create personal AI avatars, among other things, said Wednesday that it had secured a $180 million Series D funding round at a $2.1 billion valuation.
Last time the Nvidia-backed Synthesia raised money — $90 million in 2023 — it was valued at $1 billion.
The details: The round was led by NEA with participation from Atlassian Ventures and World Innovation Lab, among others.
The company, founded in 2017, now has more than 60,000 enterprise customers, according to its CEO, Victor Riparbelli. The funding will be used to expand Synthesia’s offerings in new markets, including Japan, Australia and Europe.
This latest round makes Synthesia the largest generative AI media company by valuation in the U.K.
Riparbelli told CNBC that, unlike other AI startups (OpenAI, for example) that are entirely reliant on venture funding, Synthesia isn’t dependent on venture capital. “Of course, the hype cycle is beneficial to us,” he said. “For us, what’s important is building an actually good business.”
British Tech Secretary Peter Kyle said in a statement that “the confidence investors have in British tech, especially following our newly announced blueprint for AI, highlights the global leadership of UK-based companies in pioneering generative AI innovations.”
The Washington Post reported that police, confident in their unproven facial recognition technology, are skipping investigative steps and ignoring standards. At least 8 Americans have been wrongfully arrested due to this so far.
TikTok is facing a potential ban in the U.S. as soon as this Sunday. If ByteDance decides to sell the app, analysts have estimated that it could be worth as much as $50 billion. (Mr. Musk paid $44 billion for Twitter, back in the day).
Is humanity alone in the Universe? What scientists really think (Ars Technica).
FDA bans Red No. 3, artificial coloring used in beverages, candy and other foods (NBC News).
Qantas flights delayed due to falling space debris from SpaceX rocket launches (Semafor).
Singapore is turning to AI to care for its rapidly aging population (Rest of World).
Inside Meta’s race to beat OpenAI: ‘We need to learn how to build frontier and win this race’ (The Verge).
If you want to get in front of an audience of 200,000+ developers, business leaders and tech enthusiasts, get in touch with us here.
Report: The 2025 state of AI
Source: Unsplash
We’re talking about agents today. And we’ve been talking about agents for a while. But as the capabilities of generative AI seem to continuously expand, a separate question remains critical to the health of the business of AI: corporate adoption.
What happened: Vellum published its 2025 State of AI report, a survey of 1,250 AI developers at companies — mostly tech firms, with some healthcare, finance, legal and retail — around the world, aiming to outline the trends of adoption going into the new year.
25% of those surveyed have deployed AI projects; the remainder are stuck in a variety of pre-deployment phases, which Vellum thinks could be due either to a lack of proper tooling, or a simple lack of applicable use cases.
The most popular application — with 59.7% of those surveyed — in construction across those 1,250 respondents involves simple document parsing; 51% are building customer service chatbots, 25.9% are using AI for content generation and only 23% are using AI for code generation.
Across those surveyed, OpenAI remains dominant, with 63.3% using ChatGPT, while 33.8% use Microsoft’s products and 32% use Anthropic.
The biggest challenge faced by those polled (some 57.4% of them) had to do with the management of AI hallucinations and prompts. 42.5% said that prioritizing AI use cases was their most significant challenge.
Only about half of those surveyed monitor their models, and less than 60% perform model evaluations. A third said that the biggest payoff from their AI integration thus far has involved competitive advantages.
Only 27.1% said their AI integration has afforded their company significant time and cost savings.
Agents, tasks, RAG and the double-edged sword
Source: Created with AI by The Deep View
We are *checks calendar* two weeks into the year, and already, 2025 is shaping up to fulfill our first prediction: it’s full of agents.
Let me preface this briefly by saying that, unsurprising for the world of AI, there isn’t a unified, agreed-upon definition of what, exactly, an “agent” is. The general idea is that, where a chatbot outputs textual or visual content, an agent — based on the same exact LLM architecture — is connected to platforms and tools, and so can complete tasks. But that broad definition features a pretty massive gradient when it comes to levels of autonomy.
Nvidia kicked the year off by unveiling a whole new slate of agentic “blueprints” designed to enable enterprises to build and deploy custom agents. OpenAI took its own trademarked dip into the agent pool earlier this week with its introduction of “tasks,” which is, well, kind of exactly what it sounds like.
With tasks, users can now delegate tasks to the chatbot, which will then perform them asynchronously. For example, you can ask ChatGPT to check Apple’s stock price — and send you a note with the latest data — every morning at 10 a.m.
It doesn’t seem agentic in the way people were probably expecting — by the measure of task reminders, Siri and Alexa have been agentic for years — but, nonetheless, it is a push toward greater automation, less direct human oversight, greater asynchronous activity. Still, early reports show that the feature is extremely brittle, mixing up time zones and failing to do what users are asking it to do (an inescapable flaw of the architecture).
Microsoft at the same time released something called Copilot Chat, where users can create, metered, pay-as-you-go “agents” directly in its chat experience on Microsoft 365.
And Contextual, the enterprise-focused startup building Rag 2.0, on Wednesday announced the general availability of its AI platform, which will enable enterprises to “build specialized RAG agents to support expert knowledge work.”
The platform is focused on the construction, evaluation and deployment of these specialized agents, which Contextual believes is far more important than the idea of general-purpose agents. The idea, achieved through RAG and Contextual’s latest observability tech, is to give domain experts “AI tools that match their level of expertise and can be trusted to address complex or technical problems with confidence.” The agentic output here comes complete with precise citations.
In a public set of benchmarks published by Contextual, the startup’s RAG agents — the result of a systems approach, rather than being model-specific — outperformed Claude 3.5 Sonnet, GPT-4o and Llama 3.1, sometimes by large margins.
“Enterprise AI has reached a critical turning point,” Douwe Kiela, CEO of Contextual AI, said in a statement (Kiela was one of the co-authors of the original RAG paper). “AI agents will soon be available to every employee at every company. However, the specialized work of subject-matter experts remains largely underserved. Specialized RAG agents built on the Contextual AI Platform bridge this gap, enabling SMEs to boost their productivity with AI that truly understands their domain.”
With great power … A team of Hugging Face researchers — Margaret Mitchell, Avijit Ghosh, Sasha Luccioni and Giada Pistilli — at the same time published a values-based analysis of agents, writing that agentic systems have plenty of potential to increase efficiency, accuracy, consistency and safety in certain settings.
However (I’m sure you saw this coming a mile away), increasing automation coupled with decreasing human attentiveness and control poses very specific risks: “risks to people increase with a system’s level of autonomy … The more control a user cedes, the more risks arise from the system,” the researchers wrote.
While agents allow for the opportunity of greater accuracy, since they’re based on language models that are “designed to construct text that looks like fluent language — meaning they often produce content that sounds right, but is very wrong,” the risk of actions based on incorrect information (AKA, confabulations or hallucinations) is real.
As such, according to the researchers, there is a risk of agents reducing efficiency, requiring human workers to double-check an agent’s work, or find specific, nuanced errors. There are also risks here of job loss (and therefore, economic disruption) and increased unsustainability, in addition to safety and security vulnerabilities tied to an agent’s interoperability with other platforms.
The effects of agents, they said, must be better understood. And the systems in general must be subject to higher standards of transparency and observability.
Transparency remains largely lacking in the industry. As we move into a world of agents, which are more complex than single models, I remain anxiously curious about the increase in associated energy use and carbon emissions.
If a single ChatGPT query uses 10x the electricity that a Google search does, how much energy does an asynchronous agent consume?
Which image is real? |
🤔 Your thought process:
Selected Image 2 (Left):
“Mr. one-leg AI photo is missing a pedal and has a crooked bike.”
Selected Image 1 (Right):
“NO. What is the guy pulling?? Some rope on his tires? They say AI is weird? People are weird.” Wow, you really studied this one closely. That’s so fun.
💭 A poll before you go
Thanks for reading today’s edition of The Deep View!
We’ll see you in the next one.
Here’s your view on Britain’s AI approach:
Splits are almost exactly even on this one, guys. Very interesting.
43% like Britain’s approach; 44% do not.
But 51% of the British people among you approve of the plan, while 49% do not.
I don’t know that we’ve ever had a poll this evenly divided.
Yes - I’m British:
“While the concerns need addressing, the benefits to the economy are very much needed. Britain needs to get back to being the powerhouse it once was and attract lucrative businesses instead of scaring them away with the high taxes recessive economy.”
Nope - From somewhere else:
“This is just a national government being suckered into marketing hype. It is dangerously naive to assume a technology's benefits will outweigh its costs.”
Words to live by.
Is your company using agents? Do you like them? |