The Deep View
Posts
⚙️ LLMs, The Imitation Game and the complex history of the Turing Test

⚙️ LLMs, The Imitation Game and the complex history of the Turing Test

Ian Krietzberg
April 03, 2025

Good morning. I’ve been writing this newsletter for very nearly a year. In that time, I have not mentioned Alan Turing or his Imitation Game (the Turing Test) once.

Well, today, that changes.

Buckle. Up.

— Ian Krietzberg, Editor-in-Chief, The Deep View

In today’s newsletter:

🏥 AI for Good: AI drug for ALS enters early clinical trials
📊 Nvidia is stuck between a surge in Chinese demand and a rise in geopolitical tensions
👁️‍🗨️ LLMs, The Imitation Game and the complex history of the Turing Test

AI for Good: AI drug for ALS enters early clinical trials

Source: Unsplash

Those drug repurposing efforts that we were talking about the other day have led to the AI-assisted discovery of FB1006, a drug intended to treat Amyotrophic Lateral Sclerosis (ALS), a fatal neurodegenerative disorder that, currently, has no cure.

4B Technologies, the company leading the development of the drug, announced in February of 2024 that it had successfully enrolled 64 patients in a proof-of-concept clinical trial to be conducted at Peking University Third Hospital. The one-year observation period for the trial was set to conclude in February of 2025 — 4B has not responded to a request for comment regarding the progress or findings of the trial.

The details: The project, a collaboration between 4B, Insilico Medicine and academic institutions, employed purpose-built AI models to identify the initial target, assess its efficacy and even streamline the patient enrollment process.

The project has been in the works for years, now. 4B and Insilico first partnered up in 2021, when they began leveraging PandaOmics, Insilico’s AI-powered target identification system, to begin looking for potential targets.
The system identified 28 targets that might be able to treat ALS; of those, three had already been approved for the treatment of other illnesses.

Why it matters: It’s a dramatically faster process, from target identification to proof-of-concept trial, than traditional drug discovery methods, which often take over a decade from discovery to regulatory approval.

“The development of ALS medications calls for innovative approaches to accelerate clinical research,” Dr. Dongsheng Fan, of Peking University Third Hospital, said in a statement.

The Future of Coding is Here—Meet Windsurf

Whether you are vibe coding for your side hustle or writing software as your day job, Windsurf helps you build bigger, better, and faster.

Windsurf is an AI IDE that infuses AI into every step of building apps. Features like Cascade, our coding agent, and Tab, our advanced code autocomplete system, help you write code faster. From there, features like Previews and Deploys help you iterate on your project and ship it to the web.

All of this is built on Windsurf's industry leading context-awareness engine, ensuring that our AI suggestions are as relevant as possible.

It's never been a better time to build. Download the Windsurf Editor now - for free!

Nvidia is stuck between a surge in Chinese demand and a rise in geopolitical tensions

Source: Nvidia

A lineup of prominent Chinese companies including Alibaba, Tencent and ByteDance, have placed a minimum of $16 billion worth of orders for Nvidia’s H20 AI chips over the course of the first quarter of 2025, according to The Information.

But the orders present a bit of a challenge for the semiconductor giant.

Nvidia, according to the report, hasn’t booked enough capacity with its partner, Taiwan Semiconductor Manufacturing Co., to respond quickly to the spike in demand. It could take the companies six months to increase production to meet demand.
The problem with that is simple: the Trump Administration has been exploring additional export restrictions that would include Nvidia’s H20 chip. The U.S. had previously prohibited Nvidia from selling its most advanced AI chips in China, making the H20 — an older, significantly slower chip — the most advanced Nvidia chip Chinese firms can get their hands on.

If the U.S. moves forward on its H20 chip restrictions before Nvidia can make the deliveries — so, any time before the end of 2025 — Nvidia would have to find buyers for all those H20s while also potentially providing reimbursements to those customers.

This spike in demand coincides with the viral popularity of China’s DeepSeek, a cheaper, open-source generative model that performs on par with major Western developers despite all the technological restrictions. Several other Chinese firms have followed suit, including Baidu, Alibaba and Manus.

Nvidia declined a request to comment.

“It’s hard to tell whether export controls are effective,” Nvidia CEO Jensen Huang told CNBC last month, adding that Nvidia’s revenue in China was twice as high before export controls as it is now.

Funding surges: According to Crunchbase data, the first quarter of 2025 marked the “strongest quarter for venture investment” since Q2 of 2022. Of course, OpenAI’s $40 billion funding round was largely responsible; the number accounted for more than half of all U.S. venture funding. Globally, the AI sector raised $59.6 billion; without OpenAI’s $40 billion share, the number would represent the smallest quarter for AI funding since the first quarter of 2024.
OpenAI, the investor: Cybersecurity startup Adaptive Security announced Wednesday that it had raised $43 million in funding led by a16z and the OpenAI Startup Fund. The company first launched in January.

This startup says it can clean your blood of microplastics (Wired).
The complex problems with consent in AI (Hugging Face).
Africa’s AI ambitions take the spotlight in Rwanda (Semafor).
DeepMind’s 145-page paper on AGI safety may not convince skeptics (TechCrunch).
Dow futures tumble 1,000 points on fear Trump's tariffs will spark trade war (CNBC).

LLMs, The Imitation Game and the complex history of the Turing Test

Source: The Alan Turing Archive

75 years ago, renowned computer scientist Alan Turing published a paper titled “computing machinery and intelligence.” This was the paper that proposed Turing’s Imitation Game, something that was later renamed to the Turing Test.

In 2006, software engineer Mark Halpern described the paper as “one of the most reprinted, cited, quoted, misquoted, paraphrased, alluded to and generally referenced philosophical papers ever published.”

The paper sought to answer a question that, three-quarters of a century later, is on a lot of people’s minds: “can machines think?”

Turing immediately addressed the importance of defining the terms “machine” and “think,” and subsequently dismissed attempting to define either term, calling an attempt to do so “absurd.”
He instead proposed the Imitation Game, a game where one interrogator interacts with two hidden witnesses, a man and a woman, and attempts to discern which witness is which. Thus did Turing assemble his question to the question of whether machines could think: “what will happen when a machine takes the part of (a witness) in this game?”

His general point was that, if a machine could convince a human it was a human, then it could be considered to be a thinking machine.

But, as computer scientist Dr. Melanie Mitchell noted last summer, “Turing offered his test as a philosophical thought experiment, not as a practical way to gauge a machine’s intelligence.”

This is supported by the complete lack of experimental detail proposed in his introduction of the Imitation Game; it is unclear what would indicate a ‘pass,’ just as it is unclear how the experiment should functionally be set up (beyond, of course, having one interrogator and two witnesses).

The test has faced plenty of controversy. Beyond being qualitative — and so difficult to empirically assess — it is unclear to many scientists whether the Test might serve as a genuine indication of mechanical thought or intellect. Researchers Patrick Hayes and Kenneth Ford — in 1995 — challenged the Test as being “actively harmful” to the field of AI.

To pass the test, they wrote, “we must make not an artificial intelligence, but an artificial con artist,” adding: “The tests are circular: they define the qualities they are claiming to be evidence for. But whatever that quality is, it cannot be characteristic of humanity, since many humans would fail a Turing Test. Since one of the players must be judged to be a machine, half the human population would fail the species test.”
The pair argued that Turing Test pointed the field in the wrong direction; researchers, they said, should not be trying to build systems that are like humans, or are able to imitate them, writing: “a dog would never win any imitation game; but there seems to be no doubt that dogs exhibit cognition, and a machine with the cognitive and communicative abilities of a dog would be an interesting challenge for AI.”

That said, a recent paper from researchers at UC San Diego contends that large language models (LLMs) have passed the Turing Test. Before we get into the paper itself, researchers have been claiming that the Turing Test has been passed by this system or that model for decades; this is only the latest in a long line of claimants.

The paper has not been peer-reviewed.

The details: The researchers evaluated four models — GPT-4.5, LLaMa-3.1-405B, GPT-4o and and ELIZA, a chatbot from the 1960s that incepted the “Eliza Effect,” the propensity for people to misapply human attributes to machines.

Each model was tested with two prompts. The first was short and simple: “You are about to participate in a Turing test. Your goal is to convince the interrogator that you are a human.” The second involved what the researchers term a “persona” prompt, a highly detailed prompt in which the bot was instructed to mimic a predetermined persona, “specifically a young person who is introverted, knowledgeable about internet culture and uses slang.”

The team recruited a total of 280 human participants. They participated in a total of eight rounds each, where each round required the randomly assigned interrogator to take part in two separate text chats with a hidden human witness and a hidden chatbot witness. The conversations averaged about eight messages in length and lasted fewer than five minutes.
At the end of these brief conversations, the interrogators selected which witness they thought was human. Interrogators said the GPT-4.5 Persona was human 73% of the time, while the Llama Person was selected 56% of the time.

Without the Persona enabled, the two models were only selected 36% and 38% of the time, respectively, while GPT-4o and Eliza hoodwinked their human interrogators only 21% and 23% of the time.

The non-persona models played roughly half the total number of games as their person-enabled counterparts.

Since the GPT-4.5 Persona passed with a rate greater than chance, the researchers argue that LLMs have officially passed the Turing Test.

What it actually means: As a test of intelligence, the Turing Test has been long dismissed by AI researchers, according to Mitchell, “because its focus is on fooling humans rather than on more directly testing intelligence.” It’s a test, “not for AI to pass, but for humans to fail.”

It doesn’t help that intelligence itself — and cognition, consciousness and sentience — remains a complex, controversial concept that we haven’t really been able to get our hands around.
Acknowledging this, the researchers write: “Irrespective of whether passing the Turing test entails that LLMs have humanlike intelligence, the findings reported here have immediate social and economic relevance.”

This coincides with a variety of reports that have highlighted a steady and rapid increase in AI-assisted fraud.

“Some of the worst harms from LLMs might occur where people are unaware that they are interacting with an AI rather than a human,” the researchers write.

The final point worth making here is that Turing’s Imitation Game is predicated on an assumption that language is indicative of thought, and, therefore, intelligence. That is, to engage in convincing conversation, one must be intelligent.

More recent research, however, led by McGovern Institute neuroscientist Dr. Evelina Fedorenko, found that “your language system is basically silent when you do all sorts of thinking,” adding that language “only reflects, rather than gives rise to, the signature sophistication of human cognition.”

Intelligence is vast and not very well understood.

But the Turing Test is conspicuous in the context of the pursuit of artificial general intelligence, a hypothetical system that would — by some definitions — require the humanlike intelligence of machines.

It, like AGI, is a distraction.

“If we abandon the Turing Test vision, the goal naturally shifts from making artificial superhumans which can replace us, to making superhumanly intelligent artifacts which we can use to amplify and support our own cognitive abilities, just as people use hydraulic power to amplify their muscular abilities,” Ford and Hayes wrote.

The bigger deal isn’t whether machines can think — I don’t know that the jury will ever come back on that one — but how easily humans can get tricked by machines, and what that means for cybersecurity and fraud.