Concerned about AI evolving on its own, is Anthropic planning to stop training?

On May 4, 2026, Jack Clark, co-founder of Anthropic, posted on the social media platform X. His original words were: "I now believe there is a 60% probability that recursive self-improvement will occur by the end of 2028."

Within minutes of the post being published, Eliezer Yudkowsky, a long-time active researcher in the field of AI security, replied, "Then we will all perish together." He then used an analogy to point to the design flaws of the Chernobyl nuclear reactor RBMK, implying that no one really knew how to stop the system that was being started.

This conversation, which lasted only a few dozen seconds, was like a match that lit up discussions that had previously been hidden in technical papers and internal evaluations. Recursive Self-Improvement (RSI), the idea that AI systems not only optimize their output but also autonomously optimize the process itself, ultimately building a successor system that is stronger than itself, was put into the countdown clock by Anthropic's co-founder with a 60% probability of success by the end of 2028.

A month later, Anthropic released a lengthy article titled "When AI Builds Itself." Co-authored by Marina Favaro and Jack Clark, and published by the Anthropic Institute, which had just been established in March, the article presented a precisely calibrated accelerator signal to the outside world using a series of previously unreleased internal data and a carefully calibrated narrative structure. This signal stated both, "We're not there yet," and "But it may arrive sooner than most institutions are ready."

In the same month, DeepMind CEO Demis Hassabis used a phrase he had never used in public before at Google I/O: humanity is standing at the "foot of the singularity." In a subsequent interview, he shifted the timeline for Artificial General Intelligence (AGI) from "shortly after 2030" to "2029 is a real possibility," and admitted that his use of dramatic language was "deliberately provocative," intended to create a sense of urgency for governments, economists, and the public.

Two leading institutions, renowned for their security and long considered a force for restraint in the AI industry, adjusted the volume and tone of their public statements almost simultaneously. This timing itself needs to be examined as an independent event.

A long article that has been meticulously calibrated

Anthropic's lengthy article, published on June 4th, clearly stated its narrative objective at the outset. It aimed to argue not just for a technological trend, but for a directional and accelerating process. To this end, it presented a set of previously unpublished internal data.

The first set of figures points to a structural change: as of May 2026, over 80% of the merged code in the Anthropic codebase was written by Claude. Two years ago, this number was in the low single digits. The same data also shows that in the second quarter of 2026, the typical Anthropic engineer merged eight times more code per day than in 2024.

One can imagine the reaction of anyone who hasn't been deeply involved in the AI industry upon first reading these two figures. However, Anthropic himself acknowledged several important limitations in a footnote: leadership had publicly estimated that Claude wrote over 90% of the code, including scripts and experimental code, and 80% was a more conservative estimate of merged code; lines of code count "is an imperfect metric" and may overestimate the actual productivity gains; and the code attribution pipeline itself "has gaps."

The way these footnotes are written is itself worth analyzing. Their existence, on the surface, is an honest concession, but in reality, it makes the figures in the main text appear to have undergone careful self-filtering, thus gaining greater credibility. This is a two-layered narrative structure: the main text provides the signals, and the footnotes provide the disclaimers.

The second set of figures concerns speed. In code optimization tasks, Claude Opus 4 achieved approximately a 3x speedup in May 2025, while a skilled human researcher would need 4 to 8 hours to reach a similar level. By April 2026, Claude Mythos Preview had pushed this figure to approximately 52x. The longest time AI could independently complete a task also doubled every four months from 4 minutes in March 2024, reaching 12 hours in March 2026. This doubling every four months itself constitutes a memorable point that is easily disseminated and evokes a sense of geometric progression.

Another set of data comes from an internal survey of 130 Anthropic research team members in March 2026. The median respondents estimated that using Mythos Preview would yield approximately four times the output of not using AI. A footnote reiterates that previous independent research by METR suggests that developers' estimates of AI productivity gains may generally be overstated. The same two-tiered structure reappears.

The third set of figures points to AI approaching the limits of human researchers' judgment. In November 2025, Claude Opus 4.5 outperformed human researchers in 51% of research direction selections. By April 2026, this number had risen to 64%. Based on a sample of 129 cases, Anthropic notes in a footnote that these cases were deliberately chosen by humans, highlighting moments where human choices could be improved.

Each individual number can be placed within different interpretive frameworks. But when put together, the direction is consistent: the speed is increasing, the gap is narrowing, and all of this is happening within Anthropic's own codebase and labs, not through theoretical deductions based on some external benchmark.

After listing these data, the lengthy article presented three future scenarios.

The first scenario is a trend stagnation, entering an S-curve plateau phase. Anthropic's statement is, "We don't believe this is very likely."

The second type is a composite efficiency improvement, where AI continues to replace humans in a wider range of R&D processes, but humans still set the direction and define the criteria for success. Anthropic commented that "evidence suggests we are very likely heading towards this scenario."

The third scenario involves complete recursive self-improvement, where AI autonomously designs, trains, and deploys successor systems that are more powerful than itself, freeing humanity from the cycle. The wording is "possible."

The arrangement and tone of these three scenarios form a complete narrative gradient. The first is given a lighter touch, serving to accommodate skeptics; the second is anchored on "evidence," giving the article a rational veneer; the third, through the "possible" and conditional "if technological trends continue," pushes the boldest assumptions to the edge of the reader's imagination, without requiring them to bear the burden of proof.

At the heart of the entire article, Anthropic's attitude is condensed into one sentence: "We're not there yet, and recursive self-improvement is not inevitable. But it may arrive sooner than most institutions are prepared."

From "willing to pause" to "unilateral pause will only allow reckless individuals to catch up."

If the long article on June 4th is a carefully crafted snapshot, then placing this snapshot on a timeline reveals an even longer trajectory.

In 2023, Anthropic released its Responsible Scaling Policy (RSP). The core commitment of this policy document is that if a model's capabilities exceed the company's safety controls, the company will suspend training on stronger models. This is not just a verbal statement, but an internal governance document with an evaluation framework and triggering conditions. This document was once regarded by the AI safety community as a workable model of "voluntary oversight."

In 2024, CEO Dario Amodei published a widely circulated article suggesting the possibility that "powerful AI" would arrive in 2027. At that time, Anthropic still presented itself as a safe and independent company, maintaining a restrained stance on scaling and accelerating narratives.

On January 26, 2026, Amodei published a 38-page article titled "The Adolescence of Technology" on his personal website. In it, he made a judgment that has been repeatedly cited since: "Because AI is now writing most of the code inside Anthropic, it is already substantially accelerating our progress in building the next generation of AI systems. This feedback loop is building up its strength month by month, and it may be only 1 to 2 years away from the current generation of AI autonomously building the next generation of systems." In the same article, he described the upcoming "powerful AI" as "a genius nation in the data center."

This was almost the starting point for Anthropic to begin systematically signaling that a "self-improvement feedback loop is underway." And the timing of this blog post coincided perfectly with the company's leap from a $350 billion valuation to an even higher valuation range.

Less than a month later, the turning point came.

On February 25, 2026, CNN reported that Anthropic revised its responsible expansion policy, removing its core commitment to "pause training of stronger models if capabilities exceed security controls," replacing it with a non-binding "forward-looking security roadmap." That same week, U.S. Defense Secretary Pete Hegseth issued an ultimatum to Dario Amodei: withdraw from the security red line or lose a $200 million Department of Defense contract.

The report quoted Anthropic's Chief Scientific Officer, Jared Kaplan, in a response to Time magazine: "We believe that stopping training the model doesn't actually help anyone... if the competitors are sprinting at full speed." The wording in this response is particularly noteworthy. "It doesn't help anyone" is not a technical argument, but rather a statement of stakeholder game theory. "If the competitors are sprinting at full speed" is structurally identical to "a unilateral pause will only allow the least cautious player to catch up": it replaces the original pause logic, which used one's own security capabilities as a reference point, with a speed logic, which uses the actions of competitors as a reference point.

In its CNN report, Anthropic emphasized maintaining two red lines: not using AI systems to control weapons systems and not using them for large-scale domestic surveillance. This is important because it shows that Anthropic has not abandoned its security stance entirely, but rather made selective concessions and adherences on different security dimensions. However, this selectivity itself is precisely a core clue in narrative strategy analysis: in which areas has it yielded and in which areas has it held firm? This boundary delineates the scale at which security has been recalibrated.

On March 11, the Anthropic Institute was officially established, led by Jack Clark, and positioned as a "public interest research institution." Less than two months later, on May 4, Clark posted the "60%" message.

Once this timeline is juxtaposed, the signal density and release rhythm are not random. From the personal article preview in January, to the policy changes in February, to the establishment of the organization in March, to the founder probability prediction in May, and then to the official lengthy article in June, this is a narrative pipeline with a clear rhythm and progressively escalating wording. While it cannot be directly concluded that "all of this was pre-planned," this sequence itself presents an analyst with a question: Does this sense of rhythm indicate that Anthropic has incorporated "accelerated narrative" into its public communication management?

Hassabis's deliberate provocation

If Anthropic were the only company adjusting its reporting in the first half of 2026, analysts would have ample reason to focus on the company's internal decision-making logic. However, DeepMind CEO Demis Hassabis made a nearly simultaneous and consistent adjustment, rendering the claim of an "isolated case" untenable.

On January 20th, at the Davos Forum, Hassabis maintained his long-held assessment: there is a 50% probability that AGI will arrive by 2030. Three weeks later, on February 18th, at the India AI Impact Summit, he softened his stance: "AGI could arrive within five years."

From May 20th to 22nd, at Google I/O, Hassabis stated in his keynote address that humanity is standing at "the foot of the singularity." Around the same time, OpenAI released GPT-5.3-Codex, claiming that the model "played a crucial role in creating itself," specifically in assisting with training and debugging, managing deployment, and analyzing and evaluating results. The pace difference among these three leading labs within this timeframe was compressed to the scale of weeks.

Following Google I/O, Hassabis gave an interview to Axios. This interview has been widely cited since, with the most crucial statement being his admission that using language like "the foothills of the singularity" was "deliberately provocative," intended to arouse awareness among governments, economists, and the public of the urgency of accelerating AI development. He also revised his AGI timeline from "shortly after 2030" to "2029 is a real possibility," although it is still widely expected to be around 2030, plus or minus one year.

Hassabis put it more directly to the Seoul Economic Daily: "Five to ten years from now, when we look back at 2026 and 2027, we will say, 'That was the moment we entered the AGI era.'"

The term "deliberate provocation" deserves careful consideration. It is a rare, candid confession of narrative intent made by the person involved. It acknowledges that at least some of the wording he used was not a passive reflection of technological facts, but rather an active choice of communication tools. This confession itself does not deny that he may have indeed seen the technological inflection point, but it clearly elevates "narrative" from the shadow of "facts," making it an object that can be examined independently.

Hassabis's self-explanation of his wording opened a side door to interpreting this round of synchronized signals. His "deliberate provocation" and Anthropic's "footnote disclaimer" in his lengthy data argument demonstrate the same amphibious stance: on the one hand, pushing out signals that can shake public opinion, and on the other hand, retreating to the safe space of "this is only one possibility".

The same set of data, completely different interpretations

While Anthropic and DeepMind jointly construct a narrative framework that "AI is accelerating its self-evolution," independent researchers outside offer alternative interpretations of the same set of data and phenomena. These interpretations are important not because either side possesses the ultimate truth, but because they expose the vast extent of interpretability inherent in the official narrative itself.

The most pointed response came from Eliezer Yudkowsky. He not only replied to Jack Clark but also continued to speak out on multiple subsequent occasions. MindStudio's blog documented his full stance: he used the Chernobyl RBMK reactor as an analogy for the safety design of current AI systems. The core argument of this analogy is that if the control lever and the accelerator are tied to the same system, the system will actually go out of control much faster when you try to decelerate.

Nathan Lambert of the Allen Institute for AI proposed the concept of "Lossy Self-Improvement" (LSI). His argument poses a direct challenge to the "accelerator flywheel" model: as a system becomes increasingly complex, each generation of improvements introduces friction and losses, much like signals attenuate over long distances. According to this logic, improvements that make it possible for 80% or 90% of the code to be written by AI cannot be infinitely replicated in the next generation of systems, because the next generation will face a more complex problem space, and the noise and errors inherent in the AI's output will be amplified across generations.

Dean Ball, a senior fellow at the Foundation for American Innovation, offered a more direct framework, reducing the dimensionality of Anthropic's data. He told IEEE Spectrum, "Maybe eventually they'll automate genius, but not next year. Next year they'll automate the drudgery." This distinction hits the core ambiguity of the claim that "80% of code is written by AI." If AI automates fixed-pattern parts of the codebase, the batch generation of parameters, and end-to-end pipeline configuration, then these tasks do indeed correspond to "drudgery" in a software engineering context. The remaining 20% likely includes architectural design, directional judgment, and trade-offs based on incomplete information—these are the genius parts.

David Scott Krueger of the University of Montreal, founder of Evitable, an AI safety nonprofit, has proposed that the red line for triggering a pause is "99% of the code is written by AI." He told IEEE Spectrum, "I think we may be crossing that line right now." The tension between his framework and Anthropic's own loosened commitment to a pause is one of the most important structural contradictions in this narrative.

UBC computer scientist Jeff Clune, in an interview with IEEE Spectrum, took a different approach. He said, "We are at an inflection point in recursive self-improving systems." If his statement proves true, it means that Yudkowsky's warning was sounded at the right time.

Four voices, each with its own direction, and even within the same direction, there is internal strife among radical factions. What they have in common is that none of them rely on an official narrative framework; instead, each offers independent judgments on the same set of phenomena based on its own methodology. The diversity and conflict of these judgments themselves are the most powerful rebuttal to the notion that "any single narrative is sufficient to cover the entire truth."

Coupling of valuation curve and narrative rhythm

In January 2026, Anthropic completed its funding round, valuing the company at $350 billion. Investors included Microsoft and Nvidia. This figure had already been hyped by some media outlets at the end of 2025, but its official release coincided with the publication of "The Adolescence of Technology" by Amodei.

In February, another $30 billion funding round was completed, maintaining a valuation in the range of approximately $350 billion. That same month, security policy was revised, removing the suspension commitment. The threat of a $200 million Pentagon contract fell through.

In May, Reuters, The New York Times, and TechCrunch almost simultaneously reported that Anthropic had completed a $65 billion funding round, valuing the company at $965 billion. This figure not only surpassed its valuation two months prior but also exceeded OpenAI's $852 billion valuation in March 2026. The New York Times also quoted Dario Amodei at a developer conference, stating that the company's annualized revenue had reached $30 billion, with Amodei even jokingly saying, "I hope the 80-fold revenue growth this year doesn't continue, because that would be too crazy."

On June 4, the Anthropic Institute published a lengthy article titled "When AI builds itself".

Simply listing these time points doesn't imply the existence of a precise arrow pointing on a chart. If someone claims a causal relationship between these events, direct evidence must be provided. No analyst can and should make such an assertion without an internal record of decision-making.

On the other hand, completely ignoring and failing to observe and record the corresponding relationships between these time points is equally unreasonable. A company's valuation nearly tripled from $350 billion to $965 billion in just five months, coinciding with a major shift in security policy, the construction of an "acceleration signal" narrative pipeline led by an independent research institution, and a 60% probability prediction from its co-founder. When all these events are compressed into a mere six months, investors have at least the right to ask: Do these signals, and to what extent, fulfill the function of conveying the message "we are at the forefront of acceleration" to the market?

The value of analysis lies precisely in this questioning itself. There may never be just one answer. But once a question is clearly posed, it cannot be easily withdrawn.

Global AI funding reached $297 billion in the first quarter of 2026, with the top five deals accounting for a significant share of this total. At this level, all cutting-edge labs face the same pressure: you need to convince investors that your technology curve will be steeper than your competitors'. Your risk warnings must be loud enough that your voice is pre-embedded in the policy framework when regulators finally step in to set the rules. Your narrative must also be compelling enough to attract top researchers to your lab and alarming enough to maintain your remaining voice within the safety community.

These demands are inherently contradictory. Anthropic's narrative adjustments in the first half of 2026 can be seen as a recalibration of the linguistic balance of these conflicting demands. The weakening of security commitments, the strengthening of accelerator signals, and the repeated use of the argument that "we cannot unilaterally stop" together constitute a vector pointing in the same direction.

The signal was sent, and then

We need to get back to the core question: Are these signals more like a reflection of a technological inflection point, or a rhetorical upgrade geared towards capital and regulation?

The existing public evidence doesn't allow for a simple checkmark between the two options. This is because the evidence used for both interpretations is, in fact, the same set of data. An 80% code share, a 52x speedup, and a doubling of task time every four months can be used to support the idea that "an inflection point is approaching," or to explain that "we are conveying to the market a trend perception that our own technical staff have personally experienced." The boundary between these two is blurred.

But some facts are certain, and there is no need to choose sides between the two interpretations.

First, Anthropic's narrative shift in the first half of 2026 is not an isolated case. DeepMind's Hassabis made adjustments in almost the same quarter, with varying degrees but essentially the same direction. OpenAI's Sam Altman said at the India Summit that "the world is not ready," and in February 2026, he released GPT-5.3-Codex, claiming it "played a key role in creating itself." If this were just Anthropic releasing signals alone, it might be analyzed from a corporate strategy perspective. However, the simultaneous increase in the voice of three leading labs within a few months constitutes an industry-wide narrative shift.

Second, there is a precisely traceable temporal correspondence between the rhythm of these signals and the pace of financing, policy adjustments, and institutional restructuring. This correspondence itself doesn't need to prove anything; it simply needs to be presented honestly. Once presented, each person's inherent methodology will determine their subsequent thinking.

Third, Anthropic itself labels the third scenario, namely "fully recursive self-improvement," as "possible," rather than "very likely." This means that within the internal judgment framework of the company that released this data, their accelerated narrative is not yet fully closed. The forces that make them habitually add qualifiers in academic papers and blog writing are still holding the reins on their public language.

Fourth, Hassabis's confession of "deliberate provocation" confirms a mechanism that, while widely suspected, has rarely been spoken about by those involved: at least some leaders of cutting-edge laboratories choose their wording with a clear communicative purpose. This necessitates that all interpretations of their statements include two levels of analysis: the facts they claim, and the rhetorical strategies they employ in choosing these claims, as a behavioral event in themselves.

Those who carefully read through Anthropic's entire data set received completely different signal strengths compared to those who only remembered the numbers "80% of the code was written by AI" and "52x speedup." However, in this case, "how it's remembered" should perhaps be the more important subject of analysis than "what was actually said."

This lengthy article itself is a precise example of the phenomenon it describes. It constructs a sense of imminent acceleration with data, while retaining room for retreat with footnotes and qualifiers; it calls for global coordination and verifiable slowdown, yet has already withdrawn its commitment to a pause in previous policy revisions. This is not hypocrisy, nor simply a matter of words versus deeds. It is an institution's narrative balancing act between technological uncertainty, commercial pressure, and public responsibility. Hassabis's "deliberately provocative" confession, in fact, confirms from a side angle that this balancing act has become a consciously used method within leading laboratories.