Two $20 billion deals: OpenAI and Nvidia are waging a "war of inference".

Written by: xiaopi

In December 2025, Nvidia quietly spent $20 billion to acquire an AI chip company called Groq.

On April 17, 2026, OpenAI announced that it would purchase more than $20 billion worth of chips from another AI chip company, Cerebras. On the same day, Cerebras officially filed for an IPO on Nasdaq, targeting a valuation of $35 billion.

The two payments were almost identical in amount. One was for an acquisition, and the other for a purchase. One came from the world's largest AI chip seller, and the other from the world's largest AI buyer.

These are not two separate events; they are two symmetrical actions within the same war. The battlefield is called: AI Reasoning.

Most people haven't noticed this war. Because it's silent, with only lines of financial announcements and technical discussions circulating among Silicon Valley engineers. But its impact may be more profound than any AI launch event in the past two years—because it's reshuffling control of a market that is almost certain to become the largest tech market in history.

What is reasoning, and why is "training" no longer the key word for 2026?

Before discussing the two 20 billion figures, it's necessary to understand a background: the battleground for AI chips is undergoing a shift in focus.

Training and inference are two stages of AI computing power consumption. Training is about building a model—feeding massive amounts of data to a neural network so it learns a certain ability. This process usually happens only once, or is updated periodically. Inference is about using the model—each time a user asks a question and ChatGPT provides an answer, there is an inference request behind it.

In 2023, the majority of global AI computing power spending was on training, while inference played a secondary role.

But this ratio is rapidly reversing.

According to market research data from Deloitte and CES 2026, inference already accounted for 50% of all AI computing power spending in 2025; in 2026, this proportion will jump to two-thirds. Lenovo CEO Yang Yuanqing put it more bluntly at CES: the structure of AI spending will be completely reversed from "80% training + 20% inference" to "20% training + 80% inference".

The logic isn't complex. Training is a one-time cost, while inference is an ongoing cost. GPT-4 was trained once, but it has to answer questions from hundreds of millions of users every day, and each conversation is an inference request. After large-scale deployment, the cumulative cost of inference far exceeds that of training.

What does this mean? It means that the most profitable part of the AI industry is shifting from "training chips" to "inference chips." And these two types of chips require drastically different architectural designs.

Nvidia's problem: Chips designed for training are inherently poor at inference.

NVIDIA's H100 and H200 are monsters designed for training. Their core advantage is their extremely high computational throughput—training requires a large number of multiplication operations on massive matrices, and GPUs excel at this kind of "multi-core parallel computing".

However, the bottleneck for inference is not computation, but memory bandwidth.

When a user submits a question, the chip needs to "move" the weights of the entire model from memory to the computing unit before it can generate an answer. This "moving" process is the real source of inference latency. NVIDIA's GPUs use external high-bandwidth memory (HBM), and this moving step inevitably introduces latency—for ChatGPT, which processes tens of millions of requests per second, this latency, multiplied by the scale, becomes a real performance bottleneck.

When OpenAI's internal engineers noticed this issue, they were optimizing Codex (a code generation tool) and found that no matter how they tuned the parameters, the response speed was limited by the architecture of NVIDIA GPUs.

In other words, Nvidia's disadvantage in inference is not a matter of effort, but a matter of architecture.

Cerebras' WSE-3 chip took a completely different approach. This chip, so large it required wafer-level packaging—46,255 square millimeters, larger than a human hand—integrated 900,000 AI cores and 44GB of ultra-high-speed SRAM onto a single silicon chip. The memory was placed directly next to the computing cores, reducing the "transportation" distance from centimeters to micrometers. The result: inference speeds 15 to 20 times faster than Nvidia's H100.

It's worth noting that Nvidia isn't sitting idly by. Its latest Blackwell (B200) architecture, offering four times the inference performance of the H100, is being deployed on a large scale. However, Blackwell is targeting a moving target—Cerebras is also iterating its technology at the same time, and the competition in the chip market is no longer limited to Cerebras alone.

Nvidia's $20 billion deal: A confirmation letter behind the largest acquisition in history

On December 24, 2025, Nvidia announced the largest acquisition in its history.

The target is Groq.

Groq was a competitor of Cerebras, also featuring an SRAM architecture chip optimized for inference—called the LPU (Language Processing Unit), which was the world's fastest inference chip service in public reviews at the time. Nvidia spent $20 billion to acquire Groq's core technology and founding team, including founder Jonathan Ross and several top chip engineers from Google's TPU team.

This is Nvidia's largest acquisition since its $7 billion purchase of Mellanox in 2019, tripling its size.

Many analysts believe that the message behind this money is far more important than the amount: Nvidia believes it has a structural gap in its inference capabilities, and that this gap is large enough to warrant spending 20 billion to plug it.

If Nvidia truly believed its GPUs were unbeatable in inference, it wouldn't have needed to acquire Groq. This acquisition was essentially a $20 billion technology purchase order—acknowledging the real technological advantage of SRAM embedded architecture in inference scenarios, admitting that Nvidia's existing product line couldn't naturally cover this advantage, and buying a technological gap it couldn't fill on its own at the highest price.

Of course, Nvidia's official narrative after the acquisition was different—"Deep integration with Groq to provide a more complete inference solution." The technical translation is: We realized our own solutions weren't enough, so we bought someone else's.

OpenAI's $20 billion investment: Buying chips is just the surface, the real key is equity investment.

Now let's go back to OpenAI.

In January 2026, OpenAI and Cerebras signed a three-year computing power purchase agreement worth $10 billion—at the time, media reports focused on "OpenAI is diversifying its chip suppliers," downplaying the deal.

However, the latest details revealed on April 17 have fundamentally changed the nature of this matter:

First, the procurement amount doubled from 10 billion to 20 billion.

Second, OpenAI will acquire warrants in Cerebras, and as the scale of procurement increases, its shareholding can reach up to 10% of Cerebras' total share capital.

Third, OpenAI will also provide Cerebras with $1 billion in funding for data center construction—in other words, OpenAI is helping Cerebras build a factory.

Putting these three details together paints a completely different picture: OpenAI is not just buying chips, OpenAI is incubating a supplier.

This logic has clear precedents in technological history. In 2006, Apple began collaborating with Samsung to customize A-series chips. Initially, it was a large-scale procurement agreement, but as Apple deepened its involvement and eventually developed its own M-series chips, control of the supply chain completely shifted from Intel and Samsung to Apple itself. OpenAI's approach is somewhat similar—but with a crucial boundary: Apple has controlled chip design from the beginning, while OpenAI remains the purchaser, and Cerebras will develop independently after its IPO, serving more customers. The end result of this path may not be OpenAI completely controlling Cerebras, but more likely, the establishment of a deeply interdependent ecosystem between the two companies.

On the one hand, OpenAI is using a 20 billion dollar investment and stake in Cerebras to secure a continuous supply of non-Nvidia inference computing power; on the other hand, OpenAI is collaborating with Broadcom to develop its own ASIC chips, with mass production expected by the end of 2026. This two-pronged approach aims to achieve self-sufficiency in computing power.

Cerebras went public today, what did you buy?

On April 17, Cerebras officially filed for an IPO on Nasdaq, targeting a valuation of $35 billion and planning to raise $3 billion.

This valuation represents a more than fourfold increase from its $8.1 billion valuation in September 2025. It just completed a new round of financing in February of this year, at which time its valuation had already risen to $23 billion, representing a 52% premium over its IPO target of $35 billion.

Those familiar with Cerebras' history know that this is its second attempt at going public. The first time, in 2024, the IPO was forced to be withdrawn because its core client, G42 (the UAE's sovereign technology investment fund), accounted for 83% to 97% of that year's revenue. CFIUS intervened on national security grounds and the IPO was withdrawn.

This time, G42 has disappeared from the shareholder list and has been replaced by OpenAI.

In other words, Cerebras' structural problem of customer concentration has not been fundamentally resolved—the names of major customers have changed, but the reliance on them remains. The judgment investors need to make is: is this major customer better or worse? From a credit perspective, OpenAI is clearly superior to G42; from a strategic perspective, OpenAI is also an incubator of Cerebras' competitors—once its self-developed ASIC matures, it will pose a real replacement threat to Cerebras.

To be fair, Cerebras is also actively expanding its customer base, and the prospectus is expected to list more diversified revenue streams, leading to improved market concentration. However, the answer to this question remains until OpenAI's self-developed chips enter mass production.

By buying Cerebras stock, you are essentially betting that OpenAI will continue to choose Cerebras, and that OpenAI's self-developed ASICs will not arrive ahead of schedule. Neither of these are certainties.

Of course, the bullish arguments are also valid: if the inference market grows as projected, even a small share for Cerebras would be substantial. The question isn't whether Cerebras has a chance, but whether the $35 billion price tag reflects the most optimistic scenario.

The two 20 billion figures appear symmetrically between the end of 2025 and April 2026.

A seller of AI chips from the world has acquired the technology of a competitor in the inference market.

A company from the world’s largest AI buyer has incubated a company that challenges Nvidia in the inference market.

Nvidia's $20 billion investment is defensive—it uses the most expensive price to plug a technological gap that it couldn't fill on its own.

OpenAI's $20 billion investment is an offensive move—it's burning through cash to build an inference highway that doesn't rely on Nvidia, while simultaneously acquiring stock options for a tollbooth along that highway.

This war is fought without gunfire, but the flow of funds never lies. Two sums of money tell you more clearly than any AI press conference: control of AI inference infrastructure is being fought over. And this market will account for two-thirds of the industry's total computing power expenditure by 2026.

Cerebras' IPO was the starting point of this battle.