Computing power is being re-centralized: After DeepSeek's price cut, who will control the AI infrastructure?

—Starting with Gonka's speech at LA Hacks 2026

On April 26, DeepSeek released new pricing for its V4 series APIs: the price for input cache hits across the entire series dropped to one-tenth of the initial launch price, and with a limited-time discount on the Pro version, the processing cost for one million tokens was as low as 0.025 yuan—nearly a hundred times cheaper than a year ago. A-share listed computing power stocks collectively hit their daily limit that day, and market sentiment was electric.

But behind the cheers, there's a problem no one's addressing directly: as models become cheaper, the computing power required to run them is becoming increasingly centralized.

The data doesn't lie. In the fourth quarter of 2025, the combined capital expenditures of the four cloud vendors—Microsoft, Amazon, Meta, and Google—increased by 64% year-on-year to $118.6 billion; total capital expenditures for the full year of 2026 are projected to further increase by 53% year-on-year, reaching $570.8 billion. Google also raised its 2026 TPU chip shipment target by 50% to 6 million units during the same period. Delivery times for Nvidia's H100 series have been several months in some markets.

Pricing power at the model layer is shifting towards developers, but control at the computing power layer is consolidating in the hands of a few giants at an even faster pace. This is a hidden but profound contradiction in the AI era.

Against this backdrop, on April 24, 2026, Gonka protocol co-founders Daniil and David Liberman took to the keynote stage at LA Hacks 2026. This year's keynote speaker at UCLA's largest annual university hackathon, the Liberman brothers, addressed hundreds of top engineers about to enter the industry. The question they posed was particularly clear at that moment: Is decentralized computing power still feasible?

I. The Other Side of the Price Cut Wave

The price reduction logic of DeepSeek V4 appears to be the efficiency dividend brought about by technological advancements—the new attention mechanism compresses the token dimension, and combined with DSA sparse attention, it significantly reduces the demand for computing power and GPU memory. However, the continued occurrence of price reductions depends on the premise that computing power in a certain location is sufficient and cheap enough.

The reality is that this "sufficient" source of computing power is rapidly converging on a few nodes globally. Michael Hurlston, CEO of Lumentum, a leading optical communications company, recently stated that, based on current trends, the company's production capacity by 2028 is almost entirely sold out. This is not a predicament for an isolated company, but rather a collective strain on the entire AI infrastructure supply chain in the face of rapidly expanding demand.

In his LA Hacks talk, Daniil used a simple yet powerful analogy: the Bitcoin network's computing power already exceeds the combined computing power of Google, Microsoft, and Amazon's three cloud data centers—but what is this computing power doing? Solving a hash puzzle that no one needs the answer to. The same is true for the world's idle GPU computing power: the graphics cards in gamers' machines, the servers in university computer labs, and the spare capacity held by small and medium-sized cloud service providers, all combined to a massive scale, yet unable to be used for AI inference due to a lack of coordination mechanisms.

Gonka is trying to solve this coordination problem—using a proof-of-work incentive mechanism to organize idle GPUs scattered around the world into a network capable of undertaking real AI inference tasks.

II. Reasoning is the new battleground

DeepSeek's price cut has sparked widespread discussion about "AI equality" on the Chinese internet. However, one detail has been overlooked: the price reduction applies to "invocation prices," not "computing costs." As AI applications scale up, the growth in inference calls is exponential—according to industry forecasts, by 2026, inference will account for about two-thirds of global AI computing power consumption.

What does this mean? For every order-of-magnitude reduction in call price, the actual total computing power required will only increase, not decrease. The "democratization" of large-scale models, to some extent, accelerates the centralization of computing power—because only players with massive computing power can sustain the operation of inference services with extremely low profit margins.

This is an emerging structural lock-in: whoever controls the physical computing power on the inference side controls the true infrastructure gateway to the AI era. From this perspective, the significance of decentralized computing networks is no longer just about cost optimization of "50% cheaper," but about providing a structural alternative path before centralized lock-in is complete.

III. A Real Test for Young Builders

The participants in LA Hacks—engineers and product managers from top California universities—will soon face a less-than-romantic engineering choice: on which layer of computing power to build their products.

Whose server does your AI product use for inference?

If that platform adjusts its pricing strategy or access policy, do you have the ability to migrate?

Is the user base you help build creating value for yourself, or is it providing leverage to the platform?

These problems were already experienced by developers in the Web2 era: when the fate of an application is deeply tied to platform algorithms or distribution rules, "independence" becomes a word that needs to be redefined at any time. The computing power dependence in the AI era will replicate the same logic at the infrastructure layer, and because the switching cost is higher, the lock-in effect will only be stronger.

Hackathons, as a format, have an inherent irony: building something working within 36 hours with minimal resources and maximum speed—this is precisely the state that decentralized network incentive mechanisms strive for. When Daniil took the stage at LA Hacks, he wasn't just talking about Gonka; he was more like asking the group: what you're doing in the future—are you accelerating this trend of centralization, or are you creating new possibilities?

IV. PoW 2.0: An Engineering Proposition

Gonka redirects the incentive structure of Proof-of-Work from hash computation to AI inference, enabling nearly 100% of the network's computational power contribution to directly correspond to real-world tasks. This mechanism has a key engineering requirement: the AI inference task must be verifiable and reproducible—given the same model weights, the same random seed, and the same input, any node can reproduce the computation results and verify their validity. This is the core engineering challenge that propelled Gonka from an academic prototype to a working network.

From an economic perspective, the significance of this mechanism lies in the fact that the value of the token is naturally anchored to the cost of physical computing power, rather than liquidity sentiment. Miners who contribute computing power are rewarded, and developers who use computing power pay fees. The incentive loop of the entire system does not rely on the goodwill of any intermediary.

Of course, technical feasibility is only part of the story. A more challenging question is: in an era of rapidly growing computing power demands and major players spending tens of billions of dollars, can a distributed computing network organized through spontaneous community contributions truly compete on scale?

Gonka's early data provides a benchmark: less than a year after its mainnet launch, the network's aggregate computing power expanded from 60 H100 equivalents to over 10,000, a growth achieved through the spontaneous integration of hundreds of independent nodes globally, rather than centralized allocation. This doesn't prove the scaling issue has been resolved, but it demonstrates that incentive mechanisms effectively drove early growth.

V. Issues related to the window period

Historically, dominance in infrastructure has tended to converge rapidly in its early stages—this was true in the railway era, the internet era, and the mobile internet era. Each time, some have found their place before standards are solidified, while others only realize their participation has significantly narrowed after centralization is complete.

Where is AI computing infrastructure currently located? Looking at the four major cloud vendors' projected capital expenditure of $570.8 billion in 2026, centralization is accelerating; however, from the perspective of developers' actual usage patterns, there is still a large amount of unintegrated resources on the supply side. This gap is where decentralized networks can structurally exist.

In his speech, Daniil cited a contrast: after the dot-com bubble burst in 2000, what remained was not ruins, but a global fiber optic network that supported the operation of the digital economy for the next two decades. After the AI infrastructure investment boom subsides, the computing power protocols and incentive mechanisms that have been established will become the infrastructure for the next cycle—the only question is which protocols have robust enough underlying logic to remain functional under pressure.

This isn't a question about a specific project, but rather a problem the entire decentralized AI sector needs to confront: Can governance design truly resist the erosion of single-point control? Will incentive mechanisms remain effective after scaling up? Is the decentralization of the computing network valid simultaneously across the three dimensions of the technical execution layer, token issuance layer, and upgrade decision-making layer?

Conclusion

DeepSeek's price cut has reignited the narrative of "AI democratization." However, democratized inference calls and democratized computing infrastructure are two different things. The former is already happening; whether the latter can happen depends on how many people in the next few years truly treat it as a worthwhile engineering problem to solve, rather than just a nice-sounding narrative.