Say Goodbye to "Computing Power Silos": AI Training is Breaking Free from the Shackles of Centralization

Author: Egor Shulgin | Co-founder of the Gonka Protocol, former AI algorithm engineer at Apple and Samsung

For years, the most powerful AI systems have been confined to closed "black boxes"—massive data centers controlled by a few tech giants. In these facilities, tens of thousands of GPUs are crammed into the same physical space, tightly connected by ultra-fast internal networks, enabling large models to be trained in a highly synchronized system.

This model has long been considered a technological "inevitability." However, the reality is becoming increasingly clear: centralized data centers are not only costly and risky, but are also reaching their physical limits. The growth rate of large language models is exponential, and systems trained just a few months ago are already obsolete. The question now is no longer simply whether "power is too centralized," but whether centralized infrastructure can keep pace with the evolution of AI at the physical level.

The Shadow Behind the Prosperity: The Centralized "Physical Ceiling"

Today's most cutting-edge models are already squeezing every last drop of potential from top-tier data centers. Training a more powerful model often means building a new server room from scratch or undertaking a radical upgrade of existing infrastructure. Meanwhile, co-located data centers are facing the limits of power density—a significant amount of energy is being wasted not on computing, but on cooling systems designed to prevent the silicon wafers from burning out. The result is clear: the ability to train top-tier AI models is locked into the hands of a very few companies, and highly concentrated in the US and China.

This centralization is not only an engineering challenge but also a strategic threat. The acquisition of AI capabilities is being severely restricted by geopolitics, export controls, energy rationing, and corporate interests. As AI becomes the cornerstone of economic productivity, scientific research, and even national competitiveness, reliance on a very small number of centralized hubs is turning infrastructure into its most vulnerable "Achilles' heel."

But what if this monopoly is not inevitable, but merely a "side effect" of our current training algorithms?

The overlooked communication bottleneck: the implicit limitations of centralized training

Modern AI models are so massive that they cannot be trained on a single machine. Basic models with hundreds of billions of parameters require countless GPUs to work in parallel, and their progress needs to be synchronized every few seconds, with such synchronizations occurring millions of times throughout the entire training cycle.

The industry's default approach is "co-located training": stacking thousands of GPUs together and connecting them with specialized, expensive network hardware. This network ensures that each processor is aligned in real time, guaranteeing that model copies are perfectly synchronized during training.

This approach is highly effective, but it comes with extremely stringent prerequisites: it requires a high-speed intranet, physical proximity, an extremely stable power supply, and centralized operational control. Once training needs cross physical boundaries—across cities, borders, or continents—the system will crumble. The speed of a regular internet connection is orders of magnitude slower than that of a data center intranet. With current algorithms, high-performance GPUs spend most of their time in standby, waiting for synchronization signals. It is estimated that training modern large models using standard internet connections would stretch the training cycle from months to centuries. This is why such attempts were previously considered almost fanciful.

Paradigm Shift: When "Reducing Communication" Becomes the Core Algorithm

The core assumption of traditional training models is that machines must communicate after every tiny step of learning.

Fortunately, a technology called "Federated Learning" has brought an unexpected turning point. It introduces a highly disruptive idea: machines don't need to communicate all the time. They can work independently for longer periods, synchronizing only occasionally.

This insight evolved into a broader set of techniques known as “federated optimization.” Among these, the “low-frequency communication” approach stands out. By allowing more local computation between synchronizations, it enables training models on geographically dispersed, low-bandwidth distributed networks.

DiLoCo: The Dawn of Global Distributed Training

This technological leap was embodied in the development of DiLoCo (Distributed Low-Communication Training).

DiLoCo no longer requires real-time synchronization, but instead allows each machine to train locally for extended periods before sharing updates. Experimental results are encouraging: models trained using DiLoCo achieve performance comparable to traditional highly synchronized models, but with communication requirements reduced by hundreds of times.

Crucially, this makes training outside of controlled data centers feasible. Open-source implementations have demonstrated that large language models can be trained in a peer-to-peer (P2P) environment over standard internet connections, completely eliminating reliance on centralized infrastructure.

This inspiration, originating from DeepMind researchers, has been adopted by institutions such as Prime Intellect for training models with billions of parameters. What began as a research concept is evolving into a pragmatic path for building top-tier AI systems.

Industry Transformation: The Redistribution of Computing Power

This shift from "centralized" to "distributed" is significant far beyond just improving efficiency.

If large models can be trained on the internet, AI development will no longer be the exclusive privilege of the elite. Computing power can be contributed from all over the world, provided by different participants in diverse environments. This means:

Large-scale cross-border and cross-institutional collaboration has become possible;
Reduce reliance on a few infrastructure providers;
Enhance resilience in the face of geopolitical and supply chain fluctuations;
A wider range of people can participate in the construction of AI foundational technologies.

Under this new model, the power center of AI is shifting from "who owns the largest data center" to "who can most effectively coordinate global computing power".

Building an open and verifiable AI infrastructure

As training becomes distributed, new challenges arise: trust and verification. In open networks, we must ensure that computational contributions are genuine and that the model has not been maliciously tampered with.

This has spurred a strong interest in cryptographic verification methods. Several emerging infrastructure projects are putting these ideas into practice. For example, Gonka —a decentralized network designed specifically for AI inference, training, and verification. Instead of relying on a centralized hub, Gonka coordinates the computing power of independent participants, using algorithmic verification to ensure the authenticity and reliability of contributions.

This type of network perfectly aligns with the core of "low-communication training": reducing reliance on high-speed private infrastructure and emphasizing efficiency, openness, and resilience. In this context, decentralization is no longer an ideological label, but an inevitable result at the engineering level—because algorithms no longer need to be constantly synchronized.

Another way out

The history of AI training has been constrained by the physical limitations of communication. Over the years, progress has depended on reducing the physical distance between machines.

But the latest research tells us that this is not the only way. By changing how machines collaborate—communicating less, not more—we can certainly cultivate powerful models on the global internet.

As algorithms evolve, the future of AI may no longer depend on where computing power is located, but on how it is intelligently connected. This shift will make AI development more open and resilient, and ultimately free it from the shackles of centralization.