The underlying logic of bottleneck transmission in the AI computing power industry chain

Author: qinbafrank

In February, in the article " What Does This War of Capital Expenditure Mean? ", we discussed how key links in the computing power industry chain can still extract the greatest value: chips , packaging and testing, storage, optical modules, etc. Those whose production capacity is not easy to expand rapidly and those with extremely high moats will enjoy the benefits of huge capital expenditures.

There is still significant room for efficiency optimization : distillation, quantization, MoE, dedicated chips, liquid cooling, and nuclear fusion (in the long term) at the inference end could further reduce energy consumption and cost per unit of computing power by 10–100 times. Opportunities should be sought in these areas.

Recently, several investment banks, including Morgan Stanley, JPMorgan Chase, Bank of America, Goldman Sachs, UBS, Citigroup, Bernstein, and HSBC, have released updated reports related to AI, semiconductors, power, and storage. The bottleneck in AI hardware has expanded from the single dimension of "GPU supply" to a collective shortage across five dimensions: power, chips, storage, equipment, and materials .

The demand for AI has exceeded all prediction ranges of traditional power planning, semiconductor equipment capacity, storage price models, and robot installation assumptions .

Morgan Stanley's global thematic research review points out that the weekly consumption of large language model tokens globally surged from 6.4 trillion to 22.7 trillion in three months, an increase of 2.5 times. The US data center power shortage for 2025-2028 is projected at 55 gigawatts. JPMorgan Chase's first coverage of high-performance computing project bonds for data centers directly indicates a 122 gigawatt financing gap over the next five years. The US five-year power plan has surged from 101 gigawatts to 230 gigawatts, with 44% of new projects waiting more than four years to connect to the grid. Bank of America's latest target price report for Alphabet directly revised its 2026 capital expenditures upwards to $181.5 billion, doubling year-on-year, while free cash flow decreased by 62% year-on-year. These three sets of data are not from the same framework, but rather independent profiles from three independent institutions using different research paths.

The evolution of bottlenecks in the semiconductor industry chain (especially in the field of AI computing power) follows a clear sequential progression: "computing (GPUs) → storage (HBM, etc.) → optical interconnects → power/liquid cooling." This is the industry consensus for 2025-2026. As AI training/inference clusters expand from single racks (dozens of GPUs) to ultra-large scales (thousands to hundreds of thousands of GPUs), solving the bottleneck in one link will immediately expose the next physical/supply chain constraint, forming a "Leontief-like" complementary constraint (the absence of any one of them will prevent shipment).

It is necessary to understand why this evolution occurred, the current situation, and the underlying physical/engineering reasons:

1. First-stage bottleneck: GPU computing (dominantly driving growth from 2022-2024) Core limitations:

High-end GPUs (such as NVIDIA Hopper H100 → Blackwell B200 → Rubin) have their own wafer capacity and advanced packaging.

Why is it a bottleneck? Large AI models require massive parallel computing, and TSMC's 4nm/3nm/2nm logic process + CoWoS (2.5D/3D packaging) capacity has become the biggest bottleneck. Even if there are enough front-end wafers, if the back-end's ability to stack and package logic chips + HBM cannot keep up, the entire GPU cannot be produced.

Mitigation measures: TSMC is significantly expanding its CoWoS capacity (doubling production capacity in 2024-2025), and NVIDIA Blackwell has already begun large-scale shipments. However, this only unlocks the "computing" aspect, and new problems immediately emerge.

2. Second-stage bottleneck: Storage (HBM high-bandwidth memory, becoming the most scarce resource in 2024-2025)

Key limitations: HBM3/HBM3e/HBM4 production capacity.

Why has relay become a bottleneck? While GPU computing power has increased, the number of model parameters has exploded (trillions or even tens of trillions of parameters), making data transfer (memory bandwidth) a "memory wall." HBM can transfer several terabytes of data per second, more than 20 times faster than conventional DDR memory. Because HBM is located close to the logic chip, data does not need to be transferred over long distances, thus saving energy.

A single B200 GPU requires 192GB+ HBM3e, and the total HBM volume in a single rack (NVL72) reaches 30-40TB, with bandwidth requirements far exceeding those of traditional DRAM.

Supply chain status: Only SK Hynix, Samsung, and Micron can mass-produce HBM. The process is complex (Through Silicon Vias (TSV) + stacking). All of them were sold out in 2025, and demand still exceeds supply in 2026, with prices soaring by 246% year-on-year. Even if GPU chips are ready, they cannot be assembled and delivered without HBM, causing delays in the deployment of the entire AI cluster.

Result: Storage has transformed from a "commodity" into a strategic bottleneck, with storage accounting for up to 30% of capital expenditure.

3. The third-stage bottleneck: optical interconnects (currently transitioning in 2025-2026)

Core limitations: The physical limitations of copper cables (NVLink/NVSwitch) in terms of bandwidth, distance, power consumption, and weight.

Why the inevitable shift to optical fiber: While copper cabling can still be used within a single rack (72 GPUs), scaling to multiple racks or even interconnecting thousands of GPUs presents significant challenges. Copper cabling suffers from severe attenuation (effective distance <1 meter at 1.8TB/s bandwidth), excessive weight (over 5,000 copper cables in an NVL72 rack, totaling 1.36 tons), and high power consumption (replacing copper cabling with pluggable optical modules would consume an additional 20,000 watts). Signal integrity, latency, and heat dissipation are also insufficient to support larger clusters.

Solution: Shift to optical interconnects (CPO co-packaged optics + silicon photonics technology). The optical engine is directly packaged next to the GPU/ASIC, using optical fiber for scale-out, resulting in higher bandwidth density, lower power consumption per bit, and longer distances.

NVIDIA is making a major bet at GTC 2026, having already invested in optics companies, with demand for 800G/1.6T optical modules expected to surge. Lite, Broadcom, Coherent, and Ayar Labs are among the new winners.

Current progress: Copper cables have reached their limit, and optical interconnects are changing from "optional" to "mandatory," breaking through the performance ceiling of AI data centers.

4. Fourth-stage bottleneck (currently at the forefront): Power + Liquid cooling (becoming the final physical constraint from 2026). Core limitations: Power consumption wall + Thermal wall + Grid access.

Why is it the ultimate bottleneck? Each GPU has increased from 300W to 700-1200W, and a single server rack has skyrocketed from 10-20kW (in the CPU era) to 120-200kW+ or even higher. Traditional air cooling has a physical limit of only 20-50kW, and its noise, airflow, and energy consumption are unacceptable.

On the power side: Data centers require gigabyte-level power supplies, and grid connection queues can last for several years, extending the delivery cycle of equipment such as transformers and solid-state transformers to 100 weeks. Microsoft's CEO once bluntly stated, "We have GPUs, but no power outlets."

On the liquid cooling side: a switch to Direct-to-Chip (DTC) or immersion liquid cooling is necessary, combined with microfluidics, cold plates, and other technologies. TSMC has already demonstrated silicon-based liquid cooling on its CoWoS platform, supporting a TDP of >2.6kW. Liquid cooling/thermal management vendors such as Vertiv (VRT) are becoming the new core of the infrastructure.

The chain reaction is that the PUE (Power Usage Effectiveness) requirement is <1.2, and waste heat recovery and grid connection of nuclear power/new energy sources have become new topics. Even if all the previous links are solved, without electricity and cooling, the server racks cannot be installed and put into operation.

The fundamental logic behind the bottleneck shift in the AI computing power industry chain is that AI computing power is not a "single point" problem, but a system-level Leontief production function—GPUs, HBMs, interconnects, power, and cooling must be matched according to the lowest weakest link. Every time a hyperscaler (Google, Microsoft, Meta, etc.) solves one bottleneck, it immediately pushes capital and innovation to the next link.

Currently (2026), we are in a transition period of "accelerated deployment of optical interconnects + large-scale commercial use of power/liquid cooling". New bottlenecks may emerge in the future (such as lasers, optical fiber materials or power grid transformers), but the chain of "computing → storage → light → power/cooling" has become the industry's recognized path.

This also explains why the investment logic has shifted from NVIDIA/TSMC to the three giants of HBM (SK Hynix, etc.), optical manufacturers (Lumentum, Coherent), and liquid cooling/power infrastructure (Vertiv, related power companies).

Every shift in bottlenecks reshapes the value distribution across the entire semiconductor and data center industry chain.

The underlying logic of bottleneck transmission in the AI ​​computing power industry chain

Popular Articles

The underlying logic of bottleneck transmission in the AI computing power industry chain