In June 2026, AMD confirmed shipping plans for a new device at AI DevDay in San Francisco. This machine is about the size of an Apple Mac mini, features 128GB of unified memory, and is officially positioned as a local AI development platform. Just a few months earlier, NVIDIA's DGX Spark had already appeared on developers' desktops, also in a palm-sized metal box, also with 128GB of unified memory, and also claiming the ability to run large models with 200 billion parameters locally.

The AMD Ryzen AI Halo developer platform features a Ryzen AI Max+ 395 processor.
Tom's Hardware's review of the HP Z2 Mini G1a gives a suggested price range for AMD: $2,949 to $3,999. Nvidia's website shows the DGX Spark starting at $3,999, with some OEM versions reportedly going up to $4,679 in February 2026. AMD seems to have a price advantage, but this is only on the surface.
The same 128GB, two different routes
At the heart of AMD's Ryzen AI Halo is a Ryzen AI Max+ 395 processor with 16 Zen 5 cores and 40 RDNA 3.5 architecture GPU compute units, alongside a 50 TOPS XDNA 2 NPU. NVIDIA's official hardware documentation describes the DGX Spark differently: a GB10 Grace Blackwell Superchip, a 20-core ARM CPU paired with a Blackwell architecture GPU, lacking an NPU but including a ConnectX-7 200Gbps network card. AMD devices offer 2.5GbE Ethernet and WiFi 7; NVIDIA's offer 10GbE plus WiFi 7, along with that expensive high-speed network card.
The memory specifications are superficially similar. Both are 128GB LPDDR5x. AMD's product page lists the memory bandwidth as 256 GB/s, while NVIDIA's official figure is 273 GB/s. The difference is less than 7%, which is almost imperceptible in most inference tasks.
The choice of operating system reveals a more fundamental difference between the two companies. AMD Ryzen AI Halo comes pre-installed with Windows 11 Pro, with an optional Ubuntu 24.04. It boots into a standard PC desktop with a Thunderbolt port and full support for common peripherals. DGX Spark runs DGX OS, a customized version of Ubuntu, and the first thing to do after booting is to configure the CUDA environment and NVIDIA container toolchain.
The Register conducted a detailed field test comparison in December 2025. The conclusion was that during single-batch large language model inference, the token generation speeds of the two machines were very close. However, during the prompt processing phase, the DGX Spark was 2 to 3 times faster. This difference stems from the Blackwell architecture's support for low-precision computation and NVIDIA's years of code path optimization in the inference pipeline. ServeTheHome's review pointed to another dimension: the DGX Spark's ConnectX-7 network card retails for over $900, and its potential value in multi-machine cluster scenarios far exceeds the scope of single-machine inference.
According to tests by media outlets such as Tom's Hardware, the Ryzen AI Halo measures 85mm high, 168mm wide, and 200mm deep, weighing 2.3kg, making it closer in size to a traditional mini workstation. NVIDIA's official documentation shows the DGX Spark to be 150mm square, 50.5mm thick, and weighing 1.2kg. One resembles a stacked hard drive enclosure, the other a router.
ROCm's progress bar is no longer just "good enough".
According to AMD's official announcement, ROCm 7.2 will be available in January 2026, with the subsequent 7.2.4 version specifically optimizing the stability and performance of AI inference workloads. Phoronix provided detailed coverage on the day of the release.
For developers on Linux, the installation process for ROCm is now much simpler than it was two years ago. In March 2026, tech blogger Kunal Ganglani wrote in a detailed ROCm user guide that he completed the entire process from system configuration to running a PyTorch model on an RX 7900 XTX in only about 30 minutes, "while in 2024, doing the same thing would have taken half a day." His blog confirmed that ROCm currently supports four mainstream deep learning frameworks: PyTorch, TensorFlow, JAX, and DGL, and inference engines such as vLLM, Ollama, and llama.cpp all have ROCm backends available.
However, these advancements cannot stem the tide of CUDA's growth. NVIDIA's software stack has been built up over 17 years, and the number of CUDA-related questions and answers on Stack Overflow is dozens of times greater than that of ROCm. New versions of cutting-edge libraries like FlashAttention and xFormers are typically released as CUDA versions first, with ROCm ports taking weeks to months to arrive. Any custom CUDA kernel that goes beyond the scope of the PyTorch standard API requires manual adaptation on AMD platforms. AMD's official compatibility matrix lists verified framework and GPU combinations, but "verified" and "having enough community discussion threads to find when problems arise" are two different things.
On Reddit's r/LocalLLaMA subreddit, discussions about which device to choose have been ongoing since the end of 2025. The most frequently cited summary comes from the end of Ganglani's blog post: "If you need everything to work perfectly from day one, buy NVIDIA. If you're willing to spend an afternoon fixing things to save $800, ROCm is ready."
AMD seems to understand this very well. Over the past year, the company has not been directly replicating Nvidia's moat, but rather starting afresh outside of it.
In August 2024, AMD announced its acquisition of ZT Systems for $4.9 billion. The Wall Street Journal confirmed the completion of the transaction in March 2025. ZT Systems designs and assembles rack-scale AI server systems for hyperscale data center customers, including giants like Microsoft and Meta that purchase tens of thousands of GPUs annually. AMD gained system design capabilities from individual GPUs to entire racks.
But AMD quickly made a seemingly contradictory decision. In May 2025, according to an official announcement from Sanmina, AMD spun off ZT Systems' data center manufacturing business to the electronics manufacturing services provider, retaining only its design team. The logic is clear: AMD didn't want to become a competitor to its OEM customers. If AMD manufactured its own AI servers, server manufacturers selling AMD graphics cards would immediately become wary. By retaining design capabilities and outsourcing manufacturing, this move balanced capability enhancement with ecosystem relationships.
Two more crucial events occurred in the following six months.
In October 2025, AMD officially announced a strategic partnership with OpenAI to deploy 6GW of AMD Instinct GPUs. The first 1GW is scheduled to ship in the second half of 2026. The agreement includes a clause allowing OpenAI to purchase up to 10% of AMD's shares. Reuters and CNBC highlighted this detail in their reports that day. The GPUs supplied to OpenAI will be next-generation Instinct GPUs; AMD did not disclose the specific model.
In February 2026, AMD released another official press release announcing an expanded collaboration with Meta, also deploying 6GW of GPUs. This time, the chips are a custom Meta MI450 variant, planned to begin shipping in the second half of 2026. A CNBC report that day highlighted a detail: just days before this collaboration was announced, Meta also announced an expanded AI chip procurement agreement with Nvidia.
Meta securing long-term contracts from both companies simultaneously is more compelling than any technical comparison. For a company investing tens of billions of dollars annually in AI infrastructure, putting all its eggs in one basket is an unacceptable risk. AMD doesn't need to outperform Nvidia across the board; simply offering a viable alternative to Nvidia is enough to secure orders under the "dual-supplier" logic. The scale of the two 6GW contracts suggests that at least OpenAI and Meta have included AMD in their contracts.
Nvidia's response at the same time was a combination of measures.
At the same time, NVIDIA launched a multi-pronged attack in the enterprise market. The DGX Spark is positioned as a developer desktop device, but its ConnectX-7 network interface card (NIC) ensures it's not an isolated workstation. ServeTheHome's review provides a detailed analysis of the NIC's value in prototyping and distributed training and debugging, concluding that while significantly slower than data center-grade NVLink, it's sufficient for small-scale cluster scenarios. This design anchors the DGX Spark within NVIDIA's broader enterprise product line: developers use Spark for prototyping, then migrate code to DGX Stations or cloud-based DGX instances, and finally deploy it to server clusters equipped with H200 or B200 processors. A consistent hardware and software toolchain, from desktop to data center, is seamlessly integrated with CUDA.
NVIDIA also launched the AI Enterprise software subscription suite at the same time, bundling tools such as TensorRT, RAPIDS, and Triton inference servers, and charging per node. NVIDIA's official product page lists the complete set of tools included in AI Enterprise. This isn't about selling hardware; it's about turning enterprise-level deployment and maintenance into a continuously paid business, once developers are accustomed to CUDA.
Comparing the paths on both sides, the divergence is clear enough.
Nvidia has created a complete closed-loop system, from chips to systems to software to cloud services. Developers can use optimized tools from day one in this closed loop, at the cost of being tied to a single vendor's ecosystem. AMD, on the other hand, takes an open alternative approach: using the industry-standard x86 architecture, supporting both Windows and Linux systems, making ROCm an open-source stack compatible with mainstream frameworks, and attracting cost-sensitive customers or those who have already decided to diversify their vendors' risk with lower prices.
The Ryzen AI Halo product itself represents the simplest hardware embodiment of this approach. It lacks a custom network card, a dedicated OS, and low-precision training acceleration units. It's a general-purpose PC that cleverly incorporates unified memory capable of running 200-byte models and a reasonably decent GPU. You can use it for large model inference or close the terminal to open Photoshop. Tom's Hardware's report cited the HP Z2 Mini G1a at $2,949, significantly lower than the DGX Spark's starting price of $3,999. For other OEM versions, the price difference could exceed $1,000.
However, this flexibility comes at a cost. Real-world testing data from The Register has shown that once the focus shifts from single-batch inference to scenarios requiring massive parallel computing, the Blackwell architecture's low-precision advantage and years of optimized software stack quickly become apparent. If you need a desktop box capable of running Stable Diffusion graphics, NVIDIA's CUDA ecosystem offers a complete set of ready-to-use tools. AMD's RDNA 3.5 architecture does not support FP4 and FP8 low-precision formats, resulting in performance disadvantages in workloads like image generation. This is inherent to the RDNA architecture design and cannot be resolved through driver updates.
The box's true home is not inside the box.
Looking back at the timeline, AMD's actions over the past year form a fairly clear pattern.
On the hardware front, the Instinct MI300 and MI325X are in mass production, while the MI350 and MI450 are progressing according to the roadmap. The Ryzen AI Max+ 395 has been transformed from a laptop chip into a desktop APU and integrated into development platforms. At the system level, the company acquired rack-mount design capabilities through the acquisition of ZT Systems, then spun off manufacturing while retaining R&D. At the customer level, it secured two long-term contracts totaling 6GW each, binding the world's two largest AI computing power consumers, and incidentally brought OpenAI into the shareholder list. On the software front, ROCm is iterating at a rate of approximately one version per quarter, catching up with mainstream framework support, but porting cutting-edge libraries and building a community still requires time.
Each step is not isolated. The acquisition of ZT Systems was to enable the design of the kind of massive AI clusters required by OpenAI and Meta, rather than simply selling GPUs to server manufacturers. ROCm's rapid iteration was to ensure that customers who signed 6GW contracts had a usable software stack for deployment, rather than delivering bare metal. The launch of Ryzen AI Halo was to extend the same ROCm ecosystem to the desktop, allowing developers to use a $3,000 machine for local debugging and then deploy their models to the cloud MI450 cluster.
This doesn't mean AMD has caught up with Nvidia. The two 6GW contracts represent future deployment commitments; gigawatt-scale energy capacity reflects the scale of infrastructure planning, not the number of chips already shipped. The specific specifications of the MI450 have not yet been released, and the chip's actual performance, yield rate, and stability after large-scale deployment are all unknowns. ROCm has achieved "usability" on mainstream frameworks, but achieving a state where "the community can help you when problems arise" requires more time to accumulate. And CUDA's 17 years of accumulation cannot be digested through a few quarters of rapid iteration.
Nvidia's competitive advantage isn't just in software. The ConnectX-7 network card for the DGX Spark hints at another dimension of competition: while AMD is vying for developers with cost-effectiveness and openness, Nvidia is locking in teams that need distributed training and large inference pipelines with its cluster scalability. A single DGX Spark costs $3,999, and buying two plus a network cable allows you to run a distributed prototype. In this scenario, ROCm's advantage in single-machine inference is negated.
The disagreement between the two companies on AI ultimately boils down to a concrete choice when it comes to this palm-sized box. You open the AMD box, get a familiar PC environment, install PyTorch using almost identical commands, load the model, and start inference—the process is smooth until you need to use a library with only a CUDA backend. You open the NVIDIA box, get a dedicated environment optimized from hardware to drivers to container toolchains; everything starts as expected, except for a slightly higher bill of over a thousand dollars, and the migration costs of switching vendors in the future are already locked in.
AMD didn't directly challenge Nvidia's full-stack empire. Instead, it chose a more pragmatic path: providing a sufficient alternative when Nvidia's pricing and supply chain delivery capabilities couldn't keep up with all customer demands. The two 6GW contracts are the strongest evidence of this strategy to date. Ryzen AI Halo is an extension of this strategy on the desktop; it's not about blindly following the trend of making small AI boxes, but rather taking a step further along the path of "using an open ecosystem and cost advantages to attract developers who don't want to be locked in."


