AI PCs are here, capable of locally powering 120B large models! NVIDIA redefines the "personal AI PC" dock with RTX Spark.

Over the past two years, PC manufacturers have repeatedly mentioned one parameter when promoting "AI PCs": NPU computing power. However, whether it's Intel's Lunar Lake with 45 TOPS or AMD's Strix Point with 50 TOPS, these numbers have remained at a relatively modest level. They can perform background blurring, voice noise reduction, and run some small-scale edge models, but that's about it.

On May 31st, NVIDIA unveiled its RTX Spark superchip at GTC 2026, pushing that number to 1 petaflop, or 1000 TOPS. This isn't a 30% or 50% improvement; it's a leap of an order of magnitude.

Several other announcements were also made at the event: Microsoft upgraded the native security mechanisms of Windows in conjunction with RTX Spark and introduced NVIDIA's open-source sandbox runtime OpenShell to the Windows platform; Adobe announced a complete overhaul of Photoshop and Premiere from the ground up to specifically adapt to RTX Spark's unified memory architecture; and the first six OEMs confirmed that they will launch thin and light laptops and compact desktops equipped with this chip this fall.

What Nvidia did at this year's GTC wasn't to release a new chip. It's trying to set a new hardware standard for the "personal AI computer" category.

When the GPU becomes the main component of the PC

Let's look at the chip itself first. According to data released by NVIDIA at GTC, the RTX Spark integrates a Blackwell architecture GPU with 6144 CUDA cores, paired with a 20-core Arm architecture Grace CPU co-designed with MediaTek, and uses TSMC's 3nm process. The key change lies in the memory architecture: up to 128GB of unified memory, with the CPU and GPU sharing the same memory pool, eliminating the need for data to be moved back and forth between the two.

This is the opposite of the architectural logic of past PCs.

The basic structure of a traditional PC is "an x86 CPU as the main processor and a discrete GPU as an optional accessory." Even with the emerging concept of AI PCs in recent years, Intel and AMD's approach is to integrate an NPU into the CPU as an additional module for AI acceleration, with a computing power generally around 40-50 TOPS. The GPU remains an "external" component.

RTX Spark has redistributed the power dynamic. This SoC makes the GPU the protagonist, while the CPU takes a backseat. NVIDIA claims an AI computing power of 1 petaflop FP4, equivalent to 1000 TOPS, which is more than 20 times the computing power of the NPU built into the previous generation of AI PCs. This isn't just an acceleration on one track; it's the start of a new one.

The rapid response of OEMs confirms this assessment. According to Nvidia's official announcement and subsequent reports from DIGITIMES, ASUS, Dell, HP, Lenovo, Microsoft Surface, and MSI will launch thin and light laptops and compact desktops powered by RTX Spark this fall, with Acer and Gigabyte following suit. Almost all major Windows PC brands have entered the fray.

RTX Spark wasn't created from scratch. In early 2025, the same Blackwell and Grace core chips appeared as Project DIGITS and DGX Spark, but at the time, they were positioned as Linux desktop supercomputers for developers, with a size close to a small desktop computer. A year later, this architecture was compressed into the thermal space of a thin and light laptop, the operating system was changed from Linux to Windows, and the target users expanded from AI developers to ordinary consumers and enterprise users. This is the most noteworthy change in the GTC 2026 consumer-grade launch: Nvidia isn't releasing a developer toy, but rather opening the door to the consumer market.

Is a 120B model sufficient for local use?

Ultimately, the numbers for computing power and memory must answer one question: What can be done?

NVIDIA's answer at the launch event was that RTX Spark supports running large models with 120B parameters locally, with a context window reaching millions of tokens. What does 120B mean? For reference, the current mainstream practice for running local models on consumer hardware is that an RTX 4090 with 24GB of VRAM can run models with 30B to 40B parameters through quantization and compression. Some smaller models, at around 9B, can run quickly on consumer-grade graphics cards. This leap from 9B to 120B redefines the standard for "sufficient" edge AI.

128GB of unified memory is the prerequisite for all of this. In traditional PC architecture, the CPU has its own system memory, and the GPU has its own video memory, with a physical boundary between them. A large model exceeding the video memory capacity either cannot run at all, or requires complex model splitting and memory swapping, resulting in a sharp drop in speed. The unified memory architecture eliminates this bottleneck; model data is directly placed into a 128GB shared pool, accessible to both the CPU and GPU. Apple first proved the consumer-grade feasibility of this technology with Apple Silicon, and now Nvidia is bringing it to the Windows camp.

In addition to large-scale model inference, NVIDIA listed use cases including 12K video editing, 3D scene rendering of over 90GB, and ray tracing games at 1440p resolution with over 100fps. These scenarios share the common characteristic of processing extremely large amounts of data at once, meaning traditional PCs either require several times the processing time or simply cannot run them.

There's a gap between "supporting operation" and "smooth usability." NVIDIA hasn't released actual inference speed data for the 120B model on RTX Spark, nor has it provided first-token latency data for scenarios with millions of tokens. A key metric determining inference speed for long contexts is memory bandwidth. For reference, the DGX Spark, which also uses GB10 cores, has a measured memory bandwidth of approximately 301GB/s. This bandwidth level is sufficient for running a 120B model, but when processing context windows with millions of tokens, users might have to wait several seconds to see the first output token. The actual bandwidth of the laptop version of RTX Spark may be adjusted due to power consumption limitations.

Add a safety net to AI agents

Beyond computing power, another key announcement was the system-level collaboration between NVIDIA and Microsoft. This is perhaps the most easily overlooked yet most impactful aspect of the GTC 2026 consumer-level announcement.

If a computer capable of running a 120B model is given to an AI agent that can autonomously operate the desktop, click buttons, and read and write files, the security risk is no longer at the level of "whether data will be lost," but rather "whether the agent will do things you don't want it to do." Unless this problem is solved, companies cannot deploy such devices to their employees.

Microsoft and Nvidia have proposed a two-tiered defense. First, Microsoft upgraded Windows' native security mechanisms, providing monitoring and constraints on AI agent behavior at the operating system level. Second, Nvidia officially introduced the OpenShell runtime to the Windows platform. According to Nvidia's official documentation, OpenShell is an open-source sandbox runtime that provides kernel-level isolation. It defines a controllable operating scope for the AI agent, allowing it to autonomously execute tasks within this scope, but with strictly limited permissions, preventing it from accessing core system files, network connections, or sensitive user data.

The significance of this combination for enterprise procurement is clear. Previously, the concept of "local AI agents" remained in the technology demonstration stage. The hardware could run, but the security framework was empty. No enterprise IT department dared to include devices in this state on their procurement list. Nvidia and Microsoft inserted a standardized isolation layer between hardware and applications, transforming "usable" into "manageable."

The performance overhead of OpenShell itself is a variable to be observed. Sandbox isolation typically incurs some performance penalty, but NVIDIA has not yet released data on the specific impact on inference speed or system responsiveness. The deployment complexity of enterprise IT management interfaces and compatibility with existing security policies are practical issues that will only be verified after OEM devices are released to the market.

Why is Adobe willing to "rebuild from the ground up"?

The level of cooperation from software vendors is often a leading indicator of whether a new hardware platform can gain a foothold.

Adobe's announcement at GTC was the biggest software signal in this round of releases. According to confirmation from NVIDIA's official blog and Adobe executives, Adobe has initiated a fundamental overhaul of Photoshop and Premiere, specifically adapted to the unified memory architecture of RTX Spark, claiming up to 2x performance improvements in AI and graphics processing.

"Underlying-level reconstruction" is not simply adding a plugin or creating an adaptation layer. On traditional PCs, the CPU and GPU each have their own memory space. When processing a very large PSD file or an 8K video timeline, data has to be repeatedly moved between the two memory sets, which is a major source of performance waste. RTX Spark's unified memory allows the CPU and GPU to directly share the same 128GB space. This structural change has real value for the workflows of professional creators. Adobe's decision to modify the underlying code for this demonstrates that it recognizes this architectural direction is not just a one-off marketing gimmick.

However, neither Nvidia nor Adobe has disclosed the benchmark for this "2x speedup." Is it compared to a contemporary x86 processor with a discrete graphics card, or to the NPU solution of a previous generation AI PC? The results are drastically different. Until the benchmark conditions are made public, the credibility of this figure remains questionable.

Also announcing support were Blackmagic Design, ComfyUI, llama.cpp, OTOY, and several game developers. The follow-up from ComfyUI and llama.cpp is noteworthy, as they are among the most active open-source tools in current native AI workflows. Early support from the developer community often reflects a platform's ecosystem potential more accurately than promises from major companies.

Nvidia is using its CUDA ecosystem and unified memory architecture to build a similar integrated hardware and software experience to Apple's in the Windows camp. The difference is that Apple built its own firewall, while Nvidia needs to convince Microsoft and ISVs to build it together. Adobe's willingness to start from the ground up at least shows that the first brick of this wall has been laid.

Beyond the specifications on paper

Let's get back to the most practical question: Can these devices actually be bought, and what is the experience like once you have them?

According to information released by Nvidia, the first RTX Spark devices will be available this fall, including thin and light laptops and compact desktops from ASUS, Dell, HP, Lenovo, Microsoft Surface, and MSI. Acer and Gigabyte models will follow later. Specific pricing and exact release dates for all OEMs have not yet been announced.

More critical than pricing are several unknowns at the physical level. How to balance power consumption and heat dissipation when cramming a 1 petaflop computing power chip into a thin and light laptop? How will RTX Spark perform in everyday office work and battery life in non-AI scenarios? Will the actual bandwidth of 128GB of unified memory be significantly reduced in a laptop form factor due to power consumption limitations?

These issues are the real test of industrialization. The peak computing power of a chip on an engineering prototype and its actual performance in the hands of a consumer for 8 hours a day are often two different things. Nvidia emphasized the energy efficiency of RTX Spark at the launch event, but did not provide specific TDP values or battery life data.

From the perspective of the PC industry landscape, the emergence of RTX Spark signifies the formation of a new division of labor. For the past thirty years, the core chip power in the PC market has been held by x86 processor manufacturers. While GPU manufacturers have become increasingly important, they have always been considered "components plugged into the motherboard." Nvidia's latest offering is a complete SoC, integrating everything from the CPU to the GPU to the memory controller, with the Arm architecture CPU portion designed by MediaTek. The power structure of the PC industry chain is shifting from "x86 CPU plus an optional GPU" to "GPU-centric SoC platforms."

This shift won't happen overnight. OEM pricing strategies, actual product energy efficiency, ISV software adaptation progress, and enterprise customer procurement verification cycles—each factor will determine whether RTX Spark becomes a new benchmark for the PC industry or just another high-profile but ultimately disappointing technology demonstration. The answer won't be known until at least this fall.