Xiaomi MiMo price cut by 99%, where has the price war among domestic large-format models progressed?

On May 27th, Xiaomi announced a permanent price reduction for its MiMo-V2.5 Pro API, with a maximum reduction of 99%, directly targeting the DeepSeek V4 Pro. Almost simultaneously, Zhipu completed a cumulative price increase of 83% in Q1 2026, with its CEO publicly stating that "despite the price increase, supply still cannot meet demand, and call volume has increased by 400%." One side is a strategy of near-bottom pricing, while the other is a strategy of raising prices against the trend and still achieving double-digit growth. Behind these are two completely different pricing logics. The pricing of domestic large-scale model APIs has shifted from "pricing based on capability" to "pricing based on competition." What is the cost logic behind this round of concentrated price reductions? How large is the actual cost difference between models that have been reduced in price and those that haven't? Has the selection logic been rewritten?

The true extent of Xiaomi MiMo's price reduction: It's not just the "99%" figure.

Let's break down the core facts of this price reduction.

According to an official announcement from Xiaomi MiMo, the MiMo-V2.5 series APIs will be permanently discounted starting May 27th, with a maximum reduction of 99%. Context length tiered billing will be cancelled, and existing subscription limits will be fully reset. The MiMo-V2.5-TTS model is currently free for a limited time.

Regarding pricing benchmarks, discussions within the developer community consistently indicate that MiMo-V2.5 Pro is priced identically to DeepSeek V4 Pro, while the basic MiMo-V2.5 version is priced identically to DeepSeek V4 Flash. Consulting the official DeepSeek API documentation reveals that DeepSeek V4 Pro has an input price of 3 yuan per million tokens, an output price of 6 yuan per million tokens, and a cache hit cost of only 0.025 yuan per million tokens. This suggests that MiMo V2.5 Pro will most likely also be priced within this framework.

The figure "99%" needs to be viewed rationally. It refers to the maximum price reduction from the old to the new price in certain long-context scenarios, not a 90% discount in all scenarios. The truly noteworthy signal is not the percentage reduction, but the method of price reduction: Xiaomi directly uses DeepSeek as the price anchor, eliminating the previous complex billing rules that tiered charges based on context window length. Developers no longer need to manually truncate long texts to save money; this increased billing transparency may be more valuable than a simple price reduction.

Xiaomi's pricing strategy directly targets DeepSeek, placing it in the same price range. Both companies use the MoE architecture (MiMo-V2.5 with a total parameter of 1.02T and an activation parameter of only 42B), are compatible with the OpenAI API format, and now have completely aligned prices, making it virtually cost-free for developers to switch between the two.

A panoramic view of the price-cutting camp: Who is following suit, and what is the underlying logic?

Xiaomi isn't the first to lower prices, nor will it be the last. Looking at the broader group of companies cutting prices, a very clear common characteristic emerges.

DeepSeek set the price anchor for this round. On May 31st, the V4 Pro will end its 25% discount, and the permanent price will be 1/4 of the original price, which is the previously mentioned 3 yuan for input and 6 yuan for output. This is not a temporary promotion, but a long-term pricing.

The price of ByteDance's Doubao is also kept very low. According to data from the GitHub LLM-Price price tracking project, Doubao-Seed-2.0-Pro takes an input of 3.2 yuan per million tokens and outputs 16 yuan per million tokens. According to China Industrial News Network, the daily token usage of ByteDance has exceeded 120 trillion, more than 1000 times that of May 2024.

Alibaba Cloud's Tongyi Qianwen is another major player. According to a Frost & Sullivan analyst report released by Alibaba Cloud, the average daily total consumption of large-scale enterprise-level tokens in China in the second half of 2025 will be 37 trillion tokens, with Alibaba Qianwen accounting for 32.1%, ranking first.

The common characteristic of companies pursuing price reductions is that they are backed by the ecosystems of major companies. Alibaba's Qianwen is tied to Alibaba Cloud, ByteDance's Doubao is the entry point for the computing power consumption of Volcano Engine, and Xiaomi's MiMo targets terminal devices and the developer ecosystem. For these large companies, the API itself is not a profit center; it is a ticket to acquire customers. The real business lies in the subsequent cloud computing, hardware sales, advertising, and terminal ecosystem. Pricing APIs close to marginal cost is profitable as long as it drives growth in larger business lines.

However, there's an easily overlooked issue: after price reductions, there are no hidden reductions in the concurrent QPS limits and SLA guarantees for free or low-priced packages. Official documentation doesn't explicitly disclose this. When making selections, enterprises shouldn't just look at the unit price; they also need to consider whether availability under high-concurrency scenarios has been compromised.

The Counterintuitive Reasoning Behind Price Hikes: Why Did Zhipu and Kimi Raise Prices Instead of Lowering Them?

In contrast to the price-cutting camp are Zhipu and Moon's Dark Side Kimi.

According to CBN, Zhipu's API prices rose by 83% cumulatively in Q1 2026, and the CEO explicitly stated that "despite the price increase, supply still cannot meet demand, and call volume has increased by 400%." Kimi's Moonshot V1 model is currently priced at 10 yuan per million tokens for input and 30 yuan per million tokens for output, which is 3 to 4 times that of similar products from DeepSeek/MiMo.

The price increase wasn't arbitrary. Data from OpenRouter indicates that in February 2026, the number of AI models called in China surpassed that of the United States for the first time, with four of the top five being Chinese models, including Zhipu and Kimi. Zhipu's GLM-5 series excels in complex agent and code generation scenarios, while Kimi K2.5's high price is justified by its long context and inference capabilities.

Here's a counterintuitive business logic: In the agent era, the lowest unit price doesn't necessarily equate to the lowest overall cost. In complex task scenarios, the model's success rate directly determines the total token consumption. A model with a high unit price that outputs correct code on the first try may ultimately consume less tokens compared to a model with a low unit price that requires repeated debugging and retries three to five times. One reason why Zhipu's "price increases still result in supply shortages" is that enterprise customers, after calculating the overall costs, find that the higher-priced model actually has a lower overall cost.

However, it should be noted that without independent third-party evaluation data, it's premature to draw definitive conclusions regarding the actual success rate and token consumption differences of GLM-5 compared to DeepSeek or MiMo in specific scenarios. Enterprises should conduct A/B testing on their actual tasks when selecting a solution, rather than relying on benchmark rankings or vendor marketing claims.

Quantifying the cost gap: The purchasing power of 1 yuan differs by 4 times.

Now let's put those who lowered prices and those who didn't on the same level and make a direct cost comparison.

A basic task unit consists of 1 million input tokens and 1 million output tokens.

DeepSeek V4 Pro / Xiaomi MiMo V2.5 Pro : Input costs 3 yuan plus output costs 6 yuan, with a total cost of approximately 9 yuan.
ByteDance Seed-2.0-Pro : Input cost 3.2 yuan plus output cost 16 yuan, total cost approximately 19.2 yuan.
Kimi Moonshot V1 : Input costs 10 yuan plus output costs 30 yuan, with a total cost of approximately 40 yuan.

The difference between the lowest and highest tiers is 4.4 times. For tasks handling the same number of tokens, Kimi costs nearly four times as much as DeepSeek or MiMo. Considering the reality that longer contexts consume more resources, this gap widens even further in long text scenarios.

This comparison is limited to the publicly available input and output pricing of the basic model APIs. The output price of Qwen3-Max was not found (only the input price of 8.81 yuan/million tokens was found), and the specific unit price of GLM-5 from Zhipu is also missing because it has not yet been updated to public channels after the price increase. Data for these two companies needs to be supplemented.

For tasks like translation, summarization, and simple question answering—which are essentially manual labor tasks—a cost difference of more than four times means there's almost no room for hesitation in choosing the lower-priced model. However, for more complex tasks like multi-round agent calls, long code generation, and long-range inference—which are more intellectually demanding—price comparison shouldn't be the sole basis for decision-making. OmniTools suggests that companies should categorize tasks internally, separating high-frequency, simple tasks from low-frequency, complex tasks for model selection, rather than using a single model to cover all scenarios.

Developer migration costs and new selection logic

Should you change the model after a price reduction? The answer to this question will vary completely depending on the developer.

For developers working on basic scenarios, the migration cost is very low. Both DeepSeek and Xiaomi MiMo are compatible with the OpenAI API format; switching can be completed simply by modifying the model parameter and base URL in the code. Developers in the community have already reported that it basically only requires changing two lines of code. With Xiaomi removing tiered pricing based on context length, developers no longer need to perform separate cost optimizations for long text scenarios, allowing for cleaner code logic.

For applications deeply tied to advanced capabilities of specific models, the situation is different. If a product heavily relies on Kimi's long context window, the specific agent tool call format of Zhipu GLM-5, or the unique output style of a particular model, the migration cost goes far beyond changing two lines of code—it may require redesigning prompts, re-debugging function call chains, and re-handling edge cases. This cost cannot be covered by the API's unit price.

This is precisely the natural stratification that is taking place in the market: general task volume pricing and complex task volume capabilities. Both tracks will coexist, and each will have sufficient market space. The fact that Zhipu increased its price by 83% yet still saw a 400% increase in call volume, and that DeepSeek continued to operate even after its price dropped to 3 yuan, demonstrates that these two logics can coexist without conflict.

For enterprise procurement decision-makers, a multi-model routing mechanism can be established: high-frequency, low-complexity tasks should use the low-price model to reduce basic costs; low-frequency, high-difficulty tasks should use the high-price, high-performance model to ensure task success. Combining the two approaches represents the current cost-optimal solution.

The essence of this round of pricing differentiation: from "technology premium" to "ecosystem subsidy"

Finally, let's answer the core question: Why is one side experiencing a sharp drop while the other side is experiencing a sharp rise?

Xiaomi's official explanation is "full-stack inference optimization and service efficiency improvement," with technical details promised to be disclosed in a subsequent technical blog. Based on known architectural information, MiMo-V2.5 uses a MoE architecture, with only 42 bytes activated out of a total of 1.02T parameters, indeed demonstrating a structural advantage in inference efficiency. DeepSeek is also known for its MoE architecture, and its inference cost is an order of magnitude lower than that of a Dense model with equivalent capabilities.

However, cost reduction through technology is not a sufficient condition. The deeper reason lies in the differences in industrial structure.

The APIs of major tech giants like Alibaba, ByteDance, and Xiaomi are essentially customer acquisition entry points for a larger business ecosystem. Alibaba uses Qianwen to connect with Alibaba Cloud, ByteDance uses Doubao to leverage its Volcano Engine, and Xiaomi uses MiMo to expand its terminal and developer ecosystem. APIs don't have to be profitable; they can even accept long-term slight losses as long as they generate revenue from cloud service subscriptions, computing power consumption, hardware shipments, and advertising. This is a "ecosystem subsidy" logic: these large companies have other profit centers within their ecosystems to subsidize low-priced APIs.

Startups like Zhipu and Kimi don't have this subsidy pool. They must rely on the revenue from the API itself to cover R&D and computing costs, and must pursue positive commercial profits. Given the exponential growth in token consumption in the Agent era, maintaining a low price means losing more money the more you sell; raising prices is actually a rational business choice.

This structural difference will not be bridged in the short term. Large companies will not pursue profits on APIs, and startups cannot afford to burn money and play the game forever. The two pricing logics will coexist for a long time, and the market will eventually form a stable dual-track system.

This is actually good news for developers and enterprise clients. You can complete most of the basic work at a minimal cost, while also having a powerful enough model to handle the truly complex tasks that require "intelligence." The key isn't choosing which model to use, but knowing when to use which one.