A question on Zhihu about AI transfer stations brought the niche topic of "cheap tokens," which was originally more geared towards developers, to a wider audience.
PANews previously initiated a discussion on Zhihu titled "What is an AI transfer station, and what secrets lie behind cheap tokens?" This question was included in the "Token Economics" roundtable, where it sparked heated debate.
The discussion in the answer section didn't stop at binary judgments like "Are transit stations a gray market?" Many users were asking more practical questions: Where do the cheap tokens actually come from? Is the model users are seeing real? Can transit stations see their own prompts, code, and keys? If AI is only used occasionally, is it worth taking the risk?
This shifts the discussion of AI intermediaries from "tool selection" to a broader issue of cost and trust. As AI begins to integrate into writing, programming, agents, and enterprise automation processes, tokens are no longer just a unit of account in model documents, but rather a cost that users can directly perceive.
Besides the low price, users' primary concern is whether the model is actually authentic.
In discussions on Zhihu, the most discussed topic is not the price itself, but the authenticity of the model.
Among the highly-rated answers, one respondent described AI relay stations as "AI-powered scalpers." While this description carries an emotional tone, it captures the most direct concern of users: the technical barrier to entry for relay stations is not high; open-source projects have already achieved model routing, key management, balance systems, and compatibility with the OpenAI protocol. The real difficulty lies not in setting up a forwarding service, but in obtaining cheap and stable upstream quotas.
If the upstream source is opaque, the model name seen by the user may not be the actual model being used. The answer section repeatedly mentions risks such as "model swapping," "downgrading," and "shadow APIs." Some users believe that in ordinary Q&A, the difference between high-end and low-cost models is not always visually apparent, which actually creates room for fraud. Users may think they are using a flagship model, but they could actually be routed to a lower-cost model, or even have their answers disguised as those of a particular model by system prompts.
This is also the hardest part to verify with cheap tokens. You can run tests with fake graphics cards and measure speed with fake bandwidth, but the output of a large model is inherently random. A better answer today and a worse answer tomorrow doesn't directly prove the model has been replaced. If the intermediary provides a genuine model during the testing phase and mixes in a low-priced model during long-term use, it's very difficult for ordinary users to detect.
This type of discussion shifts the question from "Is it worth it to 'Do users know what they are buying?'" If the model's origin cannot be verified, then cheap tokens are not simply a price discount, but rather a transaction based on information asymmetry.
Transit stations aren't necessarily cheaper; it depends on what you're comparing them to.
Another type of discussion focuses on cost benchmarks. Many users point out that the transit station appears cheaper because it often compares itself with the official API's pay-as-you-go pricing, rather than with official subscriptions, domestic models, free quotas, or cloud vendor channels.
One answer mentioned that for heavy users who fully utilize their official subscription quota, the unit cost might be lower than that of some intermediary stations. Other users believe that the prices of some domestically produced models are already low enough, and routine development, summarization, translation, and simple coding tasks don't necessarily require circumventing overseas model intermediary stations.
This viewpoint does not deny the need for intermediary platforms. On the contrary, it reminds users to first determine their usage patterns. For occasional Q&A, translation, and summarizing publicly available information, the free quotas of official applications and legitimate tools are often sufficient. When conducting architecture design, code review, and complex reasoning, stronger models can be used in key areas, while the specific implementation can be handled by lower-cost models. Only when users have a genuine need for continuous, high-frequency, multi-model calls should intermediary platforms be considered.
The perceived low price of a transit station largely stems from the choice of comparison. Compared to the official API's price-per-volume model, it may seem cheap; compared to subscription packages, domestic models, or free quotas, it may not always be the lowest cost. This type of viewpoint in the answers section actually brings the issue back to the user: first determine your needs, then determine the channel, rather than placing an order simply because you see a discount.
Once the source of the low price is exposed, the cost of trust becomes apparent.
Regarding the source of cheap tokens, Zhihu users have offered several explanations. More moderate paths include bulk purchasing, enterprise discounts, cloud vendor channels, caching, batch processing, and cross-model routing. Theoretically, these methods allow the relay service to still make a profit even at a lower price than the official list price.
However, the discussion focused more on gray-area supply channels: splitting subscription accounts, sharing account pools, bulk registration to take advantage of free quotas, regional price differences, refund arbitrage, monetizing cloud provider bonuses, and more aggressive methods such as black cards, fraudulent transactions, or API key theft. While the judgment criteria varied across responses, they all pointed to one issue: low prices are not from a single source, but rather a supply pool pieced together from multiple channels.
This also explains why it's difficult for users to assess risk. A request might go through the official channel today, the subscription account pool tomorrow, and then switch to a different model the day after due to upstream account bans. Users may see the same interface, the same model name, and the same balance page, but the backend may be constantly switching between them.
More restrained voices emerged in the replies section. Some users argued that a 90% discount doesn't necessarily equate to a black market card; price reductions could also stem from legitimate but opaque bulk discounts, caching, and routing optimization. This reminder is crucial. Classifying all intermediaries as illegal or fraudulent doesn't explain the market's long-term existence; however, if platforms don't disclose their sources, limits, fault handling, and data policies, users will find it difficult to consider them trustworthy infrastructure.
In other words, a low price is not the conclusion, but merely the starting point. What truly needs to be calculated is not just the token price, but also the model's authenticity, service stability, balance risk, and data flow.
With the discussion escalating to include data security, the risk is no longer just about "becoming less intelligent in response."
Data security is another frequently asked topic in Zhihu answers. Many users are no longer just worried about whether the model is "smarter," but rather about whose server their prompts, code, business documents, and keys pass through.
In typical chat scenarios, intermediary stations primarily impact response quality and billing experience. However, in AI programming, agent-based scenarios, and internal enterprise tools, requested content may include project structures, error logs, database fields, customer lists, contract terms, business plans, and internal meeting minutes. If an intermediary station records, retrieves, or resells this content, the risk extends far beyond just API billing.
The answers from legal and corporate governance perspectives elaborate on this issue. These answers mention that when businesses and professional service organizations use AI tools to process contracts, case materials, client data, and source code, they need to consider trade secrets, personal information, data export, client confidentiality obligations, and tool reliability. If the data transfer path involves unidentified intermediaries, it becomes difficult for businesses to answer questions such as whether data is retained, whether it is transmitted to third parties, whether it is processed overseas, how long logs are retained, and who has access to the backend.
Agent scenarios amplify this risk. While regular chat returns text, an agent might use the model's output to invoke tools, read files, execute commands, or access links. If the intermediary affects the model's returned content, the risk can escalate from "incorrect answer" to "incorrect execution." This is why the answer section repeatedly emphasizes against connecting unknown intermediaries to production environments, CI processes, internal knowledge bases, and automation tools.
This section of the discussion shifts the focus of the transit system from a consumer-level tool issue to an enterprise-level governance issue. For individual users, the risks are related to account balance, privacy, and user experience; for enterprises, the risks also include procurement compliance, supplier vetting, employee misuse, and the boundaries of liability after incidents.
The minimum consensus reached in Zhihu discussions: it can be used, but it shouldn't be used by default.
The discussion didn't yield a simple answer; no one can prove that all intermediaries are untrustworthy, nor can anyone prove that cheap tokens are necessarily secure. A closer consensus is that intermediaries can serve as tools for low-sensitivity, substitutable, and interruptible tasks, but they should not become the default entry point for all AI tasks.
Summarizing publicly available information, simple translations, toy projects, and low-risk testing are acceptable for small-scale trials. However, sensitive industry data involving company proprietary code, production logs, customer information, contracts, financial documents, investment and financing materials, medical and legal data, etc., should not be handed over to unknown intermediaries. When agents and automated execution are involved, extra caution should be exercised regarding tool calls, file reading, and key exposure.
Many users in the response section also gave similar usage advice: Do not make large top-ups; do not tie the entire workflow to a single relay station; keep official APIs, domestic models, or legitimate aggregators as backup lines; use fixed test questions to regularly check model quality; de-identify and abstract whenever possible; do not connect the relay station to the company's production chain.
These suggestions may sound simple, but they are far more valuable than simply recommending a platform. The allure of cheap tokens lies in their lower barrier to entry, but the true cost of using AI isn't just reflected in the price tag. The authenticity of the model, data flow, service stability, balance risk, and compliance responsibilities all lie beyond the price.
In the roundtable discussion on token economics, the transit station is just one aspect.
This is also why the "Token Economics" roundtable included this question.
In the context of cryptography, tokens are often discussed as assets, incentives, and governance tools; in the context of AI, tokens are more like a measurable production resource. They determine how frequently users can use the model, whether developers can integrate AI into their workflows, and whether enterprises are willing to include model usage in their long-term budgets.
The reason AI intermediary stations have sparked heated discussions isn't because they're particularly novel, but because they bring this sense of cost to the forefront for users. When model capabilities are priced in tokens, affordability, stability, security, and accountability are difficult to achieve simultaneously. Users' real concern isn't just whether there are hidden tricks behind the cheap tokens, but how much trust they're surrendering in order to save on access fees.
The transit hub may continue to exist for a long time. It solves real pain points related to access, payment, pricing, and multi-model integration. However, this Zhihu discussion has given a clear reminder: the more easily AI capabilities are acquired, the more users need to know where requests go, where models come from, and what data is left behind.



