Coinbase Cuts AI Spending Nearly in Half by Defaulting to Open-Weight Models Like GLM and Kimi

PANews, June 29 – Coinbase CEO Brian Armstrong shared the company’s experience in optimizing AI spending. He noted that while token usage has grown exponentially, AI spending has been cut by nearly half through better defaults, routing, and caching strategies, rather than relying on usage caps and alerts. On defaults, Coinbase is using its LLM gateway to set open-weight models (such as Zhipu’s GLM 5.2 and Moonshot AI’s Kimi 2.7) as the default options, while encouraging engineers to pick the right model for specific tasks. 91% of the company’s employees have never hit the usage cap, so the team opted to shift to cheaper defaults instead of lowering caps. At the routing level, Coinbase preprocesses prompts and routes tasks to the most suitable model based on cache hit rates and model pricing, and believes AI can eventually automate this selection process. On caching, all of Coinbase’s requests are cache-aware, and LibreChat’s cache hit rate has climbed from 5% to 60%.

Armstrong noted that the goal is not to suppress usage, but to build infrastructure that makes exponential growth sustainable. He emphasized keeping context lean, reducing wasted tokens, and providing usage visibility—the higher the AI spend, the higher the expectation of impact on output.

Share to:

Author: PA一线

This content is for market information only and is not investment advice.

Follow PANews official accounts, navigate bull and bear markets together
PANews APP
Rumor: Baidu Kunlun Chip to IPO in Hong Kong, Valuation Target $50 Billion
PANews Newsflash