Author: xiyu
Want to use Claude Opus 4.6 but don't want your bill to explode at the end of the month? This article will help you cut costs by 60-85%.
1. Where are the tokens spent?
Do you think tokens are just "what you say + what AI says in response"? Actually, they are much more than that.
Hidden costs of each conversation:
System Prompt (~3000-5000 tokens): OpenClaw core instruction, cannot be modified.
Context file injection (~3000-14000 tokens): AGENTS.md , SOUL.md , MEMORY.md , etc., included in every conversation—this is the biggest hidden overhead.
Historical messages: The conversation gets longer and longer
Your input + AI output: This is what you think is "all"
A simple "How's the weather today?" message actually consumes 8,000-15,000 input tokens. Using Opus, the context alone costs $0.12-0.22.
Cron is even more ruthless: each trigger = a completely new conversation = re-injecting the entire context. A cron job that runs every 15 minutes, 96 times a day, will cost $10-20 in Opus fees the following day.
Similarly, Heartbeat is essentially a dialogue call, and the shorter the interval, the more expensive it becomes.
II. Model Layering: Daily Sonnet, Key Opus
The number one money-saving trick, and the most effective. Sonnet is priced at about 1/5 of Opus, and it's more than enough for 80% of daily tasks.
markdown
提示词:
请帮我把OpenClaw 的默认模型改为Claude Sonnet,
只在需要深度分析或创作时使用Opus。
具体需要:
1) 默认模型设为Sonnet
2) cron 任务默认用Sonnet
3) 只有写作、深度分析类任务指定Opus
Opus Scenarios: Long article writing, complex code, multi-step reasoning, creative tasks
Sonnet scenarios: casual conversation, simple Q&A, cron checks, heartbeat, file operations, translation.
Actual test results: After switching, monthly costs decreased by 65%, with almost no difference in user experience.
III. Context Slimming: Eliminating Hidden Token Holders
The "noise floor" for each call can be 3,000-14,000 tokens. Simplifying the injection file is the most cost-effective optimization.
markdown
提示词:
帮我精简OpenClaw 的上下文文件以节约token。
具体包括:1) AGENTS.md 删掉不需要的部分(群聊规则、TTS、不用的功能),压缩到800 tokens 以内
2) SOUL.md 精简为简洁要点,300-500 tokens
3) MEMORY.md 清理过期信息,控制在2000 tokens 以内
4) 检查workspaceFiles 配and remove unnecessary injection files.
Rule of thumb: For every 1000 tokens less injected, assuming 100 Opus calls per day, you can save approximately $45 per month.
IV. Cron Optimization: The Most Insidious Cost Killer
markdown
提示词:帮我优化OpenClaw 的cron 任务以节约token。
请:
1) 列出所有cron 任务及其频率和模型
2) 把所有非创作类任务降级为Sonnet
3) 合并同时间段的任务(比如多个检查合为一个)
4) 降低不必要的高频率(系统检查从10 分钟改为30 分钟,版本检查从3 次/天改为1 次/天)
5) 配置delivery 为send notifications on demand; no messages will be sent under normal circumstances.
Core principle: More frequent is not necessarily better; most "real-time" requirements are false requirements. Merging 5 separate checks into a single call saves 75% of context injection cost.
V. Heartbeat Optimization
markdow n
提示词:帮我优化OpenClaw heartbeat 配置:
1) 工作时间间隔设为45-60 分钟
2) 深夜23:00-08:00 设为静默期
3) 精简HEARTBEAT.md 到最少行数
4) 把分散的检查任务合并到heartbeat 批量执行
VI. Precise Search: Save 90% on Input Token with qmd
When the agent searches for information, it defaults to "reading the full text"—a 500-line file contains 3000-5000 tokens, but it only needs 10 lines. 90% of the input tokens are wasted.
QMD is a local semantic search tool that builds a full-text + vector index, allowing the agent to accurately locate paragraphs instead of reading the entire file. All computations are performed locally, with zero API cost.
Use with MQ (Mini Query): Preview directory structure, accurately extract paragraphs, and search for keywords—reading only the required 10-30 lines at a time.
markdown
提示词:
帮我配置qmd 知识库检索以节约token。
Github地址:https://github.com/tobi/qmd
需要:
1) 安装qmd
2) 为工作目录建立索引
3) 在AGENTS.md 中添加检索规则,强制agent 优先用qmd/mq 搜索而非直接read 全文
4) 设置定时更新index updates
Actual results: The cost of each data lookup decreased from 15,000 tokens to 1,500 tokens, a reduction of 90%.
The difference between memorySearch and qmd: memorySearch manages "recall" ( MEMORY.md ), while qmd manages "data search" (custom knowledge base), and they do not affect each other.
VII. Memory Search Selection
markdown
提示词:帮我配置OpenClaw 的memorySearch。
如果我的记忆文件不多(几十个md),
推荐用本地嵌入还是Voyage AI?
请说明各自的成本和检索质量差异。
Simple conclusion: Use local embedding for fewer memory files (zero cost), and use Voyage AI (200 million tokens per account for free) for high multilingual requirements or many files.
VIII. Final Configuration List
markdown
提示词:
请帮我一次性优化OpenClaw 配置以最大限度节约token,按以下清单执行:
默认模型改为Sonnet,只保留创作/分析任务用Opus
精简AGENTS.md / SOUL.md / MEMORY.md
所有cron 任务降级Sonnet + 合并+ 降频
Heartbeat 间隔45 分钟+ 深夜静默
配置qmd 精准检索替代全文读取
workspaceFiles 只保留必要文件
记忆文件定期精简,MEMORY.md 控制2000 tokens 以内
Configure once, benefit long-term:
1. Model layering—Sonnet routine, Opus key, saving 60-80%
2. Context-based optimization—Simplified files + precise QMD search, saving 30-90% of input tokens.
3. Reduce invocations—merge cron jobs, lengthen heartbeats, and enable silent periods.
Sonnet 4 is already very powerful; you won't notice the difference in everyday use. Just switch to it when you really need Opus.
Based on practical experience with multi-agent systems, the data are anonymized estimates.

