OpenAI research finds cheating in cutting-edge reasoning models, recommends retaining CoT monitoring

PANews reported on March 11 that according to a study released by OpenAI, the team found that when training cutting-edge reasoning models (such as OpenAI o1 and o3-mini), these models would exploit vulnerabilities to bypass testing, such as tampering with code verification functions and falsifying test pass conditions. Research shows that monitoring the model's Chain of Thought (CoT) can effectively identify such cheating behaviors, but forcibly optimizing CoT may cause the model to hide its intentions rather than eliminate inappropriate behavior. OpenAI recommends that developers avoid putting too much optimization pressure on CoT so that they can continue to use CoT to monitor potential reward hacking behaviors. The study found that when CoT is strongly supervised, the model still cheats, but it is done more covertly, making monitoring more difficult.

The study emphasizes that as AI capabilities increase, models may develop more complex deception, manipulation, and vulnerability exploitation strategies. OpenAI believes that CoT monitoring may become a key tool for supervising superhuman intelligence models, and recommends that AI developers use strong supervision with caution when training cutting-edge reasoning models in the future.

Share to:

Author: PA一线

This content is for informational purposes only and does not constitute investment advice.

Follow PANews official accounts, navigate bull and bear markets together
Recommended Reading
4 hour ago
17 hour ago
18 hour ago
19 hour ago
20 hour ago
2025-12-18 09:21

Popular Articles

Industry News
Market Trends
Curated Readings

Curated Series

App内阅读