OpenClaw proxy task evaluation: Gemini 3 Flash success rate 95.1%, GPT-4o 85.2%.

PANews reported on March 8th that SlowMist CISO 23pads published an article on the X platform stating that the PinchBench benchmark test evaluates the performance of AI large language models in the OpenClaw agent task. The results show that Gemini 3 Flash leads with a success rate of 95.1% in processing the OpenClaw task, while minimax-m2.1 and kimi-k2.5 rank second and third with 93.6% and 93.4% respectively. Claude Sonnet 4.5 achieves 92.7%, and GPT-4o achieves 85.2%.

Share to:

Author: PA一线

This content is for market information only and is not investment advice.

Follow PANews official accounts, navigate bull and bear markets together