OpenClaw proxy task evaluation: Gemini 3 Flash success rate 95.1%, GPT-4o 85.2%.

PANews reported on March 8th that SlowMist CISO 23pads published an article on the X platform stating that the PinchBench benchmark test evaluates the performance of AI large language models in the OpenClaw agent task. The results show that Gemini 3 Flash leads with a success rate of 95.1% in processing the OpenClaw task, while minimax-m2.1 and kimi-k2.5 rank second and third with 93.6% and 93.4% respectively. Claude Sonnet 4.5 achieves 92.7%, and GPT-4o achieves 85.2%.

Share to:

Author: PA一线

This content is for market information only and is not investment advice.

Follow PANews official accounts, navigate bull and bear markets together
PANews APP
DBS Bank Singapore will offer tokenized gold trading to retail clients.
PANews Newsflash