DeepSeek releases visual primitive reasoning method to enhance multimodal complex reasoning capabilities.

PANews reported on April 30th that DeepSeek has proposed a "Visual Primitives" method, which addresses the reference gap problem in multimodal tasks by embedding basic visual units such as points and boxes into the inference chain. This method is based on the DeepSeek-V4-Flash architecture and achieves low image token consumption through compressed key-value caching. In counting and spatial inference benchmarks, its performance is comparable to GPT-5.4, Claude-Sonnet-4.6, and Gemini-3-Flash (limited to certain dimensions). The team stated that they will open-source some benchmarks and data in the future, and the model weights will be released after integration.

Share to:

Author: PA一线

This content is for market information only and is not investment advice.

Follow PANews official accounts, navigate bull and bear markets together
PANews APP
Coinbase 上线 MEGA 永续合约与现货交易
PANews Newsflash