PANews reported on April 30th that DeepSeek has proposed a "Visual Primitives" method, which addresses the reference gap problem in multimodal tasks by embedding basic visual units such as points and boxes into the inference chain. This method is based on the DeepSeek-V4-Flash architecture and achieves low image token consumption through compressed key-value caching. In counting and spatial inference benchmarks, its performance is comparable to GPT-5.4, Claude-Sonnet-4.6, and Gemini-3-Flash (limited to certain dimensions). The team stated that they will open-source some benchmarks and data in the future, and the model weights will be released after integration.
DeepSeek releases visual primitive reasoning method to enhance multimodal complex reasoning capabilities.
Share to:
Author: PA一线
This content is for market information only and is not investment advice.
Follow PANews official accounts, navigate bull and bear markets together
Recommended Reading
PANews App
24/7 blockchain news tracking and in-depth analysis.

