Author: Frank, PANews
If you were given $10,000, which AI would you choose to trust to manage your investments?
Previously, PANews reviewed nof1.ai's AI trading competition (related reading: Six AI "Traders" in a Ten-Day Showdown: A Public Lesson on Trend, Discipline, and Greed ). However, the nof1.ai competition was limited to a specific market period, and the final trading capabilities of each AI model did not seem to be fully demonstrated under that specific trading timeframe. Furthermore, there is a pressing need to clarify the actual predictive capabilities of AI models under different conditions. In addition, with various AI companies recently releasing their latest large-scale models, the ranking of these models is currently undergoing a reassessment.
To unravel this mystery, PANews organized an "AI Trader Showdown" to understand the ability of large AI models to judge market trends and plan trades in different scenarios. For example, which timeframes are they better at analyzing, and whether the success rate of AI predictions improves with the aid of indicators.
We extended the timeline to the present, randomly selecting 100 real market data slices from Binance BTC historical data to construct three extreme testing scenarios: "4-hour naked candlestick chart," "15-minute short-term chart," and "4-hour full indicator chart." The six participants represent the pinnacle of computing power in China and the US: Gemini-3-pro, Doubao-1.6-vision, DeepSeek V3.2, Grok 4.1, GPT-5.1, and Qwen3-max.
This test collected 15-minute candlestick chart data for the Binance BTC spot trading pair from August 2017 to the present, and 4-hour candlestick chart data from 2021 to the present. Fifty images of 100 candlesticks were randomly generated for each period. The 4-hour chart was divided into two types: one with only candlesticks and trading volume, and another with indicators such as EMA, SMA, Bollinger Bands, MACD, and RSI. The 15-minute candlestick charts were all naked candlestick charts (with trading volume). The AI was simultaneously provided with the specific price or indicator data values corresponding to the current candlestick chart. All AI output results can be viewed here .
4-hour chart with indicators
4-hour pure candlestick chart
During the testing process, the data and commands acquired by each large model were exactly the same. From another perspective, this also put these large models to the test in terms of their multimodal capabilities (DeepSeek, because it only has text-based large models, ultimately only received data information and did not transmit images).
Gemini 3: The King of Naked Trading Sealed by "Indicators"
Gemini 3 is currently the most popular AI model, and based on media reviews and tests since its release on November 18th, it can be considered the most powerful multimodal AI model available. However, in this trading prediction test, Gemini 3's results were not the best, only average. In the three scenarios (4-hour chart without indicators, 4-hour chart with indicators, and 15-minute chart without indicators), Gemini 3 performed best in the 4-hour chart without indicators scenario, achieving a win rate of 39.58%, followed by the 15-minute chart without indicators scenario at 34.04%. However, with indicators (for the same timeframe), the accuracy rate for the 4-hour period actually dropped to 31%, the worst among the three scenarios.
From this perspective, Gemini 3 seems to excel at pure candlestick chart patterns, while adding indicators makes it more susceptible to interference. In actual trading, without indicators, Gemini 3 seems more willing to open positions; in pure candlestick charts, it enters the market in 95% of cases, while this percentage drops to 71% after adding indicators. It's worth noting that Gemini 3 is also the only model to profit in the 4-hour pure candlestick chart scenario.
In the 15-minute scenario, Gemini 3 performed best overall, with a total position profit of 15.34%, while in the scenario with indicators, it actually lost 21.18%. However, this profit is also a short-term stroke of luck. Looking at the profit-loss ratio data for each instance, Gemini 3's expected profit (win rate * profit-loss ratio) is always below 1, which means that in the long run, it is in a state of losing money.
DeepSeek V3.2: A rock-solid "ultra-short-term order-brushing machine"
DeepSeek is the model with the best overall win rate among the six models, and it is also the most stable. In the three scenarios (4-hour bare candlestick chart, 4-hour chart with indicators, and 15-minute bare candlestick chart), the win rates are 40%, 41.38%, and 42.86%, respectively. This shows that DeepSeek's predictive ability is relatively stable across different timeframes and with or without indicators.
However, DeepSeek's overall profitability has been poor due to its low profit-loss ratio, averaging only 1.25. This tendency to take profits too early reflects DeepSeek's lack of ability to let profits run. Consequently, its expected profit is almost always around 0.5, indicating a lack of long-term profitability. Furthermore, DeepSeek is relatively conservative in its position-opening decisions, with an overall position occupancy rate of only 58%.
Doubao: The "All-Around MVP" of this competition
In this test, Doubao1.6-vision achieved the best overall results. In the 4-hour chart with indicators, Doubao1.6-vision achieved the highest win rate of 50%, with a final return of 22.2%. Simultaneously, in the 15-minute short-term timeframe, it also achieved an overall return of 8.2%. It is the only model that consistently generates profits across two different dimensions (short-term and 4-hour indicators).
Furthermore, Doubao1.6-vision's results were not achieved with a relatively conservative approach, but rather with an average opening position ratio exceeding 92%. In other words, Doubao1.6-vision opened positions in the vast majority of situations. However, relatively speaking, Doubao1.6-vision's performance is also highly dependent on indicator signals; the difference in total profit with and without indicators is 38%. Additionally, looking at the profit/loss ratio data, Doubao1.6-vision had a relatively high break-even ratio in both periods of positive returns, which is also a reason for its overall excellent performance.
Grok 4.1: A “Radical Gambler” from xAI
Grok 4.1's overall style is bold but relies heavily on indicators, while also being willing to pursue larger profits. In three scenarios, Grok 4.1 achieved a 34.69% win rate only in the 4-hour weekly chart with indicators; the win rates in the other two scenarios were extremely low. In the 4-hour chart with only candlestick patterns, the win rate was only 14.58%, and in the 15-minute chart, it was 26.53%. However, its average position opening ratio was as high as 98%, indicating a willingness to open positions in almost all candlestick scenarios. From this perspective, Grok 4.1's style is more like that of a gambler who can't control their impulses.
However, Grok 4.1's profit/loss ratio is often quite high, averaging 2, the highest among all models. Overall, however, entrusting your funds to Grok 4.1 is not a wise choice.
GPT 5.1: The Extremely Cautious "Deadly Bear" Pessimist
GPT 5.1's trading style is completely opposite to Grok 4.1's. GPT 5.1 is extremely cautious, choosing to remain on the sidelines in most cases. In the end, out of 150 tests, only 52 trades were executed, with an average position size of only 0.34%.
However, even this cautious approach didn't yield better win rates for GPT 5.1. In its best-case scenario, it only achieved a 35% win rate. Furthermore, compared to the later stages of 4-hour and 15-minute charts, GPT 5.1 is clearly less adept at long-term entry points; even with technical indicators, its win rate in the 4-hour timeframe was only 27%. It was only in the 15-minute timeframe, thanks to a higher profit/loss ratio (2.02), that it achieved positive returns, ultimately reaching 9.9%.
In addition, GPT 5.1 is characterized by a pronounced pessimism and a strong penchant for short selling. Over 70% of its orders are short positions.
Qwen 3: A Risk-Averse Person Who Is Concise in Words
Qwen 3 was clearly the most cautious large model, opening only 44 positions across all tests, a position opening rate of only 29%. However, like GPT, this extreme caution did not translate into a higher win rate. Its average win rate was only 34%, with its best performance occurring in a 4-hour chart with indicators.
Furthermore, Qwen 3 also boasts a relatively high profit/loss ratio of 1.96. This suggests it's a risk-averse trader, better suited to reducing the number of trades while allowing profits to run. In a 4-hour timeframe with indicators, Qwen 3's expected profit is also closest to profitability, reaching 0.95, the highest among all models.
Data summary
Summarize:
In summary, we may have gained the following insights from these AI-simulated trading processes.
First, for the vast majority of models, having indicators provides greater confidence than using only candlestick charts. With indicators, the average win rate of these six models reached 38%, while without indicators, the win rate was only 30%.
Second, AI may be better suited for short-term trading than long-term trading. In a 15-minute pure candlestick chart scenario, the average win rate of the six major models reached 34%, higher than the 30% of the 4-hour timeframe. Three of the six models were profitable (Gemini, GPT, Doubao), and the average profit/loss ratio was generally good.
Third, completely entrusting your trading to AI is not advisable. In this test, the expected profit of all AI models was below 1. This means that, in the long run, given this win rate and profit/loss ratio, they will all ultimately result in losses. The only difference is the speed at which they lose (however, since the AI models were not specifically tuned, and only relatively simple, commonly used indicators were used). Therefore, if you want AI to trade on your behalf, you may need a more complex tuning process and more backtesting data.
As this computing power showdown concludes, and we look at the final account balances, the most important lesson we learn may not be "which model is the strongest," but rather "where the boundaries of AI trading lie." The ultimate conclusion is that while today's AI may not be able to directly replace a top-tier fund manager, it has evolved into a relatively sophisticated trading assistant in one area. Some excel at chart analysis, some at risk control, and some at data analysis to achieve a stable win rate. However, given the growing expectations for AI, the question of whether it can replace humans in trading remains complex.
