Author: DWF Ventures
Compiled by: Deep Tide TechFlow
Deep Dive: AI agents already account for nearly one-fifth of DeFi trading volume, and they have indeed outperformed humans in well-defined scenarios like yield optimization. However, when it comes to autonomous trading, even top AI performs less than one-fifth of top human performance. This research breaks down the real-world performance of AI in different DeFi scenarios and is worth reading for anyone interested in automated trading.

Key points
Automation and agent activities currently account for about 19% of all on-chain activities, but true end-to-end autonomy has not yet been achieved.
In narrow, well-defined use cases such as profit optimization, agents have demonstrated superior performance compared to both humans and bots. However, for multifaceted actions such as transactions, humans outperform agents.
Among agents, model selection and risk management have the greatest impact on trading performance.
As agents are adopted on a large scale, several risks arise regarding trust and enforcement, including Sybil attacks, policy crowding, and privacy trade-offs.
Agent activity continues to grow
Agent activity has grown steadily over the past year, with both transaction volume and number of transactions increasing. We've seen significant growth driven by Coinbase's x402 protocol, with players like Visa, Stripe, and Google joining in and launching their own standards. Much of the infrastructure currently being built is designed to serve two types of scenarios: channels between agents or agent calls triggered by humans.
While stablecoin trading has gained widespread support, the current infrastructure still relies on traditional payment gateways as its underlying layer, meaning it remains dependent on centralized counterparties. Therefore, the "fully autonomous" endgame where agents can self-finance, self-execute, and continuously optimize based on changing conditions has not yet been achieved.

Agents are not entirely new to DeFi. For years, on-chain protocols have featured automation through bots, capturing MEVs or obtaining excess returns that would be impossible without code. These systems functioned very well with well-defined parameters that didn't change frequently or require additional oversight. However, markets have become more complex over time. This is where we're seeing a new generation of agents emerge, with on-chain becoming a testing ground for such activities over the past few months.
The actual performance of the agent
According to the report, agent activity has grown exponentially, with over 17,000 agents launched since 2025. The total amount of automated/agent activity is estimated to cover over 19% of all on-chain activity. This is not surprising, as it is estimated that over 76% of stablecoin transfers are generated by bots. This indicates significant room for growth in agent activity within DeFi.
Agent autonomy ranges widely, from chatbot-like experiences requiring high levels of human oversight to agents that can formulate strategies adapted to market conditions based on target input. Compared to bots, agents offer several key advantages, including the ability to respond to and execute new information within milliseconds, and the ability to scale to thousands of markets while maintaining the same level of rigor.
Currently, most agents are still at the analyst to co-pilot level because most of them are still in the testing phase.

Profit optimization: Agent performance is excellent
Liquidity provision is an area where automation has become increasingly common, with agents holding a total TVL exceeding $39 million. This figure primarily measures assets directly deposited by users into the agent, but excludes capital routed through vaults.
Giza Tech, one of the largest protocols in the space, launched its first agent app, ARMA, late last year, designed to enhance yield capture for major DeFi protocols. It has attracted over $19 million in assets under management and generated over $4 billion in agent trading volume. The high ratio of trading volume to total assets under management indicates that the agent frequently rebalances capital, enabling it to achieve higher yield capture. Execution is automated once capital is deposited into the contract, providing users with a simple, one-click experience with minimal oversight.
ARMA's performance is measurably excellent, generating an annualized yield of over 9.75% for USDC. Even considering additional rebalancing fees and the agent's 10% performance fee, the yield still surpasses typical lending on Aave or Morpho. Nevertheless, scalability remains a critical issue, as these agents have yet to be battle-tested to manage or scale to major DeFi protocols.
Trade: Humans are far ahead
However, for more complex actions like trading, the outcomes are far more diverse. Current trading models operate based on human-defined inputs and provide outputs according to pre-defined rules. Machine learning extends this by enabling models to update their behavior based on new information without explicit reprogramming, pushing them into a co-pilot role. With the addition of fully autonomous agents, the trading landscape will undergo a dramatic transformation.
Several trading competitions have been held, both between agents and between humans and agents, and the results show significant differences between the models. TradeXYZ held a human-agent trading competition for the stocks listed on its platform. Each account had an initial capital of $10,000, with no restrictions on leverage or trading frequency. The results overwhelmingly favored humans, with top human performers outperforming top agents by more than five times.
Meanwhile, Nof1 hosted an agent trading competition between its models (Grok-4, GPT-5, Deepseek, Kimi, Qwen3, Claude, and Gemini) to test different risk profiles, ranging from capital preservation to maximum leverage. The results revealed several factors that can help explain the performance differences:
Holding time: There is a strong correlation. Models that hold each position for an average of 2-3 hours are significantly better than models that frequently flip positions.
Expected value: This measures whether the model makes money on average per trade. Interestingly, only the top 3 models have positive expected values, meaning that most models have more losing trades than winning trades.
Leverage: Lower leverage levels of 6-8 times on average have proven to outperform models running at more than 10 times leverage, while higher levels accelerate losses.
Tip: Monk Mode is the best-performing model by far, while Situational Awareness is the worst. Based on the model's characteristics, it shows that focusing on risk management and minimizing external sources leads to better performance.
The base model, Grok 4.20, significantly outperformed other models by more than 22% across different prompting strategies and was the only model with average profitability.
Other factors, such as long/short preference, trade size, and confidence scores, lack sufficient data or have been shown to have any positive correlation with model performance. Overall, the results indicate that agents tend to perform better within well-defined constraints, suggesting that human intervention in target allocation remains crucial.

How to evaluate an agent
Given that agents are still in their early stages, there is currently no comprehensive evaluation framework. Historical performance is often used as a benchmark for evaluating agents, but it is influenced by underlying factors that provide stronger indications of strong agent performance.
Performance under different volatility levels: including disciplined loss control when conditions worsen, demonstrating that the agent is able to identify off-chain factors that could affect trading profitability.
Transparency versus privacy: Both sides have their own trade-offs. A transparent agent, if its transactions can be actively copied, will essentially have no strategic advantage. A private agent faces the risk of being extracted from its creator's database, allowing the creator to easily steal their users.
Information Source: The data source that the agent connects to is crucial in determining how the agent makes decisions. Ensuring that the source is trustworthy and does not have a single dependency is essential.
Security: Having smart contract audits and a proper escrow structure to ensure backup measures are in place for black swan events is very important.
The Agent's Next Step
There is still much work to be done on the infrastructure front for large-scale agent adoption. This boils down to key issues surrounding agent trust and enforcement. The lack of safeguards for autonomous agent actions has led to instances of mismanagement of funds.
Launched in January 2026, ERC-8004 became the first on-chain registry, enabling autonomous agents to discover each other, build verifiable reputations, and collaborate securely. This is a key unlocking of DeFi composability, as trust scores are embedded in the smart contracts themselves, allowing permissionless activity between agents and protocols. However, this does not guarantee that agents will always operate in a non-malicious manner, as security vulnerabilities such as reputation collusion and Sybil attacks can still occur. Therefore, significant room for improvement remains in areas such as insurance, security, and the economic staking of agents.
As agent activity expands in DeFi, strategy crowding becomes a structural risk. Yield farms are the most obvious precedent, where returns are compressed as strategies become more widespread. The same dynamics may apply to agent trading. If a large number of agents are trained on similar data and optimized for similar objectives, they will converge on similar positions and similar exit signals.
The CoinAlg paper, published by Cornell University in January 2026, formalized a version of this problem. Transparent agents can be arbitrageurized because their transactions are predictable and can be preempted. Private agents avoid this risk but introduce a different one: the creator retains an informational advantage over their users and can extract value through the internal knowledge that opacity is supposed to protect.
Agent activity will only continue to accelerate, and the infrastructure laid today will determine how the next phase of on-chain finance operates. As agent usage increases, they will iterate and become more adept at adapting to user preferences. Therefore, the main differentiator will come down to trustworthy infrastructure, which will gain the largest market share.


