The current mainstream real-world asset (RWA) discussion is dominated by traditional financial products: US Treasuries, private credit, gold-pegged tokens, and real estate on-chain assets. The logic behind this is simple: digitize assets that the financial world already values and migrate them to the blockchain to improve accessibility, transparency, and liquidity. But what if this narrow focus is actually a blind spot? This article will explore why the most valuable asset class may be overlooked in the current RWA discourse system: data. As we enter the era of decentralized AI, data should occupy a more important seat at the RWA table.
What is RWA?
Real-world assets are tangible or intangible assets from the physical world or traditional economic system, such as real estate, bonds, or commodities, which are represented on the chain in a tokenized form. These tokens can represent ownership, income rights, or other forms of economic utility, with the goal of introducing off-chain value into the decentralized finance (DeFi) system. RWA is the bridge between the real economy and the digital world. On the one hand, it releases the liquidity of traditional illiquid assets, and on the other hand, it also realizes programmable finance.
Currently, most discussions about RWA are still replicating the financial system it is supposed to disrupt. For example, the tokenization of U.S. Treasury bonds is developing rapidly; the private credit market is undergoing Web3ization; and even real estate and commodities have found counterparts on the chain. However, this focus may bring blind spots: it limits the space for blockchain innovation and is merely a technical renovation of the existing financial structure rather than a true exploration of new value carriers. At the same time, this path is also prone to falling into a closed loop of thinking, continuing to strengthen traditional financial logic rather than promoting the development of a new paradigm, thereby limiting the possibility of RWA subverting the global market and releasing economic potential.
Why is “data” a valuable RWA?
RWA can be seen as a new type of "stock". They are no longer just tied to companies, but anchored to asset classes with long-term economic utility. In this framework, data is not only valuable, but also strategically significant - it is the next main battlefield of global AI competition after chips.
As we have discussed in previous articles, high-quality datasets are quickly becoming the “digital gold” in the AI arms race. Today’s companies are competing not only for computing power, but also for clean, real, diverse, and global human data, which is the fuel for training and fine-tuning AI models.
In addition, according to statistics, the big data market size in 2023 is US$325.4 billion, and is expected to grow to US$1,035.4 billion by 2032, indicating that there is huge economic value hidden behind it.
Previous articles:
Just as gold ETFs have become a mainstream tool in the capital market, data-based RWAs also have the potential to open up a new trillion-dollar market. The logic behind this is consistent with how the capital market evaluates the exclusive data assets of AI companies: high-quality data itself constitutes an investable asset class.
Another key point to ensure the value of data is its "scarcity". In the era of AI, high-quality human-generated data is becoming scarce and precious. As synthetic content floods the Internet, the "real, clean, and diverse data" required to train models is becoming increasingly rare, and this scarcity further amplifies its value.
More importantly, data comes from real-world human behavior and activities, and has clear utility. You may not be able to touch it, but you can tokenize it, trade it, license it, and earn money from it.
Unlike bond tokens that sit flat in your wallet, data is meant to be used. Its utility is embedded in its existence, and demand is growing across industries: from healthcare to autonomous driving to climate analysis, almost every industry needs insightful data support. The more unique, verified, and structured a dataset is, the more valuable it is. Whether it is detailed consumer behavior trajectories, high-resolution satellite imagery, or anonymized medical records, data has become the cornerstone of decision-making in various industries.
How to tokenize datasets as real-world assets?
The core mechanism of RWA allows data to be expressed in the form of blockchain tokens, thereby achieving clear ownership, fine-grained permission control, divisibility, and convenient transfer. For example, a scientific research institution can tokenize its specific scientific data set, allowing other researchers to purchase partial access rights or jointly participate in the construction of a data pool.
Data tokenization refers to expressing data sets in the form of blockchain assets so that they can be traded, divided, and their origin verified. Just as gold or real estate ownership can be put on the chain, tokenized data can also anchor access rights, licensing income, or model call rights.
Challenges and considerations
The process of using data as RWA is bound to be long and complex, and there are currently few mature frameworks, technical standards or infrastructure in the market. The main challenges include:
Smart contract design: The technical implementation is relatively simple, but how to design a contract structure that transparently reflects data ownership, licensing rights, and profit distribution will be a major challenge.
Revenue flow and utility: The value of data tokens depends on whether they are actually used by AI developers, etc., such as payment by call volume. Mechanisms are needed to introduce revenue into contracts and distribute them while preventing the system from being abused.
Valuation Conundrum: How to objectively value a dataset? Value may depend on its uniqueness, timeliness, quality, relevance, and ability to generate insights. Developing a widely accepted valuation mechanism will be key.
Provenance and quality verification: Ensuring that tokenized data is always authentic, accurate, and timely, especially for dynamic datasets, is technically challenging.
Privacy and security: When data is tokenized and transmitted on-chain, how to protect its sensitivity? Cutting-edge encryption schemes and access control mechanisms are required.
Privacy compliance: Tokenizing human-generated data may raise a series of issues regarding data privacy regulations (such as GDPR, HIPAA). The existing legal system needs to keep pace with the times to adapt to decentralized data ownership and consent-based authorization mechanisms.
Conclusion: The “missing puzzle piece” of RWA?
If the mission of RWA is to bring the most valuable elements of the real world into Web3, then "data" must not be left out. It is the fuel of the AI economy, the invisible foundation behind all intelligent systems, and may also be the most liquid, programmable, and globalized type of RWA currently available.
With the rise of decentralized AI, the market will increasingly need open, permissionless access to high-quality data, and tokenized data is the most elegant infrastructure to achieve this future. Data RWA may not only be a fringe direction, it has the potential to become the next core theme that dominates the RWA narrative. And this story has just begun.
Author: Dr. Max Li, Founder of OORT and Professor of Columbia University
Originally published in Forbes: https://www.forbes.com/sites/digital-assets/2025/07/09/why-is-the-ai-engine-data-the-most-overlooked-real-world-asset/
