ZK-Machine Learning: Preserving Privacy while Advancing AI

ScalingX ｜2023-05-31 15:17

zkML, is essentially integrating ZK technology into AI software to overcome its limitations in privacy protection, data authenticity verification, and more.

The arrival of generative AI like ChatGPT and Midjourney has opened up new possibilities in various fields like design and art, software development, publishing, and even finance. Generative AI has been nothing short of a wonder that promises to supercharge human productivity and take us to the next level of creativity.

In order to develop the likes of ChatGPT and Midjourney to what they are today, it took years of research and massive amounts of data to be able to train the AI models behind these software. ChatGPT for one, had to be trained with around 570GB of datasets coming from web pages, books and other sources. A good portion of this data might have come from users who simply had no idea their personal data was being used to train AI software. While most of the data collected and used might be harmless to the user from which it originated from, some sensitive or private data may have inevitably fallen into the mix and fed to the model, all without the consent of the user.

Given the privacy concerns such a system creates, there is growing awareness and significance placed on the issues of data privacy and security. Calls have been made to find a harmonious equilibrium between exploiting the advantages of AI and safeguarding individual privacy rights. Fortunately, there is a promising technology that can help bridge this gap – zero-knowledge proofs (ZKPs).

What is zkML?

A zero-knowledge protocol is a method by which one party (the prover) can prove to another party (the verifier) that something is true, without revealing any information apart from the fact that this specific statement is true. Zero knowledge (ZK) technology has been steadily gaining momentum since 2022, witnessing substantial growth across the blockchain sector. Projects within the ZK space have been continuously striving and making significant advancements in the fields of scalability and privacy.

Machine learning is a branch of artificial intelligence that concentrates on developing systems capable of learning from past data, identifying patterns, and making logical decisions without significant human involvement. It is a data analysis technique that automates the creation of analytical models by utilizing various types of digital information such as numerical data, textual content, user interactions, and visual data.

In supervised machine learning, inputs are provided to a pre-trained model with predefined parameters and the model generates an output that can be utilized by other systems. It is crucial to underscore the significance of maintaining the confidentiality and privacy of both the input data and model parameters as the input data may encompass sensitive details like personal financial or biometric information, while the model parameters may include confidential elements like biometric authentication parameters.

Combine zero knowledge technology with AI, and you get zero knowledge Machine Learning (zkML), an ethical and powerful new technology that can revolutionize the way we work.

In a recent publication by the Modulus Labs team titled "The Cost of Intelligence," they conducted comprehensive benchmarking of various existing ZKP systems using a diverse set of models with varying sizes. Currently, the main application of ZK in the realm of on-chain ML is to validate accurate computation. But given sufficient time and development, ZKP, particularly SNARKs (Succinct Non-Interactive Arguments of Knowledge), could advance to a point where they can be used to ensure the privacy of users from an inquisitive verifier by preventing disclosure of their inputs.

zkML, is essentially integrating ZK technology into AI software to overcome its limitations in privacy protection, data authenticity verification, and more.

Use Cases of zkML

While zkML is still an emerging technology with plenty of unexplored possibilities, several prominent use cases have garnered attention. Some notable applications of zkML include:

Computational integrity (validity ML)

Validity proofs like SNARKs and STARKs possess the ability to verify the correctness of a computation, and this ability can be extended to ML tasks by verifying model inference or confirming that a specific input leads to a particular model output. The ease of proving and verifying that the output is a result of a specific model and input combination facilitates the off-chain deployment of ML models on specialized hardware, while conveniently verifying the zkps on-chain. For instance, Giza is assisting Yearn, a decentralized finance (DeFi) yield aggregator protocol, in demonstrating the accurate execution of a complex yield strategy that utilizes ML on-chain.

Fraud Detection

By leveraging smart contract data, anomaly detection models can be trained and subsequently endorsed by DAOs (Decentralized Autonomous Organizations) as valuable metrics for automating security procedures. This proactive and precautionary methodology enables the automation of actions such as pausing contracts when identifying potential malicious activities, thus enhancing their effectiveness.

Transparency in ML as a Service (MLaaS)

In the scenario where multiple companies offer ML models through their APIs, it becomes challenging for users to ascertain if the service provider is indeed offering the claimed model due to the opaque nature of the API. Providing validity proofs alongside a ML model API would be valuable in providing transparency to users, enabling them to verify the specific model they are utilizing.

Filtering in Web3 Social Media

The decentralized nature of web3 social applications is expected to lead to an increase in spam and malicious content. An ideal approach for a social media platform would be to utilize an open-source ML model that is collectively agreed upon by the community. Additionally, the platform could provide proofs of the model's inference when opting to filter a post. Daniel Kang's analysis of the Twitter algorithm using zkML provides further insights into this topic.

Preserving Privacy

The healthcare industry prioritizes privacy and confidentiality of patient data. By leveraging zkML, medical researchers and institutions have the ability to develop models using encrypted patient data, ensuring the protection of individual records. This enables collaborative analysis without the need to share sensitive information, thereby promoting progress in disease diagnosis, treatment efficacy, and public health research.

Overview of projects exploring zkML

Numerous applications of zkML are in the experimental phase, making frequent appearances at Hackathons in innovative new projects. zkML introduces fresh avenues for designing smart contracts, and there are several ongoing projects actively exploring its applications.

ZK-Machine Learning: Preserving Privacy while Advancing AI

Image via @bastian_wetzel

Modulus Labs: Engaging in both practical applications and pertinent research using zkML. They have exemplified the application of zkML through projects like RockyBot, an on-chain trading bot, and Leela vs. the World, a chess game where the entire human population competes against a verified, on-chain version of the Leela chess engine.
Giza: A protocol backed by Starkware that enables the deployment of AI models on-chain using a fully trustless approach.
Worldcoin: A protocol known as proof of personhood utilizing zkML. In this endeavor, Worldcoin leverages custom hardware to process detailed iris scans, which are incorporated into their Semaphore implementation. These iris scans enable essential functionalities such as membership attestation and voting.

Conclusion

Much like how ChatGPT and midjourney took countless iterations to get to where they are today, zkML is still undergoing continuous improvement and optimization, going through iteration after iteration to overcome challenges ranging from technical to practical aspects:

Quantization with minimal accuracy loss
Managing circuit size, particularly in multi-layer networks
Efficient proofs for matrix multiplication
Addressing adversarial attacks

Advancements are being made at an accelerated pace in the field of zkML, and there are expectations of zkML reaching a comparable level to the broader ML space in the near future, especially with the continued development of hardware acceleration technologies.

Incorporating ZKPs into AI systems can offer an enhanced level of security and privacy for both users and the organizations utilizing these systems. Therefore, we eagerly anticipate further product innovations in the field of zkML, where the combination of ZKPs and blockchain technology creates a secure and reliable environment for AI/ML operations within the permissionless world of Web3.