For 15 years, you've been training AI for Google—you just didn't know it.

The article reveals how reCAPTCHA has been used by Google to collect free user data for training AI systems. When users complete verification online, they are actually labeling images to improve Google Maps and Waymo's autonomous driving technology. Key points include:

reCAPTCHA evolved from a simple verification tool to a data labeling platform for training visual recognition.
Users label images of traffic lights, crosswalks, etc., from Street View, helping AI learn real-world objects.
Massive scale: 200 million verifications daily, 500,000 hours of free labor, worth $5 million per day.
This data underpins Waymo, valued at $45 billion.
The latest version, v3, collects data by analyzing user behavior, continuing implicit use.
Irony: Users prove they are human by doing tasks AI couldn't do, but help make AI replaceable.

Every time you click "identify traffic lights" or "select all pedestrian crossings" on a webpage, you think you're just proving you're not a robot. But in reality, you're providing free training data for Google's AI system. This has been going on for over 15 years, involving hundreds of millions of users worldwide, ultimately building Google Maps' visual recognition capabilities and Waymo, the self-driving car company now valued at $45 billion . Throughout this process, no one asked for your consent, no one told you the truth, and no one paid you a single penny .

Original post by @sharbel

Compiled by: Big Claws | PANews Lobster

500,000 hours of free human labor. Every day. Contributed by people who think they're just logging into their bank accounts.

reCAPTCHA is the most successful covert data harvesting operation in internet history. At its peak, 200 million people were completing its verification every day. Almost none of them knew what they were actually building.

Waymo—Google's self-driving car company—is now valued at $45 billion. A significant portion of its key training data comes from you. For free. From every website you visit.

Here is the full story.

Starting point: A clever idea

In 2000, spam bots were destroying the internet. Forums were flooded with spam, and email inboxes were overwhelmed. Websites desperately needed a way to distinguish between humans and machines.

Carnegie Mellon University professor Louis von Ahn solved this problem. He invented CAPTCHA: a distorted script that only humans can read. Robots cannot pass through it, but humans can.

But von Ahn saw even greater possibilities. Millions of people were already expending their cognitive energy on these validations. What if that energy could be used to do two things at once?

In 2007, he launched reCAPTCHA. The ingenious part was that it didn't display random gibberish, but rather two words. One was known to the system, and the other came from a real scanned book that computers couldn't yet recognize. Your answer helped complete the digitization work.

These books come from The New York Times archive and Google Books—a total of 130 million books.

You think you're logging in, but you're actually doing OCR (Optical Character Recognition) for the world's largest digital library.

In 2009, Google acquired reCAPTCHA.

Google changed the game

The era of distorted text ended around 2012.

Google faces a new problem. Street View cars are photographing every road on Earth, but the photos are just raw data. For AI to be truly useful, it needs to understand what it "sees": road signs, crosswalks, traffic lights, storefront signs.

So Google redesigned reCAPTCHA v2. The verification content changed from distorted text to an image grid: "Click on all squares containing traffic lights." "Select every pedestrian crossing." "Identify store signs."

These images are directly from Google Street View.

Every click you make is a label. Every choice you make tells Google's computer vision model: this pixel is a traffic light, this shape is a pedestrian crossing.

You're not passing a test, you're building a dataset.

The scale that nobody talks about

At its peak, 200 million reCAPTCHAs were completed every day.

Each verification takes about 10 seconds, which means there are 2 billion seconds of human labor every day—equivalent to 500,000 hours per day .

The market price for professional data annotation ranges from $10 to $50 per hour. At the lowest price, the value of labor extracted for free each day is as high as $5 million.

Moreover, reCAPTCHA isn't confined to a single application; it's ubiquitous—in every bank, every government portal, every e-commerce platform, and every login page on the internet. You have no choice. Want to access your account? Label the dataset first.

Google never asked for your opinion, never paid you, and never even told you about this.

What does all of this build?

This data was directly transmitted to both products.

Google Maps. The world's most widely used navigation tool. Its ability to read road signs, locate businesses, and understand urban geography is partly built on billions of manual annotations contributed by people trying to log onto the site.

There's also Waymo.

Waymo is Google's self-driving car project, which became an independent subsidiary in 2016. For safe navigation, self-driving cars need to recognize thousands of visual patterns with near-perfect accuracy: traffic lights, crosswalks, pedestrians, and stop signs.

The real training data required for these recognition capabilities? It's labeled by millions of people using reCAPTCHA—who are completely unaware of it.

In 2024, Waymo completed over 4 million paid rides and currently operates in San Francisco, Los Angeles, and Phoenix, and continues to expand. Its valuation is $45 billion .

The foundation of this building was built by free internet users who only wanted to send and receive emails.

Why can't anyone replicate all of this?

Data annotation is prohibitively expensive. Companies like Scale AI, Appen, and Labelbox exist solely to solve this problem. They employ hundreds of thousands of workers to annotate images, sometimes paying less than a dollar per hour.

Google solved this problem in a completely different way: they made labeling mandatory. No payment was given, no consent was required; instead, it was treated as an "entry fee" for accessing every website on the web.

Results: Billions of labeled images covering the globe, encompassing various weather conditions, time periods, and every city on Earth.

No labeling company can do that. The internet itself is that factory, and everyone in it is an employee who has never signed a contract.

What you are still doing today

Released in 2018, reCAPTCHA v3 doesn't show you any verification challenges at all. It observes how you move your mouse, how you scroll the page, and how long you hover over things. Your behavioral fingerprint tells it whether you're human.

This behavioral data is also fed back into Google's AI system.

You never actively chose to join; there was never a checkbox for you to select it. And right now, you're still doing it on most of the websites you visit.

An irony that should give everyone pause for thought.

Louis von Ahn's initial idea was a stroke of genius: to redirect the cognitive energy that humanity would otherwise expend on filtering out junk to something of value—digitizing the world's books to solve a real problem.

Google's application of this idea is another matter entirely.

They took a security mechanism that users had no choice but to use, deployed it across the entire internet, and reaped the benefits, building a commercial product worth hundreds of billions of dollars.

Users received nothing, not even the right to know.

The deepest irony is this: you spent years proving yourself to be human—by performing visual recognition tasks that AI couldn't yet accomplish. And once AI learned this skill, human visual annotation became superfluous.

You proved yourself to be human by making yourself replaceable.

Sources: Carnegie Mellon University, Google Blog (2009), WebProNews, MakeUseOf, MIT Technology Review, Waymo public disclosure documents.