Clearview doesn’t know how many individuals are in its database

Clearview AI‘s facial recognition database now has 30 billion images, according to CEO Hoan Ton-That. It’s the latest illustration of the platform’s rapid – and massive – scaling, with Clearview having announced that it reached the 10 billion-image milestone in February of this year, and gone on to reach 20 billion images in late March.

The latest development comes as Clearview seeks to expand its business activities into the private sector, having launched a commercial solution designed to enable selfie-based identity verification earlier this year. The system is designed to use facial recognition to match an end user to their official ID, and can also be set up to match a user against images in a given client’s database.

That framework wouldn’t necessarily benefit from an increase in the size of Clearview’s own image database, which exists mainly to support the company’s police and security-focused facial recognition solution. That system has made Clearview AI notorious among privacy advocates, owing primarily to its trawling of the internet, including social media profiles, to collect face images against which large-scale biometric searches may be performed. This approach to data collection has drawn not only criticism but also serious fines from privacy regulators.

For his part, Ton-That confirmed to FindBiometrics that Clearview “continues to collect images from the public internet,” but stressed that the company is only taking data from “public social media posts,” for this which there is no expectation of privacy. The system “doesn’t collect any private data,” the chief executive said.

See more: Metaverse: 5 content ideas to attract people to the metaverse

Ton-That also offered some further clarity about how Clearview AI’s system operates, explaining that it doesn’t contain a database of unique individuals – indeed, Clearview’s administrators don’t actually know how many individuals comprise its 30 billion face images.

“We don’t have a way from the way our database is setup to calculate the number of unique individuals,” he said. “Our database just returns the most similar to least similar photos in a search, and doesn’t have a concept of an ‘identity’, just an ordered list of results.”

Such an approach may at least partially address the concerns of privacy advocates. But Ton-That has also been eager to respond to broader criticisms of facial recognition technology with respect to racialized outcomes.

Many facial recognition algorithms have been found to show discrepancies in accuracy between subjects of different demographic groups. One study conducted by the National Institute of Standards and Technology (NIST) in 2019, for example, looked at 189 different algorithms, and found that depending on the system, Asian and African American subjects could be up to 100 times more likely to be misidentified than white males. Such disparities can have serious real-world consequences, such as wrongful arrests.

The disparities are generally believed to stem from inadequate training of machine learning systems, such as by using datasets of primarily white faces. But in a recent letter to the Pittsburgh Post-Gazette, Ton-That argued that the state of the art has advanced considerably in recent years.

“As of 2022, independent assessment by the National Institute of Standards and Technology, the world’s experts in algorithm evaluation, shows at least 25 algorithms exist that are 99 percent or more accurate at picking the correct matching image out of a lineup of millions of images, across all demographic groups,” he argued, adding, “These are the facts about the accuracy of facial recognition technology in 2022”.

For Clearview AI’s part, the company argued in a recent patent win announcement that its “unique data preparation and distributed training algorithms” helped to eliminate demographic bias in its facial recognition system. In that sense, at least, its enormous dataset appears to be a real asset in minimizing facial recognition’s potential for discrimination.