Web giant insists anti-bot service isn't used for personalized ads – but cookie claims don't quite add up
Analysis Six years ago, Google revised its reCAPTCHA service, designed to filter out bots, scrapers, and other automated web browsing, and allow humans through to websites.
The v2 update in 2014 added an iframe or HTML Inline Frame, which is a way of embedding one web page in another. Then there was the v3 update in 2018, which added machine learning to the mix, to reduce the need for interaction with bot detection challenges.
reCAPTCHA makes it possible for the internet giant to challenge netizens to prove they are real people, by completing picture puzzles and the like, while providing plumbing to potentially funnel information about folks into its advertising business. Google insists it doesn't use reCAPTCHA data for personalized adverts, and says as much in the reCAPTCHA terms of service.
Yet the Silicon Valley corp's fine-print and other disclosures stop short of saying reCAPTCHA is completely quarantined from all ad-related data collection. And privacy researchers now argue that the company needs to clarify that point.
Zach Edwards, co-founder of web analytics biz Victory Medium, found that Google's reCAPTCHA's JavaScript code makes it possible for the mega-corp to conduct "triangle syncing," a way for two distinct web domains to associate the cookies they set for a given individual. In such an event, if a person visits a website implementing tracking scripts tied to either those two advertising domains, both companies would receive network requests linked to the visitor and either could display an ad targeting that particular individual.
Two different domains generally shouldn't have access to the same set of cookie data, based on the distinction between first-party and third-party resources in the web browser security model. But triangle syncing dissolves that separation.
Triangle of ad success?
"Triangle syncs expand an advertising universe and make it possible to target someone across more domains," Edwards told The Register.
It's a common practice in advertising, he said, so that two separate companies with two separate domains can share data, such as the identifiers associated with a particular individual. And it's also done within a single company like Google that operates more than one domain and wants to track internet users across the different domains.
"So reCAPTCHA's gstatic.com domain doing a triangle sync to google.com basically ensures that a user can be found/tracked if either of those domains is embedded into a website," Edwards said.
According to Google, the company doesn't use reCAPTCHA for triangle syncing and reCAPTCHA loads static resources from two places on gstatic.com, with no cookies written or read. No triangle request or sync is done as part of this process, we were told. And the gstatic.com domain is supposedly "cookieless," in that it has been designed to be unable to collect cookie data.
Yet, reCAPTCHA JavaScript code hosted at Google's gstatic.com domain includes multiple references to cookies. And visiting a web page embedded with a reCAPTCHA widget does set a google.com "NID" preference cookie, even if you try to block third-party cookies.
...
Connect With Us