For more official policy you'd have to ask either Amit or Matt. I can't speak fo...

gojomo · on Feb 5, 2011

I've seen no statement that Google throws out Toolbar (and other) clickstream data for sites/pages that Googlebot can't visit (which includes not just robots-precluded but also login-required pages). Not that I think you should throw such data out; that's not what robots.txt was meant for, and the user arguably has more claim to that interaction trail than the site. But that seems the standard you're suggesting.

If Google doesn't want IE features or the Bing Toolbar observing its site interactions, it can disallow such visitors. A steep price to pay, at too coarse a level of control? Yes, just like a site deciding to bar Googlebot.

I would agree that a 'fair use'-like analysis makes sense.

I would further agree that any site solely, or predominantly, powered by indirect observations of Google users would be an unfair taking. You'd crush such a site in court.

Meanwhile, a site that tallies Google referrer inclicks for itself, or for a network of participating sites (as with analytics inserts), even republishing summaries of Google source URLs and search terms as public data, is almost certainly fair use. It's taking data you're dropping freely onto third-party site logs, and making a transformative report of it.

What Bing is doing seems to me somewhere in-between. The mechanism avoids literal copying of specific artifacts but the net effect in some cases approaches the same result. As with other 'fair use' analysis, it's rarely black-and-white. The magnitude of the information used, its effects on the market, and the value-added transformation afterward are all important. I don't know how a court would rule in such a suit but the discovery process would surely be fun for spectators like myself!