Google has been powering almost all search queries for an eternity in internet years. It knows an awful lot about what we all search for. And they keep pushing into new ways to index data and mine the activity of people.
It started out pretty simple:
- Public content on the web (web page)
- Search queries
- Websites viewed as a result of search queries
And they kept adding individually trackable data on:
- Google Groups (including the entire known history of Usenet)
- posted content
- activity patterns (who responds to whom, etc...)
- Analytics
- tracking visitors to enabled websites, including data about where they are, what they do on the site, how they found the site, and what they click on
- Advertising tracking
- every page that serves you ads. and how you got there. and what you did there. and where you went afterward.
- Adsense
- if you put adsense ads on your site, they have your banking info as well.
- Doubleclick
- Google Alerts - search queries that you care enough to subscribe to.
- GMail
- all messages you send or receive
- contacts
- connection info when using other protocols (POP, IMAP, SMTP)
- Google Talk
- chat data
- contacts
- activity patterns (chat status, times active, locations, etc...)
- Google Calendar
- calendar data
- contacts
- who has access to shared calendars
- who is invited to your events
- subscriptions
- what calendars do you subscribe to?
- who subscribes to your calendars?
- Google Contacts - your addressbook. everyone you know.
- Orkut - professional social network (if you're Brazillian...)
- Google Docs
- document content
- contacts and activity
- collaborators
- viewers of published documents
- Maps & Earth
- your location via GPS, cell, IP location, etc...
- searched locations
- customized maps (paths, locations, notes, areas of interest)
- directions (from, to, method of travel)
- Street View
- image of locations
- data sniffed by camera vehicle
- WIFI hotspot location matching
- Satellite view
- Google DNS
- every server your computer contacts, on any protocol, including the time, location, and IP address of your requests.
- Bookmark sync
- Tasks
- GReader
- subscribed feeds
- activity
- read items
- starred items
- shared items
- time and location of user activity while reading feeds
- people you follow, and what they do
- people who follow you, and what they do
- Feedburner
- Everyone that subscribes to a blog powered by Feedburner
- who they are
- where they are
- what app(s) they use to read feeds
- matching other sites of interest to those subscribers
- News
- sources of news read by a person
- news items read
- Picasa Galleries
- photo data
- geolocation (time and place of photos)
- contacts (invited to view photos)
- face recognition (are you in a crowd photo somewhere?)
- Translated content
- pages translated
- source and target language(s)
- Online video (Google Video and YouTube )
- content uploaded
- activity
- views
- searches
- comments
- contacts (subscriptions, etc...)
- faves, playlists, etc...
- iGoogle Gadgets - which gadgets and data sources are grouped together? which ones used most often?
- Google Desktop
- searches (for Google query lookup)
- usage data
- Google Books
- which online books you read
- which parts of these books you read
- Google Notebook - content of notes
- Google Wave (if anybody used it)
- content
- activity
- contacts
- other apps and data that you integrate
- Buzz - wtf is buzz, anyway? but they index whatever it is...
- Google Checkout
- purchase history - merchant, item, price, time and location, etc...
- credit card info
- Google Health
- medical history
- hospitals and clinics you've used
- prescriptions
- Android
- where you are (location data sent by phone)
- what apps you have
- who you call, and who calls you
And now they're adding:
- Visitors to websites using WebFonts
- TV viewing activity
- Web apps through the proposed Chrome Web Store
I'm probably missing a bunch of stuff. Much of this is pretty innocuous. Much of it is opt-in, or voluntarily contributed. But the sheer scope and scale of the managed data, and the widely varied sources of the data, make it potentially possible for some interesting connections to be made. Sure, much of it is claimed to be anonymized, but there's not really any such thing as true anonymity.
What is to stop Google from connecting the dots to say "show me a list of people who have searched for 'alternative medicine' who have visited an out of country clinic, have a history of cancer, and have searched for 'google jobs' and 'insurance plan'?"
My point is, if any government agency proposed tracking this level of data on individuals, there would be (should be) riots in the streets. At the very least, it would be a high profile election issue.
But, we just accept it. Google makes things easy, so we just ignore what's going on. People complain about the "evil" of the iPhone App Store, because fart apps are not approved. But we then ignore that this much data is being systematically collected on us, by a company that chants "do no evil."
This isn't meant as a paranoid "the government is keeping aliens in area 51, and cars that run on water, man. WATER!" post. But there is something big going on, and we're complicit in it.
Update:
Oh, and they get private user data from social networking sites through advertising, without user's consent . Nice.