Something strong has been brewing in Mountain View California, at the HQ Google programmers call home. In response to the recent explosive growth in online information and instant connection and communication, Google has revised its index, and named the new search indexing system Caffeine.
This revision is in keeping with the recent changes to the look of Google’s search results page and their keyword search tool. Google’s new search skin allows you to sort the search results by news feeds, videos, shopping, blogs and discussions.
Here’s why the change to the indexing system is important: it’s focused on finding the latest relevant content.
Let’s break it down:
When you search Google, you’re searching their index of the web, not the actual web itself. The index is informed by the spiderbots that constantly go out, ‘read’ websites and report back. The old index was layered, and to renew a layer with current web information meant searching and analyzing the entire web. Google found that this layered format meant that there was a lag between the time that new information was added to the web, and when that information was added to the layer.
The new indexing system analyzes portions of the web, and updates the index continuously. According to Google, Caffeine “processes hundreds of thousands of pages in parallel.” By breaking the web into smaller sections and sending out search groups, Caffeine can get closer reporting what’s relevant in real time. Here’s how they picture it on their official blog when they announced the change:
The exponential growth in popularity of YouTube, Facebook, Twitter, and FourSquare reflect the desire for real-time connectivity, and coverage of events – large and small – in real time. What’s being said now, filmed now, happening now, and where it’s happening this instant, has changed the content on the web and what we expect to find when we search it. Caffeine reflects that change.
Here’s a timely example, look at the search results for the query ‘world cup’:
Right after the URL at the bottom of the link is a time stamp. The stamp reflects when the information was indexed, or Google’s most recent stored version of that page. While the 4 minute stamp seems impressive, if we look at Google’s search results in their ‘Updates’ category:
We find that the is coming in so quickly it’s posting as ‘seconds ago,’ and the stream of new postings is rolling so quickly it’s hard to keep up!
Google’s goal has always been the most relevant information. But in this 24-hour news cycle and Twittering world, ‘current’ is becoming stronger. The Caffeine indexing system is Google’s latest quest for relevancy. Does it work for you?