Finally, on 8th June Google posted an article about launching their new search engine indexing architecture. It’s been a long time coming, it was first talked about in August and many rumours already existed about it apparently being live.
As the web has evolved, Google have needed to stay ahead of other search engines and the best method to achieve this is to provide the best results.
This new Web indexing architecture will allow Google to digest the many pages of content and media that goes live but critically, display it within its search engine result pages (SERPs) in half the time.
What does the change bring? Well Google have stated that:
“Caffeine provides 50 percent fresher results for web searches than our last index, and it’s the largest collection of web content we’ve offered. Whether it’s a news story, a blog or a forum post, you can now find links to relevant content much sooner after it is published than was possible ever before.”
Content is indexed at a faster rate
The old method would see Google crawl a webpage and start to process all its information. This would be a complex series of layers to help evaluate the web and relevancy of this page. Once processed it would reach the main layer and be added into the search index. This created a bottleneck scenario; the other layers would process at a higher rate and perhaps individually but “the main layer would update every couple of weeks” as it was done in batches.
The new method would reduce this process dramatically as the search index would be updated on a continuous basis, with pages processed in parallel rather than in huge batches of separate layers. It doesn’t just stop there, Google have taken this architecture to another level.
Content on the web
Google has always treated all web content as a ‘document’ processing all its various factors. The new foundation to Google Caffeine not only increases the amount of content now being indexed, it can also identify it by type. Content isn’t simply text on a page anymore. Today the web provides various alternatives like video, presentations, PDF documents, blogs, newspapers, forums, e-books, and importantly social media.
How is it even possible to process all this data?
In truth, I don’t know. The amount of information being processed and the storage space is huge. They said the storage would amount to a pile of iPods which would stretch to around 40 miles. Great Britain is roughly 600 miles from tip to tip so it would take Google 15 days and 9,375,000 iPods to make a line straight down Great Britain. How long would it take for Google to cover the equator?
How the changes will affect organic search
At the moment it is basically a caffeine boost to the search engine index which benefits content distributers and Google searchers. It will likely help to react better towards content which receives attention from the likes of mashable and digg. However, this is the first step for Google:
“We’ve built Caffeine with the future in mind. Not only is it fresher, it’s a robust foundation that makes it possible for us to build an even faster and comprehensive search engine that scales with the growth of information online, and delivers even more relevant search results to you. So stay tuned, and look for more improvements in the months to come.”
Deliver even more relevant search results? This maybe something to think about in the future, we may not only be looking at factors like relevancy but perhaps how up to date the information is.
For instance, I ran a search for “Google Caffeine” and on the first page there was an article from the Telegraph, posted on 10th November 2009 which talks about Google releasing information that they are working on Caffeine. Considering the huge amount of content the web will have seen on Google Caffeine recently, due to its official launch, has this page now been made redundant? Especially given that the newer pages on the web will describe the actual confirmed changes?
Thankfully, the official ‘Google Caffeine is live’ announcements (yes they posted it twice on different blogs) are already ranking at the top but then that’s Google. Watch a YouTube Video where Matt Cutts talks about Google Caffeine on its official launch, courtesy of search engine land:
Effects already noticed due to Caffeine
Here at Zen Web Solutions, we see many websites ‘go live’ and wait for them to be indexed. Recently we’ve monitored this more closely to see the effects of Caffeine and it seems Google has improved in this area. A monitored domain was picked up within the first hour or two, then indexed properly the next day. In under a week, all the pages – and new articles – were fully indexed on the web site, which included the Google ‘cached’ versions.
Others within the SEO industry have noticed minor changes; for example, JohnMu confirmed the link counts in Google Webmaster Tools increasing because it has started to use more data from Caffeine.
That’s quite a benefit for SEO as you don’t have to wait around for a site to be indexed, you can push on with any optimisation. Interestingly, the site isn’t yet in bing….










