Skip navigation.
Home

Problems at Google Plex ?

General | Search Engines | Website Development

(c)2006 John Herron, CISSP at NIST.org

Webmasters are panicking, corporate offices are scrambling, accountants are warning, and blogs are full of theories. Over what? The latest virus or exploit? No, Google search results.

Google now means so much to the bottom lines of some companies that falling out of favor with the Gods at Google can mean going out of business. So when many companies found out they went from thousands (in some cases hundreds of thousands) of searchable pages down to ONE (yes "1") you can imagine the gut wrenching feeling. Did they just get banned from Google? If so it could take months, or years to get re-included. An owner of one privately held company was actually talking about a huge outlay for Google Adword advertising. In his words "I have a hundred mouths to feed". The irony of paying Google for advertising because his company was no longer searchable by them did not escape him. But as he put it "what choice do I have?".

So what is going on? No one is really sure and Google isn't saying. However the experts at SEO (Search Engine Optimization for those new to the magical art of Google optimizing) say not to panic. It has been known for several months that Google is working on new methods of indexing the web. The new Data Centers (DC's), known as Big Daddy (BD) DC's to the SEOs, are meant primarily to better weed out the search engine spammers. We've all tried to search for something only to find dozens of bogus pages that were put there simply to attract people pages with nothing but advertising. These sites may put up dozens of nearly duplicate pages with text designed only to fool the search engines. So the goal at Google is a more relevant search experience. Something we all want.

But Big Daddy alone doesn't explain how the content at hundreds (thousands?) of legitimate companies was no longer searchable. BD has been in testing for several months now. Problems as large as this one would have shown up in testing. There are dozens of theories as to whats going on. One theory is that this is not an inherently BD problem, but a transition phase. Its not going to do anyone any good to simply import all the old data, they need to clear out any old data that might be the least bit suspect, crawl those sites again, re-index everything, and rank those pages according to the new rules. The theory sounds good on the surface. But Google wouldn't have to remove the old content first. Its not an old fashioned spring cleaning where you throw everything in the house out on the yard, and just put back the stuff you need. They could easily have tagged those pages for deletion once all the new data was collected.

The most compelling reason I've seen is the good ol' fashion "ooops". Something went wrong. Possibly connected to the Big Daddy move, possibly not. It looks like a problem that will fix its self over time and apparently that is what Google is letting happen. After reviewing several sites, site log files, and reading more blog and forum postings than I care to think about, it does look like Google has pushed the big red button to switch over to Big Daddy. But BD apparently wasn't ready for the switch and it started off with data that was several months out of date. For example, non-BD servers would contain cached pages that were recent, but BD servers would show pages that were cached several months ago or no cache pages at all. This is especially problematic for newer sites that don't have old pages in Google for people to find. These new sites previously had to pass several Google algorithm checks over the course of weeks or months in order to rank high enough to be 'findable'. It appears that they have to go through that process again. Probably because these scores are stored separately and simply aren't available on BD since the data didn't exist in the version of data that BD servers started with.

Webmasters will probably notice a lot of GoogleBot activity in their logs as Google's bots seem to be crawling sites collecting data at a rate faster than ever before. This activity is to be expected considering the scenario above. People searching Google may have a harder than normal time finding recent pages for many sites. But give things a couple weeks and things will probably be back to normal. Hopefully better than normal if Big Daddy proves its worth the effort.

Microsoft recently announced that they are making big improvements to their MSN Search engine. So this is probably the beginnings of a search engine war.

=================================================================
Reprinted with permission from http://www.nist.org
(c)2006 John Herron, CISSP at NIST.org
=================================================================