This happened when we shifted the server last time (some 6 months earlier). After the shift we were keeping a watch over the bots. We started facing THE PROBLEM with few bots, “Cache loss“.
I checked everything from robots.txt, .htaccess, php programs, frames and everything possible. Validated robots.txt, XHTML validation for all the pages to make sure I am not doing anything wrong.
It did no good. The number was going down and down, from over 20,000 to 10,000 and 10,000 to 5,000. It started worrying me and my team as search engines contributes for your traffic (almost 60% in our case).
Then I started investigating:-
- Investigation part 1:
I changed my user agent to Google bot to check like Google bot. I was still able to access the pages.
- Investigation part 2:
Checking the Log files manually. I could find no trace of Google bot.
- Investigation part 3:
Making sure that Google is having no problems at its end. I read almost all the recent search engine posting at webmasterworld, search engine watch , digg.com, webproworld, hedir.com, blogs like mattcutts.com. I found none. Our other sites were not loosing the cache either.
- Investigation part 4 to 100:
Did all possible checks.
No way out – Last shot
When we saw that there is no way out, we decided to swift the servers back. Then while testing with the http live header I saw that the header passed was with content type “text/html”.
Our servers were not passing content type “text/plain” for the txt files. I asked the questions at various forums and all said that it shouldn’t make any difference. I had no options, so thought of passing the right content type “text/plain”. I configured it and left it to God.
It was the Eureka moment as Google started visiting us again and cached all the pages soon. Believe it or not, the header matters for Google bot. They may correct it later but it certainly did matter that time for us.