Monday, March 27, 2006

Google specific robots.txt

Matt Cutts: Gadgets, Google, and SEO � 2006 � March: "one more way to block Googlebot by using wildcards in robots.txt (Google supports wildcards like �*� in robots.txt). Here�s how:
1. Add the parameter like �http://www.mattcutts.com/blog/some-random-post.html?googlebot=nocrawl� to pages that you don�t want fetched by Googlebot.
2. Add the following to your robots.txt:

User-agent: Googlebot
Disallow: *googlebot=nocrawl

That�s it. We may see links to the pages with the nocrawl parameter, but we won�t crawl them. At most, we would show the url reference (the uncrawled link), but we wouldn�t ever fetch the page."

No comments: