One of the very first things that you should do when setting up your Magento webstore – SEO wise – is to check and configure your robots.txt file correctly.
Robots.txt file on your webstore gives you the options to prevent duplicate content of your website being indexed by search engines and gives you the options to hide content that you do not want indexed or shown on Search Result pages.
Robots.txt or Robbots exclusion standard is a protocol used to communicate with the web robots and web crawlers and allows you to instruct the crawlers what to crawl and what to ignore on your webstore.
The protocol itself contains the following directives which allow you to fine tune the way the web robots and web crawlers interact with your Magento website.
Crawl-delay directive – this directive allows you to set a timeout in seconds between successive requests to your webstore.
Allow directive – this directive marks areas of the website that are allowed for crawling.
Disallow directive – this directive marks areas of the website that you do not want the crawlers to view or index.
Host directive – this directive is used for setting up a preferred domain ( most commonly used to set www ws non-www domain )
The robots.txt protocol also allows you to target specific bots and crawlers ( Googlebot ) or match all of them, by using the User-agent: directive. (User-agent: Googlebot or User-agent: * )
Here’s an example of a boilerplate robots.txt with comments that you could use as a starting point for creating your own robotx.txt file:
# robots.txt boilerplate for Magento websites User-agent: * Allow: / # sitemap – you can set a path to your sitemap using this directive Sitemap: http://www.your-magento-website-name-here.com/sitemap.xml # disallow pages – you can disallow the crawling or indexation of # the pages of your website like this Disallow: /terms-and-conditions/ # disallow directories – make sure you disallow crawling of Magento’s # directories just in case your web server is not configured correctly # and your directories are listed Disallow: /404/ Disallow: /app/ Disallow: /cgi-bin/ Disallow: /downloader/ Disallow: /includes/ Disallow: /lib/ Disallow: /magento/ Disallow: /pkginfo/ Disallow: /report/ Disallow: /stats/ Disallow: /var/ # disallow files – magento does come with some standard files which there’s # really no need to be indexed on your website Disallow: /cron.php Disallow: /cron.sh Disallow: /error_log Disallow: /install.php Disallow: /LICENSE.html Disallow: /LICENSE.txt Disallow: /LICENSE_AFL.txt Disallow: /STATUS.txt # disallow website content – there are certain areas of the website which you # might want to block for robots as they are targeted for users only and have # no real value in search results Disallow: /catalog/product_compare/ Disallow: /catalog/product/gallery/ Disallow: /catalogsearch/ Disallow: /checkout/ Disallow: /contacts/ Disallow: /newsletter/ Disallow: /poll/ Disallow: /sendfriend/ Disallow: /tag/ Disallow: /wishlist/