Recent News
May 04, 2012
Next week upgrades and testing of the core network has been planned affecting services in our Auckland DC.Read More
Apr 18, 2012
It's a new year with lots of changes at Simplehost. Behind the scenes we have been upgrading our infrastructure to give faster performance, adding new customer experience features in our plans, and lowered our prices.Read More
Robots.txt what is it and what does it do?
robots.txt is a plain text file placed in the root (top most folder) of your website. It is used to restrict what search engine robots aka crawlers should index on your site. You should only use it if you want certain pages NOT to be indexed and displayed as search results, i.e. on google.com
The contents of the file must conform to strict guidelines but it is still rather simple to understand. It may or may not exist on your site, it is not required and completely optional. If you do decide to create one, place it in the root folder, and edit with a plain text editor such as notepad on windows or textedit on a mac. Simply place the exact content at the top of the file and nothing else.
An example of disallowing any search engine from crawling your site, which will prevent your site from showing on search results:
User-agent: *
Disallow: /
Or… to disallow just a folder and its contents:
User-agent: *
Disallow: /cgi-bin/
To disallow a single page:
User-agent: *
Disallow: /secret.html
And finally multiple pages and folders:
User-agent: *
Disallow: /images/
Disallow: /secret.html
Disallow: /secret2.html
We recommend you use robots.txt carefully to avoid accidentally blocking your site from search engines and evidently users not being able to find your site on the Internet.

