Webmaster Tutorial Menu
Tutorial Home
Life before .htaccess
You have to start somewhere
Becoming a Webmaster
The steep learning curve
What to look for in books
How many hats?
Technical Job Description
Linux or Microsoft?
Standard web stuff
Basic HTML
Frames and/or Flash?
Site submission
Negotiating Links
Validating your HTML
Web safe fonts
Web safe colours
Different screen & monitor sizes
Cascading Style Sheets (CSS)
'Nix specific stuff
Choosing a 'Nix hosting company
Web Logs Demystified
Web Log Status codes
Limitations of robots.txt (and the power of .htaccess)
cost  effective,   fast  loading,   lightweight,   high  return  websites

Trying to keep the robots at bay

Q: If robots get your web site into search engines why on earth would you want to keep them away?
A: Because many robots do things other than put your site into search engines, most notably they look for email addresses and send you spam.
So, the textbook solution is for you to put a file into the highest level directory on your site (if your hosting company allows this) called 'robots.txt'.
There is a robots.txt standard to be used when you build the file.
Whether a robot looks for and / or reads and obeys this file is entirely up to the robot.
'But wait a minute' I hear you say. 'Surely the people who create robots to collect email addresses have little or no respect for my privacy and will ignore the file?'. And the answer is 'Correct!'
Later on I discuss ways to avoid these robots - but the techniques used vary depending on the type of hosting you chose when you started. This is a great example of why I'm glad I chose to host with a Linux hosting company.
Still, robots.txt is a useful file ocassionally and the major search engines respect it.
[Content of this page last reviewed: 12-Jun-2004]
_ _
copyright © 2001-2007 bpresent
Terms of Use, Privacy Policy, Copyright and Trade Marks
Sydney +61 438 726 669