Additional Resources
General ROBOTS.TXT Information
Martijn Koster's site about the Robot Exclusion Protocol.
This is the official definitive site on robot exclusion.
Who's Knocking on the Door?
This article by Rhoda Schueller presents a very good explanation of why a web
developer would want to utilize robot exclusion.
Spider
Spotting Chart
This is a simplified chart provide by the search engine watch. All of
this information is available from the RoboGen database.
Server-side Robot Enforcement and Fighting Spam
Note: These ideas are long term spam war ideas, not a quick-fix filtering solution. If you are looking for something good for filtering in Outlook,take a look at SpamNet. We have an article on using SpamAssassin in Ximian Evolution.
Robotcop, robots.txt:it's the law
Robotcop is an open-source module for webserves which helps webmasters enforce the
disallow rules in the robots.txt file. Basically, if a spider reads the robots.txt file
and attempts to load any of the disallowed files, its IP will be firewalled from
viewing any pages at all. This has implications for fighting the robots that harvest
e-mail addresses for spamming.
Using Apache to stop bad robots.
Block the spiders that collect e-mail addresses for spamming purposes. Create a "honey pot" for crawlers and list it in your robots.txt file. If a crawler picks it up, they are caught red-handed! The method is not fool-proof, but interesting.
Firewall rules and robotcop take this idea to the next level.
Fighting Spam with DNS
Not directly related to robots.txt, but has implications for fighting spam in general.
RoboGen Links
RoboGen
Help File
The documentation for the RoboGen software is also available
online.
BORIS THE SPIDER STRIKES AGAIN (Article no longer available on Internet.com, but is can be found on the author's personal site.)
(Boardwatch Sept. 1999). RoboGen is
mentioned on the second page of this article on web spiders by Thom Stark.
Free Webmaster Tools, Issue 11. (Article no longer available online)
The editors for this eZine selected RoboGen as the #1 web site for the week in the July 1999 issue.






