Rietta: Web Apps Where Security Matters

Introduction to RoboGen

This product is legacy software that is no longer maintained, nor supported by Rietta Inc. This page is preserved for historical purposes. See the listing of Rietta’s legacy software for a complete list of this software.

Search engines such as Excite and AltaVista use web spiders, also known as robots, to create the indexes for their search databases. These robots transverse HTML trees by loading pages and following hyperlinks, and they report the text and/or meta-tag information to create search indexes. ROBOTS.TXT, a file that spiders look in for information on how the site is to be cataloged. It is a ASCII text file that sits in the document root of the server. It defines what documents and/or directories that confirming spiders are forbidden to index.

The robot exclusion protocol was introduced by Martijn Koster in 1994 to deal with problems that had been arising due to the increasing popularity of the internet and the toll web spiders were having on system resources. Some of the problems were caused by robots rapid-firing requests, that is loading pages in rapid succession. Other problems such as robots indexing information deep in directory trees, temporary information, and even accessing cgi-scripts. The robot exclusion protocol was quickly adopted by webmasters and web robot makers as a way to organize and control the indexing process.

Since then, the size of the Internet has increased dramatically and millions of people are using it. The number of web robots crawling the web is greater than before and it is more important than ever for all web sites to have a properly created and maintained ROBOTS.TXT file.

With RoboGen you create robot exclusion files by selecting All Robots or a specific user-agent and adding documents and/or directories by entering the path names manually or by selecting them using FTP. Once all the restrictions and directives are set you can save the robots.txt file to your hard drive or upload it directly to your server.

It is important to remember that robot exclusion files are not a security measure. Some robots will simply ignore the file and others may purposely load the documents that the files marks as disallowed. This means that robot exclusion files are really only useful for controlling what appears in search engines.