Re: David Bartosik: Why Robots.txt by Matt Benya
I came across a blog article, David Bartosik: Why Robots.txt by Matt Benya (Archive.org Link), which happens to mention RoboGen, a program for editing robots.txt files that I wrote nearly six years ago! I do enjoy finding references to my previous work. Mr. Benya’s explanation on of the robots.txt file reminds me of a situation I came across a few weeks ago.
I had logged into one of the web servers and noticed the system was not responding as snappily as usual. I turns out the load average was at 15%, caused by a large number of instances of a customer CGI script. Fortunately, these scripts were being run by a particular user so I was able to find and inspect the tail end of the log file and determined that ZyBorg, from wisenutbot.com, was rapidly accessing the dynamically generated site by a CGI-interface Perl script. In order to get the server load under control, I created a robots.txt for the site and blocked the ZyBorg user-agent from indexing the Perl scripts for the site. Fortunately the robot did comply with the exclusion standard and the rapid-fire crawling stopped.
While this story has nothing to do with RoboGen, I used Vim in the SSH session, it does show one concrete example of the continued applicability of the robot exclusion standard.