Rietta
Rietta: Web Apps Where Security Matters
You are reading The Rietta Blog, a publication about the web since 2005.

Re: David Bartosik: Why Robots.txt by Matt Benya

I came across a blog article, David Bartosik: Why Robots.txt by Matt Benya, which happens to mention RoboGen, a program for editing robots.txt files that I wrote nearly six years ago! I do enjoy finding references to my previous work. Mr. Benya’s explanation on of the robots.txt file reminds me of a situation I came across a few weeks ago.

I had logged into one of the web servers and noticed the system was not responding as snappily as usual. I turns out the load average was at 15%, caused by a large number of instances of a customer CGI script. Fortunately, these scripts were being run by a particular user so I was able to find and inspect the tail end of the log file and determined that ZyBorg, from wisenutbot.com, was rapidly accessing the dynamically generated site by a CGI-interface Perl script. In order to get the server load under control, I created a robots.txt for the site and blocked the ZyBorg user-agent from indexing the Perl scripts for the site. Fortunately the robot did comply with the exclusion standard and the rapid-fire crawling stopped.

While this story has nothing to do with RoboGen, I used Vim in the SSH session, it does show one concrete example of the continued applicability of the robot exclusion standard.

About Frank Rietta

Frank Rietta's photo

Frank Rietta is specialized in working with startups, new Internet businesses, and in developing with the Ruby on Rails platform to build scalable businesses. He is a computer scientist with a Masters in Information Security from the College of Computing at the Georgia Institute of Technology. He teaches about security topics and is a contributor to the security chapter of the 7th edition of the "Fundamentals of Database Systems" textbook published by Addison-Wesley.