…"/>
Rietta
Rietta: Web Apps Where Security Matters
You are reading The Rietta Blog, a publication about the web since 2005.

Grep to Extract E-Mail Addresses From a Text File

Comments

Sometimes you just need a list of e-mail addresses from text files on your computer. I have personally needed this while managing an e-mail server.

Here is the scenario, given a text file that has e-mail addresses intermixed with other text, extract a sorted list of e-mail addresses.

While there are commercial applications to do this, if you have a Unix-based system then you have all of the tools that you need available at the command line.

For an input file called EMAIL_SAMPLES.TXT, this will work:

1
grep -o '[[:alnum:]+\.\_\-]*@[[:alnum:]+\.\_\-]*' EMAIL_SAMPLES.TXT | sort | uniq -i

Let’s break down the call pipeline:

  • grep -o scans the text file for matches to the requested regular expression and prints each match to a line
  • '[[:alnum:]+\.\_\-]*@[[:alnum:]+\.\_\-]*' is a regular expression that matches e-mail addresses
  • sort takes the list of e-mail addresses produced by grep and sorts them alphabetically
  • uniq -i filters out repeated e-mail addresses so that each is only listed once. The -i flag instructs it to use a case insensitive comparison of lines.

If you need to run this more than once, it makes sense to create a custom shell script like this:

~/bin/emails.sh
1
2
3
4
5
6
7
  #!/usr/bin/env bash
  if [ -f "$1" ]; then
    grep -o '[[:alnum:]+\.\_\-]*@[[:alnum:]+\.\_\-]*' "$1" | sort | uniq -i
  else
    echo "Expected a file at $1, but it doesn't exist." >&2
    exit 1
  fi

And now the same can be accomplished by running emails.sh EMAIL_SAMPLES.TXT.

This goes to show just how flexible the standard Unix tools are. They can be connected together to accomplish really neat tasks without the need for more complicated code. Anyone with a Mac OS X or Linux system have all that it takes at their finger tips.

About Frank Rietta

Frank Rietta's photo

Frank Rietta is specialized in working with startups, new Internet businesses, and in developing with the Ruby on Rails platform to build scalable businesses. He is a computer scientist with a Masters in Information Security from the College of Computing at the Georgia Institute of Technology. He teaches about security topics and is a contributor to the security chapter of the 7th edition of the "Fundamentals of Database Systems" textbook published by Addison-Wesley.

Comments