Since 1999

 

2 minutes estimated reading time.

Grep to Extract E-Mail Addresses from a Text File

Sometimes you just need a list of e-mail addresses from text files on your computer. I have personally needed this while managing an e-mail server.

Here is the scenario, given a text file that has e-mail addresses intermixed with other text, extract a sorted list of e-mail addresses.

While there are commercial applications to do this, if you have a Unix-based system then you have all of the tools that you need available at the command line.

For an input file called EMAIL_SAMPLES.TXT, this will work:

grep -o '[[:alnum:]+\.\_\-]*@[[:alnum:]+\.\_\-]*' EMAIL_SAMPLES.TXT | sort | uniq -i

Let’s break down the call pipeline:

  • grep -o scans the text file for matches to the requested regular expression and prints each match to a line
  • '[[:alnum:]+\.\_\-]*@[[:alnum:]+\.\_\-]*' is a regular expression that matches e-mail addresses
  • sort takes the list of e-mail addresses produced by grep and sorts them alphabetically
  • uniq -i filters out repeated e-mail addresses so that each is only listed once. The -i flag instructs it to use a case insensitive comparison of lines.

If you need to run this more than once, it makes sense to create a custom shell script like this:

  #!/usr/bin/env bash
  if [ -f "$1" ]; then
    grep -o '[[:alnum:]+\.\_\-]*@[[:alnum:]+\.\_\-]*' "$1" | sort | uniq -i
  else
    echo "Expected a file at $1, but it doesn't exist." >&2
    exit 1
  fi

And now the same can be accomplished by running emails.sh EMAIL_SAMPLES.TXT.

This goes to show just how flexible the standard Unix tools are. They can be connected together to accomplish really neat tasks without the need for more complicated code. Anyone with a Mac OS X or Linux system have all that it takes at their finger tips.