Grep to Extract E-Mail Addresses from a Text File
Sometimes you just need a list of e-mail addresses from text files on your computer. I have personally needed this while managing an e-mail server.
Here is the scenario, given a text file that has e-mail addresses intermixed with other text, extract a sorted list of e-mail addresses.
While there are commercial applications to do this, if you have a Unix-based system then you have all of the tools that you need available at the command line.
For an input file called EMAIL_SAMPLES.TXT
, this will work:
grep -o '[[:alnum:]+\.\_\-]*@[[:alnum:]+\.\_\-]*' EMAIL_SAMPLES.TXT | sort | uniq -i
Let’s break down the call pipeline:
grep -o
scans the text file for matches to the requested regular expression and prints each match to a line'[[:alnum:]+\.\_\-]*@[[:alnum:]+\.\_\-]*'
is a regular expression that matches e-mail addressessort
takes the list of e-mail addresses produced by grep and sorts them alphabeticallyuniq -i
filters out repeated e-mail addresses so that each is only listed once. The-i
flag instructs it to use a case insensitive comparison of lines.
If you need to run this more than once, it makes sense to create a custom shell script like this:
#!/usr/bin/env bash
if [ -f "$1" ]; then
grep -o '[[:alnum:]+\.\_\-]*@[[:alnum:]+\.\_\-]*' "$1" | sort | uniq -i
else
echo "Expected a file at $1, but it doesn't exist." >&2
exit 1
fi
And now the same can be accomplished by running emails.sh EMAIL_SAMPLES.TXT
.
This goes to show just how flexible the standard Unix tools are. They can be connected together to accomplish really neat tasks without the need for more complicated code. Anyone with a Mac OS X or Linux system have all that it takes at their finger tips.