It's been quite a while since I did anything with awk, so I wasn't sure how well ghostdog's code would work. It looked like it should handle only alphabetics with no more than one component on each side of the "@". So I made up a test file (test.txt):
- this is a test file foo@bar.com we are looking for moo@drop.dhcp.bar.com email
-
addresses inside, 00test@leo.bar.com, a text file with no
-
particular fname.lname@bar.baz.net other par72@take.the.bus.au restrictions
-
on the format or locations of the 23skidoo@bar.co.uk addresses inside the file.
-
Let's try one at the end joe27@aol.com.
-
I ran ghostdog's awk script on this and got the output:
- foo@bar.com
-
moo@drop.dhcp.bar.com
-
00test@leo.bar.com,
-
fname.lname@bar.baz.net
-
23skidoo@bar.co.uk
-
Note that this output has
FIVE email addresses, but the file has
SEVEN so there is something wrong. The two that are omitted have digits just beside the "@" so it looks like I was close but not quite right on how much awk would match with this RE. It catches everything between spaces into $i whenever it matches /[[:alpha:]]@[[:alpha:]]/
But note that it also caught the comma following the third address "00test@leo.bar.com," which it should not include in the email address.
Here's a Perl one-liner:
- perl -wne'while(/[\w\.]+@[\w\.]+/g){print "$&\n"}' test.txt
This gives the output
- foo@bar.com
-
moo@drop.dhcp.bar.com
-
00test@leo.bar.com
-
fname.lname@bar.baz.net
-
par72@take.the.bus.au
-
23skidoo@bar.co.uk
-
joe27@aol.com.
which is almost correct (and does not include the comma following number 3, although it does include the period at the end).
Here's a corrected version:
- perl -wne'while(/[\w\.]+@[\w\.]+\w+/g){print "$&\n"}' test.txt
This yields
- foo@bar.com
-
moo@drop.dhcp.bar.com
-
00test@leo.bar.com
-
fname.lname@bar.baz.net
-
par72@take.the.bus.au
-
23skidoo@bar.co.uk
-
joe27@aol.com
I'm sure ghostdog74's awk script could also easily be fixed, but as I said, it's been a long time and I'm not sure how much I want to play with it. ;)
HTH,
Paul