Connecting Tech Pros Worldwide Forums | Help | Site Map

Extract email addresses from big file.

Newbie
 
Join Date: Feb 2007
Posts: 5
#1: May 17 '07
Hey.

I have a big text file with data,
and i want to extract mail addresses.

How i can do it?

arne's Avatar
Expert
 
Join Date: Oct 2006
Posts: 306
#2: May 17 '07

re: Extract email addresses from big file.


Quote:

Originally Posted by superc0red

Hey.

I have a big text file with data,
and i want to extract mail addresses.

How i can do it?

I guess there are plenty of ways to do it. Any constraints on the tool/language?
Newbie
 
Join Date: Feb 2007
Posts: 5
#3: May 17 '07

re: Extract email addresses from big file.


perl / shellscript using awk-sed-cut ??
arne's Avatar
Expert
 
Join Date: Oct 2006
Posts: 306
#4: May 17 '07

re: Extract email addresses from big file.


Quote:

Originally Posted by superc0red

perl / shellscript using awk-sed-cut ??

Perl is certainly a reasonable choice, yes. If I had to do it, I would use it.
Motoma's Avatar
Moderator
 
Join Date: Jan 2007
Location: Maine, USA
Posts: 2,904
#5: May 17 '07

re: Extract email addresses from big file.


Regular expressions would be a great way to do this. Try looking at the sed tool.
Expert
 
Join Date: Apr 2006
Posts: 512
#6: May 18 '07

re: Extract email addresses from big file.


Expand|Select|Wrap|Line Numbers
  1. awk '
  2. {
  3.   for (i=1;i<=NF;i++) {
  4.        if ( $i ~ /[[:alpha:]]@[[:alpha:]]/ )  { 
  5.       print $i      
  6.        }
  7.   }
  8. }' "file"
  9.  
Newbie
 
Join Date: Feb 2007
Posts: 5
#7: May 18 '07

re: Extract email addresses from big file.


Thanx for the code dude :)
prn's Avatar
prn prn is offline
Expert
 
Join Date: Apr 2007
Location: Muncie, IN
Posts: 237
#8: May 21 '07

re: Extract email addresses from big file.


It's been quite a while since I did anything with awk, so I wasn't sure how well ghostdog's code would work. It looked like it should handle only alphabetics with no more than one component on each side of the "@". So I made up a test file (test.txt):

Expand|Select|Wrap|Line Numbers
  1. this is a test file foo@bar.com we are looking for moo@drop.dhcp.bar.com email
  2. addresses inside, 00test@leo.bar.com, a text file with no
  3. particular fname.lname@bar.baz.net other par72@take.the.bus.au restrictions
  4. on the format or locations of the 23skidoo@bar.co.uk addresses inside the file.
  5. Let's try one at the end joe27@aol.com.
  6.  
I ran ghostdog's awk script on this and got the output:
Expand|Select|Wrap|Line Numbers
  1. foo@bar.com
  2. moo@drop.dhcp.bar.com
  3. 00test@leo.bar.com,
  4. fname.lname@bar.baz.net
  5. 23skidoo@bar.co.uk
  6.  
Note that this output has FIVE email addresses, but the file has SEVEN so there is something wrong. The two that are omitted have digits just beside the "@" so it looks like I was close but not quite right on how much awk would match with this RE. It catches everything between spaces into $i whenever it matches /[[:alpha:]]@[[:alpha:]]/

But note that it also caught the comma following the third address "00test@leo.bar.com," which it should not include in the email address.

Here's a Perl one-liner:
Expand|Select|Wrap|Line Numbers
  1. perl -wne'while(/[\w\.]+@[\w\.]+/g){print "$&\n"}' test.txt
This gives the output
Expand|Select|Wrap|Line Numbers
  1. foo@bar.com
  2. moo@drop.dhcp.bar.com
  3. 00test@leo.bar.com
  4. fname.lname@bar.baz.net
  5. par72@take.the.bus.au
  6. 23skidoo@bar.co.uk
  7. joe27@aol.com.
which is almost correct (and does not include the comma following number 3, although it does include the period at the end).

Here's a corrected version:
Expand|Select|Wrap|Line Numbers
  1. perl -wne'while(/[\w\.]+@[\w\.]+\w+/g){print "$&\n"}' test.txt
This yields
Expand|Select|Wrap|Line Numbers
  1. foo@bar.com
  2. moo@drop.dhcp.bar.com
  3. 00test@leo.bar.com
  4. fname.lname@bar.baz.net
  5. par72@take.the.bus.au
  6. 23skidoo@bar.co.uk
  7. joe27@aol.com
I'm sure ghostdog74's awk script could also easily be fixed, but as I said, it's been a long time and I'm not sure how much I want to play with it. ;)

HTH,
Paul
Newbie
 
Join Date: Jul 2007
Posts: 1
#9: Jul 27 '07

re: Extract email addresses from big file.


Hi.
Thanks for this. I was using it for a while and thought it was wonderful. However it misses the legitimate hyphen character within emails. Here's an updated version.

Expand|Select|Wrap|Line Numbers
  1. perl -wne'while(/[\w\.\-]+@[\w\.\-]+\w+/g){print "$&\n"}' emails.txt | sort -u > output.txt
I also piped it through sort to get a sorted, unique list of emails.
Motoma's Avatar
Moderator
 
Join Date: Jan 2007
Location: Maine, USA
Posts: 2,904
#10: Jul 27 '07

re: Extract email addresses from big file.


Quote:

Originally Posted by peripatetic

Hi.
Thanks for this. I was using it for a while and thought it was wonderful. However it misses the legitimate hyphen character within emails. Here's an updated version.

Expand|Select|Wrap|Line Numbers
  1. perl -wne'while(/[\w\.\-]+@[\w\.\-]+\w+/g){print "$&\n"}' emails.txt | sort -u > output.txt
I also piped it through sort to get a sorted, unique list of emails.

Great catch peripatetic! Thanks for the addition, and welcome to The Scripts!
Newbie
 
Join Date: Jan 2008
Posts: 3
#11: Feb 9 '08

re: Extract email addresses from big file.


guys can this perl script be used on websites ? and i replace the file with a web adress ? or how can i do this to get the emails included in a website ?


and let's say i have www.domain.com/aa.php=1 have some emails saved inside
and www.domain.com/aa.php=2 have also some mails .. how can i make a loop to get all the aa.php=variable and get the mails in all the files ?
thanks in advance and sorry for my english
Newbie
 
Join Date: Mar 2008
Posts: 1
#12: Mar 19 '08

re: Extract email addresses from big file.


I have a big file with many email addresses, how do i extract only the email address, if posible please include the software i can use
Newbie
 
Join Date: May 2008
Posts: 1
#13: May 21 '08

re: Extract email addresses from big file.


How would I use a script like this on a group of files that are in a directory to retrieve email addresses from all of them?
gpraghuram's Avatar
Expert
 
Join Date: Mar 2007
Location: Chennai
Posts: 1,258
#14: May 22 '08

re: Extract email addresses from big file.


Quote:

Originally Posted by Freakin

How would I use a script like this on a group of files that are in a directory to retrieve email addresses from all of them?


Try to combine the find command with xargs and the perl script given here like this.

find . -name "*.txt" | xargs perl <script given here>


Raghu
Reply