Connecting Tech Pros Worldwide Help | Site Map

Extract email addresses from big file.

 
LinkBack Thread Tools Search this Thread
  #1  
Old May 17th, 2007, 01:06 PM
Newbie
 
Join Date: Feb 2007
Posts: 5
Default Extract email addresses from big file.

Hey.

I have a big text file with data,
and i want to extract mail addresses.

How i can do it?
Reply
  #2  
Old May 17th, 2007, 01:14 PM
arne's Avatar
Expert
 
Join Date: Oct 2006
Posts: 307
Default

Quote:
Originally Posted by superc0red
Hey.

I have a big text file with data,
and i want to extract mail addresses.

How i can do it?
I guess there are plenty of ways to do it. Any constraints on the tool/language?
Reply
  #3  
Old May 17th, 2007, 02:08 PM
Newbie
 
Join Date: Feb 2007
Posts: 5
Default

perl / shellscript using awk-sed-cut ??
Reply
  #4  
Old May 17th, 2007, 03:19 PM
arne's Avatar
Expert
 
Join Date: Oct 2006
Posts: 307
Default

Quote:
Originally Posted by superc0red
perl / shellscript using awk-sed-cut ??
Perl is certainly a reasonable choice, yes. If I had to do it, I would use it.
Reply
  #5  
Old May 17th, 2007, 05:00 PM
Motoma's Avatar
Moderator
 
Join Date: Jan 2007
Location: Maine, USA
Age: 25
Posts: 2,898
Default

Regular expressions would be a great way to do this. Try looking at the sed tool.
Reply
  #6  
Old May 18th, 2007, 03:32 AM
Expert
 
Join Date: Apr 2006
Posts: 512
Default

Expand|Select|Wrap|Line Numbers
  1. awk '
  2. {
  3.   for (i=1;i<=NF;i++) {
  4.        if ( $i ~ /[[:alpha:]]@[[:alpha:]]/ )  { 
  5.       print $i      
  6.        }
  7.   }
  8. }' "file"
  9.  
Reply
  #7  
Old May 18th, 2007, 07:44 AM
Newbie
 
Join Date: Feb 2007
Posts: 5
Default

Thanx for the code dude :)
Reply
  #8  
Old May 21st, 2007, 03:20 PM
prn's Avatar
prn prn is offline
Expert
 
Join Date: Apr 2007
Location: Muncie, IN
Posts: 232
Default

It's been quite a while since I did anything with awk, so I wasn't sure how well ghostdog's code would work. It looked like it should handle only alphabetics with no more than one component on each side of the "@". So I made up a test file (test.txt):

Expand|Select|Wrap|Line Numbers
  1. this is a test file foo@bar.com we are looking for moo@drop.dhcp.bar.com email
  2. addresses inside, 00test@leo.bar.com, a text file with no
  3. particular fname.lname@bar.baz.net other par72@take.the.bus.au restrictions
  4. on the format or locations of the 23skidoo@bar.co.uk addresses inside the file.
  5. Let's try one at the end joe27@aol.com.
  6.  
I ran ghostdog's awk script on this and got the output:
Expand|Select|Wrap|Line Numbers
  1. foo@bar.com
  2. moo@drop.dhcp.bar.com
  3. 00test@leo.bar.com,
  4. fname.lname@bar.baz.net
  5. 23skidoo@bar.co.uk
  6.  
Note that this output has FIVE email addresses, but the file has SEVEN so there is something wrong. The two that are omitted have digits just beside the "@" so it looks like I was close but not quite right on how much awk would match with this RE. It catches everything between spaces into $i whenever it matches /[[:alpha:]]@[[:alpha:]]/

But note that it also caught the comma following the third address "00test@leo.bar.com," which it should not include in the email address.

Here's a Perl one-liner:
Expand|Select|Wrap|Line Numbers
  1. perl -wne'while(/[\w\.]+@[\w\.]+/g){print "$&\n"}' test.txt
This gives the output
Expand|Select|Wrap|Line Numbers
  1. foo@bar.com
  2. moo@drop.dhcp.bar.com
  3. 00test@leo.bar.com
  4. fname.lname@bar.baz.net
  5. par72@take.the.bus.au
  6. 23skidoo@bar.co.uk
  7. joe27@aol.com.
which is almost correct (and does not include the comma following number 3, although it does include the period at the end).

Here's a corrected version:
Expand|Select|Wrap|Line Numbers
  1. perl -wne'while(/[\w\.]+@[\w\.]+\w+/g){print "$&\n"}' test.txt
This yields
Expand|Select|Wrap|Line Numbers
  1. foo@bar.com
  2. moo@drop.dhcp.bar.com
  3. 00test@leo.bar.com
  4. fname.lname@bar.baz.net
  5. par72@take.the.bus.au
  6. 23skidoo@bar.co.uk
  7. joe27@aol.com
I'm sure ghostdog74's awk script could also easily be fixed, but as I said, it's been a long time and I'm not sure how much I want to play with it. ;)

HTH,
Paul
Reply
  #9  
Old July 27th, 2007, 05:48 AM
Newbie
 
Join Date: Jul 2007
Posts: 1
Default

Hi.
Thanks for this. I was using it for a while and thought it was wonderful. However it misses the legitimate hyphen character within emails. Here's an updated version.

Expand|Select|Wrap|Line Numbers
  1. perl -wne'while(/[\w\.\-]+@[\w\.\-]+\w+/g){print "$&\n"}' emails.txt | sort -u > output.txt
I also piped it through sort to get a sorted, unique list of emails.
Reply
  #10  
Old July 27th, 2007, 07:01 AM
Motoma's Avatar
Moderator
 
Join Date: Jan 2007
Location: Maine, USA
Age: 25
Posts: 2,898
Default

Quote:
Originally Posted by peripatetic
Hi.
Thanks for this. I was using it for a while and thought it was wonderful. However it misses the legitimate hyphen character within emails. Here's an updated version.

Expand|Select|Wrap|Line Numbers
  1. perl -wne'while(/[\w\.\-]+@[\w\.\-]+\w+/g){print "$&\n"}' emails.txt | sort -u > output.txt
I also piped it through sort to get a sorted, unique list of emails.
Great catch peripatetic! Thanks for the addition, and welcome to The Scripts!
Reply
  #11  
Old February 9th, 2008, 01:17 AM
Newbie
 
Join Date: Jan 2008
Posts: 3
Default

guys can this perl script be used on websites ? and i replace the file with a web adress ? or how can i do this to get the emails included in a website ?


and let's say i have www.domain.com/aa.php=1 have some emails saved inside
and www.domain.com/aa.php=2 have also some mails .. how can i make a loop to get all the aa.php=variable and get the mails in all the files ?
thanks in advance and sorry for my english
Reply
  #12  
Old March 19th, 2008, 02:17 PM
Newbie
 
Join Date: Mar 2008
Posts: 1
Default

I have a big file with many email addresses, how do i extract only the email address, if posible please include the software i can use
Reply
  #13  
Old May 21st, 2008, 07:53 AM
Newbie
 
Join Date: May 2008
Posts: 1
Default

How would I use a script like this on a group of files that are in a directory to retrieve email addresses from all of them?
Reply
  #14  
Old May 22nd, 2008, 01:30 AM
gpraghuram's Avatar
Expert
 
Join Date: Mar 2007
Location: Chennai
Age: 30
Posts: 1,247
Default

Quote:
Originally Posted by Freakin
How would I use a script like this on a group of files that are in a directory to retrieve email addresses from all of them?

Try to combine the find command with xargs and the perl script given here like this.

find . -name "*.txt" | xargs perl <script given here>


Raghu
Reply
Reply

Bookmarks

Thread Tools Search this Thread
Search this Thread:

Advanced Search


Popular Articles

What is Bytes?

We are a network of experts and professionals in IT and software development that help one another with answers to tough questions and share insights. Get the best answers to your questions from over 220,662 network members.