By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
446,145 Members | 1,629 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 446,145 IT Pros & Developers. It's quick & easy.

isolate email addresses

P: n/a
Hi,

I have a very large text file export of email messages from an email
client. I've imported into access.

It is not segregated in any way except by the message which contains, among
everything else, an email address somewhere in the body.
IOW, every record looks more or less like this (nonsense text substituted
in):

posuere dignissim, iaculis eget, turpis. Suspendisse consectetuer neque id
odio. Ut ullamcorper. Quisque hendrerit my*****@mydomain.com neque eu dui.
In lacinia purus in felis. Sed nonummy, nisi pharetra facilisis blandit,
lacus urna bibendum diam,

If you look close, you'll see in the middle is the email address:

my*****@mydomain.com

Every record is more or less like this.

What I need to do is delete all other text from the records except what
qualifies as an email address.

So, I guess logically - I need to first find the @ character, then move
left and right until I encounter the first space in both directions.
Everything in between those spaces would be the email address. Everything
else needs to be tossed.

Help?
Apr 30 '06 #1
Share this Question
Share on Google+
4 Replies


P: n/a
an**@anon.ocm (Jim Bradstreet) wrote in
news:97*******************@216.196.97.136:
Hi,

I have a very large text file export of email messages from an email
client. I've imported into access.

It is not segregated in any way except by the message which contains,
among everything else, an email address somewhere in the body.
IOW, every record looks more or less like this (nonsense text
substituted in):

posuere dignissim, iaculis eget, turpis. Suspendisse consectetuer
neque id odio. Ut ullamcorper. Quisque hendrerit my*****@mydomain.com
neque eu dui. In lacinia purus in felis. Sed nonummy, nisi pharetra
facilisis blandit, lacus urna bibendum diam,

If you look close, you'll see in the middle is the email address:

my*****@mydomain.com

Every record is more or less like this.

What I need to do is delete all other text from the records except
what qualifies as an email address.

So, I guess logically - I need to first find the @ character, then
move left and right until I encounter the first space in both
directions. Everything in between those spaces would be the email
address. Everything else needs to be tossed.

Help?


Public Function ExtractedEmailAddresses(ByVal s As String) As String
' requires that vbscript be installed (default for windows)
Dim r As Object
Dim m As Variant
Dim ms As Variant
Set r = CreateObject("VBScript.RegExp")
With r
.Global = True
.IgnoreCase = True
.pattern = "[\w-\.]{1,}\@([\da-zA-Z-]{1,}\.){1,}[\da-zA-Z-]{2,3}"
End With
Set ms = r.Execute(s)
For Each m In ms
ExtractedEmailAddresses = ExtractedEmailAddresses & ";" & m.Value
Next m
ExtractedEmailAddresses = Replace(ExtractedEmailAddresses, ";", "", , 1)
End Function

Private Sub TestExtractedEmailAddresses()
Debug.Print ExtractedEmailAddresses("posuere dignissim, iaculis eget,
turpis. Suspendisse consectetuer neque id odio. Ut ullamcorper. Quisque
hendrerit my*****@mydomain.com neque eu dui. In lacinia purus in felis.
Sed nonummy, nisi pharetra facilisis blandit,lacus urna bibendum diam")
' prints my*****@mydomain.com
End Sub
--
Lyle Fairfield
Apr 30 '06 #2

P: n/a
On Sun, 30 Apr 2006 23:11:41 GMT, Lyle Fairfield
<ly***********@aim.com> wrote:

We'll leave it up to the interested programmer to support top level
domain names like .info in this otherwise elegant solution.

-Tom.

an**@anon.ocm (Jim Bradstreet) wrote in
news:97*******************@216.196.97.136:
Hi,

I have a very large text file export of email messages from an email
client. I've imported into access.

It is not segregated in any way except by the message which contains,
among everything else, an email address somewhere in the body.
IOW, every record looks more or less like this (nonsense text
substituted in):

posuere dignissim, iaculis eget, turpis. Suspendisse consectetuer
neque id odio. Ut ullamcorper. Quisque hendrerit my*****@mydomain.com
neque eu dui. In lacinia purus in felis. Sed nonummy, nisi pharetra
facilisis blandit, lacus urna bibendum diam,

If you look close, you'll see in the middle is the email address:

my*****@mydomain.com

Every record is more or less like this.

What I need to do is delete all other text from the records except
what qualifies as an email address.

So, I guess logically - I need to first find the @ character, then
move left and right until I encounter the first space in both
directions. Everything in between those spaces would be the email
address. Everything else needs to be tossed.

Help?


Public Function ExtractedEmailAddresses(ByVal s As String) As String
' requires that vbscript be installed (default for windows)
Dim r As Object
Dim m As Variant
Dim ms As Variant
Set r = CreateObject("VBScript.RegExp")
With r
.Global = True
.IgnoreCase = True
.pattern = "[\w-\.]{1,}\@([\da-zA-Z-]{1,}\.){1,}[\da-zA-Z-]{2,3}"
End With
Set ms = r.Execute(s)
For Each m In ms
ExtractedEmailAddresses = ExtractedEmailAddresses & ";" & m.Value
Next m
ExtractedEmailAddresses = Replace(ExtractedEmailAddresses, ";", "", , 1)
End Function

Private Sub TestExtractedEmailAddresses()
Debug.Print ExtractedEmailAddresses("posuere dignissim, iaculis eget,
turpis. Suspendisse consectetuer neque id odio. Ut ullamcorper. Quisque
hendrerit my*****@mydomain.com neque eu dui. In lacinia purus in felis.
Sed nonummy, nisi pharetra facilisis blandit,lacus urna bibendum diam")
' prints my*****@mydomain.com
End Sub


May 1 '06 #3

P: n/a
Tom van Stiphout <no*************@cox.net> wrote in
news:u9********************************@4ax.com:

Yeppers. It's kinda old and I didn't have info on info when I wrote the
guts of it.

Maybe I'll attend to it some day.
On Sun, 30 Apr 2006 23:11:41 GMT, Lyle Fairfield
<ly***********@aim.com> wrote:

We'll leave it up to the interested programmer to support top level
domain names like .info in this otherwise elegant solution.

-Tom.


--
Lyle Fairfield
May 1 '06 #4

P: n/a
Lyle Fairfield <ly***********@aim.com> wrote in
news:Xn*********************************@216.221.8 1.119:
Tom van Stiphout <no*************@cox.net> wrote in
news:u9********************************@4ax.com:

Yeppers. It's kinda old and I didn't have info on info when I wrote the
guts of it.

Maybe I'll attend to it some day.
On Sun, 30 Apr 2006 23:11:41 GMT, Lyle Fairfield
<ly***********@aim.com> wrote:

We'll leave it up to the interested programmer to support top level
domain names like .info in this otherwise elegant solution.

-Tom.


Quick fix is to change the pattern to:

[\w-\.]{1,}\@([\da-zA-Z-]{1,}\.){1,}[\da-zA-Z-]{2,4}

This will find

so*****@domain.info

but problem is that if we have

so*****@domain.comdon't write this guy

the address will be

so*****@domain.comd

That is we will have to delimit the address with white space to be 100 %
accurate. Not sure what to do about that and maybe just expecting the
file to be that way is enough.

--
Lyle Fairfield
May 1 '06 #5

This discussion thread is closed

Replies have been disabled for this discussion.