473,398 Members | 2,403 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,398 software developers and data experts.

a more precise re for email addys

rbt
Is it possible to write an re that _only_ matches email addresses? I've
been googling around and have found several examples on the Web, but all
of them produce too many false positives... here are examples from
Google that I've experimented with:

re.compile('([\w\.\-]+@[\w\.\-]+)')
re.compile(r'[\w\-][\w\-\.]+@[\w\-][\w\-\.]+[a-zA-Z]{1,4}')
re.compile('(\S+)@(\S+)')

All of these will find email addys, but they also find other things.
Could someone demonstrate how to write a more accurate re for emails?

BTW, this is not for spam, but like any tool could be used in a bad way.

Thanks!
Jan 18 '06 #1
10 1114
Jim
There is a precise one in a Perl module, I believe.
http://www.ex-parrot.com/~pdw/Mail-RFC822-Address.html
Can you swipe that?

Jim

Jan 18 '06 #2
OMG, that is so ugly :D

Jim wrote:
There is a precise one in a Perl module, I believe.
http://www.ex-parrot.com/~pdw/Mail-RFC822-Address.html
Can you swipe that?

Jim


Jan 18 '06 #3

rbt> re.compile('([\w\.\-]+@[\w\.\-]+)')
rbt> re.compile(r'[\w\-][\w\-\.]+@[\w\-][\w\-\.]+[a-zA-Z]{1,4}')
rbt> re.compile('(\S+)@(\S+)')

rbt> All of these will find email addys, but they also find other
rbt> things.

I think the only way to decide if your regular expression does what you want
is to provide a set of strings it must accept and another set which it must
reject. Supply those two sets and I'm sure any number of people here can
come up with a regular express that distinguishes the two sets.

Skip
Jan 18 '06 #4

Jim> http://www.ex-parrot.com/~pdw/Mail-RFC822-Address.html

Maybe Cafe Express could be convinced to put that on a t-shirt...

Skip
Jan 18 '06 #5
* rbt wrote:
Is it possible to write an re that _only_ matches email addresses?


No. The only way to check if the matched thing is a mail address is to send
a mail and ask the supposed receiver whether he got it.

The grammar in RFC 2822 nearly matches anything with an @ in it. So, how
accurate your regex needs to be depends heavily on the context of the
usage. For example, my suggestion for web form checkers is always to just
look for an @ char and do the rest using the human component.

nd
--
Already I've seen people (really!) write web URLs in the form:
http:\\some.site.somewhere
[...] How soon until greengrocers start writing "apples $1\pound"
or something? -- Joona I Palaste in clc
Jan 18 '06 #6
rbt
Jim wrote:
There is a precise one in a Perl module, I believe.
http://www.ex-parrot.com/~pdw/Mail-RFC822-Address.html
Can you swipe that?

Jim


I can swipe it... but it causes my head to explode. I get unbalanced
paratheses errors when trying to make it work as a python re... it makes
more sense when broken up like this:

(?:(?:\r\n)?[ \t])*(?:(?:(?:[^()<>@,;:\\".\[\] \000-\031]
+(?:(?:(?:\r\n)... \000-\031]
+(?:(?:(?:\r\n)... \000-\031]
+(?:(?:(?:\r\n)... \000-\031]
....
....
Jan 18 '06 #7
Does it really need to be a regular expression? Why not just write a
short function that breaks apart the input and validates each part?

def IsEmail(addr):
'Returns True if addr appears to be a valid email address'

# we don't allow stuff like foo@ba*@biff.com
if addr.count('@') != 1:
return False
name, host = addr.split('@')

# verify the hostname (is an IP or has a valid TLD, etc.)
hostParts = host.split('.')
...

That way you'd have a nice, readable chunk of code that you could tweak
as needed (for example, maybe you'll find that the RFC is too liberal
so you'll end up needing to add additional rules to exclude "bad"
addresses).

Jan 18 '06 #8
rbt
da*********@gmail.com wrote:
Does it really need to be a regular expression? Why not just write a
short function that breaks apart the input and validates each part?

def IsEmail(addr):
'Returns True if addr appears to be a valid email address'

# we don't allow stuff like foo@ba*@biff.com
if addr.count('@') != 1:
return False
name, host = addr.split('@')

# verify the hostname (is an IP or has a valid TLD, etc.)
hostParts = host.split('.')
...

That way you'd have a nice, readable chunk of code that you could tweak
as needed (for example, maybe you'll find that the RFC is too liberal
so you'll end up needing to add additional rules to exclude "bad"
addresses).


Good idea. I'll see what I can do with this. Thanks!
Jan 18 '06 #9
sk**@pobox.com wrote:
rbt> re.compile('([\w\.\-]+@[\w\.\-]+)')
rbt> re.compile(r'[\w\-][\w\-\.]+@[\w\-][\w\-\.]+[a-zA-Z]{1,4}')
rbt> re.compile('(\S+)@(\S+)')

rbt> All of these will find email addys, but they also find other
rbt> things.

I think the only way to decide if your regular expression does what you want
is to provide a set of strings it must accept and another set which it must
reject. Supply those two sets and I'm sure any number of people here can
come up with a regular express that distinguishes the two sets.


Doesn't the relevent RFC state that the only way to
determine a valid email address is to send to it and
see if the mail server likes it?

I believe it explicitly warns against validating email
addresses, since you will invariably end up refusing to
accept some valid email addresses.
--
Steven.

Jan 19 '06 #10
rbt
da*********@gmail.com wrote:
Does it really need to be a regular expression? Why not just write a
short function that breaks apart the input and validates each part?

def IsEmail(addr):
'Returns True if addr appears to be a valid email address'

# we don't allow stuff like foo@ba*@biff.com
if addr.count('@') != 1:
return False
name, host = addr.split('@')

# verify the hostname (is an IP or has a valid TLD, etc.)
hostParts = host.split('.')
...

That way you'd have a nice, readable chunk of code that you could tweak
as needed (for example, maybe you'll find that the RFC is too liberal
so you'll end up needing to add additional rules to exclude "bad"
addresses).


Just to follow-up on this. I found that doing something such as this
along with a more generic RE that the results are much better. Thanks
for the idea!
Jan 19 '06 #11

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

8
by: mike | last post by:
regards: What is the precise URL(or URI) in a HTTP Request @@.? Any positive suggestion is welcome. thank you May goodness be with you all
8
by: Madhusudan Singh | last post by:
Hi I am using time.clock() to get the current time of the processor in seconds. For my application, I need really high resolution but currently seem to be limited to 0.01 second. Is there a way...
4
by: Al Dykes | last post by:
I'm going to be collecting lots (a few thousand) email addys. Each of these people will be interested in recieving mail on one or more topics, with an open ended and growing number of topics....
4
by: acni | last post by:
I have the following peice of code to try and send an email to selected contacts in my form.The problem is this line: StrStore = DLookup("", "qrySelectEmail", "??????") This looks up the email...
28
by: Yannick Loth | last post by:
Hello I'm writing a program which needs to pause exactly some microseconds then go ahead. I'v tried the nanosleep() function but this is not much precise. I also tried it with raising the...
7
by: A. L. | last post by:
Consider following code segment: #1: double pi = 3.141592653589; #2: printf("%lf\n", pi); #3: printf("%1.12lf\n", pi); #4: printf("%1.15lf\n", pi); The above code outputs as following: ...
0
by: Microsoft Public NewsServer | last post by:
Hey Guys, Both my machines were trashed and Ive only just got back on line . Can the reg's who normally mail me please mail me again so I have ur email addy's please. Thanks - Terry
9
by: Ernesto | last post by:
Is there a special module for mail ? I'd like to send an email from a python script. Thanks !
17
by: VK | last post by:
Is there any possibility to detect any of UA's feature in the template? I have an XML data file with XSL template linked having HTML page as output. I would love to make some adjustments during...
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.