473,609 Members | 1,868 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

I need some help with a regexp please

Hi,

I am trying to get a regexp to validate email addresses but can't get
it quite right. The problem is I can't quite find the regexp to deal
with ignoring the case james..ki**@fre d.com, which is not valid. Here's
my attempt, neither of my regexps work quite how I want:

Expand|Select|Wrap|Line Numbers
  1. import os
  2. import re
  3.  
  4. s = 'Hi james..ki**@fred.com dr******@blarg.com ji*@home.com @@not
  5. sc*****@home.space.com partridge in a pear tree'
  6. r = re.compile(r'\w+\.?\w+@[^@\s]+\.\w+')
  7. #r = re.compile(r'[a-z\-\.]+@[a-z\-\.]+')
  8.  
  9. addys = set()
  10. for a in r.findall(s):
  11. addys.add(a)
  12.  
  13. for a in sorted(addys):
  14. print a
  15.  
This gives:
dr******@blarg. com
ji*@home.com
ki**@fred.com <-- shouldn't be here :(
sc*****@home.sp ace.com

Nearly there but no cigar :)

I can't see the wood for the trees now :) Can anyone suggest a fix
please?

Thanks,
Tony

Sep 21 '06 #1
23 2863

codefire wrote:
Hi,

I am trying to get a regexp to validate email addresses but can't get
it quite right. The problem is I can't quite find the regexp to deal
with ignoring the case james..ki**@fre d.com, which is not valid. Here's
my attempt, neither of my regexps work quite how I want:

Expand|Select|Wrap|Line Numbers
  1. import os
  2. import re
  3. s = 'Hi james..ki**@fred.com dr******@blarg.com ji*@home.com @@not
  4. sc*****@home.space.com partridge in a pear tree'
  5. r = re.compile(r'\w+\.?\w+@[^@\s]+\.\w+')
  6. #r = re.compile(r'[a-z\-\.]+@[a-z\-\.]+')
  7. addys = set()
  8. for a in r.findall(s):
  9.     addys.add(a)
  10. for a in sorted(addys):
  11.     print a
  12.  

This gives:
dr******@blarg. com
ji*@home.com
ki**@fred.com <-- shouldn't be here :(
sc*****@home.sp ace.com

Nearly there but no cigar :)

I can't see the wood for the trees now :) Can anyone suggest a fix
please?

Thanks,
Tony
'[\w.]+@\w+(\.\w+)*'
Works for me, and SHOULD for you, but I haven't tested it all that
much.
Good luck.

Sep 21 '06 #2
On 2006-09-21, codefire <to**********@g mail.comwrote:
I am trying to get a regexp to validate email addresses but
can't get it quite right. The problem is I can't quite find the
regexp to deal with ignoring the case james..ki**@fre d.com,
which is not valid. Here's my attempt, neither of my regexps
work quite how I want:
I suggest a websearch for email address validators instead of
writing of your own.

Here's a hit that looks useful:

http://aspn.activestate.com/ASPN/Coo...n/Recipe/66439

--
Neil Cerutti
Next Sunday Mrs. Vinson will be soloist for the morning service.
The pastor will then speak on "It's a Terrible Experience."
--Church Bulletin Blooper
Sep 21 '06 #3
codefire wrote:
Hi,

I am trying to get a regexp to validate email addresses but can't get
it quite right. The problem is I can't quite find the regexp to deal
with ignoring the case james..ki**@fre d.com, which is not valid. Here's
my attempt, neither of my regexps work quite how I want:

Expand|Select|Wrap|Line Numbers
  1. import os
  2. import re
  3. s = 'Hi james..ki**@fred.com dr******@blarg.com ji*@home.com @@not
  4. sc*****@home.space.com partridge in a pear tree'
  5. r = re.compile(r'\w+\.?\w+@[^@\s]+\.\w+')
  6. #r = re.compile(r'[a-z\-\.]+@[a-z\-\.]+')
  7. addys = set()
  8. for a in r.findall(s):
  9.     addys.add(a)
  10. for a in sorted(addys):
  11.     print a
  12.  

This gives:
dr******@blarg. com
ji*@home.com
ki**@fred.com <-- shouldn't be here :(
sc*****@home.sp ace.com

Nearly there but no cigar :)

I can't see the wood for the trees now :) Can anyone suggest a fix
please?
The problem is that your pattern doesn't start out by confirming that
it's either at the start of a line or after whitespace. You could do
this with a "look-behind assertion" if you wanted.

regards
Steve
--
Steve Holden +44 150 684 7255 +1 800 494 3119
Holden Web LLC/Ltd http://www.holdenweb.com
Skype: holdenweb http://holdenweb.blogspot.com
Recent Ramblings http://del.icio.us/steve.holden

Sep 21 '06 #4
Hi,

thanks for the advice guys.

Well took the kids swimming, watched some TV, read your hints and
within a few minutes had this:

r = re.compile(r'[^.\w]\w+\.?\w+@[^@\s]+\.\w+')

This works for me. That is if you have an invalid email such as
tony..bATblah.c om it will reject it (note the double dots).

Anyway, now know a little more about regexps :)

Thanks again for the hints,

Tony

Sep 21 '06 #5
codefire wrote:
Hi,

thanks for the advice guys.

Well took the kids swimming, watched some TV, read your hints and
within a few minutes had this:

r = re.compile(r'[^.\w]\w+\.?\w+@[^@\s]+\.\w+')

This works for me. That is if you have an invalid email such as
tony..bATblah.c om it will reject it (note the double dots).

Anyway, now know a little more about regexps :)
A little more is unfortunately not enough. The best advice you got was
to use an existing e-mail address validator. The definition of a valid
e-mail address is complicated. You may care to check out "Mastering
Regular Expressions" by Jeffery Friedl. In the first edition, at least
(I haven't looked at the 2nd), he works through assembling a 4700+ byte
regex for validating e-mail addresses. Yes, that's 4KB. It's the best
advertisement for *not* using regexes for a task like that that I've
ever seen.

Cheers,
John

Sep 21 '06 #6
"John Machin" <sj******@lexic on.netwrites:
A little more is unfortunately not enough. The best advice you got was
to use an existing e-mail address validator. The definition of a valid
e-mail address is complicated. You may care to check out "Mastering
Regular Expressions" by Jeffery Friedl. In the first edition, at least
(I haven't looked at the 2nd), he works through assembling a 4700+ byte
regex for validating e-mail addresses. Yes, that's 4KB. It's the best
advertisement for *not* using regexes for a task like that that I've
ever seen.
The best advice I've seen when people ask "How do I validate whether
an email address is valid?" was "Try sending mail to it".

It's both Pythonic, and truly the best way. If you actually want to
confirm, don't try to validate it statically; *use* the email address,
and check the result. Send an email to that address, and don't use it
any further unless you get a reply saying "yes, this is the right
address to use" from the recipient.

The sending system's mail transport agent, not regular expressions,
determines which part is the domain to send the mail to.

The domain name system, not regular expressions, determines what
domains are valid, and what host should receive mail for that domain.

Most especially, the receiving mail system, not regular expressions,
determines what local-parts are valid.

--
\ "I believe in making the world safe for our children, but not |
`\ our children's children, because I don't think children should |
_o__) be having sex." -- Jack Handey |
Ben Finney

Sep 22 '06 #7
Ben Finney wrote:
"John Machin" <sj******@lexic on.netwrites:

>>A little more is unfortunately not enough. The best advice you got was
to use an existing e-mail address validator. The definition of a valid
e-mail address is complicated. You may care to check out "Mastering
Regular Expressions" by Jeffery Friedl. In the first edition, at least
(I haven't looked at the 2nd), he works through assembling a 4700+ byte
regex for validating e-mail addresses. Yes, that's 4KB. It's the best
advertiseme nt for *not* using regexes for a task like that that I've
ever seen.


The best advice I've seen when people ask "How do I validate whether
an email address is valid?" was "Try sending mail to it".
That only applies if it's a likely-looking email address. If someone
asks me to send mail to "splurge.!#$%*& ^from@thingie?> <{}_)" I will
probably assume that it isn't worth my time trying.

If the email looks syntactically correct, *then* it's worth further
validation by trying a delivery attempt.
It's both Pythonic, and truly the best way. If you actually want to
confirm, don't try to validate it statically; *use* the email address,
and check the result. Send an email to that address, and don't use it
any further unless you get a reply saying "yes, this is the right
address to use" from the recipient.
This is a rather scatter-shot approach. Many possibilities can be
properly eliminated by judicious lexical checks before delivery is
considered.
The sending system's mail transport agent, not regular expressions,
determines which part is the domain to send the mail to.

The domain name system, not regular expressions, determines what
domains are valid, and what host should receive mail for that domain.

Most especially, the receiving mail system, not regular expressions,
determines what local-parts are valid.
Nevertheless, I am *not* going to try delivery to (for example) a
non-local address that doesn't contain an "at@ sign.

regards
Steve
--
Steve Holden +44 150 684 7255 +1 800 494 3119
Holden Web LLC/Ltd http://www.holdenweb.com
Skype: holdenweb http://holdenweb.blogspot.com
Recent Ramblings http://del.icio.us/steve.holden

Sep 22 '06 #8
Steve Holden <st***@holdenwe b.comwrites:
Ben Finney wrote:
The best advice I've seen when people ask "How do I validate
whether an email address is valid?" was "Try sending mail to it".
That only applies if it's a likely-looking email address. If someone
asks me to send mail to "splurge.!#$%*& ^from@thingie?> <{}_)" I will
probably assume that it isn't worth my time trying.
You, as a human, can possibly make that decision, if you don't care
about turning away someone who *does* have such an email address. How
can an algorithm do so? There are many valid email addresses that look
as bizarre as the example you gave.
The sending system's mail transport agent, not regular
expressions, determines which part is the domain to send the mail
to.

The domain name system, not regular expressions, determines what
domains are valid, and what host should receive mail for that
domain.

Most especially, the receiving mail system, not regular
expressions, determines what local-parts are valid.
Nevertheless, I am *not* going to try delivery to (for example) a
non-local address that doesn't contain an "at@ sign.
Would you try delivery to an email address that contains two or more
"@" symbols? If not, you will be denying delivery to valid RFC2821
addresses.

This is, of course, something you're entitled to do. But you've then
consciously chosen not to use "is the email address valid?" as your
criterion, and the original request for such validation becomes moot.

--
\ "During my service in the United States Congress, I took the |
`\ initiative in creating the Internet." -- Al Gore |
_o__) |
Ben Finney

Sep 22 '06 #9
Ant

John Machin wrote:
....
A little more is unfortunately not enough. The best advice you got was
to use an existing e-mail address validator.
We got bitten by this at the last place I worked - we were using a
regex email validator (from Microsoft IIRC), and we kept having
problems with specific email addresses from Ireland. There are stack of
Irish email addresses out there of the form paddy.o'reilly@ domain -
perfectly valid email address, but doesn't satisfy the usual naive
versions of regex validators.

We use an even worse validator at my current job, but the feeling the
management have (not one I agree with) is that unusual email addresses,
whilst perhaps valid, are uncommon enough not to worry about....

Sep 22 '06 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

6
5658
by: Danny | last post by:
I need an asp command to strip out from a string all extra punctuation such as apostrophe, comma, period, spaces dashes, etc etc and just leave the letters. Can anybody give me some ideas? Thanks
8
5093
by: timmy_dale12 | last post by:
I need help with this one. I have a function that pastes a row. In each row i am pasting a link which is to call a date picker javascript function. This is the code that pastes the link : link = document.createElement('a'); link.href ="javascript:show_calendar('document.form1.date',document.form1.date.value);"; img = document.createElement('img'); img.setAttribute("src","H:Diverse\\cal.gif"); link.appendChild(img);
5
1217
by: Felix Collins | last post by:
Hi, I'm not a regexp expert and had a bit of trouble with the following search. I have an "outline number" system like 1 1.2 1.2.3 1.3
4
1463
by: Joseph | last post by:
The idea is to show only one of the <Baby_Div> while hiding all the others. At the moment all I have managed to do is to show each <Baby_Div> in turn as expected, but the problem is that once a <Baby_Div>.innerHTML is replaced by one of the procedures, it does not work anymore, ie, the <Baby_Div> content is not displayed anymore, once code replaces the actual 'Blah' string. Structure: <Top-Div> <Child_Div (6 in total)> <Baby_Span>
21
3969
by: google | last post by:
I'm trying to implement something that would speed up data entry. I'd like to be able to take a string, and increment ONLY the right-most numerical characters by one. The type structure of the data that is in this field can vary. It's a list of mechanical equipment, and how it is designated varies based on how the customer has them labeled. For example, a list of their equipment might look like: CH-1 CH-2 CH-3
3
2997
by: Roy W. Andersen | last post by:
Hi, I need to do some replace-calls on certain strings in order to replace smiley glyphs and other keywords with graphical icons on the client. Unfortunately, my knowledge of regular expressions is somewhat limited to say the least, so I'm struggling with making it work as I wand. What I have is an associative array like this: smileys = 'smile.gif'; smileys = 'sad.gif';
5
3214
by: paulmcnally | last post by:
Hi, I need a javascript RegExp for image file validation from a file upload form. I need to allow image files only (jpeg/jpg/gif/png) can anybody help me please? thanks,
7
1717
by: VUNETdotUS | last post by:
How can I get the text after matching string is found: var str = "1111>AAAA<2222>BBBB<3333>CCCC"; if(str.indexOf("2222>")){ //how to get "BBBB" value following my "2222>" but before "<3333" or any < sign? }
0
8139
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
8579
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
8555
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
0
8408
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
1
6064
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
5524
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
1
2540
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
1
1686
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
0
1403
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.