473,791 Members | 3,028 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Help with Regular Expressions

I have been looking at the Python re module and have been trying to
make sense of a simple function that I'd like to do. However, no amount
of reading or googling has helped me with this. Forgive my
stone-headedness. I have done this with .NET and Java in the past but
damn if I can't get it done with Python for some reason. As such I am
sure it is something even simpler.

I am trying to find some matches and have them put into a list when
processing is done. I'll use a simple example like email addresses.

My input is the following:
wordList = ['myname1', 'm******@domain .tld', 'm******@domain .tld',
'myname4@domain ', 'm******@domain .tldx']

My regular expression would be something like '\w\@\w\.\w' (I realize
it could and should be more detailed but that's not the point for now).

I would like to find out how to output the matches for this expression
of my 'wordList' into a neat list variable. How do I get this done?

Thanks,

Harlin Seritt

Aug 10 '05 #1
10 1621
Harlin Seritt wrote:
I have been looking at the Python re module and have been trying to
make sense of a simple function that I'd like to do. However, no amount
of reading or googling has helped me with this. Forgive my
stone-headedness. I have done this with .NET and Java in the past but
damn if I can't get it done with Python for some reason. As such I am
sure it is something even simpler.

I am trying to find some matches and have them put into a list when
processing is done. I'll use a simple example like email addresses.

My input is the following:
wordList = ['myname1', 'm******@domain .tld', 'm******@domain .tld',
'myname4@domain ', 'm******@domain .tldx']

My regular expression would be something like '\w\@\w\.\w' (I realize
it could and should be more detailed but that's not the point for now).

I would like to find out how to output the matches for this expression
of my 'wordList' into a neat list variable. How do I get this done?

Thanks,

Harlin Seritt


You need to enclose the '\w's in parentheses. The re module will only
return it if you enclose it in parentheses. Also, you need to use the
'+' so that \w won't just match the first alphanumeric character, but
will match one or more. You also need to escape the '.' because that's
matches any character. So your regular expression would be more like

r'(\w+)@(\w+)\. (\w+)'

Anyways, you can use a list comprehension and the groups() method of a
match object to build a list of tuples
[re.match(r'(\w+ )@(\w+)\.(\w+)' , address).groups () for address in
wordList]

On a side note, some of the email addresses in your list don't work.
You should use

wordList = ['m*****@domain. tld', 'm*******@domai n.tld',
'm*****@domain. tldx']

Aug 10 '05 #2
Harlin Seritt wrote:
I am trying to find some matches and have them put into a list when
processing is done. I'll use a simple example like email addresses.

My input is the following:
wordList = ['myname1', 'm******@domain .tld', 'm******@domain .tld',
'myname4@domain ', 'm******@domain .tldx']

My regular expression would be something like '\w\@\w\.\w' (I realize
it could and should be more detailed but that's not the point for now).

I would like to find out how to output the matches for this expression
of my 'wordList' into a neat list variable. How do I get this done?


that's more of a list manipulation question than a regular expression
question, of course. to apply a regular expression to all items in a
list, apply it to all items in a list. a list comprehension is the shortest
way to do this:
out = [word for word in wordList if re.match("\w+@\ w+\.\w+", word)]
out

['m******@domain .tld', 'm******@domain .tld', 'm******@domain .tldx']

</F>

Aug 10 '05 #3
Ahh that's it Frederik. That's what I was looking for. The regular
expression problems I will take care of, but first wanted to walk
before running. ;)

Thanks,

Harlin Seritt

Aug 10 '05 #4
Forgive another question here, but what is the 'r' for when used with
expression: r'\w+...' ?

Aug 10 '05 #5
Harlin Seritt wrote:
Forgive another question here, but what is the 'r' for when used with
expression: r'\w+...' ?


r'..' or r".." are "raw strings" where backslashes do not introduce an
escape sequence - so you don't have to write '\\', if you need a backslash
in the string, e.g. r'\w+' == '\\w+'.
Useful for regular expression (because the re module parses the '\X'
sequences itself) or Windows pathes (e.g. r'C:\newfile.tx t').

And you should append a '$' to the regular expression, because
r"\w+@\w+\.\ w+" would match 'f**@example.co m-+*junk', too.

--
Benjamin Niemann
Email: pink at odahoda dot de
WWW: http://www.odahoda.de/
Aug 10 '05 #6
If your re demands get more complicated, you could take a look at
pyparsing. The code is a bit more verbose, but many find it easier to
compose their expressions using pyparsing's classes, such as Literal,
OneOrMore, Optional, etc., plus a number of built-in helper functions
and expressions, including delimitedList, quotedString, and
cStyleComment. Pyparsing is intended for writing recursive-descent
parsers, but can also be used (and is best learned) with simple
applications such as this one.

Here is a simple script for parsing your e-mail addresses. Note the
use of results names to give you access to the individual parsed fields
(re's also support a similar capability).

Download pyparsing at http://pyparsing.sourceforge.net.

-- Paul

from pyparsing import Literal,Word,Op tional,\
delimitedList,a lphanums

# define format of an email address
AT = Literal("@").su ppress()
emailWord = Word(alphanums+ "_")
emailDomain = delimitedList( emailWord, ".", combine=True)
emailAddress = emailWord.setRe sultsName("user ") + \
Optional( AT + emailDomain ).setResultsNam e("host")

# parse each word in wordList
wordList = ['myname1', 'm******@domain .tld', 'm******@domain .tld',
'myname4@domain ', 'm******@domain .tldx']

for w in wordList:
addr = emailAddress.pa rseString(w)
print w
print addr
print "user:", addr.user
print "host:", addr.host
print

Will print out:
myname1
['myname1']
user: myname1
host:

my*****@domain. tld
['myname1', 'domain.tld']
user: myname1
host: domain.tld

my*****@domain. tld
['myname2', 'domain.tld']
user: myname2
host: domain.tld

myname4@domain
['myname4', 'domain']
user: myname4
host: domain

my*****@domain. tldx
['myname5', 'domain.tldx']
user: myname5
host: domain.tldx

Aug 10 '05 #7
Harlin Seritt wrote:
I am trying to find some matches and have them put into a list when
processing is done. I'll use a simple example like email addresses.

My input is the following:
wordList = ['myname1', 'm******@domain .tld', 'm******@domain .tld',
'myname4@domain ', 'm******@domain .tldx']

My regular expression would be something like '\w\@\w\.\w' (I realize
it could and should be more detailed but that's not the point for now).


FYI, matching all compliant email addresses is ridiculously complicated.
Before you spend too much time on it, you might want to borrow the
complete and thoroughly explained example in Regular Expressions (O'Reilly):

http://www.oreilly.com/catalog/regex/
Aug 10 '05 #8
Be careful with that book though, it's RE examples are Perl-centric and
not exactly the same implementation that Python uses. However, it's a
good place to start

This will also be useful
http://www.amk.ca/python/howto/regex/

Aug 10 '05 #9
Paul McGuire wrote:
If your re demands get more complicated, you could take a look at
pyparsing. The code is a bit more verbose, but many find it easier to
compose their expressions using pyparsing's classes, such as Literal,
OneOrMore, Optional, etc., plus a number of built-in helper functions
and expressions, including delimitedList, quotedString, and
cStyleComment. Pyparsing is intended for writing recursive-descent
parsers, but can also be used (and is best learned) with simple
applications such as this one.


As a slightly unrelated pyparsing question, is there a good set of API
documentation around for pyparsing?

I've looked into it for my mud client, but for now have gone with
DParser because I need (desire) custom token generation sometimes.
Pyparsing looks easier to internationaliz e, though.
Aug 10 '05 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

9
3702
by: Steve | last post by:
Hello, I am writing a script that calls a URL and reads the resulting HTML into a function that strips out everthing and returns ONLY the links, this is so that I can build a link index of various pages. I have been programming in PHP for over 2 years now and have never encountered a problem like the one I am having now. To me this seems like it should be just about the simplest thing in the world, but I must admit I'm stumped BIG TIME!...
1
4185
by: Kenneth McDonald | last post by:
I'm working on the 0.8 release of my 'rex' module, and would appreciate feedback, suggestions, and criticism as I work towards finalizing the API and feature sets. rex is a module intended to make regular expressions easier to create and use (and in my experience as a regular expression user, it makes them MUCH easier to create and use.) I'm still working on formal documentation, and in any case, such documentation isn't necessarily the...
8
2991
by: Johnny | last post by:
I need to determine whether a text box contains a value that does not convert to a decimal. If the value does not convert to a decimal, I want to throw a MessageBox to have the user correct the value in the text box. I have the following code but when the user enters a decimal value the Regex.IsMatch catches it (ex. 250.50 should be allowed, but 250.50.0 should not). My code is as follows: if( ! Regex.IsMatch( tboxQtyCounted.Text,...
2
5100
by: Sehboo | last post by:
Hi, I have several regular expressions that I need to run against documents. Is it possible to combine several expressions in one expression in Regex object. So that it is faster, or will I have to use all the expressions seperately? Here are my regular expressions that check for valid email address and link Dim Expression As String =
5
1659
by: Greg Vereschagin | last post by:
I'm trying to figure out a regular expression that will match the innermost tag and the contents in between. Specifically, the string that I am attempting to match looks as follows: ....<table>...<table>...>Final<...</table>...</table>... I want to match: <table>...>Final<...</table> from this example. The string could also, of course, look like the following:
2
1599
by: news.microsoft.com | last post by:
I need help design a reg exp. I am parsing an html file to get the input values, here is one example <input VALUE="Staff Writer" size=60 type="text" name="author"> Can I grab the value "Staff Writer" if name = "author"? is it possible using regexp? Thanks
1
3724
by: Rahul | last post by:
Hi Everybody I have some problem in my script. please help me. This is script file. I have one *.inq file. I want run this script in XML files. But this script errors shows . If u want i am attach this script files and inq files. I cant understand this error. Please suggest me. You can talk with my yahoo id b_sahoo1@yahoo.com. Now i am online. Plz....Plz..Plz...
3
1248
by: Zach | last post by:
I'm writing an app which is going to rely extremely heavily on the usage of regular expressions. I'm reading the docs but having trouble wrapping my head around some of this since it's all fairly new to me. I have two questions, I'm hoping I can get answers to at least one :) Any help is better than no help: 1) I have many cases I am checking if a particular string matches against a particular regular expression. However, if the match...
1
4387
by: Allan Ebdrup | last post by:
I have a dynamic list of regular expressions, the expressions don't change very often but they can change. And I have a single string that I want to match the regular expressions against and find the first regular expression that matches the string. I've gor the regular expressions ordered so that the highest priority is first (if two or more regular expressions match the string I want the first one returned) The code that does this has...
9
2495
by: Rene | last post by:
I'm trying to basically remove chunks of html from a page but I must not be doing my regular expression correctly. What i'm trying with no avail. $site = preg_replace("/<!DOCTYPE(.|\s)*<div class=\"notice_tan\">(.| \s)*</div>/", "", $site); I'm trying to remove from the very top to a specific div Top of file:
0
9669
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
10427
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
10207
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
0
9995
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
9029
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
1
7537
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
6776
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
5559
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
4110
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.