I have been looking at the Python re module and have been trying to
make sense of a simple function that I'd like to do. However, no amount
of reading or googling has helped me with this. Forgive my
stone-headedness. I have done this with .NET and Java in the past but
damn if I can't get it done with Python for some reason. As such I am
sure it is something even simpler.
I am trying to find some matches and have them put into a list when
processing is done. I'll use a simple example like email addresses.
My input is the following:
wordList = ['myname1', 'm******@domain .tld', 'm******@domain .tld',
'myname4@domain ', 'm******@domain .tldx']
My regular expression would be something like '\w\@\w\.\w' (I realize
it could and should be more detailed but that's not the point for now).
I would like to find out how to output the matches for this expression
of my 'wordList' into a neat list variable. How do I get this done?
Thanks,
Harlin Seritt 10 1621
Harlin Seritt wrote: I have been looking at the Python re module and have been trying to make sense of a simple function that I'd like to do. However, no amount of reading or googling has helped me with this. Forgive my stone-headedness. I have done this with .NET and Java in the past but damn if I can't get it done with Python for some reason. As such I am sure it is something even simpler.
I am trying to find some matches and have them put into a list when processing is done. I'll use a simple example like email addresses.
My input is the following: wordList = ['myname1', 'm******@domain .tld', 'm******@domain .tld', 'myname4@domain ', 'm******@domain .tldx']
My regular expression would be something like '\w\@\w\.\w' (I realize it could and should be more detailed but that's not the point for now).
I would like to find out how to output the matches for this expression of my 'wordList' into a neat list variable. How do I get this done?
Thanks,
Harlin Seritt
You need to enclose the '\w's in parentheses. The re module will only
return it if you enclose it in parentheses. Also, you need to use the
'+' so that \w won't just match the first alphanumeric character, but
will match one or more. You also need to escape the '.' because that's
matches any character. So your regular expression would be more like
r'(\w+)@(\w+)\. (\w+)'
Anyways, you can use a list comprehension and the groups() method of a
match object to build a list of tuples
[re.match(r'(\w+ )@(\w+)\.(\w+)' , address).groups () for address in
wordList]
On a side note, some of the email addresses in your list don't work.
You should use
wordList = ['m*****@domain. tld', 'm*******@domai n.tld',
'm*****@domain. tldx']
Harlin Seritt wrote: I am trying to find some matches and have them put into a list when processing is done. I'll use a simple example like email addresses.
My input is the following: wordList = ['myname1', 'm******@domain .tld', 'm******@domain .tld', 'myname4@domain ', 'm******@domain .tldx']
My regular expression would be something like '\w\@\w\.\w' (I realize it could and should be more detailed but that's not the point for now).
I would like to find out how to output the matches for this expression of my 'wordList' into a neat list variable. How do I get this done?
that's more of a list manipulation question than a regular expression
question, of course. to apply a regular expression to all items in a
list, apply it to all items in a list. a list comprehension is the shortest
way to do this: out = [word for word in wordList if re.match("\w+@\ w+\.\w+", word)] out
['m******@domain .tld', 'm******@domain .tld', 'm******@domain .tldx']
</F>
Ahh that's it Frederik. That's what I was looking for. The regular
expression problems I will take care of, but first wanted to walk
before running. ;)
Thanks,
Harlin Seritt
Forgive another question here, but what is the 'r' for when used with
expression: r'\w+...' ?
Harlin Seritt wrote: Forgive another question here, but what is the 'r' for when used with expression: r'\w+...' ?
r'..' or r".." are "raw strings" where backslashes do not introduce an
escape sequence - so you don't have to write '\\', if you need a backslash
in the string, e.g. r'\w+' == '\\w+'.
Useful for regular expression (because the re module parses the '\X'
sequences itself) or Windows pathes (e.g. r'C:\newfile.tx t').
And you should append a '$' to the regular expression, because
r"\w+@\w+\.\ w+" would match 'f**@example.co m-+*junk', too.
--
Benjamin Niemann
Email: pink at odahoda dot de
WWW: http://www.odahoda.de/
If your re demands get more complicated, you could take a look at
pyparsing. The code is a bit more verbose, but many find it easier to
compose their expressions using pyparsing's classes, such as Literal,
OneOrMore, Optional, etc., plus a number of built-in helper functions
and expressions, including delimitedList, quotedString, and
cStyleComment. Pyparsing is intended for writing recursive-descent
parsers, but can also be used (and is best learned) with simple
applications such as this one.
Here is a simple script for parsing your e-mail addresses. Note the
use of results names to give you access to the individual parsed fields
(re's also support a similar capability).
Download pyparsing at http://pyparsing.sourceforge.net.
-- Paul
from pyparsing import Literal,Word,Op tional,\
delimitedList,a lphanums
# define format of an email address
AT = Literal("@").su ppress()
emailWord = Word(alphanums+ "_")
emailDomain = delimitedList( emailWord, ".", combine=True)
emailAddress = emailWord.setRe sultsName("user ") + \
Optional( AT + emailDomain ).setResultsNam e("host")
# parse each word in wordList
wordList = ['myname1', 'm******@domain .tld', 'm******@domain .tld',
'myname4@domain ', 'm******@domain .tldx']
for w in wordList:
addr = emailAddress.pa rseString(w)
print w
print addr
print "user:", addr.user
print "host:", addr.host
print
Will print out:
myname1
['myname1']
user: myname1
host: my*****@domain. tld
['myname1', 'domain.tld']
user: myname1
host: domain.tld my*****@domain. tld
['myname2', 'domain.tld']
user: myname2
host: domain.tld
myname4@domain
['myname4', 'domain']
user: myname4
host: domain my*****@domain. tldx
['myname5', 'domain.tldx']
user: myname5
host: domain.tldx
Harlin Seritt wrote: I am trying to find some matches and have them put into a list when processing is done. I'll use a simple example like email addresses.
My input is the following: wordList = ['myname1', 'm******@domain .tld', 'm******@domain .tld', 'myname4@domain ', 'm******@domain .tldx']
My regular expression would be something like '\w\@\w\.\w' (I realize it could and should be more detailed but that's not the point for now).
FYI, matching all compliant email addresses is ridiculously complicated.
Before you spend too much time on it, you might want to borrow the
complete and thoroughly explained example in Regular Expressions (O'Reilly): http://www.oreilly.com/catalog/regex/
Be careful with that book though, it's RE examples are Perl-centric and
not exactly the same implementation that Python uses. However, it's a
good place to start
This will also be useful http://www.amk.ca/python/howto/regex/
Paul McGuire wrote: If your re demands get more complicated, you could take a look at pyparsing. The code is a bit more verbose, but many find it easier to compose their expressions using pyparsing's classes, such as Literal, OneOrMore, Optional, etc., plus a number of built-in helper functions and expressions, including delimitedList, quotedString, and cStyleComment. Pyparsing is intended for writing recursive-descent parsers, but can also be used (and is best learned) with simple applications such as this one.
As a slightly unrelated pyparsing question, is there a good set of API
documentation around for pyparsing?
I've looked into it for my mud client, but for now have gone with
DParser because I need (desire) custom token generation sometimes.
Pyparsing looks easier to internationaliz e, though. This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics |
by: Steve |
last post by:
Hello, I am writing a script that calls a URL and reads the resulting
HTML into a function that strips out everthing and returns ONLY the
links, this is so that I can build a link index of various pages.
I have been programming in PHP for over 2 years now and have never
encountered a problem like the one I am having now. To me this seems
like it should be just about the simplest thing in the world, but I
must admit I'm stumped BIG TIME!...
|
by: Kenneth McDonald |
last post by:
I'm working on the 0.8 release of my 'rex' module, and would appreciate
feedback, suggestions, and criticism as I work towards finalizing the
API and feature sets. rex is a module intended to make regular expressions
easier to create and use (and in my experience as a regular expression
user, it makes them MUCH easier to create and use.)
I'm still working on formal documentation, and in any case, such
documentation isn't necessarily the...
|
by: Johnny |
last post by:
I need to determine whether a text box contains a value that does not convert
to a decimal. If the value does not convert to a decimal, I want to throw a
MessageBox to have the user correct the value in the text box. I have the
following code but when the user enters a decimal value the Regex.IsMatch
catches it (ex. 250.50 should be allowed, but 250.50.0 should not). My code
is as follows:
if( ! Regex.IsMatch( tboxQtyCounted.Text,...
|
by: Sehboo |
last post by:
Hi,
I have several regular expressions that I need to run against
documents. Is it possible to combine several expressions in one
expression in Regex object. So that it is faster, or will I have to
use all the expressions seperately?
Here are my regular expressions that check for valid email address and
link
Dim Expression As String =
|
by: Greg Vereschagin |
last post by:
I'm trying to figure out a regular expression that will match the
innermost tag and the contents in between. Specifically, the string
that I am attempting to match looks as follows:
....<table>...<table>...>Final<...</table>...</table>...
I want to match: <table>...>Final<...</table> from this example.
The string could also, of course, look like the following:
| |
by: news.microsoft.com |
last post by:
I need help design a reg exp.
I am parsing an html file to get the input values, here is one example
<input VALUE="Staff Writer" size=60 type="text" name="author">
Can I grab the value "Staff Writer" if name = "author"? is it possible using
regexp?
Thanks
|
by: Rahul |
last post by:
Hi Everybody
I have some problem in my script. please help me. This is script file.
I have one *.inq file. I want run this script in XML files. But this
script errors shows . If u want i am attach this script
files and inq files. I cant understand this error. Please suggest me.
You can talk with my yahoo id b_sahoo1@yahoo.com. Now i am online.
Plz....Plz..Plz...
|
by: Zach |
last post by:
I'm writing an app which is going to rely extremely heavily on the
usage of regular expressions. I'm reading the docs but having trouble
wrapping my head around some of this since it's all fairly new to me.
I have two questions, I'm hoping I can get answers to at least one :)
Any help is better than no help:
1) I have many cases I am checking if a particular string matches
against a particular regular expression. However, if the match...
|
by: Allan Ebdrup |
last post by:
I have a dynamic list of regular expressions, the expressions don't change
very often but they can change. And I have a single string that I want to
match the regular expressions against and find the first regular expression
that matches the string.
I've gor the regular expressions ordered so that the highest priority is
first (if two or more regular expressions match the string I want the first
one returned)
The code that does this has...
|
by: Rene |
last post by:
I'm trying to basically remove chunks of html from a page but I must
not be doing my regular expression correctly.
What i'm trying with no avail.
$site = preg_replace("/<!DOCTYPE(.|\s)*<div class=\"notice_tan\">(.|
\s)*</div>/", "", $site);
I'm trying to remove from the very top to a specific div
Top of file:
|
by: marktang |
last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look !
Part I. Meaning of...
| |
by: Oralloy |
last post by:
Hello folks,
I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>".
The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed.
This is as boiled down as I can make it.
Here is my compilation command:
g++-12 -std=c++20 -Wnarrowing bit_field.cpp
Here is the code in...
|
by: jinu1996 |
last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth.
The Art of Business Website Design
Your website is...
|
by: tracyyun |
last post by:
Dear forum friends,
With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
|
by: agi2029 |
last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own....
Now, this would greatly impact the work of software developers. The idea...
|
by: isladogs |
last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM).
In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules.
He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms.
Adolph will...
|
by: conductexam |
last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one.
At the time of converting from word file to html my equations which are in the word document file was convert into image.
Globals.ThisAddIn.Application.ActiveDocument.Select();...
| |
by: adsilva |
last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
|
by: 6302768590 |
last post by:
Hai team
i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
| |