470,594 Members | 1,415 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 470,594 developers. It's quick & easy.

Pattern Matching

Hello-

I'm running Python 2.2.3 on Windows XP "Professional" and am reading a file
wit 1 very long line of text (the line consists of multiple records with no
cr/lf). What I would like to do is scan for the occurrence of a specific
pattern of characters which I expect to repeat many times in the file.
Suppose I want to search for "Start: mm/dd/yy" and capture the mm/dd/yyyy
data for processing each time I find it. This is the type of problem I used
to solve with <duck>Perl<\duck> in a former lifetime using regular
expressions. The following does not work, but is the flavor of what I want
to do:

long_line_of_text = 'Start: 1/1/2004 and some stuff.~Start: 2/3/2004 stuff.
~Start 5/1/2004 morestuff.~'
while re.match('Start:\ (\D?/\D?/\D+)', long_line_of_text):
# process the date string here which I hoped to catch in the parenthesis
above.

I'd like this to keep matching and processing the string as long as it keeps
matching the pattern, bopping down the string as it goes.

Another way to handle this is to replace all of the tildes with linefeeds
(tildes are the end of segment marker), or split the records on the tilde
and go from there. I'd just like to know how I could do it with the regular
expressions.

Thanks for your help,
--greg

Greg Lindstrom (501) 975-4859
NovaSys Health gr************@novasyshealth.com

"We are the music makers, and we are the dreamers of dreams" W.W.
Jul 18 '05 #1
3 1603
On Mon, 19 Jul 2004, Greg Lindstrom wrote:
The following does not work, but is the flavor of what I want to do:

long_line_of_text = 'Start: 1/1/2004 and some stuff.~Start: 2/3/2004 stuff.
~Start 5/1/2004 morestuff.~'
while re.match('Start:\ (\D?/\D?/\D+)', long_line_of_text):
# process the date string here which I hoped to catch in the parenthesis
above.

I'd like this to keep matching and processing the string as long as it keeps
matching the pattern, bopping down the string as it goes.


That line tastes distincly Perlish ;)

What you want to write in Python is:

for match in re.finditer('Start:\ (\D?/\D?/\D+)', long_line_of_text):
<do something with match.group(1)>

re.finditer() returns an iterator that loops over all occurances of the
pattern in the string, returning a match object for each one.
match.group() returns the actual text of the match, and match.group(n)
returns the text of group n.

I'm curious, though, why do you escape the space? My guess is it's
something from Perl that I don't remember.

Jul 18 '05 #2
Greg Lindstrom <gr************@novasyshealth.com> wrote:
long_line_of_text = 'Start: 1/1/2004 and some stuff.~Start: 2/3/2004 stuff.
~Start 5/1/2004 morestuff.~'
while re.match('Start:\ (\D?/\D?/\D+)', long_line_of_text):
# process the date string here which I hoped to catch in the parenthesis
above.

I'd like this to keep matching and processing the string as long as it keeps
matching the pattern, bopping down the string as it goes.


p = re.compile(your_pattern_from_above)
matches = p.findall(long_line_of_text)

matches will be a list of your matches caught in the parenthesis

--
Kristofer Pettijohn
kr*******@cybernetik.net
Jul 18 '05 #3
"Greg Lindstrom" <gr************@novasyshealth.com> writes:
Hello- I'm running Python 2.2.3 on Windows XP "Professional" and am reading a file
wit 1 very long line of text (the line consists of multiple records with no
cr/lf). What I would like to do is scan for the occurrence of a specific
pattern of characters which I expect to repeat many times in the file.
Suppose I want to search for "Start: mm/dd/yy" and capture the mm/dd/yyyy
data for processing each time I find it. This is the type of problem I used
to solve with <duck>Perl<\duck> in a former lifetime using regular
expressions. The following does not work, but is the flavor of what I want
to do: long_line_of_text = 'Start: 1/1/2004 and some stuff.~Start: 2/3/2004 stuff.
~Start 5/1/2004 morestuff.~'
while re.match('Start:\ (\D?/\D?/\D+)', long_line_of_text):
# process the date string here which I hoped to catch in the parenthesis
above. I'd like this to keep matching and processing the string as long as it keeps
matching the pattern, bopping down the string as it goes. Another way to handle this is to replace all of the tildes with linefeeds
(tildes are the end of segment marker), or split the records on the tilde
and go from there. I'd just like to know how I could do it with the regular
expressions.


In addition to previous answers, a useful resource might be:
http://gnosis.cx/TPiP/
Jul 18 '05 #4

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

8 posts views Thread by gsv2com | last post: by
176 posts views Thread by Thomas Reichelt | last post: by
9 posts views Thread by Xah Lee | last post: by
5 posts views Thread by olaufr | last post: by
9 posts views Thread by Jim Lewis | last post: by
2 posts views Thread by Ole Nielsby | last post: by
1 post views Thread by VanKha | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.