473,396 Members | 2,111 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,396 software developers and data experts.

tricky regular expressions

So regular expressions have been good to me so far, but now my problem
is a bit trickier. The string I'm getting data from looks like this:

myString =
[USELESS DATA]
Request : Play
[USELESS DATA]
Title: Beethoven's 5th
[USELESS DATA]
Request : next
[USELESS DATA]
Title: song #2
......

I'm using this code to search myString:
.....
pattern = '''(?x)
Title:\s+(.+)
'''
Titles = re.findall(pattern, myString)

.....
The problem is that I only want the "Titles" which are either:

a) Followed by "Request : Play"
b) Followed by "Request : next"

I'm not sure if I should use RE's or some other mechanism. Thanks

Feb 7 '06 #1
7 1524
Ernesto wrote:
I'm not sure if I should use RE's or some other mechanism. Thanks

I think a line-based state machine parser could be a better idea. Much
simpler to build and debug if not faster to execute.
Feb 7 '06 #2
[hint: posting the same question in newsgroups generally
does not help to get responses any quicker]

Ernesto wrote:
The string I'm getting data from looks like this:
[USELESS DATA]
Request : Play
[USELESS DATA]
Title: Beethoven's 5th
[USELESS DATA]
Request : next
[USELESS DATA]
Title: song #2
.....
The problem is that I only want the "Titles" which are either: a) Followed by "Request : Play"
b) Followed by "Request : next"

I'm not sure if I should use RE's or some other mechanism.


I'd advise against REs - they can quickly get messy. What
I'd do is just what you have described:
1) read all lines that are not [USELESS DATA] (i.e. lines
beginning with either Title or Request) into a list
2) walk through this list, deleting all "Title" lines that
are not followed by an appropriate "Request" line.

Your description is not very exact - but if anything else
needs to be filtered out, just do so. The general idea is to
break your task into smaller steps (instead of one huge RE)
that are easier to manage, write and understand.
Feb 7 '06 #3

Xavier Morel wrote:
Ernesto wrote:
I'm not sure if I should use RE's or some other mechanism. Thanks

I think a line-based state machine parser could be a better idea. Much
simpler to build and debug if not faster to execute.


What is a line-based state machine ?

Feb 7 '06 #4
Ernesto wrote:
Xavier Morel wrote:
Ernesto wrote:
I'm not sure if I should use RE's or some other mechanism. Thanks

I think a line-based state machine parser could be a better idea. Much
simpler to build and debug if not faster to execute.


What is a line-based state machine ?

Parse your file line-by-line (since it seems that it's the way your data
is organized).

Keep state informations somewhere.

Change your state based on the current state and the data being fed to
your parser.

For example, here you basically have 3 states:

No Title, which is the initial state of the machine (it has not
encountered any title yet, and you do stuff based on titles)

Title loaded, when you've met a title. "Title loaded" loops on itself:
if you meet a "Title: whatever" line, you change the title currently
stored but you stay in the "Title loaded" state (you change the current
state of the machine from "title loaded" to "title loaded").

Request loaded, which can be reached only when you're in the "Title
loaded", and then encounter a line starting with "Request: ". When you
reach that stage, do your processing (you have a title loaded, which is
the latest title you encountered, and you have a request loaded, which
is the request that immediately follows the loaded title), then you go
back to the "No Title" state, since you've processed (and therefore
unloaded) the current title.

So, the state diagram could kind of look like that:
(it's supposed to be a single state diagram, but i suck at ascii
diagrams so i'll create one mini-diagram for each state)

NoTitle =0> TitleLoaded

=0>
Event: on encountering a line starting with "Title: "
Action: save the title (to whatever variable you see fit)
Change state to: TitleLoaded
TitleLoaded =1> TitleLoaded
||
2
\/
Request

=1>
Event: on encountering a line starting with "Title: "
Action: save the title (replace the current value of your title variable)
Change state to: TitleLoaded

=2>
Event: on encountering a line starting with "Request: "
Action: save the request?; immediately process the Request state
Change state to: Request
Request =3> NoTitle
||
4
\/
TitleLoaded

=3>
Event: the Request state is reached, the request is either "Play" or "Next"
Action: Do whatever you want to do; nuke the content of the title variable
Change state to: NoTitle

=4>
Event: the Request state is reached, the request is neither "Play" nor
"Next"
Action: Nuke the content of the request variable (if you saved it), do
nothing else
Change state to: TitleLoaded

As a final note, i'd recommend reading "Text Processing in Python", even
though it puts a quite big emphasis on functional programming (which you
may or may not appreciate), it's an extremely good initiation to
text-files handling, parsing and processing.
Feb 7 '06 #5
try to google for "finit state machine" OR "state machine" OR FSM

titles =["USELESS DATA","Request : Play",
"USELESS DATA","Title: Beethoven's 5th",
"USELESS DATA","Request : next","USELESS DATA",
"Title: song# 2 ","USELESS DATA","Request : Play",
"USELESS DATA","Title: Beethoven's 5th",
"USELESS DATA","Request : next","USELESS DATA",
"Title: song# 3 ","USELESS DATA","Request : Play"]
for title in range(len(titles)):
if titles[title][:6] =="Title:":
x=1
try:
while titles[title+x]!="Request : Play" and
titles[title+x]!="Request : next":
x+=1
pass
print titles[title], titles[title+x]
except IndexError: pass
HTH

Petr Jakes
PS: just wonder why are you asking the same question in two different
topics....

Feb 7 '06 #6

Petr Jakes wrote:
PS: just wonder why are you asking the same question in two different
topics....


Thanks for the help Peter. That happened accidentally. I meant to
only put that in python topic. Aplogies...

Feb 7 '06 #7
I had a somewhat similar problem that got solved with a line-based
state machine parser.
Cheers,

Elezar
=================================
Wiki Dictionary - Bosnian/Croatian/Serbian
http://izraz.com/%C4%8Cengi%C4%87_vila

Feb 8 '06 #8

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
by: Kenneth McDonald | last post by:
I'm working on the 0.8 release of my 'rex' module, and would appreciate feedback, suggestions, and criticism as I work towards finalizing the API and feature sets. rex is a module intended to make...
2
by: Sehboo | last post by:
Hi, I have several regular expressions that I need to run against documents. Is it possible to combine several expressions in one expression in Regex object. So that it is faster, or will I...
4
by: Együd Csaba | last post by:
Hi All, I'd like to "compress" the following two filter expressions into one - assuming that it makes sense regarding query execution performance. .... where (adate LIKE "2004.01.10 __:30" or...
7
by: Billa | last post by:
Hi, I am replaceing a big string using different regular expressions (see some example at the end of the message). The problem is whenever I apply a "replace" it makes a new copy of string and I...
0
by: Ernesto | last post by:
I'm trying to get the right syntax for my regular expression. The string I'm trying to parse is: # myString Request: Play Name: David Dude
3
by: a | last post by:
I'm a newbie needing to use some Regular Expressions in PHP. Can I safely use the results of my tests using 'The Regex Coach' (http://www.weitz.de/regex-coach/index.html) Are the Regular...
1
by: Allan Ebdrup | last post by:
I have a dynamic list of regular expressions, the expressions don't change very often but they can change. And I have a single string that I want to match the regular expressions against and find...
13
by: Wiseman | last post by:
I'm kind of disappointed with the re regular expressions module. In particular, the lack of support for recursion ( (?R) or (?n) ) is a major drawback to me. There are so many great things that can...
12
by: FAQEditor | last post by:
Anybody have any URL's to tutorials and/or references for Regular Expressions? The four I have so far are: http://docs.sun.com/source/816-6408-10/regexp.htm...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.