473,320 Members | 1,841 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,320 software developers and data experts.

Regular Expression AND mach

I'm writing a song lyric database (effectively to drive a projector -
so the database contains the full song lyrics).

I'm using a nice simple Python database called KirbyBase which uses
regular expressions to search.

A simple search for a phrase I can handle - I think I can even build a
regular expression that ignores punctuation :-) - but I'm struggling
with implementing an expression that will find several words
independently (i.e. a basic 'search engine').

What I'd like is a reg. exp. that will match if a field contains word
1 *and* word 2 *and* word 3...... (for example) - but in no particular
order......

Can anyone offer any clues (ideally without me having to index every
song seperately from the database - which would kind of defeat the
object)...

Regards,
Fuzzyman

--

http://www.voidspace.org.uk/atlantib...thonutils.html
Jul 18 '05 #1
5 1590
On Fri, 19 Mar 2004 07:21:09 -0800, Fuzzyman wrote:
I'm writing a song lyric database (effectively to drive a projector -
so the database contains the full song lyrics).

I'm using a nice simple Python database called KirbyBase which uses
regular expressions to search.

A simple search for a phrase I can handle - I think I can even build a
regular expression that ignores punctuation :-) - but I'm struggling
with implementing an expression that will find several words
independently (i.e. a basic 'search engine').

What I'd like is a reg. exp. that will match if a field contains word
1 *and* word 2 *and* word 3...... (for example) - but in no particular
order......

Can anyone offer any clues (ideally without me having to index every
song seperately from the database - which would kind of defeat the
object)...

Regards,
Fuzzyman

To realize and test your regexs, use kodos. Very good python regex
debugger.
http://kodos.sourceforge.net/

Regards,
Jul 18 '05 #2
Regular expressions are not a good tool for this purpose.

Here's one bad solution:

def permutations:
raise NotImplementedException, "exercise for the reader"

def rxseq(seq): # Return a RE that has seq[0] followed by seq[1] etc
return ".*".join(seq)

def rxand(seq): # Return an RE that matches each permutation of seq
return "|".join([rxseq(p) for p in permutations(seq)])

This fails when one part can overlap another, for instance if
word1="aba" and word2="b", "ab" or "ba".

You could also use a bunch of lookahead assertions, something like
(?=.*word1)(?=.*word2)
but you'll also come to regret this choice.

Jeff
PS If you liked these regular expressions and would like to buy more,
visit e-bay where I'm selling a RE that matches multiples of 3 base 10.

Jul 18 '05 #3
> PS If you liked these regular expressions and would like to buy more,
visit e-bay where I'm selling a RE that matches multiples of 3 base 10.


An exercise given by an undergraduate professor for a theory of
computation course (my friend was the TA) was as follows...

Given a binary string ab that is the alternation of elements of binary
string a and binary string b, that is (in set notation)...
a = {a1, a2, a3,...,an}
b = {b1, b2, b3,...,bn}
ab = {a1, b1, a2, b2,..., an, bn}

Create a regular expression to determine if string a (taken as a binary
integer) is exactly 3 times string b (also taken as a binary integer).
Strangely enough, it is actually possible. I'll leave it as an exercise
to the reader.

- Josiah
Jul 18 '05 #4
Jeff Epler <je****@unpythonic.net> wrote in message news:<ma************************************@pytho n.org>...
Regular expressions are not a good tool for this purpose.

Here's one bad solution:

def permutations:
raise NotImplementedException, "exercise for the reader"

def rxseq(seq): # Return a RE that has seq[0] followed by seq[1] etc
return ".*".join(seq)

def rxand(seq): # Return an RE that matches each permutation of seq
return "|".join([rxseq(p) for p in permutations(seq)])

This fails when one part can overlap another, for instance if
word1="aba" and word2="b", "ab" or "ba".

You could also use a bunch of lookahead assertions, something like
(?=.*word1)(?=.*word2)
but you'll also come to regret this choice.

Jeff
PS If you liked these regular expressions and would like to buy more,
visit e-bay where I'm selling a RE that matches multiples of 3 base 10.


Hmm... I'm not sure if I've been helped or not :-)
Thanks anyway.....

I've ended up just doing a search in the the database for the longest
word.... (which returns all songs containing that word) and then
checking hte text of *each* song for all the other
words...............

It's not noticeably slow (only 1800 songs)...
If I do this again I might try and find a useful indexer..........

Odd that you can't do this easily with regular expressions - I suppose
it doesn't compile down to a neat test.... but then it's hardly a
complex search... OTOH I have *no idea* how regular expressions
actually work (and no need to find out)...

Regards
Fuzzy

http://www.voidspace.org.uk/atlantib...thonutils.html
Jul 18 '05 #5
You didn't tell us you were looking for full-text indexing.

This may be what you want, as your database grows:
http://www.divmod.org/Home/Projects/Lupy/

I haven't used it. The only two full-text indexing packages I've used
are Glimpse and Swish-E. I still use Swish-E daily to search in a
2.5 million line software product. On a reasonably fast machine, it can
find 37 instances of the phrase "int main" in 1.2 seconds.

It's no google, but it works great. Calling out to it from Python
should pose no great difficulty.

Jeff

Jul 18 '05 #6

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
by: Kenneth McDonald | last post by:
I'm working on the 0.8 release of my 'rex' module, and would appreciate feedback, suggestions, and criticism as I work towards finalizing the API and feature sets. rex is a module intended to make...
4
by: Buddy | last post by:
Can someone please show me how to create a regular expression to do the following My text is set to MyColumn{1, 100} Test I want a regular expression that sets the text to the following...
4
by: Neri | last post by:
Some document processing program I write has to deal with documents that have headers and footers that are unnecessary for the main processing part. Therefore, I'm using a regular expression to go...
11
by: Dimitris Georgakopuolos | last post by:
Hello, I have a text file that I load up to a string. The text includes certain expression like {firstName} or {userName} that I want to match and then replace with a new expression. However,...
3
by: James D. Marshall | last post by:
The issue at hand, I believe is my comprehension of using regular expression, specially to assist in replacing the expression with other text. using regular expression (\s*) my understanding is...
7
by: Billa | last post by:
Hi, I am replaceing a big string using different regular expressions (see some example at the end of the message). The problem is whenever I apply a "replace" it makes a new copy of string and I...
25
by: Mike | last post by:
I have a regular expression (^(.+)(?=\s*).*\1 ) that results in matches. I would like to get what the actual regular expression is. In other words, when I apply ^(.+)(?=\s*).*\1 to " HEART...
1
by: Allan Ebdrup | last post by:
I have a dynamic list of regular expressions, the expressions don't change very often but they can change. And I have a single string that I want to match the regular expressions against and find...
1
by: NvrBst | last post by:
I want to use the .replace() method with the regular expression /^ %VAR % =,($|&)/. The following DOESN'T replace the "^default.aspx=,($|&)" regular expression with "":...
0
by: ryjfgjl | last post by:
ExcelToDatabase: batch import excel into database automatically...
0
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
1
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: Vimpel783 | last post by:
Hello! Guys, I found this code on the Internet, but I need to modify it a little. It works well, the problem is this: Data is sent from only one cell, in this case B5, but it is necessary that data...
1
by: PapaRatzi | last post by:
Hello, I am teaching myself MS Access forms design and Visual Basic. I've created a table to capture a list of Top 30 singles and forms to capture new entries. The final step is a form (unbound)...
0
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
0
by: Defcon1945 | last post by:
I'm trying to learn Python using Pycharm but import shutil doesn't work
1
by: Shællîpôpï 09 | last post by:
If u are using a keypad phone, how do u turn on JavaScript, to access features like WhatsApp, Facebook, Instagram....
0
by: Faith0G | last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.