469,639 Members | 1,574 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 469,639 developers. It's quick & easy.

Regular Expression AND mach

I'm writing a song lyric database (effectively to drive a projector -
so the database contains the full song lyrics).

I'm using a nice simple Python database called KirbyBase which uses
regular expressions to search.

A simple search for a phrase I can handle - I think I can even build a
regular expression that ignores punctuation :-) - but I'm struggling
with implementing an expression that will find several words
independently (i.e. a basic 'search engine').

What I'd like is a reg. exp. that will match if a field contains word
1 *and* word 2 *and* word 3...... (for example) - but in no particular
order......

Can anyone offer any clues (ideally without me having to index every
song seperately from the database - which would kind of defeat the
object)...

Regards,
Fuzzyman

--

http://www.voidspace.org.uk/atlantib...thonutils.html
Jul 18 '05 #1
5 1465
On Fri, 19 Mar 2004 07:21:09 -0800, Fuzzyman wrote:
I'm writing a song lyric database (effectively to drive a projector -
so the database contains the full song lyrics).

I'm using a nice simple Python database called KirbyBase which uses
regular expressions to search.

A simple search for a phrase I can handle - I think I can even build a
regular expression that ignores punctuation :-) - but I'm struggling
with implementing an expression that will find several words
independently (i.e. a basic 'search engine').

What I'd like is a reg. exp. that will match if a field contains word
1 *and* word 2 *and* word 3...... (for example) - but in no particular
order......

Can anyone offer any clues (ideally without me having to index every
song seperately from the database - which would kind of defeat the
object)...

Regards,
Fuzzyman

To realize and test your regexs, use kodos. Very good python regex
debugger.
http://kodos.sourceforge.net/

Regards,
Jul 18 '05 #2
Regular expressions are not a good tool for this purpose.

Here's one bad solution:

def permutations:
raise NotImplementedException, "exercise for the reader"

def rxseq(seq): # Return a RE that has seq[0] followed by seq[1] etc
return ".*".join(seq)

def rxand(seq): # Return an RE that matches each permutation of seq
return "|".join([rxseq(p) for p in permutations(seq)])

This fails when one part can overlap another, for instance if
word1="aba" and word2="b", "ab" or "ba".

You could also use a bunch of lookahead assertions, something like
(?=.*word1)(?=.*word2)
but you'll also come to regret this choice.

Jeff
PS If you liked these regular expressions and would like to buy more,
visit e-bay where I'm selling a RE that matches multiples of 3 base 10.

Jul 18 '05 #3
> PS If you liked these regular expressions and would like to buy more,
visit e-bay where I'm selling a RE that matches multiples of 3 base 10.


An exercise given by an undergraduate professor for a theory of
computation course (my friend was the TA) was as follows...

Given a binary string ab that is the alternation of elements of binary
string a and binary string b, that is (in set notation)...
a = {a1, a2, a3,...,an}
b = {b1, b2, b3,...,bn}
ab = {a1, b1, a2, b2,..., an, bn}

Create a regular expression to determine if string a (taken as a binary
integer) is exactly 3 times string b (also taken as a binary integer).
Strangely enough, it is actually possible. I'll leave it as an exercise
to the reader.

- Josiah
Jul 18 '05 #4
Jeff Epler <je****@unpythonic.net> wrote in message news:<ma************************************@pytho n.org>...
Regular expressions are not a good tool for this purpose.

Here's one bad solution:

def permutations:
raise NotImplementedException, "exercise for the reader"

def rxseq(seq): # Return a RE that has seq[0] followed by seq[1] etc
return ".*".join(seq)

def rxand(seq): # Return an RE that matches each permutation of seq
return "|".join([rxseq(p) for p in permutations(seq)])

This fails when one part can overlap another, for instance if
word1="aba" and word2="b", "ab" or "ba".

You could also use a bunch of lookahead assertions, something like
(?=.*word1)(?=.*word2)
but you'll also come to regret this choice.

Jeff
PS If you liked these regular expressions and would like to buy more,
visit e-bay where I'm selling a RE that matches multiples of 3 base 10.


Hmm... I'm not sure if I've been helped or not :-)
Thanks anyway.....

I've ended up just doing a search in the the database for the longest
word.... (which returns all songs containing that word) and then
checking hte text of *each* song for all the other
words...............

It's not noticeably slow (only 1800 songs)...
If I do this again I might try and find a useful indexer..........

Odd that you can't do this easily with regular expressions - I suppose
it doesn't compile down to a neat test.... but then it's hardly a
complex search... OTOH I have *no idea* how regular expressions
actually work (and no need to find out)...

Regards
Fuzzy

http://www.voidspace.org.uk/atlantib...thonutils.html
Jul 18 '05 #5
You didn't tell us you were looking for full-text indexing.

This may be what you want, as your database grows:
http://www.divmod.org/Home/Projects/Lupy/

I haven't used it. The only two full-text indexing packages I've used
are Glimpse and Swish-E. I still use Swish-E daily to search in a
2.5 million line software product. On a reasonably fast machine, it can
find 37 instances of the phrase "int main" in 1.2 seconds.

It's no google, but it works great. Calling out to it from Python
should pose no great difficulty.

Jeff

Jul 18 '05 #6

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

4 posts views Thread by Buddy | last post: by
4 posts views Thread by Neri | last post: by
11 posts views Thread by Dimitris Georgakopuolos | last post: by
3 posts views Thread by James D. Marshall | last post: by
7 posts views Thread by Billa | last post: by
25 posts views Thread by Mike | last post: by
1 post views Thread by NvrBst | last post: by
reply views Thread by gheharukoh7 | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.