need some regular expression help

Chris

I need a pattern that matches a string that has the same number of '('
as ')':
findall( compile('...'), '42^((2x+2)sin(x)) + (log(2)/log(5))' ) = [
'((2x+2)sin(x))', '(log(2)/log(5))' ]
Can anybody help me out?

Thanks for any help!

Oct 7 '06 #1

Subscribe Post Reply

2237

Diez B. Roggisch

Chris wrote:

I need a pattern that matches a string that has the same number of '('
as ')':
findall( compile('...'), '42^((2x+2)sin(x)) + (log(2)/log(5))' ) = [
'((2x+2)sin(x))', '(log(2)/log(5))' ]
Can anybody help me out?

This is not possible with regular expressions - they can't "remember"
how many parens they already encountered.

You will need a real parser for this - pyparsing seems to be the most
popular choice today, I personally like spark. I'm sure you find an
example-grammar that will parse simple arithmetical expressions like
the one above.

Diez

Oct 7 '06 #2

John Machin

Chris wrote:

I need a pattern that matches a string that has the same number of '('
as ')':
findall( compile('...'), '42^((2x+2)sin(x)) + (log(2)/log(5))' ) = [
'((2x+2)sin(x))', '(log(2)/log(5))' ]
Can anybody help me out?

No, there is so such pattern. You will have to code up a function.

Consider what your spec really is: '42^((2x+2)sin(x)) +
(log(2)/log(5))' has the same number of left and right parentheses; so
does the zero-length string; so does ') + (' -- perhaps you need to add
'and starts with a "("'

Consider what you are going to do with input like this:

print '(' + some_text + ')'

Maybe you need to do some lexical analysis and work at the level of
tokens rather than individual characters.

Which then raises the usual question: you have a perception that
regular expressions are the solution -- to what problem??

HTH,
John

Oct 7 '06 #3

hanumizzle

On 7 Oct 2006 15:00:29 -0700, Diez B. Roggisch <de***@web.dewrote:

>
Chris wrote:
I need a pattern that matches a string that has the same number of '('
as ')':
findall( compile('...'), '42^((2x+2)sin(x)) + (log(2)/log(5))' ) = [
'((2x+2)sin(x))', '(log(2)/log(5))' ]
Can anybody help me out?

This is not possible with regular expressions - they can't "remember"
how many parens they already encountered.

Remember that regular expressions are used to represent regular
grammars. Most regex engines actually aren't regular in that they
support fancy things like look-behind/ahead and capture groups...IIRC,
these cannot be part of a true regular expression library.

With that said, the quote-unquote regexes in Lua have a special
feature that supports balanced expressions. I believe Python has a
PCRE lib somewhere; you may be able to use the experimental ??{ }
construct in that case.

-- Theerasak

Oct 8 '06 #4

Roy Smith

In article <11*********************@e3g2000cwe.googlegroups.c om>,
"Chris" <ch*********@gmail.comwrote:

I need a pattern that matches a string that has the same number of '('
as ')':
findall( compile('...'), '42^((2x+2)sin(x)) + (log(2)/log(5))' ) = [
'((2x+2)sin(x))', '(log(2)/log(5))' ]
Can anybody help me out?

Thanks for any help!

Why does it need to be a regex? There is a very simple and well-known
algorithm which does what you want.

Start with i=0. Walk the string one character at a time, incrementing i
each time you see a '(', and decrementing it each time you see a ')'. At
the end of the string, the count should be back to 0. If at any time
during the process, the count goes negative, you've got mis-matched
parentheses.

The algorithm runs in O(n), same as a regex.

Regex is a wonderful tool, but it's not the answer to all problems.

Oct 8 '06 #5

Tim Chase

Why does it need to be a regex? There is a very simple and well-known

algorithm which does what you want.

Start with i=0. Walk the string one character at a time, incrementing i
each time you see a '(', and decrementing it each time you see a ')'. At
the end of the string, the count should be back to 0. If at any time
during the process, the count goes negative, you've got mis-matched
parentheses.

The algorithm runs in O(n), same as a regex.

Regex is a wonderful tool, but it's not the answer to all problems.

Following Roy's suggestion, one could use something like:

>>s = '42^((2x+2)sin(x)) + (log(2)/log(5))'
d = {'(':1, ')':-1}
sum(d.get(c, 0) for c in s)

0
If you get a sum() 0, then you have too many "(", and if you
have sum() < 0, you have too many ")" characters. A sum() of 0
means there's the same number of parens. It still doesn't solve
the aforementioned problem of things like ')))(((' which is
balanced, but psychotic. :)

-tkc

Oct 8 '06 #6

Diez B. Roggisch

hanumizzle wrote:

On 7 Oct 2006 15:00:29 -0700, Diez B. Roggisch <de***@web.dewrote:

Chris wrote:
I need a pattern that matches a string that has the same number of '('
as ')':
findall( compile('...'), '42^((2x+2)sin(x)) + (log(2)/log(5))' ) = [
'((2x+2)sin(x))', '(log(2)/log(5))' ]
Can anybody help me out?
This is not possible with regular expressions - they can't "remember"
how many parens they already encountered.

Remember that regular expressions are used to represent regular
grammars. Most regex engines actually aren't regular in that they
support fancy things like look-behind/ahead and capture groups...IIRC,
these cannot be part of a true regular expression library.

Certainly true, and it always gives me a hard time because I don't know
to which extend a regular expression nowadays might do the job because
of these extensions. It was so much easier back in the old times....

With that said, the quote-unquote regexes in Lua have a special
feature that supports balanced expressions. I believe Python has a
PCRE lib somewhere; you may be able to use the experimental ??{ }
construct in that case.

Even if it has - I'm not sure if it really does you good, for several
reasons:

- regexes - even enhanced ones - don't build trees. But that is what
you ultimately want
from an expression like sin(log(x))

- even if they are more powerful these days, the theory of context
free grammars still applies.
so if what you need isn't LL(k) but LR(k), how do you specify that
to the regex engine?

- the regexes are useful because of their compact notations, parsers
allow for better structured outcome
Diez

Oct 8 '06 #7

Theerasak Photha

On 8 Oct 2006 01:49:50 -0700, Diez B. Roggisch <de***@web.dewrote:

Even if it has - I'm not sure if it really does you good, for several
reasons:

- regexes - even enhanced ones - don't build trees. But that is what
you ultimately want
from an expression like sin(log(x))

- even if they are more powerful these days, the theory of context
free grammars still applies.
so if what you need isn't LL(k) but LR(k), how do you specify that
to the regex engine?

- the regexes are useful because of their compact notations, parsers
allow for better structured outcome

Just wait for Perl 6 :D

-- Theerasak

Oct 8 '06 #8

bearophileHUGS

Tim Chase:

It still doesn't solve the aforementioned problem
of things like ')))(((' which is balanced, but psychotic. :)

This may solve the problem:

def balanced(txt):
d = {'(':1, ')':-1}
tot = 0
for c in txt:
tot += d.get(c, 0)
if tot < 0:
return False
return tot == 0

print balanced("42^((2x+2)sin(x)) + (log(2)/log(5))") # True
print balanced("42^((2x+2)sin(x) + (log(2)/log(5))") # False
print balanced("42^((2x+2)sin(x))) + (log(2)/log(5))") # False
print balanced(")))(((") # False

A possibile alternative for Py 2.5. The dict solution looks better, but
this may be faster:

def balanced2(txt):
tot = 0
for c in txt:
tot += 1 if c=="(" else (-1 if c==")" else 0)
if tot < 0:
return False
return tot == 0

Bye,
bearophile

Oct 8 '06 #9

Fredrik Lundh

be************@lycos.com wrote:

The dict solution looks better, but this may be faster:

it's slightly faster, but both your alternatives are about 10x slower
than a straightforward:

def balanced(txt):
return txt.count("(") == txt.count(")")

</F>

Oct 8 '06 #10

Mirco Wahab

Thus spoke Diez B. Roggisch (on 2006-10-08 10:49):

Certainly true, and it always gives me a hard time because I don't know
to which extend a regular expression nowadays might do the job because
of these extensions. It was so much easier back in the old times....

Right, in perl, this would be a no-brainer,
its documented all over the place, like:

my $re;

$re = qr{
(?:
(?[^\$)]+ | \\. )
|
\( (??{ $re }) $
)*
}xs;

where you have a 'delayed execution'
of the

(??{ $re })

which in the end makes the whole a thing
recursive one, it gets expanded and
executed if the match finds its way
to it.

Above regex will match balanced parens,
as in:

my $good = 'a + (b / (c - 2)) * (d ^ (e+f)) ';
my $bad1 = 'a + (b / (c - 2) * (d ^ (e+f)) ';
my $bad2 = 'a + (b / (c - 2)) * (d) ^ (e+f) )';

if you do:

print "ok \n" if $good =~ /^$re$/;
print "ok \n" if $bad1 =~ /^$re$/;
print "ok \n" if $bad2 =~ /^$re$/;
This in some depth documented e.g. in
http://japhy.perlmonk.org/articles/tpj/2004-summer.html
(topic: Recursive Regexes)

Regards

M.

Oct 8 '06 #11

Diez B. Roggisch

Mirco Wahab schrieb:

Thus spoke Diez B. Roggisch (on 2006-10-08 10:49):
>Certainly true, and it always gives me a hard time because I don't know
to which extend a regular expression nowadays might do the job because
of these extensions. It was so much easier back in the old times....

Right, in perl, this would be a no-brainer,
its documented all over the place, like:

my $re;

$re = qr{
(?:
(?[^\$)]+ | \\. )
|
\( (??{ $re }) $
)*
}xs;

where you have a 'delayed execution'
of the

(??{ $re })

which in the end makes the whole a thing
recursive one, it gets expanded and
executed if the match finds its way
to it.

Above regex will match balanced parens,
as in:

my $good = 'a + (b / (c - 2)) * (d ^ (e+f)) ';
my $bad1 = 'a + (b / (c - 2) * (d ^ (e+f)) ';
my $bad2 = 'a + (b / (c - 2)) * (d) ^ (e+f) )';

if you do:

print "ok \n" if $good =~ /^$re$/;
print "ok \n" if $bad1 =~ /^$re$/;
print "ok \n" if $bad2 =~ /^$re$/;
This in some depth documented e.g. in
http://japhy.perlmonk.org/articles/tpj/2004-summer.html
(topic: Recursive Regexes)

That clearly is a recursive grammar rule, and thus it can't be regular
anymore :) But first of all, I find it ugly - the clean separation of
lexical and syntactical analysis is better here, IMHO - and secondly,
what are the properties of that parsing? Is it LL(k), LR(k), backtracking?

Diez

Oct 8 '06 #12

bearophileHUGS

Fredrik Lundh wrote:

it's slightly faster, but both your alternatives are about 10x slower
than a straightforward:
def balanced(txt):
return txt.count("(") == txt.count(")")

I know, but if you read my post again you see that I have shown those
solutions to mark ")))(((" as bad expressions. Just counting the parens
isn't enough.

Bye,
bearophile

Oct 8 '06 #13

Roy Smith

"Diez B. Roggisch" <de***@web.dewrote:

Certainly true, and it always gives me a hard time because I don't know
to which extend a regular expression nowadays might do the job because
of these extensions. It was so much easier back in the old times....

What old times? I've been working with regex for mumble years and there's
always been the problem that every implementation supports a slightly
different syntax. Even back in the "good old days", grep, awk, sed, and ed
all had slightly different flavors.

Oct 8 '06 #14

Theerasak Photha

On 10/8/06, Roy Smith <ro*@panix.comwrote:

"Diez B. Roggisch" <de***@web.dewrote:
Certainly true, and it always gives me a hard time because I don't know
to which extend a regular expression nowadays might do the job because
of these extensions. It was so much easier back in the old times....

What old times? I've been working with regex for mumble years and there's
always been the problem that every implementation supports a slightly
different syntax. Even back in the "good old days", grep, awk, sed, and ed
all had slightly different flavors.

Which grep? Which awk? :)

-- Theerasak

Oct 8 '06 #15

Similar topics

Help needed with a regular expression

by: Neri | last post by:

Some document processing program I write has to deal with documents that have headers and footers that are unnecessary for the main processing part. Therefore, I'm using a regular expression to go...

C# / C Sharp

Need help with regular expression.

by: hillcountry74 | last post by:

Hi, I'm stuck with this regular expression from past 2 days. Desperately need help. I need a regular expression that will allow all characters except these *:~<>' This is my code in...

C# / C Sharp

Need help understanding regular expression

by: Joe | last post by:

Hi, I have been using a regular expression that I donâ€™t uite understand to filter the valid email address. My regular expression is as follows: <asp:RegularExpressionValidator...

ASP.NET

Simple Regular Expression need

by: Q. John Chen | last post by:

I have Vidation Controls First One: Simple exluce certain special characters: say no a or b or c in the string: * Second One: I required date be entered in "MM/DD/YYYY" format: //+4 How...

ASP.NET

Regular expression optimization

by: Billa | last post by:

Hi, I am replaceing a big string using different regular expressions (see some example at the end of the message). The problem is whenever I apply a "replace" it makes a new copy of string and I...

.NET Framework

Regular Expression Matches

by: Pete Davis | last post by:

I'm using regular expressions to extract some data and some links from some web pages. I download the page and then I want to get a list of certain links. For building regular expressions, I use...

C# / C Sharp

Need one Regular Expression

by: Lucky | last post by:

hi guys, i'm practising regular expression. i've got one string and i want it to split in groups. i was trying to make one regular expression but i didn't successed. please help me guys. i'm...

Visual Basic .NET

Get regular expression

by: Mike | last post by:

I have a regular expression (^(.+)(?=\s*).*\1 ) that results in matches. I would like to get what the actual regular expression is. In other words, when I apply ^(.+)(?=\s*).*\1 to " HEART...

C# / C Sharp

Need help in forming a regular expression using regex_replace

by: deepak_kamath_n | last post by:

Hello, I am relatively new to the world of regex and require some help in forming a regular expression to achieve the following: I have an input stream similar to: Slot: slot1 Description:...

C / C++

Migrating Website to Cloud - Emmanuel Katto

by: emmanuelkatto | last post by:

Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel

General

Navigating the Data Structures and Algorithms (DSA)

by: BarryA | last post by:

What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...

Algorithms / Advanced Math

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

General

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...

Windows Server

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

Online Marketing

Discussion: How does Zigbee compare with other wireless protocols in smart home applications?

by: tracyyun | last post by:

Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

General

AI Job Threat for Devs

by: agi2029 | last post by:

Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

Career Advice

Access Europe - Using VBA to create a class based on a table - Wed 1 May

by: isladogs | last post by:

The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

Microsoft Access / VBA