473,414 Members | 1,757 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,414 software developers and data experts.

re question - finiding matching ()

Hello All,

To all of you regexp gurus out there...

I'd like to find all of the sub strings in the form "add(.*)"
The catch is that I might have () in the string (e.g. "add((2 * 2), 100)"),

Currently I can only get the "addr((2 *2)" using re.compile("\w+\([^\)]*\)").
To solve the problem a hand crafted search is used :-(

Is there a better way?

Thanks.
Miki
Jul 18 '05 #1
9 1377
On Sun, 2004-01-18 at 09:51, Miki Tebeka wrote:
Hello All,

To all of you regexp gurus out there...

I'd like to find all of the sub strings in the form "add(.*)"
The catch is that I might have () in the string (e.g. "add((2 * 2), 100)"),


The pattern you have is so simple and the necessary regex (it seems to
me) so unnecessarily complex (for a 100% regex solution) that I think
using only regex's for this problem is overkill. Anyway, here's a
test-based approach:

#!/usr/bin/env python

import re
import unittest

def findOperands(verb, text):
"""Find operands in text for verb.

E.g., findOperands('add', 'add(1, 2, 3)') == ['1', '2', '3']

"""
raw = '%s\((.*)\)' % (verb,)
pat = re.compile(raw)
matches = pat.findall(text)
if not matches:
raise RuntimeError('No matches found for pattern %s.' % (raw,))
assert len(matches) == 1
match = matches[0]
operands = match.split(',')
for i, item in enumerate(operands):
operands[i] = item.strip()
return operands

class test(unittest.TestCase):

def test(self):
text = 'add(2, 3)'
operands = findOperands('add', text)
expected = ['2', '3']
self.assertEquals(operands, expected)

text = 'add((2 * 2), 100)'
operands = findOperands('add', text)
expected = ['(2 * 2)', '100']
self.assertEquals(operands, expected)

text = 'add(1, 2, 3)'
operands = findOperands('add', text)
expected = ['1', '2', '3']
self.assertEquals(operands, expected)

text = 'multiply(1, 2, 3, 4)'
operands = findOperands('multiply', text)
expected = ['1', '2', '3', '4']
self.assertEquals(operands, expected)

text = 'add 2, 3'
self.assertRaises(RuntimeError, findOperands, 'add', text)

unittest.main()

Jul 18 '05 #2
On 18 Jan 2004 07:51:38 -0800, Miki Tebeka wrote:
Hello All,

To all of you regexp gurus out there...

I'd like to find all of the sub strings in the form "add(.*)"
The catch is that I might have () in the string (e.g. "add((2 * 2),
100)"),

Currently I can only get the "addr((2 *2)" using
re.compile("\w+\([^\)]*\)"). To solve the problem a hand crafted
search is used :-(

Is there a better way?

Thanks.
Miki

Hello,

You may need "recursive patterns" to do this but regular expressions
cannot handle this. You can simulate recursive pattern by limiting the
recursivity level. For example the expression inside () should be
[^\(\)]+ at level 0. At level 1, you can match only zero or one pair:
(?:\([^\(\)]+\)|[^\(\)]+)* and so on.

You can build such an expression recursively:

def make_par_re(level=6):
if level < 1:
return r'[^\(\)]+'
else:
return r'(?:\(%s\)|%s)*'%(make_par_re(level-1), make_par_re(0))

par_re = re.compile(r"\w+\(%s\)"%make_par_re())

But in this case you are limited to 6 levels.

Now you can try this :

for m in par_re.findall("add((2*2), 100) some text sub(a, b*(10-c),
f(g(a,b), h(c, d)))"):
print m

I don't really like this solution because the expressions are ugly (try
print make_par_re(6)).

Anyway a better solution would be to use a syntactic parser. You can
write your own by hand or make your choice here:
http://www.python.org/sigs/parser-sig/

Best regards,
Christophe.
Jul 18 '05 #3
Regular Expressions cannot perform the simple task of matching an
arbitrary number of parentheses. You could write an expression that
will work as long as the nesting isn't more than N levels, but the
expression quickly becomes very painful.

Instead, you can use some method to split the string into parts: one
part for "(", one for ")" and one for any other sequence of characters:

tokenize_re = re.compile("\(|\)|[^()]*")

Now, use this expression to tokenize your input string:
tokens = tokenize_re.findall("add((2 * 2), 100)")
print tokens

['add', '(', '(', '2 * 2', ')', ', 100', ')']
To match your language, the first token must be 'add':
if tokens[0] != 'add': # no match
The second token must be '(':
if tokens[1] != '(': # no match
Now, start scanning and counting parens, until you get back to 0
nesting_level = 0
for t in tokens[1:]:
if t == ')':
nesting_level -= 1
if nesting_level == 0:
# End of desired string
else if t == '(':
nesting_level += 1
# if you end the loop without reaching nesting_level 0
# then there were not enough close-parens
# no match
You could also implement this part recursively (as you likely would if
you were actually compiling the string into bytecode).

Jeff

Jul 18 '05 #4
Perhaps you'd like to enhance your test-suite:
# Do improperly-nested parens cause an exception? (part 1)
text = 'add((1,2,3)'
self.assertRaises(RuntimeError, findOperands, 'add', text)

# Do improperly-nested parens cause an exception? (part 2)
text = 'add(1,2,3))'
self.assertRaises(RuntimeError, findOperands, 'add', text)

# Do multiple statements on one line behave as expected?
text = 'add(1,2); add(1,2)'
expected = ['2', '3']
self.assertEquals(operands, expected)

Jeff

Jul 18 '05 #5
Miki Tebeka wrote:
To all of you regexp gurus out there...
So not asking me, but anyway...
I'd like to find all of the sub strings in the form "add(.*)"
The catch is that I might have () in the string (e.g. "add((2 * 2),
100)"),

Currently I can only get the "addr((2 *2)" using
re.compile("\w+\([^\)]*\)"). To solve the problem a hand crafted search is
used :-(

Is there a better way?


Iff you are looking into Python code, you could also use the compiler
module. The following script scans itself for occurences of add() function
calls.

<example.py>
import compiler

def sample():
""" add(xxx) <- will not be found """
x.add(1) # <- will not be found
add(a*(b+c))
add(a, b)
add()
add((1*1),2)+add((2))

class Visitor:
def visitCallFunc(self, node):
if getattr(node.getChildren()[0], "name", None) == "add":
print node

tree = compiler.parseFile(__file__)

compiler.visitor.walk(tree, Visitor())
</example.py>

Peter
Jul 18 '05 #6
mi*********@zoran.com (Miki Tebeka) writes:
I'd like to find all of the sub strings in the form "add(.*)"
The catch is that I might have () in the string (e.g. "add((2 * 2), 100)"),


You can't do that with classic regexps. You need a parser, not a scanner.
Jul 18 '05 #7
Jeff Epler <je****@unpythonic.net> wrote in message news:<ma**************************************@pyt hon.org>...
Regular Expressions cannot perform the simple task of matching an
arbitrary number of parentheses. You could write an expression that
will work as long as the nesting isn't more than N levels, but the
expression quickly becomes very painful.

Instead, you can use some method to split the string into parts: one
part for "(", one for ")" and one for any other sequence of characters: .... Now, start scanning and counting parens, until you get back to 0


That's the way I'd recommend. However, it doesn't generalize to cases
where there are multiple types of parentheses. For that situation,
you can use:

LEFT_PARENS = '([{'
RIGHT_PARENS = ')]}'
PAREN_MATCHES = dict(zip(RIGHT_PARENS, LEFT_PARENS))

def balancedParens(tokens):
parenStack = []
for token in tokens:
if token in LEFT_PARENS:
parenStack.append(token)
elif token in RIGHT_PARENS:
if parenStack:
correspondingLeftParen = parenStack.pop()
if PAREN_MATCHES[token] != correspondingLeftParen:
return False
else:
return False
return True

(There's probably a simpler way, but I can't think of one right now.)
Jul 18 '05 #8
"Christophe Delord" <no.spam> wrote in message
news:20040118192118.61fea23f.no.spam@box...
On 18 Jan 2004 07:51:38 -0800, Miki Tebeka wrote:
<snip>
Anyway a better solution would be to use a syntactic parser. You can
write your own by hand or make your choice here:
http://www.python.org/sigs/parser-sig/

Best regards,
Christophe.


Another parser not listed on this page can be found at
http://pyparsing.sourceforge.net. The download includes an algebraic
expression parser, that handles parenthesis nesting.

-- Paul
Jul 18 '05 #9
| Paul Rubin said |
mi*********@zoran.com (Miki Tebeka) writes:
I'd like to find all of the sub strings in the form "add(.*)"
The catch is that I might have () in the string (e.g. "add((2 * 2), 100)"),


You can't do that with classic regexps. You need a parser, not a scanner.


Or, to put it slightly differently:

Regexps are a type of system that is not mathematically powerful enough to
handle this type of identification. The next step up in power are
parsers.

Sam Walters.

--
Never forget the halloween documents.
http://www.opensource.org/halloween/
""" Where will Microsoft try to drag you today?
Do you really want to go there?"""

Jul 18 '05 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

8
by: gsv2com | last post by:
One of my weaknesses has always been pattern matching. Something I definitely need to study up on and maybe you guys can give me a pointer here. I'm looking to remove all of this code and just...
176
by: Thomas Reichelt | last post by:
Moin, short question: is there any language combining the syntax, flexibility and great programming experience of Python with static typing? Is there a project to add static typing to Python? ...
17
by: Andrew McLean | last post by:
I have a problem that is suspect isn't unusual and I'm looking to see if there is any code available to help. I've Googled without success. Basically, I have two databases containing lists of...
1
by: Henry | last post by:
I have a table that stores a list of zip codes using a varchar column type, and I need to perform some string prefix pattern matching search. Let's say that I have the columns: 94000-1235 94001...
5
by: olaufr | last post by:
Hi, I'd need to perform simple pattern matching within a string using a list of possible patterns. For example, I want to know if the substring starting at position n matches any of the string I...
1
by: solarin | last post by:
Hi, I've developed a program under VS 6.0. I can compile it and run it, but when I try to debbug , all my breakpoints are dissabled and I can see the following messages: Loaded...
2
by: santhoshs | last post by:
Hello I am required to parse two files that contain email addresses and figure out a way to get the matching and non-matching email addresses from both the files. I was able to get the matching...
11
by: tech | last post by:
Hi, I need a function to specify a match pattern including using wildcard characters as below to find chars in a std::string. The match pattern can contain the wildcard characters "*" and "?",...
1
by: sora | last post by:
Hi, I've developed a MFC program under VS 6.0. My debugger *was* working fine and I've used it often for my project. Then, one day, the errors below appear and they prevent me from using the...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.