473,785 Members | 2,329 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

re question - finiding matching ()

Hello All,

To all of you regexp gurus out there...

I'd like to find all of the sub strings in the form "add(.*)"
The catch is that I might have () in the string (e.g. "add((2 * 2), 100)"),

Currently I can only get the "addr((2 *2)" using re.compile("\w+ \([^\)]*\)").
To solve the problem a hand crafted search is used :-(

Is there a better way?

Thanks.
Miki
Jul 18 '05 #1
9 1397
On Sun, 2004-01-18 at 09:51, Miki Tebeka wrote:
Hello All,

To all of you regexp gurus out there...

I'd like to find all of the sub strings in the form "add(.*)"
The catch is that I might have () in the string (e.g. "add((2 * 2), 100)"),


The pattern you have is so simple and the necessary regex (it seems to
me) so unnecessarily complex (for a 100% regex solution) that I think
using only regex's for this problem is overkill. Anyway, here's a
test-based approach:

#!/usr/bin/env python

import re
import unittest

def findOperands(ve rb, text):
"""Find operands in text for verb.

E.g., findOperands('a dd', 'add(1, 2, 3)') == ['1', '2', '3']

"""
raw = '%s\((.*)\)' % (verb,)
pat = re.compile(raw)
matches = pat.findall(tex t)
if not matches:
raise RuntimeError('N o matches found for pattern %s.' % (raw,))
assert len(matches) == 1
match = matches[0]
operands = match.split(',' )
for i, item in enumerate(opera nds):
operands[i] = item.strip()
return operands

class test(unittest.T estCase):

def test(self):
text = 'add(2, 3)'
operands = findOperands('a dd', text)
expected = ['2', '3']
self.assertEqua ls(operands, expected)

text = 'add((2 * 2), 100)'
operands = findOperands('a dd', text)
expected = ['(2 * 2)', '100']
self.assertEqua ls(operands, expected)

text = 'add(1, 2, 3)'
operands = findOperands('a dd', text)
expected = ['1', '2', '3']
self.assertEqua ls(operands, expected)

text = 'multiply(1, 2, 3, 4)'
operands = findOperands('m ultiply', text)
expected = ['1', '2', '3', '4']
self.assertEqua ls(operands, expected)

text = 'add 2, 3'
self.assertRais es(RuntimeError , findOperands, 'add', text)

unittest.main()

Jul 18 '05 #2
On 18 Jan 2004 07:51:38 -0800, Miki Tebeka wrote:
Hello All,

To all of you regexp gurus out there...

I'd like to find all of the sub strings in the form "add(.*)"
The catch is that I might have () in the string (e.g. "add((2 * 2),
100)"),

Currently I can only get the "addr((2 *2)" using
re.compile("\w+ \([^\)]*\)"). To solve the problem a hand crafted
search is used :-(

Is there a better way?

Thanks.
Miki

Hello,

You may need "recursive patterns" to do this but regular expressions
cannot handle this. You can simulate recursive pattern by limiting the
recursivity level. For example the expression inside () should be
[^\(\)]+ at level 0. At level 1, you can match only zero or one pair:
(?:\([^\(\)]+\)|[^\(\)]+)* and so on.

You can build such an expression recursively:

def make_par_re(lev el=6):
if level < 1:
return r'[^\(\)]+'
else:
return r'(?:\(%s\)|%s) *'%(make_par_re (level-1), make_par_re(0))

par_re = re.compile(r"\w +\(%s\)"%make_p ar_re())

But in this case you are limited to 6 levels.

Now you can try this :

for m in par_re.findall( "add((2*2), 100) some text sub(a, b*(10-c),
f(g(a,b), h(c, d)))"):
print m

I don't really like this solution because the expressions are ugly (try
print make_par_re(6)) .

Anyway a better solution would be to use a syntactic parser. You can
write your own by hand or make your choice here:
http://www.python.org/sigs/parser-sig/

Best regards,
Christophe.
Jul 18 '05 #3
Regular Expressions cannot perform the simple task of matching an
arbitrary number of parentheses. You could write an expression that
will work as long as the nesting isn't more than N levels, but the
expression quickly becomes very painful.

Instead, you can use some method to split the string into parts: one
part for "(", one for ")" and one for any other sequence of characters:

tokenize_re = re.compile("\(| \)|[^()]*")

Now, use this expression to tokenize your input string:
tokens = tokenize_re.fin dall("add((2 * 2), 100)")
print tokens

['add', '(', '(', '2 * 2', ')', ', 100', ')']
To match your language, the first token must be 'add':
if tokens[0] != 'add': # no match
The second token must be '(':
if tokens[1] != '(': # no match
Now, start scanning and counting parens, until you get back to 0
nesting_level = 0
for t in tokens[1:]:
if t == ')':
nesting_level -= 1
if nesting_level == 0:
# End of desired string
else if t == '(':
nesting_level += 1
# if you end the loop without reaching nesting_level 0
# then there were not enough close-parens
# no match
You could also implement this part recursively (as you likely would if
you were actually compiling the string into bytecode).

Jeff

Jul 18 '05 #4
Perhaps you'd like to enhance your test-suite:
# Do improperly-nested parens cause an exception? (part 1)
text = 'add((1,2,3)'
self.assertRais es(RuntimeError , findOperands, 'add', text)

# Do improperly-nested parens cause an exception? (part 2)
text = 'add(1,2,3))'
self.assertRais es(RuntimeError , findOperands, 'add', text)

# Do multiple statements on one line behave as expected?
text = 'add(1,2); add(1,2)'
expected = ['2', '3']
self.assertEqua ls(operands, expected)

Jeff

Jul 18 '05 #5
Miki Tebeka wrote:
To all of you regexp gurus out there...
So not asking me, but anyway...
I'd like to find all of the sub strings in the form "add(.*)"
The catch is that I might have () in the string (e.g. "add((2 * 2),
100)"),

Currently I can only get the "addr((2 *2)" using
re.compile("\w+ \([^\)]*\)"). To solve the problem a hand crafted search is
used :-(

Is there a better way?


Iff you are looking into Python code, you could also use the compiler
module. The following script scans itself for occurences of add() function
calls.

<example.py>
import compiler

def sample():
""" add(xxx) <- will not be found """
x.add(1) # <- will not be found
add(a*(b+c))
add(a, b)
add()
add((1*1),2)+ad d((2))

class Visitor:
def visitCallFunc(s elf, node):
if getattr(node.ge tChildren()[0], "name", None) == "add":
print node

tree = compiler.parseF ile(__file__)

compiler.visito r.walk(tree, Visitor())
</example.py>

Peter
Jul 18 '05 #6
mi*********@zor an.com (Miki Tebeka) writes:
I'd like to find all of the sub strings in the form "add(.*)"
The catch is that I might have () in the string (e.g. "add((2 * 2), 100)"),


You can't do that with classic regexps. You need a parser, not a scanner.
Jul 18 '05 #7
Jeff Epler <je****@unpytho nic.net> wrote in message news:<ma******* *************** *************** *@python.org>.. .
Regular Expressions cannot perform the simple task of matching an
arbitrary number of parentheses. You could write an expression that
will work as long as the nesting isn't more than N levels, but the
expression quickly becomes very painful.

Instead, you can use some method to split the string into parts: one
part for "(", one for ")" and one for any other sequence of characters: .... Now, start scanning and counting parens, until you get back to 0


That's the way I'd recommend. However, it doesn't generalize to cases
where there are multiple types of parentheses. For that situation,
you can use:

LEFT_PARENS = '([{'
RIGHT_PARENS = ')]}'
PAREN_MATCHES = dict(zip(RIGHT_ PARENS, LEFT_PARENS))

def balancedParens( tokens):
parenStack = []
for token in tokens:
if token in LEFT_PARENS:
parenStack.appe nd(token)
elif token in RIGHT_PARENS:
if parenStack:
correspondingLe ftParen = parenStack.pop( )
if PAREN_MATCHES[token] != correspondingLe ftParen:
return False
else:
return False
return True

(There's probably a simpler way, but I can't think of one right now.)
Jul 18 '05 #8
"Christophe Delord" <no.spam> wrote in message
news:2004011819 2118.61fea23f.n o.spam@box...
On 18 Jan 2004 07:51:38 -0800, Miki Tebeka wrote:
<snip>
Anyway a better solution would be to use a syntactic parser. You can
write your own by hand or make your choice here:
http://www.python.org/sigs/parser-sig/

Best regards,
Christophe.


Another parser not listed on this page can be found at
http://pyparsing.sourceforge.net. The download includes an algebraic
expression parser, that handles parenthesis nesting.

-- Paul
Jul 18 '05 #9
| Paul Rubin said |
mi*********@zor an.com (Miki Tebeka) writes:
I'd like to find all of the sub strings in the form "add(.*)"
The catch is that I might have () in the string (e.g. "add((2 * 2), 100)"),


You can't do that with classic regexps. You need a parser, not a scanner.


Or, to put it slightly differently:

Regexps are a type of system that is not mathematically powerful enough to
handle this type of identification. The next step up in power are
parsers.

Sam Walters.

--
Never forget the halloween documents.
http://www.opensource.org/halloween/
""" Where will Microsoft try to drag you today?
Do you really want to go there?"""

Jul 18 '05 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

8
6981
by: gsv2com | last post by:
One of my weaknesses has always been pattern matching. Something I definitely need to study up on and maybe you guys can give me a pointer here. I'm looking to remove all of this code and just use pattern matching to determine if the proper amount of numeric characters has been met. Here is the function I've already done. Any help you can give in a pattern matching solution would be much appreciated and very educational.
176
8189
by: Thomas Reichelt | last post by:
Moin, short question: is there any language combining the syntax, flexibility and great programming experience of Python with static typing? Is there a project to add static typing to Python? Thank you, -- greetz tom
17
14073
by: Andrew McLean | last post by:
I have a problem that is suspect isn't unusual and I'm looking to see if there is any code available to help. I've Googled without success. Basically, I have two databases containing lists of postal addresses and need to look for matching addresses in the two databases. More precisely, for each address in database A I want to find a single matching address in database B. I'm 90% of the way there, in the sense that I have a simplistic...
1
2738
by: Henry | last post by:
I have a table that stores a list of zip codes using a varchar column type, and I need to perform some string prefix pattern matching search. Let's say that I have the columns: 94000-1235 94001 94100 If I run a pattern matching search for the string "940", then I should get the top two rows of data back. If I run a pattern matching search for the string "94", then I should get all the three rows of data back.
5
5759
by: olaufr | last post by:
Hi, I'd need to perform simple pattern matching within a string using a list of possible patterns. For example, I want to know if the substring starting at position n matches any of the string I have a list, as below: sentence = "the color is $red" patterns = pos = sentence.find($)
1
6395
by: solarin | last post by:
Hi, I've developed a program under VS 6.0. I can compile it and run it, but when I try to debbug , all my breakpoints are dissabled and I can see the following messages: Loaded 'D:\prj\simulador HMS\Enviar Datos al sim\CMS\Debug\CMS.exe', no matching symbolic information found. Loaded 'ntdll.dll', no matching symbolic information found. Loaded 'C:\WINDOWS\system32\kernel32.dll', no matching symbolic
2
2119
by: santhoshs | last post by:
Hello I am required to parse two files that contain email addresses and figure out a way to get the matching and non-matching email addresses from both the files. I was able to get the matching entries by using the following code: #!/usr/bin/perl open f1,"file1" or die "cannot open file:$!\n"; open f2,"file2" or die "cannot open file2:$!\n"; open out,">match.out" or die $@; @file1=<f1>; @file2 =<f2>; close f1;close f2;
11
4846
by: tech | last post by:
Hi, I need a function to specify a match pattern including using wildcard characters as below to find chars in a std::string. The match pattern can contain the wildcard characters "*" and "?", where "*" matches zero or more consecutive occurrences of any character and "?" matches a single occurrence of any character. Does boost or some other library have this capability? If boost does have this, do i need to include an entire
1
4643
by: sora | last post by:
Hi, I've developed a MFC program under VS 6.0. My debugger *was* working fine and I've used it often for my project. Then, one day, the errors below appear and they prevent me from using the debugger. I know that I get these warnings because the OS symbols arn't installed (a few of them loaded fine). As far as I can remember, I didn't touch anything that should mess with the debugger so I don't know how to fix it and I need to be able to...
0
9647
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
9489
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
10357
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
1
10101
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
8988
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
0
5396
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
0
5528
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
4063
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
3
2893
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.