Hello All,
To all of you regexp gurus out there...
I'd like to find all of the sub strings in the form "add(.*)"
The catch is that I might have () in the string (e.g. "add((2 * 2), 100)"),
Currently I can only get the "addr((2 *2)" using re.compile("\w+ \([^\)]*\)").
To solve the problem a hand crafted search is used :-(
Is there a better way?
Thanks.
Miki 9 1397
On Sun, 2004-01-18 at 09:51, Miki Tebeka wrote: Hello All,
To all of you regexp gurus out there...
I'd like to find all of the sub strings in the form "add(.*)" The catch is that I might have () in the string (e.g. "add((2 * 2), 100)"),
The pattern you have is so simple and the necessary regex (it seems to
me) so unnecessarily complex (for a 100% regex solution) that I think
using only regex's for this problem is overkill. Anyway, here's a
test-based approach:
#!/usr/bin/env python
import re
import unittest
def findOperands(ve rb, text):
"""Find operands in text for verb.
E.g., findOperands('a dd', 'add(1, 2, 3)') == ['1', '2', '3']
"""
raw = '%s\((.*)\)' % (verb,)
pat = re.compile(raw)
matches = pat.findall(tex t)
if not matches:
raise RuntimeError('N o matches found for pattern %s.' % (raw,))
assert len(matches) == 1
match = matches[0]
operands = match.split(',' )
for i, item in enumerate(opera nds):
operands[i] = item.strip()
return operands
class test(unittest.T estCase):
def test(self):
text = 'add(2, 3)'
operands = findOperands('a dd', text)
expected = ['2', '3']
self.assertEqua ls(operands, expected)
text = 'add((2 * 2), 100)'
operands = findOperands('a dd', text)
expected = ['(2 * 2)', '100']
self.assertEqua ls(operands, expected)
text = 'add(1, 2, 3)'
operands = findOperands('a dd', text)
expected = ['1', '2', '3']
self.assertEqua ls(operands, expected)
text = 'multiply(1, 2, 3, 4)'
operands = findOperands('m ultiply', text)
expected = ['1', '2', '3', '4']
self.assertEqua ls(operands, expected)
text = 'add 2, 3'
self.assertRais es(RuntimeError , findOperands, 'add', text)
unittest.main()
On 18 Jan 2004 07:51:38 -0800, Miki Tebeka wrote: Hello All,
To all of you regexp gurus out there...
I'd like to find all of the sub strings in the form "add(.*)" The catch is that I might have () in the string (e.g. "add((2 * 2), 100)"),
Currently I can only get the "addr((2 *2)" using re.compile("\w+ \([^\)]*\)"). To solve the problem a hand crafted search is used :-(
Is there a better way?
Thanks. Miki
Hello,
You may need "recursive patterns" to do this but regular expressions
cannot handle this. You can simulate recursive pattern by limiting the
recursivity level. For example the expression inside () should be
[^\(\)]+ at level 0. At level 1, you can match only zero or one pair:
(?:\([^\(\)]+\)|[^\(\)]+)* and so on.
You can build such an expression recursively:
def make_par_re(lev el=6):
if level < 1:
return r'[^\(\)]+'
else:
return r'(?:\(%s\)|%s) *'%(make_par_re (level-1), make_par_re(0))
par_re = re.compile(r"\w +\(%s\)"%make_p ar_re())
But in this case you are limited to 6 levels.
Now you can try this :
for m in par_re.findall( "add((2*2), 100) some text sub(a, b*(10-c),
f(g(a,b), h(c, d)))"):
print m
I don't really like this solution because the expressions are ugly (try
print make_par_re(6)) .
Anyway a better solution would be to use a syntactic parser. You can
write your own by hand or make your choice here: http://www.python.org/sigs/parser-sig/
Best regards,
Christophe.
Regular Expressions cannot perform the simple task of matching an
arbitrary number of parentheses. You could write an expression that
will work as long as the nesting isn't more than N levels, but the
expression quickly becomes very painful.
Instead, you can use some method to split the string into parts: one
part for "(", one for ")" and one for any other sequence of characters:
tokenize_re = re.compile("\(| \)|[^()]*")
Now, use this expression to tokenize your input string: tokens = tokenize_re.fin dall("add((2 * 2), 100)") print tokens
['add', '(', '(', '2 * 2', ')', ', 100', ')']
To match your language, the first token must be 'add':
if tokens[0] != 'add': # no match
The second token must be '(':
if tokens[1] != '(': # no match
Now, start scanning and counting parens, until you get back to 0
nesting_level = 0
for t in tokens[1:]:
if t == ')':
nesting_level -= 1
if nesting_level == 0:
# End of desired string
else if t == '(':
nesting_level += 1
# if you end the loop without reaching nesting_level 0
# then there were not enough close-parens
# no match
You could also implement this part recursively (as you likely would if
you were actually compiling the string into bytecode).
Jeff
Perhaps you'd like to enhance your test-suite:
# Do improperly-nested parens cause an exception? (part 1)
text = 'add((1,2,3)'
self.assertRais es(RuntimeError , findOperands, 'add', text)
# Do improperly-nested parens cause an exception? (part 2)
text = 'add(1,2,3))'
self.assertRais es(RuntimeError , findOperands, 'add', text)
# Do multiple statements on one line behave as expected?
text = 'add(1,2); add(1,2)'
expected = ['2', '3']
self.assertEqua ls(operands, expected)
Jeff
Miki Tebeka wrote: To all of you regexp gurus out there...
So not asking me, but anyway...
I'd like to find all of the sub strings in the form "add(.*)" The catch is that I might have () in the string (e.g. "add((2 * 2), 100)"),
Currently I can only get the "addr((2 *2)" using re.compile("\w+ \([^\)]*\)"). To solve the problem a hand crafted search is used :-(
Is there a better way?
Iff you are looking into Python code, you could also use the compiler
module. The following script scans itself for occurences of add() function
calls.
<example.py>
import compiler
def sample():
""" add(xxx) <- will not be found """
x.add(1) # <- will not be found
add(a*(b+c))
add(a, b)
add()
add((1*1),2)+ad d((2))
class Visitor:
def visitCallFunc(s elf, node):
if getattr(node.ge tChildren()[0], "name", None) == "add":
print node
tree = compiler.parseF ile(__file__)
compiler.visito r.walk(tree, Visitor())
</example.py>
Peter mi*********@zor an.com (Miki Tebeka) writes: I'd like to find all of the sub strings in the form "add(.*)" The catch is that I might have () in the string (e.g. "add((2 * 2), 100)"),
You can't do that with classic regexps. You need a parser, not a scanner.
Jeff Epler <je****@unpytho nic.net> wrote in message news:<ma******* *************** *************** *@python.org>.. . Regular Expressions cannot perform the simple task of matching an arbitrary number of parentheses. You could write an expression that will work as long as the nesting isn't more than N levels, but the expression quickly becomes very painful.
Instead, you can use some method to split the string into parts: one part for "(", one for ")" and one for any other sequence of characters:
.... Now, start scanning and counting parens, until you get back to 0
That's the way I'd recommend. However, it doesn't generalize to cases
where there are multiple types of parentheses. For that situation,
you can use:
LEFT_PARENS = '([{'
RIGHT_PARENS = ')]}'
PAREN_MATCHES = dict(zip(RIGHT_ PARENS, LEFT_PARENS))
def balancedParens( tokens):
parenStack = []
for token in tokens:
if token in LEFT_PARENS:
parenStack.appe nd(token)
elif token in RIGHT_PARENS:
if parenStack:
correspondingLe ftParen = parenStack.pop( )
if PAREN_MATCHES[token] != correspondingLe ftParen:
return False
else:
return False
return True
(There's probably a simpler way, but I can't think of one right now.)
"Christophe Delord" <no.spam> wrote in message
news:2004011819 2118.61fea23f.n o.spam@box... On 18 Jan 2004 07:51:38 -0800, Miki Tebeka wrote:
<snip> Anyway a better solution would be to use a syntactic parser. You can write your own by hand or make your choice here: http://www.python.org/sigs/parser-sig/
Best regards, Christophe.
Another parser not listed on this page can be found at http://pyparsing.sourceforge.net. The download includes an algebraic
expression parser, that handles parenthesis nesting.
-- Paul
| Paul Rubin said | mi*********@zor an.com (Miki Tebeka) writes: I'd like to find all of the sub strings in the form "add(.*)" The catch is that I might have () in the string (e.g. "add((2 * 2), 100)"),
You can't do that with classic regexps. You need a parser, not a scanner.
Or, to put it slightly differently:
Regexps are a type of system that is not mathematically powerful enough to
handle this type of identification. The next step up in power are
parsers.
Sam Walters.
--
Never forget the halloween documents. http://www.opensource.org/halloween/
""" Where will Microsoft try to drag you today?
Do you really want to go there?""" This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics |
by: gsv2com |
last post by:
One of my weaknesses has always been pattern matching. Something I
definitely need to study up on and maybe you guys can give me a pointer
here.
I'm looking to remove all of this code and just use pattern matching to
determine if the proper amount of numeric characters has been met. Here is
the function I've already done. Any help you can give in a pattern matching
solution would be much appreciated and very educational.
|
by: Thomas Reichelt |
last post by:
Moin,
short question: is there any language combining the syntax, flexibility and
great programming experience of Python with static typing? Is there a
project to add static typing to Python?
Thank you,
--
greetz tom
|
by: Andrew McLean |
last post by:
I have a problem that is suspect isn't unusual and I'm looking to see if
there is any code available to help. I've Googled without success.
Basically, I have two databases containing lists of postal addresses and
need to look for matching addresses in the two databases. More
precisely, for each address in database A I want to find a single
matching address in database B.
I'm 90% of the way there, in the sense that I have a simplistic...
|
by: Henry |
last post by:
I have a table that stores a list of zip codes using a varchar column type,
and I need to perform some string prefix pattern matching search. Let's say
that I have the columns:
94000-1235
94001
94100
If I run a pattern matching search for the string "940", then I should get
the top two rows of data back. If I run a pattern matching search for the
string "94", then I should get all the three rows of data back.
|
by: olaufr |
last post by:
Hi,
I'd need to perform simple pattern matching within a string using a
list of possible patterns. For example, I want to know if the substring
starting at position n matches any of the string I have a list, as
below:
sentence = "the color is $red"
patterns =
pos = sentence.find($)
| |
by: solarin |
last post by:
Hi,
I've developed a program under VS 6.0. I can compile it and run it, but
when I try to debbug , all my breakpoints are dissabled and I can see
the following messages:
Loaded 'D:\prj\simulador HMS\Enviar Datos al sim\CMS\Debug\CMS.exe', no
matching symbolic information found.
Loaded 'ntdll.dll', no matching symbolic information found.
Loaded 'C:\WINDOWS\system32\kernel32.dll', no matching symbolic
|
by: santhoshs |
last post by:
Hello
I am required to parse two files that contain email addresses and figure out a way to get the matching and non-matching email addresses from both the files. I was able to get the matching entries by using the following code:
#!/usr/bin/perl
open f1,"file1" or die "cannot open file:$!\n";
open f2,"file2" or die "cannot open file2:$!\n";
open out,">match.out" or die $@;
@file1=<f1>;
@file2 =<f2>;
close f1;close f2;
|
by: tech |
last post by:
Hi, I need a function to specify a match pattern including using
wildcard characters as below
to find chars in a std::string.
The match pattern can contain the wildcard characters "*" and "?",
where "*" matches zero or more consecutive occurrences of any
character and "?" matches a single occurrence of any character.
Does boost or some other library have this capability? If boost does
have this, do i need to include an entire
|
by: sora |
last post by:
Hi,
I've developed a MFC program under VS 6.0. My debugger *was* working fine and I've used it often for my project. Then, one day, the errors below appear and they prevent me from using the debugger. I know that I get these warnings because the OS symbols arn't installed (a few of them loaded fine).
As far as I can remember, I didn't touch anything that should mess with the debugger so I don't know how to fix it and I need to be able to...
|
by: marktang |
last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look !
Part I. Meaning of...
|
by: Hystou |
last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it.
First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
| |
by: Oralloy |
last post by:
Hello folks,
I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>".
The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed.
This is as boiled down as I can make it.
Here is my compilation command:
g++-12 -std=c++20 -Wnarrowing bit_field.cpp
Here is the code in...
|
by: Hystou |
last post by:
Overview:
Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
|
by: agi2029 |
last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own....
Now, this would greatly impact the work of software developers. The idea...
|
by: TSSRALBI |
last post by:
Hello
I'm a network technician in training and I need your help.
I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs.
The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols.
I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
|
by: adsilva |
last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
|
by: 6302768590 |
last post by:
Hai team
i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
| |
by: bsmnconsultancy |
last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...
| |