473,769 Members | 1,640 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

regular expression for nested parentheses

I have been trying to write a regular expression that identifies a
block of text enclosed by (potentially nested) parentheses. I've found
solutions using other regular expression engines (for example, my text
editor, BBEdit, which uses the PCRE library), but have not been able
to replicate it using python's re module.

Here's a version that works using the PCRE syntax, along with the
python error message. I'm hoping for this to identify the string '(foo
(bar) (baz))'

% python -V
Python 2.5.1
% python
pyimport re
pytext = 'buh (foo (bar) (baz)) blee'
pyno_ws = lambda s: ''.join(s.split ())
pyrexp = r"""(?P<pare ns>
.... \(
.... (?>
.... (?[^()]+ ) |
.... (?P>parens)
.... )*
.... \)
.... )"""
pyprint re.findall(no_w s(rexp), text)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Library/Frameworks/Python.framewor k/Versions/2.5/lib/
python2.5/re.py", line 167, in findall
return _compile(patter n, flags).findall( string)
File "/Library/Frameworks/Python.framewor k/Versions/2.5/lib/
python2.5/re.py", line 233, in _compile
raise error, v # invalid expression
sre_constants.e rror: unexpected end of pattern

From what I understand of the PCRE syntax, the (?>) construct is a non-
capturing subpattern, and (?P>parens) is a recursive call to the
enclosing (named) pattern. So my best guess at a python equivalent is
this:

pyrexp2 = r"""(?P<pare ns>
.... \(
.... (?=
.... (?= [^()]+ ) |
.... (?P=parens)
.... )*
.... \)
.... )"""
pyprint re.findall(no_w s(rexp2), text)
[]

....which results in no match. I've played around quite a bit with
variations on this theme, but haven't been able to come up with one
that works.

Can anyone help me understand how to construct a regular expression
that does the job in python?

Thanks -

Dec 9 '07 #1
5 8284
On Dec 10, 8:13 am, Noah Hoffman <noah.hoff...@g mail.comwrote:
I have been trying to write a regular expression that identifies a
block of text enclosed by (potentially nested) parentheses. I've found
solutions using other regular expression engines (for example, my text
editor, BBEdit, which uses the PCRE library), but have not been able
to replicate it using python's re module.
A pattern that can validly be described as a "regular expression"
cannot count and thus can't match balanced parentheses. Some "RE"
engines provide a method of tagging a sub-pattern so that a match must
include balanced () (or [] or {}); Python's doesn't.

Looks like you need a parser; try pyparsing.

[snip]
pyrexp = r"""(?P<pare ns>
... \(
... (?>
... (?[^()]+ ) |
... (?P>parens)
... )*
... \)
... )"""
pyprint re.findall(no_w s(rexp), text)
Ummm ... even if Python's re engine did do what you want, wouldn't you
need flags=re.VERBOS E in there?

Dec 9 '07 #2
On Dec 9, 1:41 pm, John Machin <sjmac...@lexic on.netwrote:
A pattern that can validly be described as a "regular expression"
cannot count and thus can't match balanced parentheses. Some "RE"
engines provide a method of tagging a sub-pattern so that a match must
include balanced () (or [] or {}); Python's doesn't.
Okay, thanks for the clarification. So recursion is not possible using
python regular expressions?
Ummm ... even if Python's re engine did do what you want, wouldn't you
need flags=re.VERBOS E in there?
Ah, thanks for letting me know about that flag; but removing
whitespace as I did with the no_ws lambda expression should also work,
no?
Dec 9 '07 #3
On Dec 10, 8:53 am, Noah Hoffman <noah.hoff...@g mail.comwrote:
On Dec 9, 1:41 pm, John Machin <sjmac...@lexic on.netwrote:
A pattern that can validly be described as a "regular expression"
cannot count and thus can't match balanced parentheses. Some "RE"
engines provide a method of tagging a sub-pattern so that a match must
include balanced () (or [] or {}); Python's doesn't.

Okay, thanks for the clarification. So recursion is not possible using
python regular expressions?
Ummm ... even if Python's re engine did do what you want, wouldn't you
need flags=re.VERBOS E in there?

Ah, thanks for letting me know about that flag; but removing
whitespace as I did with the no_ws lambda expression should also work,
no?
Under a very limited definition of "work". That technique would not
produce correct answers on patterns that contain any *significant*
whitespace e.g. you want to match "foo" and "bar" separated by one or
more spaces (but not tabs, newlines etc) ....
pattern = r"""
foo
[ ]+
bar
"""
Dec 9 '07 #4
On Dec 9, 10:12 pm, John Machin <sjmac...@lexic on.netwrote:
On Dec 10, 8:53 am, Noah Hoffman <noah.hoff...@g mail.comwrote:
On Dec 9, 1:41 pm, John Machin <sjmac...@lexic on.netwrote:
A pattern that can validly be described as a "regular expression"
cannot count and thus can't match balanced parentheses. Some "RE"
engines provide a method of tagging a sub-pattern so that a match must
include balanced () (or [] or {}); Python's doesn't.
Okay, thanks for the clarification. So recursion is not possible using
python regular expressions?
Ummm ... even if Python's re engine did do what you want, wouldn't you
need flags=re.VERBOS E in there?
Ah, thanks for letting me know about that flag; but removing
whitespace as I did with the no_ws lambda expression should also work,
no?

Under a very limited definition of "work". That technique would not
produce correct answers on patterns that contain any *significant*
whitespace e.g. you want to match "foo" and "bar" separated by one or
more spaces (but not tabs, newlines etc) ....
pattern = r"""
foo
[ ]+
bar
"""
You can also escape a literal space:

pattern = r"""
foo
\ +
bar
"""
Dec 10 '07 #5
On Dec 10, 12:22 pm, MRAB <goo...@mrabarn ett.plus.comwro te:
On Dec 9, 10:12 pm, John Machin <sjmac...@lexic on.netwrote:


On Dec 10, 8:53 am, Noah Hoffman <noah.hoff...@g mail.comwrote:
On Dec 9, 1:41 pm, John Machin <sjmac...@lexic on.netwrote:
A pattern that can validly be described as a "regular expression"
cannot count and thus can't match balanced parentheses. Some "RE"
engines provide a method of tagging a sub-pattern so that a match must
include balanced () (or [] or {}); Python's doesn't.
Okay, thanks for the clarification. So recursion is not possible using
python regular expressions?
Ummm ... even if Python's re engine did do what you want, wouldn't you
need flags=re.VERBOS E in there?
Ah, thanks for letting me know about that flag; but removing
whitespace as I did with the no_ws lambda expression should also work,
no?
Under a very limited definition of "work". That technique would not
produce correct answers on patterns that contain any *significant*
whitespace e.g. you want to match "foo" and "bar" separated by one or
more spaces (but not tabs, newlines etc) ....
pattern = r"""
foo
[ ]+
bar
"""

You can also escape a literal space:

pattern = r"""
foo
\ +
bar
"""
I know that. *Any* method of putting in a literal significant space is
clobbered by the OP's "trick" of removing *all* whitespace instead of
using the VERBOSE flag, which also permits comments:
pattern = r"""
\ + # ugly
[ ]+ # not quite so ugly
"""
Dec 10 '07 #6

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
4182
by: Kenneth McDonald | last post by:
I'm working on the 0.8 release of my 'rex' module, and would appreciate feedback, suggestions, and criticism as I work towards finalizing the API and feature sets. rex is a module intended to make regular expressions easier to create and use (and in my experience as a regular expression user, it makes them MUCH easier to create and use.) I'm still working on formal documentation, and in any case, such documentation isn't necessarily the...
5
19232
by: joemono | last post by:
Hello everyone! First, I appologize if this posting isn't proper "netiquette" for this group. I've been working with perl for almost 2 years now. However, my regular expression knowledge is pretty limited. I wrote the following expression to take (hopefully) any _reasonable_ phone number input, and format it as (999) 999-9999 x 9999.
11
3915
by: Martin Robins | last post by:
I am trying to parse a string that is similar in form to an OLEDB connection string using regular expressions; in principle it is working, but certain character combinations in the string being parsed can completely wreck it. The string I am trying to parse is as follows: commandText=insert into (Text) values (@message + N': ' + @category);commandType=StoredProcedure; message=@message; category=@category I am looking to retrive name value...
9
380
by: MJ | last post by:
HI I want to know what is mean by regular expression in C Mayur
10
3037
by: Lee Kuhn | last post by:
I am trying the create a regular expression that will essentially match characters in the middle of a fixed-length string. The string may be any characters, but will always be the same length. In other words, as the regular expression (....)($) matches the "4567" in the string "1234567", how would I create a similar regular expression that only matches the "45" in the same string. The same regular expression would match "32" in the string...
5
1640
by: Tony Marston | last post by:
I am seeking help with a regular expression that will split a string into several parts with ',' (comma) as the separator, but NOT where the separator is enclosed in parentheses. For example, take the string "field1, CONCAT(field2,' ', field3) as field23, field4". I would like to be able to split this into the following: field1 CONCAT(field2,' ', field3) as field23 field4 Thanks in advance.
9
7365
by: a | last post by:
I need to write a regular expression to match a quoted string in which the double quote character itself is represented by 2 double quotes. For example: "beginning ""nested quoted string"" end" Any idea how to write this in boost::xpressive or boost::regex. Thanks,
16
9372
by: Mark Rae | last post by:
Hi, Supposing I had a string made up of a person's name followed by their profession in parentheses e.g. string strText = "Tiger Woods (golfer)"; and I wanted to extract the portion of the string between the parentheses i.e. "golfer"
5
3790
by: shawnmkramer | last post by:
Anyone every heard of the Regex.IsMatch and Regex.Match methods just hanging and eventually getting a message "Requested Service not found"? I have the following pattern: ^(?<OrgCity>(+)+), City of, (?<OrgState>(()|( +\.)))( \((?<OrgCountry>{2,})\))?$ (ignore the line wrap)
0
9589
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
9423
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
10211
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
9863
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
1
7408
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
6673
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
5298
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
0
5447
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
3
2815
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.