473,385 Members | 1,890 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,385 software developers and data experts.

regex for balanced parentheses?

There's no regex that detects balanced parentheses,
or is there?

That is, search('and so ((x+y)+z) = (x+(y+z))')
should return '((x+y)+z)'.

Not just a theoretical question, I'm cleaning up
a large body of TeX code and a regex that did that
would be very convenient. (Yes, I know it's not
hard to detect balanced parentheses by hand...)

I don't know that stuff, but I seen to recall reading
that there's a theoretical notion of "regular expression"
in CS, and a regular expression in that sense cannot
do this. But in the same place I read that the actual
regexes in programming languages include things which
are not regular expressions in that theoretical sense.
David C. Ullrich
Jun 27 '08 #1
5 4916
On Jun 12, 6:06*am, David C. Ullrich <dullr...@sprynet.comwrote:
There's no regex that detects balanced parentheses,
or is there?

That is, search('and so ((x+y)+z) = (x+(y+z))')
should return '((x+y)+z)'.

Not just a theoretical question, I'm cleaning up
a large body of TeX code and a regex that did that
would be very convenient. (Yes, I know it's not
hard to detect balanced parentheses by hand...)

I don't know that stuff, but I seen to recall reading
that there's a theoretical notion of "regular expression"
in CS, and a regular expression in that sense cannot
do this. But in the same place I read that the actual
regexes in programming languages include things which
are not regular expressions in that theoretical sense.

David C. Ullrich
Pyparsing includes several helper methods for building common
expression patterns, such as delimitedList, oneOf, operatorPrecedence,
countedArray - and a fairly recent addition, nestedExpr. nestedExpr
creates an expression for matching nested text within opening and
closing delimiters, such as ()'s, []'s, {}'s, etc. The default
delimiters are ()'s. You can also specify a content expression, so
that pyparsing will look for and construct meaningful results. The
default is to return any text nested within the delimiters, broken up
by whitespace.

Here is your sample string parsed using the default nestedExpr:
>>from pyparsing import nestedExpr
for e in nestedExpr().searchString('and so ((x+y)+z) = (x+(y+z))'):
... print e[0]
...
[['x+y'], '+z']
['x+', ['y+z']]

Pyparsing found 2 matches in your input string. Note that the parens
are gone from the results - nestedExpr returns a nested list
structure, with nesting corresponding to the ()'s found in the
original string.

Pyparsing supports parse-time callbacks, called 'parse actions', and
it comes with several commonly used methods, such as removeQuotes,
upcaseTokens, and keepOriginalText. The purpose of keepOriginalText
is to revert any structuring or parsing an expression or other parse
actions might do, and just return the originally matched text.

Here is how keepOriginalText gives you back just the nested
parenthetical expressions, without any additional processing or
grouping:
>>from pyparsing import keepOriginalText
matchedParens = nestedExpr().setParseAction(keepOriginalText)
for e in matchedParens.searchString('and so ((x+y)+z) = (x+(y+z))'):
... print e[0]
...
((x+y)+z)
(x+(y+z))

-- Paul
Jun 27 '08 #2
On Thu, 12 Jun 2008 06:38:16 -0700 (PDT), Paul McGuire
<pt***@austin.rr.comwrote:
>On Jun 12, 6:06*am, David C. Ullrich <dullr...@sprynet.comwrote:
>There's no regex that detects balanced parentheses,
or is there?

[...]

Pyparsing includes several helper methods for building common
expression patterns, such as delimitedList, oneOf, operatorPrecedence,
countedArray - and a fairly recent addition, nestedExpr. nestedExpr
creates an expression for matching nested text within opening and
closing delimiters, such as ()'s, []'s, {}'s, etc.
Keen. Howdya know I wanted that? Thanks.

TeX is one of the amazing things about free software. Knuth
is great in many ways. He totally blew it in one detail,
unfortunately one that comes up a lot: '$' is an opening
delimiter, for which the corresponding closing delimiter
is also '$'. Then better yet, '$$' is another double-duty
delimiter... what I've done with that is first split
on '$$', taking the odd-numbered bits to be the parts
enclosed in $$..$$, and then taking the remining
parts and splitting on $. Not hard but it gives me a
creepy feeling.

Hence the question: Can pyparsing tell the difference
between '$' and '$'? (heh-heh).
The default
delimiters are ()'s. You can also specify a content expression, so
that pyparsing will look for and construct meaningful results. The
default is to return any text nested within the delimiters, broken up
by whitespace.

Here is your sample string parsed using the default nestedExpr:
>>>from pyparsing import nestedExpr
for e in nestedExpr().searchString('and so ((x+y)+z) = (x+(y+z))'):
... print e[0]
...
[['x+y'], '+z']
['x+', ['y+z']]

Pyparsing found 2 matches in your input string. Note that the parens
are gone from the results - nestedExpr returns a nested list
structure, with nesting corresponding to the ()'s found in the
original string.

Pyparsing supports parse-time callbacks, called 'parse actions', and
it comes with several commonly used methods, such as removeQuotes,
upcaseTokens, and keepOriginalText. The purpose of keepOriginalText
is to revert any structuring or parsing an expression or other parse
actions might do, and just return the originally matched text.

Here is how keepOriginalText gives you back just the nested
parenthetical expressions, without any additional processing or
grouping:
>>>from pyparsing import keepOriginalText
matchedParens = nestedExpr().setParseAction(keepOriginalText)
for e in matchedParens.searchString('and so ((x+y)+z) = (x+(y+z))'):
... print e[0]
...
((x+y)+z)
(x+(y+z))

-- Paul
David C. Ullrich
Jun 27 '08 #3
Parsing TeX is definitely not for the faint-of-heart! You might try
something like QuotedString('$', escQuote='$$') in pyparsing. (I've
not poked at TeX or its ilk since the mid-80's so my TeXpertise is
long rusted away.)

I know of two projects that have taken on the problem using pyparsing
- one is the mathtext module in John Hunter's matplotlib, and Tim
Arnold posted some questions on the subject a while back - try
googling for "pyparsing tex" for further leads.

-- Paul
Jun 27 '08 #4
"Paul McGuire" <pt***@austin.rr.comwrote in message
news:20**********************************@a1g2000h sb.googlegroups.com...
Parsing TeX is definitely not for the faint-of-heart! You might try
something like QuotedString('$', escQuote='$$') in pyparsing. (I've
not poked at TeX or its ilk since the mid-80's so my TeXpertise is
long rusted away.)

I know of two projects that have taken on the problem using pyparsing
- one is the mathtext module in John Hunter's matplotlib, and Tim
Arnold posted some questions on the subject a while back - try
googling for "pyparsing tex" for further leads.

-- Paul
Definitely agree that TeX can get pretty complicated. My method (writing a
converter from one TeX tag system to another) was to pre-parse using string
match/replace for really simple stuff, regular expressions for the more
complex and pyparsing for the really tough stuff.

One thing that was surprisingly hard for me to figure out was filtering out
comments. I finally just looped through the file line by line, looking for a
'%' that wasn't in a verbatim environment and wasn't escaped, etc.
Funny how sometimes the simplest thing can be difficult to handle.

Definitely pyparsing made the job possible; I can't imagine that job without
it.

--Tim Arnold

Jun 27 '08 #5
In article
<20**********************************@a1g2000hsb.g ooglegroups.com>,
Paul McGuire <pt***@austin.rr.comwrote:
Parsing TeX is definitely not for the faint-of-heart! You might try
something like QuotedString('$', escQuote='$$') in pyparsing. (I've
not poked at TeX or its ilk since the mid-80's so my TeXpertise is
long rusted away.)
Thanks. Not actually parsing TeX, just going through and
making little changes to a few things. Easier since I wrote
the original crap so I know that various things don't come
up (for example _every_ % is the start of a comment and
_none_ of those comments is actually important, they're
all just old stuff commented out.)
I know of two projects that have taken on the problem using pyparsing
- one is the mathtext module in John Hunter's matplotlib, and Tim
Arnold posted some questions on the subject a while back - try
googling for "pyparsing tex" for further leads.

-- Paul
--
David C. Ullrich
Jun 27 '08 #6

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

4
by: aevans1108 | last post by:
expanding this message to microsoft.public.dotnet.xml Greetings Please direct me to the right group if this is an inappropriate place to post this question. Thanks. I want to format a...
7
by: alphatan | last post by:
Is there relative source or document for this purpose? I've searched the index of "Mastering Regular Expression", but cannot get the useful information for C. Thanks in advanced. -- Learning...
7
by: bill tie | last post by:
I'd appreciate it if you could advise. 1. How do I replace "\" (backslash) with anything? 2. Suppose I want to replace (a) every occurrence of characters "a", "b", "c", "d" with "x", (b)...
6
by: Dave | last post by:
I'm struggling with something that should be fairly simple. I just don't know the regext syntax very well, unfortunately. I'd like to parse words out of what is basically a boolean search...
10
by: Chance Hopkins | last post by:
I'm trying to match a set of matches after some initial text: mytext: "something" "somethingelse" "another thing" "maybe another" (?:mytext: )(?<mymatch>{1,1}+{1,1}+)+ I only get the last one...
15
by: Kay Schluehr | last post by:
I have a list of strings ls = and want to create a regular expression sx from it, such that sx.match(s) yields a SRE_Match object when s starts with an s_i for one i in . There might be...
7
by: Aek | last post by:
Hi everyone, I am trying to construct a regular expression and format string to use with a boost::regex_replace() In my file the sample text is: // .fx shader file FLOAT JOE 3545f; FLOAT...
1
by: vmoreau | last post by:
I have a text and I need to find a Word that are not enclosed in paranthesis. Can it be done with a regex? Is someone could help me? I am not familar with regex... Example looking for WORD:...
3
by: =?Utf-8?B?UmF5IE1pdGNoZWxs?= | last post by:
I'm trying to learn regex but since I've spent way too much time on the following "simple" case, there's obviously something I'm missing. I need to find all occurrences of a specific...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.