finding indices in a sequence of parentheses

Steven Bethard

I have a list of strings that looks something like:

lst = ['0', '0', '(*)', 'O', '(*', '*', '(*', '*))', '((*', '*)', '*)']

The parentheses in the labels indicate where an "annotation" starts and
ends. So for example, the label '(*)' at index 2 of the list means that
I have an annotation at (2, 2), and the labels '(*', '*', '(*', '*))' at
indices 4 through 7 mean that I have an annotation at (4, 7) and an
annotation at (6, 7).

I'd like to determine all indices at which I have an annotation. So for
the data above, I want the indices:

(2, 2), (4, 7), (6, 7), (8, 9) and (8, 10)

Here's what I'm doing now:

py> def indices(lst):
.... stack = []
.... for i, s in enumerate(lst):
.... if s == 'O':
.... continue
.... stack.extend([i]*s.count('('))
.... if '*' in s and not stack:
.... raise Exception('No start for %r at %i' % (s, i))
.... for _ in range(s.count(')')):
.... try:
.... yield stack.pop(), i
.... except IndexError:
.... raise Exception('No start for %r at %i' % (s, i))
.... if stack:
.... raise Exception('No ends for starts at %r' % stack)
....
py> list(indices(['0', '0', '(*)', 'O', '(*', '*', '(*', '*))', '((*',
'*)', '*)', '0']))
[(2, 2), (6, 7), (4, 7), (8, 9), (8, 10)]

I think that works right, but I'm not certain. So two questions:

(1) Can anyone see anything wrong with the code above? and
(2) Does anyone see an easier/clearer/simpler[1] way of doing this?

Thanks,

STeVe

[1] Yes, I know easier/clearer/simpler are subjective terms. It's okay,
I'm only looking for opinions here anyway. =)

Jul 19 '05 #1

Subscribe Post Reply

1582

tiissa

Steven Bethard wrote:

(2) Does anyone see an easier/clearer/simpler[1] way of doing this?

I'd personnally extract the parenthesis then zip the lists of indices.
In short:

def indices(mylist): ... lopen=reduce(list.__add__, [[i]*s.count('(') for i,s in
enumerate(mylist)],[])
... lclose=reduce(list.__add__, [[i]*s.count(')') for i,s in
enumerate(mylist)],[])
... return zip(lopen,lclose)
... indices(lst) [(2, 2), (4, 7), (6, 7), (8, 9), (8, 10)]

Before returning, you can check if the lists have same size and if the
'(' has lower or equal index than ')' in each of these couples. If not
you can raise the appropriate exception.

Disclaimer: not tested further than example above (but confident).

Jul 19 '05 #2

Steven Bethard

tiissa wrote:

I'd personnally extract the parenthesis then zip the lists of indices.
In short:
>>> def indices(mylist): ... lopen=reduce(list.__add__, [[i]*s.count('(') for i,s in
enumerate(mylist)],[])
... lclose=reduce(list.__add__, [[i]*s.count(')') for i,s in
enumerate(mylist)],[])
... return zip(lopen,lclose)
... >>> indices(lst) [(2, 2), (4, 7), (6, 7), (8, 9), (8, 10)] >>>

Thanks, that's a good idea. In case anyone else is reading this thread,
and had to mentally unwrap the reduce expressions, I believe they could
be written as:

lopen = [x for i, s in enumerate(lst) for x in [i]*s.count('(')]
lclose = [x for i, s in enumerate(lst) for x in [i]*s.count(')')]

or maybe:

lopen = [i for i, s in enumerate(lst) for _ in xrange(s.count('('))]
lclose = [i for i, s in enumerate(lst) for _ in xrange(s.count(')'))]

Sorry, I have an irrational fear of reduce. ;)

STeVe

Jul 19 '05 #3

Raymond Hettinger

[Steven Bethard]

I have a list of strings that looks something like:
lst = ['0', '0', '(*)', 'O', '(*', '*', '(*', '*))', '((*', '*)', '*)'] . . . I want the indices:
(2, 2), (4, 7), (6, 7), (8, 9) and (8, 10)

opener_stack = []
for i, elem in enumerate(lst):
for c in elem:
if c == '(':
opener_stack.append(i)
elif c == ')':
print opener_stack.pop(), i
To see something like this in production code, look at
Tools/scripts/texcheck.py

Raymond Hettinger

Jul 19 '05 #4

Peter Otten

tiissa wrote:

Steven Bethard wrote:
(2) Does anyone see an easier/clearer/simpler[1] way of doing this?

I'd personnally extract the parenthesis then zip the lists of indices.
In short:
>>> def indices(mylist): ... lopen=reduce(list.__add__, [[i]*s.count('(') for i,s in
enumerate(mylist)],[])
... lclose=reduce(list.__add__, [[i]*s.count(')') for i,s in
enumerate(mylist)],[])
... return zip(lopen,lclose)
... >>> indices(lst) [(2, 2), (4, 7), (6, 7), (8, 9), (8, 10)] >>>

Before returning, you can check if the lists have same size and if the
'(' has lower or equal index than ')' in each of these couples. If not
you can raise the appropriate exception.

Disclaimer: not tested further than example above (but confident).

Not tested but confident should be an oxymoron for a programmer. Some
examples:

lst: ['(', '(', ')', ')']
hettinger [(1, 2), (0, 3)]
bethard [(1, 2), (0, 3)]
tiissa [(0, 2), (1, 3)] oops (or am I just spoilt by the XML spec?)

lst: ['(', ')(', ')']
hettinger [(0, 1), (1, 2)]
bethard [(1, 1), (0, 2)] oops
tiissa [(0, 1), (1, 2)]

So far Raymond's solution is my favourite...

Peter

Jul 19 '05 #5

tiissa

Peter Otten wrote:

tiissa wrote:
Disclaimer: not tested further than example above (but confident).

Not tested but confident should be an oxymoron for a programmer.

Not tested but confident is an oxymoron for mathemtaticians.
Programmers know better than that, they leave bugs in their code to have
more work to do. ;)

OTOH, you're right. Matching parentheses cannot be done without a stack,
that should have rung a bell.

Jul 19 '05 #6

Peter Otten

tiissa wrote:

Not tested but confident is an oxymoron for mathemtaticians.

I think no amount of testing will give these strange people confidence.
"Proof" is the magic word here.

Peter

Jul 19 '05 #7

tiissa

Peter Otten wrote:

I think no amount of testing will give these strange people confidence.
"Proof" is the magic word here.

Some would maybe be satisfied if your tests cover the whole set of input.

When that's possible, that may be useless. But that's not a matter to
bother them with. ;)

(And of course, you just don't tell them how you generated their output
with the same program.)

Jul 19 '05 #8

Steven Bethard

Raymond Hettinger wrote:

[Steven Bethard]
I have a list of strings that looks something like:
lst = ['0', '0', '(*)', 'O', '(*', '*', '(*', '*))', '((*', '*)', '*)']
. . .
I want the indices:
(2, 2), (4, 7), (6, 7), (8, 9) and (8, 10)

opener_stack = []
for i, elem in enumerate(lst):
for c in elem:
if c == '(':
opener_stack.append(i)
elif c == ')':
print opener_stack.pop(), i

Thanks Raymond, this is definitely an elegant solution. It was also easy
to add all the error checking I needed into this one. For the curious,
my final solution looks something like:

def indices(lst):
stack = []
for i, elem in enumerate(lst):
for c in elem:
if c == '(':
stack.append(i)
elif c == ')' and not stack:
raise Exception('")" at %i without "("' % i)
elif c == ')':
yield stack.pop(), i
elif c == 'O' and stack:
raise Exception('"O" at %i after "(" at %i' %
(i, stack[-1]))
elif c == '*' and not stack:
raise Exception('"*" at %i without "("' % i)
if stack:
raise Exception('"(" at %r without ")"' % stack)

Thanks again!

STeVe

Jul 19 '05 #9

Similar topics

negative indices for sequence types

by: dan | last post by:

I was recently surprised, and quite shocked in fact, to find that Python treats negative indices into sequence types as if they were mod(length-of-sequence), at least up to -len(seq). This fact...

Python

Find Items & Indices In A List...

by: andrea.gavana | last post by:

Hello NG, I was wondering if there is a faster/nicer method (than a for loop) that will allow me to find the elements (AND their indices) in a list that verify a certain condition. For example,...

Python

Problem: Generating random indices for a container

by: Ross MacGregor | last post by:

I have a very simple yet complicated problem. I want to generate a random list of indices (int's) for a container. Let's say I have a container with 10 items and I want a list of 3 random...

C / C++

Finding cookie name?

by: brettr | last post by:

When I reference document.cookie, there is a long string of key=value; pairs listed. I may have 100 hundred cookies on my hard drive. However, most only have one key=value pair. Does the...

Javascript

Extended slices and indices

by: GavinCrooks | last post by:

The indices method of slice doesn't seem to work quite how I would expect when reversing a sequence. For example : '43210' '43210' So a slice with a negative step (and nothing else) reverses...

Python

Finding Nonzero Elements in a Sparse Matrix

by: deLenn | last post by:

Hi, Does scipy have an equivalent to Matlab's 'find' function, to list the indices of all nonzero elements in a sparse matrix? Cheers.

Python

Array indices

by: mmu2643 | last post by:

Hi, I had a noobie question: i = 0; while (len i) && (SOME_VAL != arr ) i++; In the above snippet if len is the number of elements in arr, when i == len, since the first condition fails,...

C / C++

Finding and updating text between parentheses with PHP4

by: watershed | last post by:

In PHP4, how would I go about finding *every* instance of text (max of 14 characters) in a string that is placed within parentheses? And then loop through each instance and update the string...

PHP

Python Feature Request: Allow changing base of member indices to 1

by: samjnaa | last post by:

This is like the previous one. Please check for sanity and approve for posting at python-dev. I would like to have something like "option base" in Visual Basic. IIRC it used to allow me to...

Python

How to turn on java script in a villaon keypad mobile phone

by: Charles Arthur | last post by:

How do i turn on java script on a villaon, callus and itel keypad mobile phone

Java

Navigating the Data Structures and Algorithms (DSA)

by: BarryA | last post by:

What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...

Algorithms / Advanced Math

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

General

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...

General

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...

Windows Server

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

C / C++

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

Online Marketing

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...

Windows Server

Access Europe - Using VBA to create a class based on a table - Wed 1 May

by: isladogs | last post by:

The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

Microsoft Access / VBA