472,119 Members | 1,495 Online

# find all index positions

hi
say i have string like this
astring = 'abcd efgd 1234 fsdf gfds abcde 1234'
if i want to find which postion is 1234, how can i achieve this...? i
want to use index() but it only give me the first occurence. I want to
know the positions of both "1234"
thanks

May 11 '06 #1
14 21993
> say i have string like this
astring = 'abcd efgd 1234 fsdf gfds abcde 1234'
if i want to find which postion is 1234, how can i
achieve this...? i want to use index() but it only give
me the first occurence. I want to know the positions of
both "1234"

Well, I'm not sure how efficient it is, but the following
seemed to do it for me:
a = 'abcd efgd 1234 fsdf gfds abcde 1234'
thing = '1234'
offsets = [i for i in range(len(a)) if a.startswith(thing, i)] print offsets

[10, 31]
HTH,

-tkc

May 11 '06 #2

mi*******@hotmail.com wrote:
hi
say i have string like this
astring = 'abcd efgd 1234 fsdf gfds abcde 1234'
if i want to find which postion is 1234, how can i achieve this...? i
want to use index() but it only give me the first occurence. I want to
know the positions of both "1234"
thanks

==========
def getAllIndex(aString=None, aSub=None):
t=dict()
c=0
ndx=0
while True:
try:
ndx=aString.index(aSub, ndx)
t[c]=ndx
ndx += 1
c += 1
except ValueError:
break
return t
===========

This will return a dictionary of what was found; i.e.,
getAllIndex('abcd 1234 efgh 1234 ijkl', '1234')

{0: 5, 1: 15}

May 11 '06 #3
alisonken1 wrote:
==========
def getAllIndex(aString=None, aSub=None):
t=dict()
c=0
ndx=0
while True:
try:
ndx=aString.index(aSub, ndx)
t[c]=ndx
ndx += 1
c += 1
except ValueError:
break
return t
===========

This will return a dictionary of what was found; i.e.,
getAllIndex('abcd 1234 efgh 1234 ijkl', '1234')

{0: 5, 1: 15}

Consecutive integers starting at 0 as keys? Looks like you want a list

Peter
May 11 '06 #4
mi*******@hotmail.com wrote:
astring = 'abcd efgd 1234 fsdf gfds abcde 1234'
<paraphrase> i want to find all positions of '1234' in astring.</paraphrase>

def positions(target, source):
'''Produce all positions of target in source'''
pos = -1
try:
while True:
pos = source.index(target, pos + 1)
yield pos
except ValueError:
pass

print list(positions('1234', 'abcd efgd 1234 fsdf gfds abcde 1234'))

prints:
[10, 31]

--Scott David Daniels
sc***********@acm.org
May 11 '06 #5
mi*******@hotmail.com writes:
say i have string like this
astring = 'abcd efgd 1234 fsdf gfds abcde 1234'
if i want to find which postion is 1234, how can i achieve this...? i
want to use index() but it only give me the first occurence. I want to
know the positions of both "1234"

Most straightforwardly with re.findall -- see the docs.
May 11 '06 #6
mi*******@hotmail.com wrote:
hi
say i have string like this
astring = 'abcd efgd 1234 fsdf gfds abcde 1234'
if i want to find which postion is 1234, how can i achieve this...? i
want to use index() but it only give me the first occurence. I want to
know the positions of both "1234"
thanks

The regular expression module (called re) has a function (named
finditer) that gives you what you want here.

The finditer function will find all matches of a pattern in a string and
return an iterator for them. You can loop through the iterator and do
what you want with each occurrence.
import re
astring = 'abcd efgd 1234 fsdf gfds abcde 1234'
pattern = '1234'
Perhaps just find the starting point of each match:
for match in re.finditer(pattern,astring): .... print match.start()
....
10
31

Or the span of each match:
for match in re.finditer(pattern,astring): .... print match.span()
....
(10, 14)
(31, 35)

Or use list comprehension to build a list of starting positions:
[match.start() for match in re.finditer(pattern,astring)]

[10, 31]

And so on ....

Of course, if you wish, the re module can work with vastly more complex
patterns than just a constant string like your '1234'.

Gary Herron
May 11 '06 #7
Paul Rubin wrote:
mi*******@hotmail.com writes:

say i have string like this
astring = 'abcd efgd 1234 fsdf gfds abcde 1234'
if i want to find which postion is 1234, how can i achieve this...? i
want to use index() but it only give me the first occurence. I want to
know the positions of both "1234"

Most straightforwardly with re.findall -- see the docs.

Not quite. The findall will list the matching strings, not their
positions. -- He'll get ['1234','1234']. The finditer function will
work for his requirements. See my other post to this thread.
May 11 '06 #8
I thought this to be a great exercise so I went the extra length to
turn it into a function for my little but growing library. I hope you
enjoy :)
def indexer(string, target):
'''indexer(string, target) ->[list of target indexes]

enter in a string and a target and indexer will either return a
list of all targeted indexes if at least one target is found or
indexer will return None if the target is not found in sequence.
indexer('a long long day is long', 'long') [2, 7, 19]
indexer('a long long day is long', 'day') [12]
indexer('a long long day is long', 'short')

None
'''

res = []

if string.count(target) >= 1:
res.append(string.find(target))

if string.count(target) >= 2:
for item in xrange(string.count(target) - 1):
res.append(string.find(target, res[-1] + 1))

return res
if __name__ == '__main__':
print indexer('a long long day is long', 'long') # -> [2, 7, 19]
print indexer('a long long day is long', 'day') # -> [12]
print indexer('a long long day is long', 'short') # -> None

May 11 '06 #9

Scott David Daniels wrote:
mi*******@hotmail.com wrote: <SNIP> print list(positions('1234', 'abcd efgd 1234 fsdf gfds abcde 1234'))

prints:
[10, 31]

Nicer than mine ;)
Shows I need to get a job where I use python more!

May 11 '06 #10
On 12/05/2006 5:13 AM, vbgunz wrote:
I thought this to be a great exercise so I went the extra length to
turn it into a function for my little but growing library. I hope you
enjoy :)

Oh, indeed ;-)

def indexer(string, target):
Consider not shadowing the name of the string module.
'''indexer(string, target) ->[list of target indexes]

enter in a string and a target and indexer will either return a
list of all targeted indexes if at least one target is found or
indexer will return None if the target is not found in sequence.
Consider returning [] if the target is not found in the sequence. That
would enable callers to "do nothing gracefully":

for posn in indexer(...
# do something

and it doesn't require a gross rewrite ... merely dedent the last line.
>>> indexer('a long long day is long', 'long') [2, 7, 19]
>>> indexer('a long long day is long', 'day') [12]
>>> indexer('a long long day is long', 'short')
None
'''

res = []

Consider evaluating string.count(target) *ONCE*.
if string.count(target) >= 1:
res.append(string.find(target))

if string.count(target) >= 2:
for item in xrange(string.count(target) - 1):
res.append(string.find(target, res[-1] + 1))

return res
if __name__ == '__main__':
print indexer('a long long day is long', 'long') # -> [2, 7, 19]
print indexer('a long long day is long', 'day') # -> [12]
print indexer('a long long day is long', 'short') # -> None

When called with the args('abababababababa', 'aba'), it returns [0, 2,
4, 6]. If you *intend* to allow for overlaps, it should return [0, 2, 4,
6, 8, 10, 12]; otherwise it should return [0, 4, 8, 12].

Consider doing something straightforward and understandable, like the
following (tested):

def findallstr(text, target, overlapping=0):
result = []
startpos = 0
if overlapping:
jump = 1
else:
jump = max(1, len(target))
while 1:
newpos = text.find(target, startpos)
if newpos == -1:
return result
result.append(newpos)
startpos = newpos + jump

HTH,
John
May 11 '06 #11
Hello John,

Thank you very much for your pointers! I decided to redo it and try to
implement your suggestion. I think I did a fair job and because of your
suggestion have a better iterator. Thank you!

def indexer(string, substring, overlap=1):
'''indexer(string, substring, [overlap=1]) -> int

indexer takes a string and searches it to return all substring
indexes. by default indexer is set to overlap all occurrences.
to get the index to whole words only, set the overlap argument
to the length of the substring. The only pitfall to indexer is
it will return the substring whether it stansalone or not.
list(indexer('ababababa', 'aba')) [0, 2, 4, 6]
list(indexer('ababababa', 'aba', len('aba'))) [0, 4]
list(indexer('ababababa', 'xxx')) []
list(indexer('show chow', 'how'))

[1, 6]
'''

index = string.find(substring)
if index != -1:
yield index

while index != -1:
index = string.find(substring, index + overlap)
if index == -1: continue
yield index

if __name__ == '__main__':
print list(indexer('ababababa', 'aba')) # -> [0, 2, 4, 6]
print list(indexer('ababababa', 'aba', len('aba'))) # -> [0, 4]
print list(indexer('ababababa', 'xxx')) # -> []
print list(indexer('show chow', 'how')) # -> [1, 6]

May 12 '06 #12
I forgot to explain my reason for over shadowing the 'string' built-in
within my iterator. To me, it doesn't matter because the string
identifier is temporary within the function and dies when the function
dies. Also, I personally don't use the string function and prefer
''.join('hi'), etc. Also, at least for me just starting out in Python,
I find 'string' to be as readable as possible :)

what do you think about that?

May 12 '06 #13
On 13/05/2006 1:45 AM, vbgunz wrote:
Hello John,

Thank you very much for your pointers! I decided to redo it and try to
implement your suggestion. I think I did a fair job and because of your
suggestion have a better iterator. Thank you!

def indexer(string, substring, overlap=1):
'''indexer(string, substring, [overlap=1]) -> int

indexer takes a string and searches it to return all substring
indexes. by default indexer is set to overlap all occurrences.
to get the index to whole words only, set the overlap argument
to the length of the substring.
(1) Computing the length should be done inside the function, if
necessary, which (2) avoids the possibility of passing in the wrong
length. (3) "whole words only" does *NOT* mean the same as "substrings
don't overlap".
The only pitfall to indexer is
it will return the substring whether it stansalone or not.
>>> list(indexer('ababababa', 'aba')) [0, 2, 4, 6]
>>> list(indexer('ababababa', 'aba', len('aba'))) [0, 4]
>>> list(indexer('ababababa', 'xxx')) []
>>> list(indexer('show chow', 'how'))
[1, 6]
'''

index = string.find(substring)
if index != -1:
yield index

while index != -1:
index = string.find(substring, index + overlap)
if index == -1: continue
yield index

Quite apart from the fact that you are now using both 'string' *AND*
'index' outside their usual meaning, this is hard to follow. (1) You
*CAN* avoid doing the 'find' twice without losing readibility and
elegance. (2) continue?? Somebody hits you if you use the 'return'
statement or the 'break' statement?

Sigh. I'll try once more. Here is the function I wrote, with the minimal
changes required to make it an iterator, plus changing from 0/1 to
False/True:

def findallstr(text, target, overlapping=False):
startpos = 0
if overlapping:
jump = 1
else:
jump = max(1, len(target))
while True:
newpos = text.find(target, startpos)
if newpos == -1:
return
yield newpos
startpos = newpos + jump

if __name__ == '__main__':
print list(indexer('ababababa', 'aba')) # -> [0, 2, 4, 6]
print list(indexer('ababababa', 'aba', len('aba'))) # -> [0, 4]
print list(indexer('ababababa', 'xxx')) # -> []
print list(indexer('show chow', 'how')) # -> [1, 6]

Get yourself a self-checking testing mechanism, and a more rigorous set
of tests. Ultimately you will want to look at unittest or pytest, but
for a small library of functions, you can whip up your own very quickly.
Here is what I whipped up yesterday:

def indexer2(string, target):
res = []
if string.count(target) >= 1:
res.append(string.find(target))
if string.count(target) >= 2:
for item in xrange(string.count(target) - 1):
res.append(string.find(target, res[-1] + 1))
return res # dedent fixed

if __name__ == '__main__':
tests = [
('a long long day is long', 'long', [2, 7, 19], [2, 7, 19]),
('a long long day is long', 'day', [12], [12]),
('a long long day is long', 'short', [], []),
('abababababababa', 'aba', [0, 4, 8, 12], [0, 2, 4, 6, 8, 10, 12]),
('qwerty', '', range(7), range(7)),
('', 'qwerty', [], []),
]
for test in tests:
text, target = test[:2]
results = test[2:]
for olap in range(2):
result = findallstr(text, target, olap)
print (
'FAS', text, target, olap,
result, results[olap], result == results[olap],
)
for test in tests:
text, target = test[:2]
results = test[2:]
result = indexer2(text, target)
print (
'INDXR2', text, target,
result, result == results[0], result == results[1],
)

Make sure your keyboard interrupt is not disabled before you run the
2nd-last test :-)

HTH,
John
May 12 '06 #14
On 13/05/2006 1:55 AM, vbgunz wrote:
I forgot to explain my reason for over shadowing the 'string' built-in
within my iterator. To me, it doesn't matter because the string
identifier is temporary within the function and dies when the function
dies. Also, I personally don't use the string function and prefer
''.join('hi'), etc. Also, at least for me just starting out in Python,
I find 'string' to be as readable as possible :)

what do you think about that?

1. 'string' is not a function, it is a module. I'm glad you don't use
the non-existent 'string function'.

2.
''.join('hi') 'hi'

You prefer to do that instead of what?

Look at the docs on the .join() method carefully, and consider the
following: '-'.join('hello') 'h-e-l-l-o'

3. Back to the shadowing thing:

'string' is admittedly not a good example to pull you up on, as that
module is little used these days. However it is the thin end of the
wedge, and your adding 'index' in round 2 just drove the wedge in a
little further. Once you start using words like 'file' and 'list' as
It's not just the shadowing possibility, it's the confusion in the
reader's mind between the usual/normal Python meaning of a word and the
meaning that you have attached to it. As you are just starting out in
Python, now is the time to acquire good habits.

Cheers,
John
May 12 '06 #15

### This discussion thread is closed

Replies have been disabled for this discussion.