Help | Site Map
Connecting Tech Pros Worldwide
 
 
LinkBack Thread Tools
  #1  
Old August 2nd, 2005, 03:05 PM
giampiero mu
Guest
 
Posts: n/a
Default finding sublist

hi everyone

my target is implement a function controlla(string - a binary string-)
that check if there is in a string two equal not overlapping
subsequences at least of length limitem:

my code:

def controlla(test):
# print test
limitem=4
lunghezz=len(test)

l1=lunghezz-limitem+1
l2=lunghezz-limitem+1
f=0

for ii in range(l1):
for kk in range(l2):
t1=test[ii:ii+limitem]
t2=test[kk:kk+limitem]
if abs(ii-kk)>=limitem :
if t1==t2 :
f=1
if f==1 :
break
break
break
if f==0 :
return test
else:
return 'no'


the question is: Is there a faster way doing that? I don't know,
changing string into list or array, using map, or module, using
different loop, regular expression,funcional programming , list
comprehensions , sets, different looping techniques, i dont
know.......(!)

  #2  
Old August 2nd, 2005, 04:05 PM
Kent Johnson
Guest
 
Posts: n/a
Default Re: finding sublist

giampiero mu wrote:[color=blue]
> hi everyone
>
> my target is implement a function controlla(string - a binary string-)
> that check if there is in a string two equal not overlapping
> subsequences at least of length limitem:[/color]

You can do this with a regular expression:
[color=blue][color=green][color=darkred]
>>> import re
>>> r=re.compile(r'(?P<seq>.{4,}).*(?P=seq)')
>>> r.match('abcaoeaeoudabc')
>>> r.match('abcdaeaeuabcd')[/color][/color][/color]
<_sre.SRE_Match object at 0x008D67E0>[color=blue][color=green][color=darkred]
>>> _.group(1)[/color][/color][/color]
'abcd'[color=blue][color=green][color=darkred]
>>> r.match('abcdefgaeaeuabcdefg')[/color][/color][/color]
<_sre.SRE_Match object at 0x008D68E0>[color=blue][color=green][color=darkred]
>>> _.group(1)[/color][/color][/color]
'abcdefg'

Kent

[color=blue]
>
> my code:
>
> def controlla(test):
> # print test
> limitem=4
> lunghezz=len(test)
>
> l1=lunghezz-limitem+1
> l2=lunghezz-limitem+1
> f=0
>
> for ii in range(l1):
> for kk in range(l2):
> t1=test[ii:ii+limitem]
> t2=test[kk:kk+limitem]
> if abs(ii-kk)>=limitem :
> if t1==t2 :
> f=1
> if f==1 :
> break
> break
> break
> if f==0 :
> return test
> else:
> return 'no'
>
>
> the question is: Is there a faster way doing that? I don't know,
> changing string into list or array, using map, or module, using
> different loop, regular expression,funcional programming , list
> comprehensions , sets, different looping techniques, i dont
> know.......(!)
>[/color]
  #3  
Old August 2nd, 2005, 05:35 PM
Ron Adam
Guest
 
Posts: n/a
Default Re: finding sublist

giampiero mu wrote:[color=blue]
> hi everyone[/color]

Hi, you appear to be fairly new to Python, so you might want to take a
look at the tutorial at Python.org

Kents suggestion to use RE is good.


This should be faster than your example by quite a bit if you don't want
to use RE.

def controlla(test, size=4):
# Return substring if a second substring is found.
# Return None if no match found.

for i in range(len(test)-size):
match=test[i:i+size]
left=test[:i]
right=test[i+size:]
if match in left or match in right:
return match

print controlla('qqqabcdrrabcd')

--> 'abcd'


Here's a few notes on your example for future reference:

Multiple breaks don't work. The first break will jump out of the loop
before the other breaks are reached.

Any function that ends without a return will return None. So you don't
need to return 'no'.

Cheers,
Ron


  #4  
Old August 2nd, 2005, 06:45 PM
Johan Lindberg
Guest
 
Posts: n/a
Default Re: finding sublist

Hello.
[color=blue]
> my target is implement a function controlla(string - a binary string-)
> that check if there is in a string two equal not overlapping
> subsequences at least of length limitem:
>
> my code:
> [snip]
>[/color]

I may have misunderstood it, but does your function work the way you
want it to?
[color=blue][color=green][color=darkred]
>>>controlla("testeststest")[/color][/color][/color]
no

I can't get it to print anything other than "no". But then again, I'm
reading and posting via Google and I guess all those break statements
shouldn't be on the same indent-level.
[color=blue]
> the question is: Is there a faster way doing that? I don't know,
> changing string into list or array, using map, or module, using
> different loop, regular expression,funcional programming , list
> comprehensions , sets, different looping techniques, i dont
> know.......(!)[/color]

Since you're using nested for loops when searching the string you
should make sure not to perform more iterations than neccessary. The
function below returns a list of all, non-overlapping, substrings in
text where len(substring)>= minLength. The outer loop is limited to
about half of the length of the text which is where most of the speed
comes from but I'm sure it can be tweaked for more.

def foo(text, minLength):
result= []
for length in range(minLength, len(text)/ 2+ 1):
for start in range(len(text)):
end= start+ length
if end< len(text):
part= text[start:end]
if text.find(part, end)!= -1:
if part not in result:
result.append(part)

return result
[color=blue][color=green][color=darkred]
>>>foo("testeststest", 4)[/color][/color][/color]
['test', 'stes', 'stest']
[color=blue][color=green][color=darkred]
>>>foo("testeststest", 3)[/color][/color][/color]
['tes', 'est', 'ste', 'test', 'stes', 'stest']

HTH
Johan Lindberg
johan@pulp.se

  #5  
Old August 2nd, 2005, 08:45 PM
giampiero mu
Guest
 
Posts: n/a
Default Re: finding sublist

controlla("12345678") -> "12345678"

thanks everyone. only a question. there is a way to advantage of binary
sequences?

  #6  
Old August 3rd, 2005, 06:05 PM
Johan Lindberg
Guest
 
Posts: n/a
Default Re: finding sublist

> thanks everyone. only a question. there is a way to advantage of binary[color=blue]
> sequences?[/color]

I doubt you'll find any way to optimize the code that somehow only
applies to binary sequences. You still have to find each possible
subsequence of minimum length within the sequence and compare it to all
other possible subsequences and that's what's going to take most of the
time.

If you haven't already, check out psyco
(http://psyco.sourceforge.net/). It will most definitely make your code
run faster.

BR
Johan Lindberg
johan@pulp.se

 

Bookmarks

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are Off
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On

What is Bytes?

We are a network of experts and professionals in IT and software development that help one another with answers to tough questions and share insights. Get the best answers to your questions from over network members.
Post your question now . . .
It's fast and it's free

Popular Articles