473,466 Members | 1,401 Online
Bytes | Software Development & Data Engineering Community
Create Post

Home Posts Topics Members FAQ

Generating all ordered substrings of a string

Hi,
I want to generate all non-empty substrings of a string of length >=2.
Also,
each substring is to be paired with 'string - substring' part and vice
versa.
Thus, ['abc'] gives me [['a', 'bc'], ['bc', 'a'], ['ab', 'c'], ['c',
'ab'], ['b', 'ac'], ['ac', 'b']] etc.
Similarly, 'abcd' should give me [['a', 'bcd'], ['bcd', 'a'], ['abc',
'd'], ['d', 'abc'], ['b', 'acd'], ['acd', 'b'],['c', 'abd'], ['abd', 'c'],
['ab', 'cd'], ['cd', 'ab'], ['bc', 'ad'], ['ad', 'bc'], ['ac',
'bd'],['bd','ac']]

I've tried the following but i cant prevent duplicates and i'm missing
some substrings:
>>colocn = 'abcd'
k = 4
for i in range(k-1):
for j in range(1,k):
rule1 = [colocn[i:i+j],colocn[:i]+colocn[i+j:]]
rule2 = [colocn[:i]+colocn[i+j:],colocn[i:i+j]]
rules.append(rule1)
rules.append(rule2)
>>rules
[['a', 'bcd'], ['bcd', 'a'], ['ab', 'cd'], ['cd', 'ab'], ['abc', 'd'],
['d', 'abc'], ['b', 'acd'], ['acd', 'b'], ['bc', 'ad'], ['ad', 'bc'],
['bcd', 'a'], ['a', 'bcd'], ['c', 'abd'], ['abd', 'c'], ['cd', 'ab'],
['ab', 'cd'], ['cd', 'ab'], ['ab', 'cd']]
Any ideas??

TIA,
girish

----------------------------------------------------------------
This message was sent using IMP, the Internet Messaging Program.
Jul 11 '06 #1
8 5019
gi****@it.usyd.edu.au a écrit :
Hi,
I want to generate all non-empty substrings of a string of length >=2.
Also,
each substring is to be paired with 'string - substring' part and vice
versa.
Thus, ['abc'] gives me [['a', 'bc'], ['bc', 'a'], ['ab', 'c'], ['c',
'ab'], ['b', 'ac'], ['ac', 'b']] etc.
Similarly, 'abcd' should give me [['a', 'bcd'], ['bcd', 'a'], ['abc',
'd'], ['d', 'abc'], ['b', 'acd'], ['acd', 'b'],['c', 'abd'], ['abd', 'c'],
['ab', 'cd'], ['cd', 'ab'], ['bc', 'ad'], ['ad', 'bc'], ['ac',
'bd'],['bd','ac']]

I've tried the following but i cant prevent duplicates and i'm missing
some substrings:

>>>>colocn = 'abcd'
k = 4
for i in range(k-1):

for j in range(1,k):
rule1 = [colocn[i:i+j],colocn[:i]+colocn[i+j:]]
rule2 = [colocn[:i]+colocn[i+j:],colocn[i:i+j]]
rules.append(rule1)
rules.append(rule2)
>>>>rules

[['a', 'bcd'], ['bcd', 'a'], ['ab', 'cd'], ['cd', 'ab'], ['abc', 'd'],
['d', 'abc'], ['b', 'acd'], ['acd', 'b'], ['bc', 'ad'], ['ad', 'bc'],
['bcd', 'a'], ['a', 'bcd'], ['c', 'abd'], ['abd', 'c'], ['cd', 'ab'],
['ab', 'cd'], ['cd', 'ab'], ['ab', 'cd']]
Any ideas??
Algorithmic problem.

First, you need to get all permutations. This is a known algorithm, that
you'll find examples of on the net. Then for each permutation, list
possibles 'pair-splits'.

Here's a (probably sub-optimal, but it's getting late here) possible
implementation in functional style:

def rotate(lst):
yield lst
max = len(lst)
for i in range(1, max):
yield lst[i:] + lst[:-(max-i)]

def permute(lst):
if len(lst) 2:
for rotated in rotate(lst):
head, tail = rotated[0], rotated[1:]
for permuted in permute(tail):
yield [head] + permuted
elif len(lst) == 2:
yield lst
yield lst[::-1]
else:
yield lst

def splits(lst):
for i in range(1, len(lst)):
yield lst[0:i], lst[i:]

def allsplits(lst):
for permuted in permute(lst):
for pair in splits(permuted):
yield pair

def listsubstrings(thestr):
format = lambda pair: (''.join(pair[0]), ''.join(pair[1]))
return [format(list(pair)) for pair in allsplits(list(thestr))]
print listsubstrings("abcd")












Jul 12 '06 #2
Quoting Bruno Desthuilliers <bd*****************@free.quelquepart.fr>:
gi****@it.usyd.edu.au a écrit :
Hi,
I want to generate all non-empty substrings of a string of length
=2.
Also,
each substring is to be paired with 'string - substring' part and vice
versa.
Thus, ['abc'] gives me [['a', 'bc'], ['bc', 'a'], ['ab', 'c'], ['c',
'ab'], ['b', 'ac'], ['ac', 'b']] etc.
Similarly, 'abcd' should give me [['a', 'bcd'], ['bcd', 'a'], ['abc',
'd'], ['d', 'abc'], ['b', 'acd'], ['acd', 'b'],['c', 'abd'], ['abd',
'c'],
['ab', 'cd'], ['cd', 'ab'], ['bc', 'ad'], ['ad', 'bc'], ['ac',
'bd'],['bd','ac']]

I've tried the following but i cant prevent duplicates and i'm
missing
some substrings:

>>>colocn = 'abcd'
k = 4
for i in range(k-1):
for j in range(1,k):
rule1 = [colocn[i:i+j],colocn[:i]+colocn[i+j:]]
rule2 = [colocn[:i]+colocn[i+j:],colocn[i:i+j]]
rules.append(rule1)
rules.append(rule2)
>>>rules
[['a', 'bcd'], ['bcd', 'a'], ['ab', 'cd'], ['cd', 'ab'], ['abc', 'd'],
['d', 'abc'], ['b', 'acd'], ['acd', 'b'], ['bc', 'ad'], ['ad', 'bc'],
['bcd', 'a'], ['a', 'bcd'], ['c', 'abd'], ['abd', 'c'], ['cd', 'ab'],
['ab', 'cd'], ['cd', 'ab'], ['ab', 'cd']]
Any ideas??

Algorithmic problem.

First, you need to get all permutations. This is a known algorithm, that
you'll find examples of on the net. Then for each permutation, list
possibles 'pair-splits'.

Here's a (probably sub-optimal, but it's getting late here) possible
implementation in functional style:

def rotate(lst):
yield lst
max = len(lst)
for i in range(1, max):
yield lst[i:] + lst[:-(max-i)]

def permute(lst):
if len(lst) 2:
for rotated in rotate(lst):
head, tail = rotated[0], rotated[1:]
for permuted in permute(tail):
yield [head] + permuted
elif len(lst) == 2:
yield lst
yield lst[::-1]
else:
yield lst

def splits(lst):
for i in range(1, len(lst)):
yield lst[0:i], lst[i:]

def allsplits(lst):
for permuted in permute(lst):
for pair in splits(permuted):
yield pair

def listsubstrings(thestr):
format = lambda pair: (''.join(pair[0]), ''.join(pair[1]))
return [format(list(pair)) for pair in allsplits(list(thestr))]
print listsubstrings("abcd")
Thanks Bruno. I wanted to avoid permutations as it would take more time,
but i guess will have to deal with them now :((
Also this one gives each rule twice...i've to search for some double
counting...
>













--
http://mail.python.org/mailman/listinfo/python-list

----------------------------------------------------------------
This message was sent using IMP, the Internet Messaging Program.
Jul 12 '06 #3
Quoting Bruno Desthuilliers <bd*****************@free.quelquepart.fr>:
gi****@it.usyd.edu.au a écrit :
Hi,
I want to generate all non-empty substrings of a string of length
=2.
Also,
each substring is to be paired with 'string - substring' part and vice
versa.
Thus, ['abc'] gives me [['a', 'bc'], ['bc', 'a'], ['ab', 'c'], ['c',
'ab'], ['b', 'ac'], ['ac', 'b']] etc.
Similarly, 'abcd' should give me [['a', 'bcd'], ['bcd', 'a'], ['abc',
'd'], ['d', 'abc'], ['b', 'acd'], ['acd', 'b'],['c', 'abd'], ['abd',
'c'],
['ab', 'cd'], ['cd', 'ab'], ['bc', 'ad'], ['ad', 'bc'], ['ac',
'bd'],['bd','ac']]

I've tried the following but i cant prevent duplicates and i'm
missing
some substrings:

>>>colocn = 'abcd'
k = 4
for i in range(k-1):
for j in range(1,k):
rule1 = [colocn[i:i+j],colocn[:i]+colocn[i+j:]]
rule2 = [colocn[:i]+colocn[i+j:],colocn[i:i+j]]
rules.append(rule1)
rules.append(rule2)
>>>rules
[['a', 'bcd'], ['bcd', 'a'], ['ab', 'cd'], ['cd', 'ab'], ['abc', 'd'],
['d', 'abc'], ['b', 'acd'], ['acd', 'b'], ['bc', 'ad'], ['ad', 'bc'],
['bcd', 'a'], ['a', 'bcd'], ['c', 'abd'], ['abd', 'c'], ['cd', 'ab'],
['ab', 'cd'], ['cd', 'ab'], ['ab', 'cd']]
Any ideas??

Algorithmic problem.

First, you need to get all permutations. This is a known algorithm, that
you'll find examples of on the net. Then for each permutation, list
possibles 'pair-splits'.

Here's a (probably sub-optimal, but it's getting late here) possible
implementation in functional style:

def rotate(lst):
yield lst
max = len(lst)
for i in range(1, max):
yield lst[i:] + lst[:-(max-i)]

def permute(lst):
if len(lst) 2:
for rotated in rotate(lst):
head, tail = rotated[0], rotated[1:]
for permuted in permute(tail):
yield [head] + permuted
elif len(lst) == 2:
yield lst
yield lst[::-1]
else:
yield lst

def splits(lst):
for i in range(1, len(lst)):
yield lst[0:i], lst[i:]

def allsplits(lst):
for permuted in permute(lst):
for pair in splits(permuted):
yield pair

def listsubstrings(thestr):
format = lambda pair: (''.join(pair[0]), ''.join(pair[1]))
return [format(list(pair)) for pair in allsplits(list(thestr))]
print listsubstrings("abcd")
thanks Bruno....wanted to avoid permute function but i guess i've no no
option :((...
also there is some double counting in this one (all rules outputted
twice)...i've to find out where...
>












--
http://mail.python.org/mailman/listinfo/python-list

----------------------------------------------------------------
This message was sent using IMP, the Internet Messaging Program.
Jul 12 '06 #4
Quoting gi****@it.usyd.edu.au:
Quoting Bruno Desthuilliers <bd*****************@free.quelquepart.fr>:
gi****@it.usyd.edu.au a écrit :
Hi,
I want to generate all non-empty substrings of a string of length
>=2.
Also,
each substring is to be paired with 'string - substring' part and
vice
versa.
Thus, ['abc'] gives me [['a', 'bc'], ['bc', 'a'], ['ab', 'c'],
['c',
'ab'], ['b', 'ac'], ['ac', 'b']] etc.
Similarly, 'abcd' should give me [['a', 'bcd'], ['bcd', 'a'],
['abc',
'd'], ['d', 'abc'], ['b', 'acd'], ['acd', 'b'],['c', 'abd'], ['abd',
'c'],
['ab', 'cd'], ['cd', 'ab'], ['bc', 'ad'], ['ad', 'bc'], ['ac',
'bd'],['bd','ac']]
>
I've tried the following but i cant prevent duplicates and i'm
missing
some substrings:
>
>
>>>>colocn = 'abcd'
>>>>k = 4
>>>>for i in range(k-1):
>
for j in range(1,k):
rule1 = [colocn[i:i+j],colocn[:i]+colocn[i+j:]]
rule2 = [colocn[:i]+colocn[i+j:],colocn[i:i+j]]
rules.append(rule1)
rules.append(rule2)
>
>>>>rules
>
[['a', 'bcd'], ['bcd', 'a'], ['ab', 'cd'], ['cd', 'ab'], ['abc',
'd'],
['d', 'abc'], ['b', 'acd'], ['acd', 'b'], ['bc', 'ad'], ['ad',
'bc'],
['bcd', 'a'], ['a', 'bcd'], ['c', 'abd'], ['abd', 'c'], ['cd',
'ab'],
['ab', 'cd'], ['cd', 'ab'], ['ab', 'cd']]
>
>
Any ideas??
Algorithmic problem.

First, you need to get all permutations. This is a known algorithm,
that
you'll find examples of on the net. Then for each permutation, list
possibles 'pair-splits'.

Here's a (probably sub-optimal, but it's getting late here) possible
implementation in functional style:

def rotate(lst):
yield lst
max = len(lst)
for i in range(1, max):
yield lst[i:] + lst[:-(max-i)]

def permute(lst):
if len(lst) 2:
for rotated in rotate(lst):
head, tail = rotated[0], rotated[1:]
for permuted in permute(tail):
yield [head] + permuted
elif len(lst) == 2:
yield lst
yield lst[::-1]
else:
yield lst

def splits(lst):
for i in range(1, len(lst)):
yield lst[0:i], lst[i:]

def allsplits(lst):
for permuted in permute(lst):
for pair in splits(permuted):
yield pair

def listsubstrings(thestr):
format = lambda pair: (''.join(pair[0]), ''.join(pair[1]))
return [format(list(pair)) for pair in allsplits(list(thestr))]
print listsubstrings("abcd")

thanks Bruno....wanted to avoid permute function but i guess i've no no
option :((...
also there is some double counting in this one (all rules outputted
twice)...i've to find out where...
A Recursive Attempt:
def gen(s):
sList = [s[:i]+s[i+1:] for i in range(len(s))]
substringList = sList
s = sList
while len(s[0]) != 1:
substrings = []
for string in s:
sList = [string[:i]+string[i+1:] for i in range(len(string))]
substrings.extend(sList)
s = set(substrings)
s = list(s)
substringList.extend(s)
print substringList
return substringList













--
http://mail.python.org/mailman/listinfo/python-list


----------------------------------------------------------------
This message was sent using IMP, the Internet Messaging Program.
--
http://mail.python.org/mailman/listinfo/python-list

----------------------------------------------------------------
This message was sent using IMP, the Internet Messaging Program.
Jul 12 '06 #5

gi****@it.usyd.edu.au wrote:
Hi,
I want to generate all non-empty substrings of a string of length >=2.
Also,
each substring is to be paired with 'string - substring' part and vice
versa.
Thus, ['abc'] gives me [['a', 'bc'], ['bc', 'a'], ['ab', 'c'], ['c',
'ab'], ['b', 'ac'], ['ac', 'b']] etc.
Similarly, 'abcd' should give me [['a', 'bcd'], ['bcd', 'a'], ['abc',
'd'], ['d', 'abc'], ['b', 'acd'], ['acd', 'b'],['c', 'abd'], ['abd', 'c'],
['ab', 'cd'], ['cd', 'ab'], ['bc', 'ad'], ['ad', 'bc'], ['ac',
'bd'],['bd','ac']]
In your last example you have ['ac','bd'], but neither 'ac' nor 'bd' is
a _substring_ of 'abcd'.

If you want to compute all possible (non-empty) sub-groups of a group
(a group of characters, in your case), that's really quite a common
algorthmic problem and you should be able to Google for a solution.

Once you have all possible subgroups, just make your (weird) pairs,
remove doubles (either by using a set or by sorting and removing
identical neighboring objects), and you're done.

If you're looking for a more efficient solution, specialized for your
specific problem, you'll have to explain more precisely what you're
trying to do, as well as why existing solutions aren't good enough.

- Tal

Jul 12 '06 #6
* gi****@it.usyd.edu.au (2006-07-11 10:20 +0000)
I want to generate all non-empty substrings of a string of length >=2.
Also,
each substring is to be paired with 'string - substring' part and vice
versa.
Thus, ['abc'] gives me [['a', 'bc'], ['bc', 'a'], ['ab', 'c'], ['c',
'ab'], ['b', 'ac'], ['ac', 'b']] etc.
Similarly, 'abcd' should give me [['a', 'bcd'], ['bcd', 'a'], ['abc',
'd'], ['d', 'abc'], ['b', 'acd'], ['acd', 'b'],['c', 'abd'], ['abd', 'c'],
['ab', 'cd'], ['cd', 'ab'], ['bc', 'ad'], ['ad', 'bc'], ['ac',
'bd'],['bd','ac']]
No, you don't want to generate all substrings, you want to generate
all partions of a given set with length 2:

filter(lambda x: len(x) == 2, part(['abcd']))

I've written an utility that generates all possible partions of a set;
the "pairing" as you call it, is trivial, so you can do it yourself

def part(seq):
import copy
partset = [[]]
for item in seq:
newpartset = []
for partition in partset:
for index in range(len(partition)):
newpartset.append(copy.deepcopy(partition))
newpartset[-1][index].append(item)
partition.append([item])
newpartset.append(partition)
partset = newpartset
return partset
Jul 12 '06 #7

gi****@it.usyd.edu.au wrote:
Hi,
I want to generate all non-empty substrings of a string of length >=2.
Also,
each substring is to be paired with 'string - substring' part and vice
versa.
Thus, ['abc'] gives me [['a', 'bc'], ['bc', 'a'], ['ab', 'c'], ['c',
'ab'], ['b', 'ac'], ['ac', 'b']] etc.
Similarly, 'abcd' should give me [['a', 'bcd'], ['bcd', 'a'], ['abc',
'd'], ['d', 'abc'], ['b', 'acd'], ['acd', 'b'],['c', 'abd'], ['abd', 'c'],
['ab', 'cd'], ['cd', 'ab'], ['bc', 'ad'], ['ad', 'bc'], ['ac',
'bd'],['bd','ac']]
from a previous post
(http://groups.google.com/group/comp....f5b578bb00e61b)

def nkRange(n,k):
m = n - k + 1
indexer = range(0, k)
vector = range(1, k+1)
last = range(m, n+1)
yield vector
while vector != last:
high_value = -1
high_index = -1
for i in indexer:
val = vector[i]
if val high_value and val < m + i:
high_value = val
high_index = i
for j in range(k - high_index):
vector[j+high_index] = high_value + j + 1
yield vector

def kSubsets(s, k):
for vector in nkRange(len(s),k):
yield ''.join( s[i-1] for i in vector )

print list( kSubsets('abcd',2) )

['ab', 'ac', 'ad', 'bc', 'bd', 'cd']

Jul 12 '06 #8
* Thorsten Kampe (2006-07-12 19:11 +0000)
filter(lambda x: len(x) == 2, part(['abcd']))
That's "filter(lambda x: len(x) == 2, part('abcd'))" of course...
Jul 13 '06 #9

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
by: Leandro Pardini | last post by:
Hello there, I'm trying to process a binary file and I really don't know how. The story: gPhoto2 downloads the images from my camera just fine, but small areas of 10x3 pixels are screwed up. I...
7
by: Chris Ritchey | last post by:
Hmmm I might scare people away from this one just by the title, or draw people in with a chalange :) I'm writting this program in c++, however I'm using char* instead of the string class, I am...
4
by: Justin Lebar | last post by:
Sorry about the huge post, but I think this is the amount of information necessary for someone to help me with a good answer. I'm writing a statistical analysis program in ASP.net and MSSQL7 that...
4
by: spam | last post by:
Is there a well-known algorithm for replacing many substrings in a string? For example, I'd like to take the string "abc def ghi jkl mno pqr" and replace, say, every instance of "abc", "ghi", and...
3
by: Will McGugan | last post by:
Hi, Is there a simple way of replacing a large number of substrings in a string? I was hoping that str.replace could take a dictionary and use it to replace the occurrences of the keys with the...
9
by: C3 | last post by:
I have to process some data in C that is given to me as a char * array. I have a fairly large number of substrings (well, they're not actually printable, but let's treat them as strings) that I...
4
by: Robert Dodier | last post by:
Hello all, I'm trying to find substrings that look like 'FOO blah blah blah' in a string. For example give 'blah FOO blah1a blah1b FOO blah2 FOO blah3a blah3b blah3b' I want to get three...
3
by: Girish Sahani | last post by:
Given a length k string,i want to search for 2 substrings (overlap possible) in a list consisting of length k-1 strings. These 2 substrings when 'united' give the original string. e.g given...
2
by: Pilcrow | last post by:
This problem was raised in comp.lang.perl.misc, and the poster was concerned, among other things, by the speed of execution. Since C is faster than perl, I wonder how a C coder would solve it? ...
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
0
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and...
0
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The...
0
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated ...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.