469,934 Members | 1,509 Online

Sequences in list

Hi All,

I wonder could someone help me with this?

What I want to do is search through a list of letters and look for
adjacent groups of letters that form sequences, not in the usual way of
matching say abc to another appearance later on in the list but to look
for transposed patterns.

The groups of letters can be made up of between 2 to 4 letters.
It is not know beforehand how the groups are formed, so I can't say
here is a group, look for a sequence formed from this.
There must be at least three appearances of the groups for the sequence
to be pass.
More than one sequence may show in a list, i.e. first abc, bcd and cde
followed by bd, df, fa, ac at some later point.
Ideally I would like to know the start and end position of the
sequence/sequences.

I'm looking for patterns such as these:

['a', 'b', 'c', 'b', 'c', 'd', 'c', 'd', 'e']
['a', 'c', 'f', 'c', 'e', 'h', 'e', 'g', 'c']
['c', 'a', 'a', 'f', 'f', 'd', 'd', 'b']

But these would be wrong:

['a', 'b', 'c', 'b', 'c', 'd', 'c', 'd', 'e', 'f', 'a', 'd'] - This one
fails because of the 'jump' of the last group of three letters in
relation to the others. To pass the last group would need to have been
def. However, the previous three groups are ok and would be passed as a
sequence.

['a', 'c', 'f', 'a', 'f', 'e', 'e', 'g', 'c'] - Here, although the
first and last groups have a similar sequence this would fail because
of the intervening group (afe) not following the pattern.

Thanks for any help,

Malcolm

Jul 19 '05 #1
4 2136 te**@mclift.fsnet.co.uk wrote:
Hi All,

I wonder could someone help me with this?

What I want to do is search through a list of letters and look for
adjacent groups of letters that form sequences, not in the usual way of
matching say abc to another appearance later on in the list but to look
for transposed patterns.

The groups of letters can be made up of between 2 to 4 letters.
It is not know beforehand how the groups are formed, so I can't say
here is a group, look for a sequence formed from this.
There must be at least three appearances of the groups for the sequence
to be pass.
More than one sequence may show in a list, i.e. first abc, bcd and cde
followed by bd, df, fa, ac at some later point.
Ideally I would like to know the start and end position of the
sequence/sequences.

I'm looking for patterns such as these:

['a', 'b', 'c', 'b', 'c', 'd', 'c', 'd', 'e']
['a', 'c', 'f', 'c', 'e', 'h', 'e', 'g', 'c']
['c', 'a', 'a', 'f', 'f', 'd', 'd', 'b']

But these would be wrong:

['a', 'b', 'c', 'b', 'c', 'd', 'c', 'd', 'e', 'f', 'a', 'd'] - This one
fails because of the 'jump' of the last group of three letters in
relation to the others. To pass the last group would need to have been
def. However, the previous three groups are ok and would be passed as a
sequence.

['a', 'c', 'f', 'a', 'f', 'e', 'e', 'g', 'c'] - Here, although the
first and last groups have a similar sequence this would fail because
of the intervening group (afe) not following the pattern.

Thanks for any help,

Malcolm

I'm not sure I understand exactly what you want to do, but some of the following
may help

# Good sequences
s1 = ['a', 'b', 'c', 'b', 'c', 'd', 'c', 'd', 'e']
s2 = ['a', 'c', 'f', 'c', 'e', 'h', 'e', 'g', 'c']
s3 = ['c', 'a', 'a', 'f', 'f', 'd', 'd', 'b']

b1 = ['a', 'b', 'c', 'b', 'c', 'd', 'c', 'd', 'e', 'f', 'a', 'd']
b2 = ['a', 'c', 'f', 'a', 'f', 'e', 'e', 'g', 'c']
def make_groups(sequence, n):
"""return groups of length n from sequence
Usage:
list(make_groups(s1,3)) [['a', 'b', 'c'], ['b', 'c', 'd'], ['c', 'd', 'e']]
"""
length = len(sequence)
if length % n:
raise ValueError, "Sequence length %s is not multiple of %s"\
% (length, n)
for i in xrange(0, length, n):
yield sequence[i:i+n]

def make_diffs(groups):
"""return groups in canonical form where the first note
in each sequence is 0, and each subsequent note is expressed as
a positive interval from the first.
Example: list(make_diffs([['a', 'b', 'c'], ['b', 'c', 'd'], ['c', 'd', 'e']])) [(0, 1, 2), (0, 1, 2), (0, 1, 2)]
"""
for group in groups:
base, basechr = , ord(group)
for char in group[1:]:
diff = ord(char) - basechr
# although not specified it seems that you want
# to compare diffs mod 7 i.e. -2 == +5
base.append(diff % 7)
yield tuple(base)

def compare_forms(sequence):
"""test whether all groups of a sequence have the same form"""
for length in range(4,2,-1): # look for longest match first
forms = {}
try:
for group in make_diffs(make_groups(sequence, length)):
forms[group] = forms.get(group,0)+1
print forms
except ValueError:
# raised if the sequence is not a multiple of length
pass
compare_forms(s1) {(0, 1, 2): 3} # found three instances of (0,1,2) compare_forms(s2) {(0, 2, 5): 3} # found three instances compare_forms(s3) {(0, 5, 5, 3): 2} # found two instances compare_forms(b1) {(0, 1, 3, 6): 1, (0, 1, 2, 1): 1, (0, 1, 0, 1): 1} # found 3 4-tuples
{(0, 1, 2): 3, (0, 2, 5): 1} # found 3 instances of (0,1,2) and 1 other compare_forms(b2) {(0, 5, 4): 1, (0, 2, 5): 2}

HTH
Michael

Jul 19 '05 #2
Michael Spencer wrote:

def compare_forms(sequence):
"""test whether all groups of a sequence have the same form"""
for length in range(4,2,-1): # look for longest match first

oops, make that range(4,1,-1) or, simply (4,3,2)

Michael

Jul 19 '05 #3
to my problem some how involved breaking down the list into groups and
that the notes would be easier to work with as numbers. However, I
think your âmake_diffsâ function is really a very cunning way to
go about things ï I would have never have thought of that!

As to the latest update, that is very welcome and will allow me to
analyse the results given in more detail.

Are your search sequences really limited to 4 notes? â No, just used
that as an example
How long are the pieces you are searching? â Anything from the length
of a phrase to perhaps 32 bars, though maybe longer at some later date.
Do you not need to care about note lengths, only pitches. â While
Iâve been trying to figure out a way to do this Iâve just
concentrated on pitches, but it would be great to be able to look at
the note lengths as well.
Are you really (as I inferred from your examples) looking only at
a 7-tone scale â Yes, for the time being

Your code will identify sequences in a list, but how to index them? I
have an idea, which seems ridiculously long-winded, but should work.
First, put the groups from the âmake_diffsâ function into alist
and do a search for the sequence identified from the
âcompare_formsâ function using the âKnuth-Morris-Prattâ
function in the python cookbook. Then the positions in that list will
obviously be the same as those in the original.

Next is the problem of the interval between the individual groups of
the sequence. Using something along the lines of your âmake_diffsâ
it should be a simple matter to see by what interval the groups in the
sequence are transposed.

Hmmâ¦ those last two paragraphs come across as the ranting of a mad
man. Iâm off home to see if something like that worksâ¦

All the best and thanks again for the help,

Malcolm

Jul 19 '05 #4
te**@mclift.fsnet.co.uk wrote:
Your code will identify sequences in a list, but how to index them? I
have an idea, which seems ridiculously long-winded, but should work.
First, put the groups from the âmake_diffsâ function into a list
and do a search for the sequence identified from the
âcompare_formsâ function using the âKnuth-Morris-Prattâ
function in the python cookbook. Then the positions in that list will
obviously be the same as those in the original.

Next is the problem of the interval between the individual groups of
the sequence. Using something along the lines of your âmake_diffsâ
it should be a simple matter to see by what interval the groups in the
sequence are transposed.

Hold on! Adding the index position and transposition offset is a small change to
the existing code:

# Good sequences
s1 = ['a', 'b', 'c', 'b', 'c', 'd', 'c', 'd', 'e']
s2 = ['a', 'c', 'f', 'c', 'e', 'h', 'e', 'g', 'c']
s3 = ['c', 'a', 'a', 'f', 'f', 'd', 'd', 'b']

b1 = ['a', 'b', 'c', 'b', 'c', 'd', 'c', 'd', 'e', 'f', 'a', 'd']
b2 = ['a', 'c', 'f', 'a', 'f', 'e', 'e', 'g', 'c']
def make_groups(sequence, n, anywhere = False):
"""yields tuples of sub-sequences of length n, and their position
from sequence.

bool: anywhere
False: look only at positions n*i for i = 1,2,3...
True: return every (overlapping sequence) of length n

Usage:
list(make_groups(s1,3)) [(['a', 'b', 'c'], (0, 3)), (['b', 'c', 'd'], (3, 6)), (['c', 'd', 'e'],
(6, 9))]
"""
length = len(sequence)
if (length % n) and not anywhere:
raise ValueError, "Sequence length %s is not multiple of %s"\
% (length, n)
for i in xrange(0, length-n+1, anywhere or n):
yield sequence[i:i+n], (i, i+n)

def ord_note(note, note_base = "a"):
"""Return the ordinal value of a note, or the interval between two notes"""
return (ord(note)-ord(note_base)) % 7

def make_diffs(groups):
"""return groups in canonical form where the first note
in each sequence is 0, and each subsequent note is expressed as
a positive interval from the first.
Example: list(make_diffs(make_groups(s1,3))) [((0, 1, 2), (0, 3)), ((0, 1, 2), (3, 6)), ((0, 1, 2), (6, 9))]
"""
for group, index in groups:
seq, base_note = , group
for note in group[1:]:
interval = ord_note(note, base_note)
seq.append(interval)
yield tuple(seq), index, base_note

def group_forms(sequence, anywhere = False, max_length = 4):
"""group sequences of similar form

yields {form_tuple: [index tuples where form occurs]}
starting with longer forms
"""

for length in range(max_length, 1, -1):
forms = {}
try:
for group, index, base_note in make_diffs(make_groups(sequence,
length, anywhere)):
forms.setdefault(group,[]).append((index,base_note))
yield forms
except ValueError:
# raised if the sequence is not a multiple of length
pass

def search(sequence, max_length = 4, min_repeats = 3):
"""print all sequences that occur least min_repeats times"""
found = 0
print "Finding forms length 2-%s, occuring at minimum %s times" \
% (max_length, min_repeats)
for form_dict in group_forms(sequence, anywhere = True):
for form, data in form_dict.iteritems():
if len(data) >= min_repeats:
print "%s:%s" % (form, ["%s:%s" % (index, ord_note(offset))
for index, offset in data])
found +=1
print "Found: %s" % found
search(s1)

Finding forms length 2-4, occuring at minimum 3 times
(0, 1, 2):['(0, 3):0', '(3, 6):1', '(6, 9):2']
(0, 1):['(0, 2):0', '(1, 3):1', '(3, 5):1', '(4, 6):2', '(6, 8):2', '(7, 9):3']
Found: 2
Given the increasingly complex tuples between the different stages it would
probably be worth defining a class to represent a form_match object

Michael

Jul 19 '05 #5

 6 posts views Thread by kartik | last post: by 1 post views Thread by CSN | last post: by 4 posts views Thread by Hemant Shah | last post: by 8 posts views Thread by elein | last post: by 18 posts views Thread by Bruno Baguette | last post: by 72 posts views Thread by Gregory Petrosyan | last post: by 5 posts views Thread by Anton81 | last post: by 18 posts views Thread by psbasha | last post: by 122 posts views Thread by C.L. | last post: by reply views Thread by eddparker01 | last post: by reply views Thread by lanliddd | last post: by 1 post views Thread by isladogs | last post: by reply views Thread by Trystan | last post: by 1 post views Thread by skydivetom | last post: by reply views Thread by Romlus | last post: by reply views Thread by MikeCant | last post: by 2 posts views Thread by Usman55 | last post: by 1 post views Thread by cloudytechi147 | last post: by