By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
428,829 Members | 1,798 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 428,829 IT Pros & Developers. It's quick & easy.

Sequences in list

P: n/a
Hi All,

I wonder could someone help me with this?

What I want to do is search through a list of letters and look for
adjacent groups of letters that form sequences, not in the usual way of
matching say abc to another appearance later on in the list but to look
for transposed patterns.

The groups of letters can be made up of between 2 to 4 letters.
It is not know beforehand how the groups are formed, so I can't say
here is a group, look for a sequence formed from this.
There must be at least three appearances of the groups for the sequence
to be pass.
More than one sequence may show in a list, i.e. first abc, bcd and cde
followed by bd, df, fa, ac at some later point.
Ideally I would like to know the start and end position of the
sequence/sequences.

I'm looking for patterns such as these:

['a', 'b', 'c', 'b', 'c', 'd', 'c', 'd', 'e']
['a', 'c', 'f', 'c', 'e', 'h', 'e', 'g', 'c']
['c', 'a', 'a', 'f', 'f', 'd', 'd', 'b']

But these would be wrong:

['a', 'b', 'c', 'b', 'c', 'd', 'c', 'd', 'e', 'f', 'a', 'd'] - This one
fails because of the 'jump' of the last group of three letters in
relation to the others. To pass the last group would need to have been
def. However, the previous three groups are ok and would be passed as a
sequence.

['a', 'c', 'f', 'a', 'f', 'e', 'e', 'g', 'c'] - Here, although the
first and last groups have a similar sequence this would fail because
of the intervening group (afe) not following the pattern.

Thanks for any help,

Malcolm

Jul 19 '05 #1
Share this Question
Share on Google+
4 Replies


P: n/a
te**@mclift.fsnet.co.uk wrote:
Hi All,

I wonder could someone help me with this?

What I want to do is search through a list of letters and look for
adjacent groups of letters that form sequences, not in the usual way of
matching say abc to another appearance later on in the list but to look
for transposed patterns.

The groups of letters can be made up of between 2 to 4 letters.
It is not know beforehand how the groups are formed, so I can't say
here is a group, look for a sequence formed from this.
There must be at least three appearances of the groups for the sequence
to be pass.
More than one sequence may show in a list, i.e. first abc, bcd and cde
followed by bd, df, fa, ac at some later point.
Ideally I would like to know the start and end position of the
sequence/sequences.

I'm looking for patterns such as these:

['a', 'b', 'c', 'b', 'c', 'd', 'c', 'd', 'e']
['a', 'c', 'f', 'c', 'e', 'h', 'e', 'g', 'c']
['c', 'a', 'a', 'f', 'f', 'd', 'd', 'b']

But these would be wrong:

['a', 'b', 'c', 'b', 'c', 'd', 'c', 'd', 'e', 'f', 'a', 'd'] - This one
fails because of the 'jump' of the last group of three letters in
relation to the others. To pass the last group would need to have been
def. However, the previous three groups are ok and would be passed as a
sequence.

['a', 'c', 'f', 'a', 'f', 'e', 'e', 'g', 'c'] - Here, although the
first and last groups have a similar sequence this would fail because
of the intervening group (afe) not following the pattern.

Thanks for any help,

Malcolm

I'm not sure I understand exactly what you want to do, but some of the following
may help

# Good sequences
s1 = ['a', 'b', 'c', 'b', 'c', 'd', 'c', 'd', 'e']
s2 = ['a', 'c', 'f', 'c', 'e', 'h', 'e', 'g', 'c']
s3 = ['c', 'a', 'a', 'f', 'f', 'd', 'd', 'b']

# Bad sequences
b1 = ['a', 'b', 'c', 'b', 'c', 'd', 'c', 'd', 'e', 'f', 'a', 'd']
b2 = ['a', 'c', 'f', 'a', 'f', 'e', 'e', 'g', 'c']
def make_groups(sequence, n):
"""return groups of length n from sequence
Usage:
list(make_groups(s1,3)) [['a', 'b', 'c'], ['b', 'c', 'd'], ['c', 'd', 'e']]
"""
length = len(sequence)
if length % n:
raise ValueError, "Sequence length %s is not multiple of %s"\
% (length, n)
for i in xrange(0, length, n):
yield sequence[i:i+n]

def make_diffs(groups):
"""return groups in canonical form where the first note
in each sequence is 0, and each subsequent note is expressed as
a positive interval from the first.
Example: list(make_diffs([['a', 'b', 'c'], ['b', 'c', 'd'], ['c', 'd', 'e']])) [(0, 1, 2), (0, 1, 2), (0, 1, 2)]
"""
for group in groups:
base, basechr = [0], ord(group[0])
for char in group[1:]:
diff = ord(char) - basechr
# although not specified it seems that you want
# to compare diffs mod 7 i.e. -2 == +5
base.append(diff % 7)
yield tuple(base)

def compare_forms(sequence):
"""test whether all groups of a sequence have the same form"""
for length in range(4,2,-1): # look for longest match first
forms = {}
try:
for group in make_diffs(make_groups(sequence, length)):
forms[group] = forms.get(group,0)+1
print forms
except ValueError:
# raised if the sequence is not a multiple of length
pass
compare_forms(s1) {(0, 1, 2): 3} # found three instances of (0,1,2) compare_forms(s2) {(0, 2, 5): 3} # found three instances compare_forms(s3) {(0, 5, 5, 3): 2} # found two instances compare_forms(b1) {(0, 1, 3, 6): 1, (0, 1, 2, 1): 1, (0, 1, 0, 1): 1} # found 3 4-tuples
{(0, 1, 2): 3, (0, 2, 5): 1} # found 3 instances of (0,1,2) and 1 other compare_forms(b2) {(0, 5, 4): 1, (0, 2, 5): 2}


HTH
Michael

Jul 19 '05 #2

P: n/a
Michael Spencer wrote:

def compare_forms(sequence):
"""test whether all groups of a sequence have the same form"""
for length in range(4,2,-1): # look for longest match first


oops, make that range(4,1,-1) or, simply (4,3,2)

Michael

Jul 19 '05 #3

P: n/a
I’ve had chance to look at your code. I had an idea that I the answer
to my problem some how involved breaking down the list into groups and
that the notes would be easier to work with as numbers. However, I
think your ‘make_diffs’ function is really a very cunning way to
go about things  I would have never have thought of that!

As to the latest update, that is very welcome and will allow me to
analyse the results given in more detail.

In answer to your questions –

Are your search sequences really limited to 4 notes? – No, just used
that as an example
How long are the pieces you are searching? – Anything from the length
of a phrase to perhaps 32 bars, though maybe longer at some later date.
Do you not need to care about note lengths, only pitches. – While
I’ve been trying to figure out a way to do this I’ve just
concentrated on pitches, but it would be great to be able to look at
the note lengths as well.
Are you really (as I inferred from your examples) looking only at
a 7-tone scale – Yes, for the time being

Your code will identify sequences in a list, but how to index them? I
have an idea, which seems ridiculously long-winded, but should work.
First, put the groups from the ‘make_diffs’ function into alist
and do a search for the sequence identified from the
‘compare_forms’ function using the ‘Knuth-Morris-Pratt’
function in the python cookbook. Then the positions in that list will
obviously be the same as those in the original.

Next is the problem of the interval between the individual groups of
the sequence. Using something along the lines of your ‘make_diffs’
it should be a simple matter to see by what interval the groups in the
sequence are transposed.

Hmm… those last two paragraphs come across as the ranting of a mad
man. I’m off home to see if something like that works…

All the best and thanks again for the help,

Malcolm

Jul 19 '05 #4

P: n/a
te**@mclift.fsnet.co.uk wrote:
Your code will identify sequences in a list, but how to index them? I
have an idea, which seems ridiculously long-winded, but should work.
First, put the groups from the ‘make_diffs’ function into a list
and do a search for the sequence identified from the
‘compare_forms’ function using the ‘Knuth-Morris-Pratt’
function in the python cookbook. Then the positions in that list will
obviously be the same as those in the original.

Next is the problem of the interval between the individual groups of
the sequence. Using something along the lines of your ‘make_diffs’
it should be a simple matter to see by what interval the groups in the
sequence are transposed.


Hold on! Adding the index position and transposition offset is a small change to
the existing code:

# Good sequences
s1 = ['a', 'b', 'c', 'b', 'c', 'd', 'c', 'd', 'e']
s2 = ['a', 'c', 'f', 'c', 'e', 'h', 'e', 'g', 'c']
s3 = ['c', 'a', 'a', 'f', 'f', 'd', 'd', 'b']

# Bad sequences
b1 = ['a', 'b', 'c', 'b', 'c', 'd', 'c', 'd', 'e', 'f', 'a', 'd']
b2 = ['a', 'c', 'f', 'a', 'f', 'e', 'e', 'g', 'c']
def make_groups(sequence, n, anywhere = False):
"""yields tuples of sub-sequences of length n, and their position
from sequence.

bool: anywhere
False: look only at positions n*i for i = 1,2,3...
True: return every (overlapping sequence) of length n

Usage:
list(make_groups(s1,3)) [(['a', 'b', 'c'], (0, 3)), (['b', 'c', 'd'], (3, 6)), (['c', 'd', 'e'],
(6, 9))]
"""
length = len(sequence)
if (length % n) and not anywhere:
raise ValueError, "Sequence length %s is not multiple of %s"\
% (length, n)
for i in xrange(0, length-n+1, anywhere or n):
yield sequence[i:i+n], (i, i+n)

def ord_note(note, note_base = "a"):
"""Return the ordinal value of a note, or the interval between two notes"""
return (ord(note)-ord(note_base)) % 7

def make_diffs(groups):
"""return groups in canonical form where the first note
in each sequence is 0, and each subsequent note is expressed as
a positive interval from the first.
Example: list(make_diffs(make_groups(s1,3))) [((0, 1, 2), (0, 3)), ((0, 1, 2), (3, 6)), ((0, 1, 2), (6, 9))]
"""
for group, index in groups:
seq, base_note = [0], group[0]
for note in group[1:]:
interval = ord_note(note, base_note)
seq.append(interval)
yield tuple(seq), index, base_note

def group_forms(sequence, anywhere = False, max_length = 4):
"""group sequences of similar form

yields {form_tuple: [index tuples where form occurs]}
starting with longer forms
"""

for length in range(max_length, 1, -1):
forms = {}
try:
for group, index, base_note in make_diffs(make_groups(sequence,
length, anywhere)):
forms.setdefault(group,[]).append((index,base_note))
yield forms
except ValueError:
# raised if the sequence is not a multiple of length
pass

def search(sequence, max_length = 4, min_repeats = 3):
"""print all sequences that occur least min_repeats times"""
found = 0
print "Finding forms length 2-%s, occuring at minimum %s times" \
% (max_length, min_repeats)
for form_dict in group_forms(sequence, anywhere = True):
for form, data in form_dict.iteritems():
if len(data) >= min_repeats:
print "%s:%s" % (form, ["%s:%s" % (index, ord_note(offset))
for index, offset in data])
found +=1
print "Found: %s" % found
search(s1)

Finding forms length 2-4, occuring at minimum 3 times
(0, 1, 2):['(0, 3):0', '(3, 6):1', '(6, 9):2']
(0, 1):['(0, 2):0', '(1, 3):1', '(3, 5):1', '(4, 6):2', '(6, 8):2', '(7, 9):3']
Found: 2
Given the increasingly complex tuples between the different stages it would
probably be worth defining a class to represent a form_match object

Michael

Jul 19 '05 #5

This discussion thread is closed

Replies have been disabled for this discussion.