By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
 428,829 Members | 1,798 Online + Ask a Question
Need help? Post your question and get tips & solutions from a community of 428,829 IT Pros & Developers. It's quick & easy.

# Sequences in list

 P: n/a Hi All, I wonder could someone help me with this? What I want to do is search through a list of letters and look for adjacent groups of letters that form sequences, not in the usual way of matching say abc to another appearance later on in the list but to look for transposed patterns. The groups of letters can be made up of between 2 to 4 letters. It is not know beforehand how the groups are formed, so I can't say here is a group, look for a sequence formed from this. There must be at least three appearances of the groups for the sequence to be pass. More than one sequence may show in a list, i.e. first abc, bcd and cde followed by bd, df, fa, ac at some later point. Ideally I would like to know the start and end position of the sequence/sequences. I'm looking for patterns such as these: ['a', 'b', 'c', 'b', 'c', 'd', 'c', 'd', 'e'] ['a', 'c', 'f', 'c', 'e', 'h', 'e', 'g', 'c'] ['c', 'a', 'a', 'f', 'f', 'd', 'd', 'b'] But these would be wrong: ['a', 'b', 'c', 'b', 'c', 'd', 'c', 'd', 'e', 'f', 'a', 'd'] - This one fails because of the 'jump' of the last group of three letters in relation to the others. To pass the last group would need to have been def. However, the previous three groups are ok and would be passed as a sequence. ['a', 'c', 'f', 'a', 'f', 'e', 'e', 'g', 'c'] - Here, although the first and last groups have a similar sequence this would fail because of the intervening group (afe) not following the pattern. Thanks for any help, Malcolm Jul 19 '05 #1
Share this Question
4 Replies

 P: n/a te**@mclift.fsnet.co.uk wrote: Hi All, I wonder could someone help me with this? What I want to do is search through a list of letters and look for adjacent groups of letters that form sequences, not in the usual way of matching say abc to another appearance later on in the list but to look for transposed patterns. The groups of letters can be made up of between 2 to 4 letters. It is not know beforehand how the groups are formed, so I can't say here is a group, look for a sequence formed from this. There must be at least three appearances of the groups for the sequence to be pass. More than one sequence may show in a list, i.e. first abc, bcd and cde followed by bd, df, fa, ac at some later point. Ideally I would like to know the start and end position of the sequence/sequences. I'm looking for patterns such as these: ['a', 'b', 'c', 'b', 'c', 'd', 'c', 'd', 'e'] ['a', 'c', 'f', 'c', 'e', 'h', 'e', 'g', 'c'] ['c', 'a', 'a', 'f', 'f', 'd', 'd', 'b'] But these would be wrong: ['a', 'b', 'c', 'b', 'c', 'd', 'c', 'd', 'e', 'f', 'a', 'd'] - This one fails because of the 'jump' of the last group of three letters in relation to the others. To pass the last group would need to have been def. However, the previous three groups are ok and would be passed as a sequence. ['a', 'c', 'f', 'a', 'f', 'e', 'e', 'g', 'c'] - Here, although the first and last groups have a similar sequence this would fail because of the intervening group (afe) not following the pattern. Thanks for any help, Malcolm I'm not sure I understand exactly what you want to do, but some of the following may help # Good sequences s1 = ['a', 'b', 'c', 'b', 'c', 'd', 'c', 'd', 'e'] s2 = ['a', 'c', 'f', 'c', 'e', 'h', 'e', 'g', 'c'] s3 = ['c', 'a', 'a', 'f', 'f', 'd', 'd', 'b'] # Bad sequences b1 = ['a', 'b', 'c', 'b', 'c', 'd', 'c', 'd', 'e', 'f', 'a', 'd'] b2 = ['a', 'c', 'f', 'a', 'f', 'e', 'e', 'g', 'c'] def make_groups(sequence, n): """return groups of length n from sequence Usage: list(make_groups(s1,3)) [['a', 'b', 'c'], ['b', 'c', 'd'], ['c', 'd', 'e']] """ length = len(sequence) if length % n: raise ValueError, "Sequence length %s is not multiple of %s"\ % (length, n) for i in xrange(0, length, n): yield sequence[i:i+n] def make_diffs(groups): """return groups in canonical form where the first note in each sequence is 0, and each subsequent note is expressed as a positive interval from the first. Example: list(make_diffs([['a', 'b', 'c'], ['b', 'c', 'd'], ['c', 'd', 'e']])) [(0, 1, 2), (0, 1, 2), (0, 1, 2)] """ for group in groups: base, basechr = , ord(group) for char in group[1:]: diff = ord(char) - basechr # although not specified it seems that you want # to compare diffs mod 7 i.e. -2 == +5 base.append(diff % 7) yield tuple(base) def compare_forms(sequence): """test whether all groups of a sequence have the same form""" for length in range(4,2,-1): # look for longest match first forms = {} try: for group in make_diffs(make_groups(sequence, length)): forms[group] = forms.get(group,0)+1 print forms except ValueError: # raised if the sequence is not a multiple of length pass compare_forms(s1) {(0, 1, 2): 3} # found three instances of (0,1,2) compare_forms(s2) {(0, 2, 5): 3} # found three instances compare_forms(s3) {(0, 5, 5, 3): 2} # found two instances compare_forms(b1) {(0, 1, 3, 6): 1, (0, 1, 2, 1): 1, (0, 1, 0, 1): 1} # found 3 4-tuples {(0, 1, 2): 3, (0, 2, 5): 1} # found 3 instances of (0,1,2) and 1 other compare_forms(b2) {(0, 5, 4): 1, (0, 2, 5): 2} HTH Michael Jul 19 '05 #2

 P: n/a Michael Spencer wrote: def compare_forms(sequence): """test whether all groups of a sequence have the same form""" for length in range(4,2,-1): # look for longest match first oops, make that range(4,1,-1) or, simply (4,3,2) Michael Jul 19 '05 #3

 P: n/a Iâve had chance to look at your code. I had an idea that I the answer to my problem some how involved breaking down the list into groups and that the notes would be easier to work with as numbers. However, I think your âmake_diffsâ function is really a very cunning way to go about things ï I would have never have thought of that! As to the latest update, that is very welcome and will allow me to analyse the results given in more detail. In answer to your questions â Are your search sequences really limited to 4 notes? â No, just used that as an example How long are the pieces you are searching? â Anything from the length of a phrase to perhaps 32 bars, though maybe longer at some later date. Do you not need to care about note lengths, only pitches. â While Iâve been trying to figure out a way to do this Iâve just concentrated on pitches, but it would be great to be able to look at the note lengths as well. Are you really (as I inferred from your examples) looking only at a 7-tone scale â Yes, for the time being Your code will identify sequences in a list, but how to index them? I have an idea, which seems ridiculously long-winded, but should work. First, put the groups from the âmake_diffsâ function into alist and do a search for the sequence identified from the âcompare_formsâ function using the âKnuth-Morris-Prattâ function in the python cookbook. Then the positions in that list will obviously be the same as those in the original. Next is the problem of the interval between the individual groups of the sequence. Using something along the lines of your âmake_diffsâ it should be a simple matter to see by what interval the groups in the sequence are transposed. Hmmâ¦ those last two paragraphs come across as the ranting of a mad man. Iâm off home to see if something like that worksâ¦ All the best and thanks again for the help, Malcolm Jul 19 '05 #4

 P: n/a te**@mclift.fsnet.co.uk wrote: Your code will identify sequences in a list, but how to index them? I have an idea, which seems ridiculously long-winded, but should work. First, put the groups from the âmake_diffsâ function into a list and do a search for the sequence identified from the âcompare_formsâ function using the âKnuth-Morris-Prattâ function in the python cookbook. Then the positions in that list will obviously be the same as those in the original. Next is the problem of the interval between the individual groups of the sequence. Using something along the lines of your âmake_diffsâ it should be a simple matter to see by what interval the groups in the sequence are transposed. Hold on! Adding the index position and transposition offset is a small change to the existing code: # Good sequences s1 = ['a', 'b', 'c', 'b', 'c', 'd', 'c', 'd', 'e'] s2 = ['a', 'c', 'f', 'c', 'e', 'h', 'e', 'g', 'c'] s3 = ['c', 'a', 'a', 'f', 'f', 'd', 'd', 'b'] # Bad sequences b1 = ['a', 'b', 'c', 'b', 'c', 'd', 'c', 'd', 'e', 'f', 'a', 'd'] b2 = ['a', 'c', 'f', 'a', 'f', 'e', 'e', 'g', 'c'] def make_groups(sequence, n, anywhere = False): """yields tuples of sub-sequences of length n, and their position from sequence. bool: anywhere False: look only at positions n*i for i = 1,2,3... True: return every (overlapping sequence) of length n Usage: list(make_groups(s1,3)) [(['a', 'b', 'c'], (0, 3)), (['b', 'c', 'd'], (3, 6)), (['c', 'd', 'e'], (6, 9))] """ length = len(sequence) if (length % n) and not anywhere: raise ValueError, "Sequence length %s is not multiple of %s"\ % (length, n) for i in xrange(0, length-n+1, anywhere or n): yield sequence[i:i+n], (i, i+n) def ord_note(note, note_base = "a"): """Return the ordinal value of a note, or the interval between two notes""" return (ord(note)-ord(note_base)) % 7 def make_diffs(groups): """return groups in canonical form where the first note in each sequence is 0, and each subsequent note is expressed as a positive interval from the first. Example: list(make_diffs(make_groups(s1,3))) [((0, 1, 2), (0, 3)), ((0, 1, 2), (3, 6)), ((0, 1, 2), (6, 9))] """ for group, index in groups: seq, base_note = , group for note in group[1:]: interval = ord_note(note, base_note) seq.append(interval) yield tuple(seq), index, base_note def group_forms(sequence, anywhere = False, max_length = 4): """group sequences of similar form yields {form_tuple: [index tuples where form occurs]} starting with longer forms """ for length in range(max_length, 1, -1): forms = {} try: for group, index, base_note in make_diffs(make_groups(sequence, length, anywhere)): forms.setdefault(group,[]).append((index,base_note)) yield forms except ValueError: # raised if the sequence is not a multiple of length pass def search(sequence, max_length = 4, min_repeats = 3): """print all sequences that occur least min_repeats times""" found = 0 print "Finding forms length 2-%s, occuring at minimum %s times" \ % (max_length, min_repeats) for form_dict in group_forms(sequence, anywhere = True): for form, data in form_dict.iteritems(): if len(data) >= min_repeats: print "%s:%s" % (form, ["%s:%s" % (index, ord_note(offset)) for index, offset in data]) found +=1 print "Found: %s" % found search(s1) Finding forms length 2-4, occuring at minimum 3 times (0, 1, 2):['(0, 3):0', '(3, 6):1', '(6, 9):2'] (0, 1):['(0, 2):0', '(1, 3):1', '(3, 5):1', '(4, 6):2', '(6, 8):2', '(7, 9):3'] Found: 2 Given the increasingly complex tuples between the different stages it would probably be worth defining a class to represent a form_match object Michael Jul 19 '05 #5

### This discussion thread is closed

Replies have been disabled for this discussion. 