472,143 Members | 1,749 Online

# Best strategy for finding a pattern in a sequence of integers

Hi all,

I am a Python novice, and I have run into a problem in a project I am
working on, which boils down to identifying the patterns in a sequence
of integers, for example

..... 1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 9 3 3 0 3 3 0 3 3 0 3 3 0 10 6 6
1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 9 3 3 0 3 3 0 3 3 0 3 3 0 10 6 6
1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 9 3 3 0 3 3 0 3 3 0 3 3 0 10 6 6
1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 9 3 3 0 3 3 0 3 3 0 3 3 0 10 6 6
1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 ...

I want to process this such that I get out two patterns, like:
(9, 3, 3, 0, 3, 3, 0, 3, 3, 0, 3, 3, 0)
and
(10, 6, 6, 1, 6, 6, 1, 6, 6, 1, 6, 6, 1, 6, 6, 1, 6, 6, 1, 6, 6, 1)

I am pretty sure I can figure out how to do that, but I would like to
have some guidance on the most pythonic approach to this.

Two paths I have considered is:
1. Convert the sequence of integers to a hex string, i.e., "...
16616616616616619330330330330A66166..." and use the re module to find
the patterns. Use the string positions to go back to the sequence
2. Put them in a list or an array and manually look for the patterns
by iterating and filtering the elements compare with sets.

I am not looking for a "solution" to this specific problem, just some
guidance

The rules for the sequence is:
1. The sequence may start in the middle of a pattern
2. There are one or two patterns, Pattern A and Pattern B in the
sequence
3. Pattern A only consists of the numbers 0, 3, and 9. 3, 3 is always
followed by 0
4. Pattern B only consists of the numbers 1, 6, and 10. 6, 6, is
always followed by 1
5. There may be other numbers interspersed within the sequence, but
they can be ignored
6. The relative position of 9 or 10 in the patterns varies from case
to case, but is consistent throughout a sequence.
7. There is always one 9 or one 10 in a pattern
7. The beginning of a pattern is marked by the transision from oner
pattern to the other.
8. If there is only one pattern in the sequence, the pattern beginning
is marked by the first occurance of either 9 or 10
9. The pattern is repetitive in the sequence,
e.g., ...ABABABAB..., ...AAA..., or ...BBB...

Thank you,
-- Slaunger

Nov 21 '08 #1
8 2595
Slaunger wrote:
Hi all,

I am a Python novice, and I have run into a problem in a project I am
working on, which boils down to identifying the patterns in a sequence
of integers, for example

.... 1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 9 3 3 0 3 3 0 3 3 0 3 3 0 10 6 6
1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 9 3 3 0 3 3 0 3 3 0 3 3 0 10 6 6
1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 9 3 3 0 3 3 0 3 3 0 3 3 0 10 6 6
1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 9 3 3 0 3 3 0 3 3 0 3 3 0 10 6 6
1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 ...

I want to process this such that I get out two patterns, like:
(9, 3, 3, 0, 3, 3, 0, 3, 3, 0, 3, 3, 0)
and
(10, 6, 6, 1, 6, 6, 1, 6, 6, 1, 6, 6, 1, 6, 6, 1, 6, 6, 1, 6, 6, 1)
Maybe:

#-----------------------------------------------------------------
data = '''
1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 9 3 3 0 3 3 0 3 3 0 3 3 0 10 6 6
1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 9 3 3 0 3 3 0 3 3 0 3 3 0 10 6 6
1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 9 3 3 0 3 3 0 3 3 0 3 3 0 10 6 6
1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 9 3 3 0 3 3 0 3 3 0 3 3 0 10 6 6
1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1'''

data = [int(x) for x in data.split()]

from itertools import groupby

S1 = [0, 3, 9]

s = set()
for k, g in groupby(data, lambda x: x in S1):
seq = tuple(g)
# maybe the next line should be 'if 9 in seq or 10 in seq'?
if seq[0] in [9, 10]:

print s
#------------------------------------------------------------------
set(
[(9, 3, 3, 0, 3, 3, 0, 3, 3, 0, 3, 3, 0),
(10, 6, 6, 1, 6, 6, 1, 6, 6, 1, 6, 6, 1, 6, 6, 1, 6, 6, 1, 6, 6, 1)])

hth

G.

Nov 21 '08 #2
Slaunger <Sl******@gmail.comwrites:
I am a Python novice, and I have run into a problem in a project I am
working on, which boils down to identifying the patterns in a sequence
of integers, for example

.... 1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 9 3 3 0 3 3 0 3 3 0 3 3 0 10 6 6
1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 9 3 3 0 3 3 0 3 3 0 3 3 0 10 6 6
1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 9 3 3 0 3 3 0 3 3 0 3 3 0 10 6 6
1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 9 3 3 0 3 3 0 3 3 0 3 3 0 10 6 6
1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 ...

I want to process this such that I get out two patterns, like:
(9, 3, 3, 0, 3, 3, 0, 3, 3, 0, 3, 3, 0)
and
(10, 6, 6, 1, 6, 6, 1, 6, 6, 1, 6, 6, 1, 6, 6, 1, 6, 6, 1, 6, 6, 1)

I am pretty sure I can figure out how to do that, but I would like to
have some guidance on the most pythonic approach to this.
Then it would be a good starting point to write some code. Then you
could post it and ask how it can be made more 'pythonic'.

HTH

--
Arnaud
Nov 21 '08 #3
On Nov 21, 9:13*am, Slaunger <Slaun...@gmail.comwrote:
Hi all,

I am a Python novice, and I have run into a problem in a project I am
working on, which boils down to identifying the patterns in a sequence
of integers, for example

.... 1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 9 3 3 0 3 3 0 3 3 0 3 3 0 10 6 6
1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 9 3 3 0 3 3 0 3 3 0 3 3 0 10 6 6
1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 9 3 3 0 3 3 0 3 3 0 3 3 0 10 6 6
1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 9 3 3 0 3 3 0 3 3 0 3 3 0 10 6 6
1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 ...

I want to process this such that I get out two patterns, like:
(9, 3, 3, 0, 3, 3, 0, 3, 3, 0, 3, 3, 0)
and
(10, 6, 6, 1, 6, 6, 1, 6, 6, 1, 6, 6, 1, 6, 6, 1, 6, 6, 1, 6, 6, 1)

I am pretty sure I can figure out how to do that, but I would like to
have some guidance on the most pythonic approach to this.

Two paths I have considered is:
1. Convert the sequence of integers to a hex string, i.e., "...
16616616616616619330330330330A66166..." and use the re module to find
the patterns. Use the string positions to go back to the sequence
2. Put them in a list or an array and manually look for the patterns
by iterating and filtering the elements compare with sets.

I am not looking for a "solution" to this specific problem, just some
guidance
Your rules appear to be incomplete and inconsistent.
>
The rules for the sequence is:
1. The sequence may start in the middle of a pattern
2. There are one or two patterns, Pattern A and Pattern B in the
sequence
3. Pattern A only consists of the numbers 0, 3, and 9. 3, 3 is always
followed by 0
But does a 3 always follow a 3? Can you have 3, 0, 3, 0?
Can 0's occur without 3's, such as 0, 0, 0?
4. Pattern B only consists of the numbers 1, 6, and 10. 6, 6, is
always followed by 1
5. There may be other numbers interspersed within the sequence, but
they can be ignored
So, I can have 3, 3, 0, 7, 3, 3, 0?

What if the 7 occurs after the pair of 3's? Is the number following
the 7 forced to be 0, i.e., is 3, 3, 7, 3, 3, 0 legal?
6. The relative position of 9 or 10 in the patterns varies from case
to case, but is consistent throughout a sequence.
7. There is always one 9 or one 10 in a pattern
7. The beginning of a pattern is marked by the transision from oner
pattern to the other.
Can there be an ignored number between the patterns? Is
9,3,3,0,7,10,6,6,1
legal? If NO, you violate Rule 5. If YES, you violate the second Rule
7.
8. If there is only one pattern in the sequence, the pattern beginning
is marked by the first occurance of either 9 or 10
9. The pattern is repetitive in the sequence,
e.g., ...ABABABAB..., ...AAA..., or ...BBB...

Thank you,
-- Slaunger
Nov 21 '08 #4
On Fri, 21 Nov 2008 18:10:02 +0100
Gerard flanagan <gr********@gmail.comwrote:
data = '''
1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 9 3 3 0 3 3 0 3 3 0 3 3 0 10 6 6
1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 9 3 3 0 3 3 0 3 3 0 3 3 0 10 6 6
1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 9 3 3 0 3 3 0 3 3 0 3 3 0 10 6 6
1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 9 3 3 0 3 3 0 3 3 0 3 3 0 10 6 6
1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1'''

data = [int(x) for x in data.split()]

from itertools import groupby
But groupby needs sorted data?

Suppose the rules do not conflict or overlap and between them divide
all the values, then maybe this would work:

class StateMachine:

def __init__(self,*rules):
self.rules = rules
self.state = len(rules) #deliberately unreachable
self.first = True

def change(self,x):
#check and/or change state
for i,rule in enumerate(self.rules):
if rule(x):
if i == self.state: #no state change
return False
else: #maybe state change
self.state = i
if self.first: #set initial state, no change
self.first = False
return False
else:
return True #state is changed
raise ValueError

def test():

data = '''
1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 9 3 3 0 3 3 0 3 3 0 3 3 0 10 6 6
1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 9 3 3 0 3 3 0 3 3 0 3 3 0 10
6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 9 3 3 0 3 3 0 3 3 0 3 3 0 10
6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 9 3 3 0 3 3 0 3 3 0 3 3 0 10
6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1'''

data = map(int, data.split())

def rule1(x):
return x in set((0, 3, 9))
def rule2(x):
return x in set((6, 1, 10))

state = StateMachine(rule1,rule2)
L = []
res = []
for x in data:
if state.change(x):
res.append(list(L))
L =[]
L.append(x)
res.append(list(L))
print res

if __name__=='__main__':
test()

Nov 21 '08 #5
>
I am pretty sure I can figure out how to do that, but I would like to
have some guidance on the most pythonic approach to this.

Then it would be a good starting point to write some code. Then you
could post it and ask how it can be made more 'pythonic'.
That is actually a good point. I will do that.
-- ~~~~
Nov 22 '08 #6
On 21 Nov., 23:36, Mensanator <mensana...@aol.comwrote:
Your rules appear to be incomplete and inconsistent.
OK. Let me try to clarify then...
3. Pattern A only consists of the numbers 0, 3, and 9. 3, 3 is always
followed by 0

But does a 3 always follow a 3? Can you have 3, 0, 3, 0?
Can 0's occur without 3's, such as 0, 0, 0?
Yes, 3s always comes in pairs. So, 3, 0, 3, 0 is not allowed.
And of the numbers 0, 3, and 9; 0 will always be the first after the
pair of 3s
>
4. Pattern B only consists of the numbers 1, 6, and 10. 6, 6, is
always followed by 1
5. There may be other numbers interspersed within the sequence, but
they can be ignored

So, I can have 3, 3, 0, 7, 3, 3, 0?
Yes, there is a point I did not mention propery in my first
description:
The number 7 for instance could appear in that position, but it would
not be repetitive;
as a matter of fact these other numbers can be filtered away before
looking for the pattern,
so let us just forgot about those.
>
What if the 7 occurs after the pair of 3's? Is the number following
the 7 forced to be 0, i.e., is 3, 3, 7, 3, 3, 0 legal?
No, it would have to be 3, 3, 0, 7, 3, 3, 0, it is sequeezed in - but
as mentioned they can be prefiltered out of the problem
>
7. The beginning of a pattern is marked by the transition from oner
pattern to the other.

Can there be an ignored number between the patterns? Is
9,3,3,0,7,10,6,6,1
legal? If NO, you violate Rule 5. If YES, you violate the second Rule
7.
Yes you are right. This complication is again eliminated by
prefiltering "other" numbers out

-- Slaunger
Nov 22 '08 #7
On 21 Nov., 18:10, Gerard flanagan <grflana...@gmail.comwrote:
Slaunger wrote:
Hi all,
I am a Python novice, and I have run into a problem in a project I am
working on, which boils down to identifying the patterns in a sequence
of integers, for example
.... 1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 9 3 3 0 3 3 0 3 3 0 3 3 0 10 6 6
1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 9 3 3 0 3 3 0 3 3 0 3 3 0 10 6 6
1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 9 3 3 0 3 3 0 3 3 0 3 3 0 10 6 6
1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 9 3 3 0 3 3 0 3 3 0 3 3 0 10 6 6
1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 ...
I want to process this such that I get out two patterns, like:
(9, 3, 3, 0, 3, 3, 0, 3, 3, 0, 3, 3, 0)
and
(10, 6, 6, 1, 6, 6, 1, 6, 6, 1, 6, 6, 1, 6, 6, 1, 6, 6, 1, 6, 6, 1)

Maybe:

#-----------------------------------------------------------------
data = '''
1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 9 3 3 0 3 3 0 3 3 0 3 3 0 10 6 6
1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 9 3 3 0 3 3 0 3 3 0 3 3 0 10 6 6
1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 9 3 3 0 3 3 0 3 3 0 3 3 0 10 6 6
1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 9 3 3 0 3 3 0 3 3 0 3 3 0 10 6 6
1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1'''

data = [int(x) for x in data.split()]

from itertools import groupby

S1 = [0, 3, 9]

s = set()
for k, g in groupby(data, lambda x: x in S1):
* * *seq = tuple(g)
* * *# maybe the next line should be 'if 9 in seq or 10 in seq'?
* * *if seq[0] in [9, 10]:

print s
#------------------------------------------------------------------
set(
[(9, 3, 3, 0, 3, 3, 0, 3, 3, 0, 3, 3, 0),
(10, 6, 6, 1, 6, 6, 1, 6, 6, 1, 6, 6, 1, 6, 6, 1, 6, 6, 1, 6, 6, 1)])

hth

G.
Hi Gerard,
This definitely looks like a path to walk along, and I think your code
does the trick, although I have to play a little around with the
groupby method, of which I had no prior knowledge. I think I will
write some unit test cases to stress test you concept (on Monday, when
I am back at work). I appreciate your almost full implementation - it
would have sufficed to point me to the itertools module, and then I
think I would have figured out.
-- ~~~~
Nov 22 '08 #8
So I think you just need to find the first two complete sequences of
1,6,10 and 0,3,9, remove any repetitions and then you're done.

data = '''
1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 9 3 3 0 3 7 3 0 3 3 0 3 3 0 10 6 6
1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 10 6 6
1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 9 3 3 0 3 3 0 3 3 0 3 3 0 10 6
6
1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 9 3 3 0 3 3 0 3 3 0 3 3 0 10 6
6
1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1'''
data = [int(x) for x in data.split()]

s1 = frozenset([1,6,10])
s2 = frozenset([0,3,9])

diter = iter(data)

i = diter.next()
curset = (s1,s2)[i in s2]
otherset = lambda : (s1,s2)[curset is s1]
seq = { s1 : [], s2 : [] }

# read until there is the first change in state - discard
# these, since we may have started in the middle of a sequence
other = otherset()
while i not in other:
i = diter.next()

for _ in range(2):
other = curset
curset = otherset()
tmp = []
while i not in other:
if i in curset:
tmp.append(i)
i = diter.next()
seq[curset] = tmp[:]

# look for repeats in a seq, truncate
def truncateReps(s,sentinel):
if s.count(sentinel) 1:
loc1 = s.index(sentinel)
loc2 = s.index(sentinel,loc1+1)
s[:] = s[:loc2-loc1]

truncateReps(seq[s1],10)
truncateReps(seq[s2],9)

print seq[s1]
print seq[s2]

Prints:
[10, 6, 6, 1, 6, 6, 1, 6, 6, 1, 6, 6, 1, 6, 6, 1, 6, 6, 1, 6, 6, 1]
[9, 3, 3, 0, 3, 3, 0, 3, 3, 0, 3, 3, 0]

Your original sample was only the nominal, most friendly case, so it
is hard to know if any submitted solutions will work will all of your
other conditions. Please try this with more challenging data, such as
starting a sequence in the middle, numbers not in the set
or 10, etc.

-- Paul
Nov 22 '08 #9

### This discussion thread is closed

Replies have been disabled for this discussion.