473,394 Members | 1,306 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,394 software developers and data experts.

Best strategy for finding a pattern in a sequence of integers

Hi all,

I am a Python novice, and I have run into a problem in a project I am
working on, which boils down to identifying the patterns in a sequence
of integers, for example

..... 1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 9 3 3 0 3 3 0 3 3 0 3 3 0 10 6 6
1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 9 3 3 0 3 3 0 3 3 0 3 3 0 10 6 6
1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 9 3 3 0 3 3 0 3 3 0 3 3 0 10 6 6
1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 9 3 3 0 3 3 0 3 3 0 3 3 0 10 6 6
1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 ...

I want to process this such that I get out two patterns, like:
(9, 3, 3, 0, 3, 3, 0, 3, 3, 0, 3, 3, 0)
and
(10, 6, 6, 1, 6, 6, 1, 6, 6, 1, 6, 6, 1, 6, 6, 1, 6, 6, 1, 6, 6, 1)

I am pretty sure I can figure out how to do that, but I would like to
have some guidance on the most pythonic approach to this.

Two paths I have considered is:
1. Convert the sequence of integers to a hex string, i.e., "...
16616616616616619330330330330A66166..." and use the re module to find
the patterns. Use the string positions to go back to the sequence
2. Put them in a list or an array and manually look for the patterns
by iterating and filtering the elements compare with sets.

I am not looking for a "solution" to this specific problem, just some
guidance

The rules for the sequence is:
1. The sequence may start in the middle of a pattern
2. There are one or two patterns, Pattern A and Pattern B in the
sequence
3. Pattern A only consists of the numbers 0, 3, and 9. 3, 3 is always
followed by 0
4. Pattern B only consists of the numbers 1, 6, and 10. 6, 6, is
always followed by 1
5. There may be other numbers interspersed within the sequence, but
they can be ignored
6. The relative position of 9 or 10 in the patterns varies from case
to case, but is consistent throughout a sequence.
7. There is always one 9 or one 10 in a pattern
7. The beginning of a pattern is marked by the transision from oner
pattern to the other.
8. If there is only one pattern in the sequence, the pattern beginning
is marked by the first occurance of either 9 or 10
9. The pattern is repetitive in the sequence,
e.g., ...ABABABAB..., ...AAA..., or ...BBB...

Thank you,
-- Slaunger

Nov 21 '08 #1
8 2738
Slaunger wrote:
Hi all,

I am a Python novice, and I have run into a problem in a project I am
working on, which boils down to identifying the patterns in a sequence
of integers, for example

.... 1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 9 3 3 0 3 3 0 3 3 0 3 3 0 10 6 6
1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 9 3 3 0 3 3 0 3 3 0 3 3 0 10 6 6
1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 9 3 3 0 3 3 0 3 3 0 3 3 0 10 6 6
1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 9 3 3 0 3 3 0 3 3 0 3 3 0 10 6 6
1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 ...

I want to process this such that I get out two patterns, like:
(9, 3, 3, 0, 3, 3, 0, 3, 3, 0, 3, 3, 0)
and
(10, 6, 6, 1, 6, 6, 1, 6, 6, 1, 6, 6, 1, 6, 6, 1, 6, 6, 1, 6, 6, 1)
Maybe:

#-----------------------------------------------------------------
data = '''
1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 9 3 3 0 3 3 0 3 3 0 3 3 0 10 6 6
1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 9 3 3 0 3 3 0 3 3 0 3 3 0 10 6 6
1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 9 3 3 0 3 3 0 3 3 0 3 3 0 10 6 6
1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 9 3 3 0 3 3 0 3 3 0 3 3 0 10 6 6
1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1'''

data = [int(x) for x in data.split()]

from itertools import groupby

S1 = [0, 3, 9]

s = set()
for k, g in groupby(data, lambda x: x in S1):
seq = tuple(g)
# maybe the next line should be 'if 9 in seq or 10 in seq'?
if seq[0] in [9, 10]:
s.add(seq)

print s
#------------------------------------------------------------------
set(
[(9, 3, 3, 0, 3, 3, 0, 3, 3, 0, 3, 3, 0),
(10, 6, 6, 1, 6, 6, 1, 6, 6, 1, 6, 6, 1, 6, 6, 1, 6, 6, 1, 6, 6, 1)])

hth

G.

Nov 21 '08 #2
Slaunger <Sl******@gmail.comwrites:
I am a Python novice, and I have run into a problem in a project I am
working on, which boils down to identifying the patterns in a sequence
of integers, for example

.... 1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 9 3 3 0 3 3 0 3 3 0 3 3 0 10 6 6
1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 9 3 3 0 3 3 0 3 3 0 3 3 0 10 6 6
1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 9 3 3 0 3 3 0 3 3 0 3 3 0 10 6 6
1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 9 3 3 0 3 3 0 3 3 0 3 3 0 10 6 6
1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 ...

I want to process this such that I get out two patterns, like:
(9, 3, 3, 0, 3, 3, 0, 3, 3, 0, 3, 3, 0)
and
(10, 6, 6, 1, 6, 6, 1, 6, 6, 1, 6, 6, 1, 6, 6, 1, 6, 6, 1, 6, 6, 1)

I am pretty sure I can figure out how to do that, but I would like to
have some guidance on the most pythonic approach to this.
Then it would be a good starting point to write some code. Then you
could post it and ask how it can be made more 'pythonic'.

HTH

--
Arnaud
Nov 21 '08 #3
On Nov 21, 9:13*am, Slaunger <Slaun...@gmail.comwrote:
Hi all,

I am a Python novice, and I have run into a problem in a project I am
working on, which boils down to identifying the patterns in a sequence
of integers, for example

.... 1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 9 3 3 0 3 3 0 3 3 0 3 3 0 10 6 6
1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 9 3 3 0 3 3 0 3 3 0 3 3 0 10 6 6
1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 9 3 3 0 3 3 0 3 3 0 3 3 0 10 6 6
1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 9 3 3 0 3 3 0 3 3 0 3 3 0 10 6 6
1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 ...

I want to process this such that I get out two patterns, like:
(9, 3, 3, 0, 3, 3, 0, 3, 3, 0, 3, 3, 0)
and
(10, 6, 6, 1, 6, 6, 1, 6, 6, 1, 6, 6, 1, 6, 6, 1, 6, 6, 1, 6, 6, 1)

I am pretty sure I can figure out how to do that, but I would like to
have some guidance on the most pythonic approach to this.

Two paths I have considered is:
1. Convert the sequence of integers to a hex string, i.e., "...
16616616616616619330330330330A66166..." and use the re module to find
the patterns. Use the string positions to go back to the sequence
2. Put them in a list or an array and manually look for the patterns
by iterating and filtering the elements compare with sets.

I am not looking for a "solution" to this specific problem, just some
guidance
Your rules appear to be incomplete and inconsistent.
>
The rules for the sequence is:
1. The sequence may start in the middle of a pattern
2. There are one or two patterns, Pattern A and Pattern B in the
sequence
3. Pattern A only consists of the numbers 0, 3, and 9. 3, 3 is always
followed by 0
But does a 3 always follow a 3? Can you have 3, 0, 3, 0?
Can 0's occur without 3's, such as 0, 0, 0?
4. Pattern B only consists of the numbers 1, 6, and 10. 6, 6, is
always followed by 1
5. There may be other numbers interspersed within the sequence, but
they can be ignored
So, I can have 3, 3, 0, 7, 3, 3, 0?

What if the 7 occurs after the pair of 3's? Is the number following
the 7 forced to be 0, i.e., is 3, 3, 7, 3, 3, 0 legal?
6. The relative position of 9 or 10 in the patterns varies from case
to case, but is consistent throughout a sequence.
7. There is always one 9 or one 10 in a pattern
7. The beginning of a pattern is marked by the transision from oner
pattern to the other.
Can there be an ignored number between the patterns? Is
9,3,3,0,7,10,6,6,1
legal? If NO, you violate Rule 5. If YES, you violate the second Rule
7.
8. If there is only one pattern in the sequence, the pattern beginning
is marked by the first occurance of either 9 or 10
9. The pattern is repetitive in the sequence,
e.g., ...ABABABAB..., ...AAA..., or ...BBB...

Thank you,
-- Slaunger
Nov 21 '08 #4
On Fri, 21 Nov 2008 18:10:02 +0100
Gerard flanagan <gr********@gmail.comwrote:
data = '''
1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 9 3 3 0 3 3 0 3 3 0 3 3 0 10 6 6
1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 9 3 3 0 3 3 0 3 3 0 3 3 0 10 6 6
1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 9 3 3 0 3 3 0 3 3 0 3 3 0 10 6 6
1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 9 3 3 0 3 3 0 3 3 0 3 3 0 10 6 6
1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1'''

data = [int(x) for x in data.split()]

from itertools import groupby
But groupby needs sorted data?

Suppose the rules do not conflict or overlap and between them divide
all the values, then maybe this would work:

class StateMachine:

def __init__(self,*rules):
self.rules = rules
self.state = len(rules) #deliberately unreachable
self.first = True

def change(self,x):
#check and/or change state
for i,rule in enumerate(self.rules):
if rule(x):
if i == self.state: #no state change
return False
else: #maybe state change
self.state = i
if self.first: #set initial state, no change
self.first = False
return False
else:
return True #state is changed
raise ValueError

def test():

data = '''
1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 9 3 3 0 3 3 0 3 3 0 3 3 0 10 6 6
1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 9 3 3 0 3 3 0 3 3 0 3 3 0 10
6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 9 3 3 0 3 3 0 3 3 0 3 3 0 10
6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 9 3 3 0 3 3 0 3 3 0 3 3 0 10
6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1'''

data = map(int, data.split())

def rule1(x):
return x in set((0, 3, 9))
def rule2(x):
return x in set((6, 1, 10))

state = StateMachine(rule1,rule2)
L = []
res = []
for x in data:
if state.change(x):
res.append(list(L))
L =[]
L.append(x)
res.append(list(L))
print res

if __name__=='__main__':
test()

Nov 21 '08 #5
>
I am pretty sure I can figure out how to do that, but I would like to
have some guidance on the most pythonic approach to this.

Then it would be a good starting point to write some code. Then you
could post it and ask how it can be made more 'pythonic'.
That is actually a good point. I will do that.
-- ~~~~
Nov 22 '08 #6
On 21 Nov., 23:36, Mensanator <mensana...@aol.comwrote:
Your rules appear to be incomplete and inconsistent.
OK. Let me try to clarify then...
3. Pattern A only consists of the numbers 0, 3, and 9. 3, 3 is always
followed by 0

But does a 3 always follow a 3? Can you have 3, 0, 3, 0?
Can 0's occur without 3's, such as 0, 0, 0?
Yes, 3s always comes in pairs. So, 3, 0, 3, 0 is not allowed.
And of the numbers 0, 3, and 9; 0 will always be the first after the
pair of 3s
>
4. Pattern B only consists of the numbers 1, 6, and 10. 6, 6, is
always followed by 1
5. There may be other numbers interspersed within the sequence, but
they can be ignored

So, I can have 3, 3, 0, 7, 3, 3, 0?
Yes, there is a point I did not mention propery in my first
description:
The number 7 for instance could appear in that position, but it would
not be repetitive;
as a matter of fact these other numbers can be filtered away before
looking for the pattern,
so let us just forgot about those.
>
What if the 7 occurs after the pair of 3's? Is the number following
the 7 forced to be 0, i.e., is 3, 3, 7, 3, 3, 0 legal?
No, it would have to be 3, 3, 0, 7, 3, 3, 0, it is sequeezed in - but
as mentioned they can be prefiltered out of the problem
>
7. The beginning of a pattern is marked by the transition from oner
pattern to the other.

Can there be an ignored number between the patterns? Is
9,3,3,0,7,10,6,6,1
legal? If NO, you violate Rule 5. If YES, you violate the second Rule
7.
Yes you are right. This complication is again eliminated by
prefiltering "other" numbers out

-- Slaunger
Nov 22 '08 #7
On 21 Nov., 18:10, Gerard flanagan <grflana...@gmail.comwrote:
Slaunger wrote:
Hi all,
I am a Python novice, and I have run into a problem in a project I am
working on, which boils down to identifying the patterns in a sequence
of integers, for example
.... 1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 9 3 3 0 3 3 0 3 3 0 3 3 0 10 6 6
1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 9 3 3 0 3 3 0 3 3 0 3 3 0 10 6 6
1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 9 3 3 0 3 3 0 3 3 0 3 3 0 10 6 6
1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 9 3 3 0 3 3 0 3 3 0 3 3 0 10 6 6
1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 ...
I want to process this such that I get out two patterns, like:
(9, 3, 3, 0, 3, 3, 0, 3, 3, 0, 3, 3, 0)
and
(10, 6, 6, 1, 6, 6, 1, 6, 6, 1, 6, 6, 1, 6, 6, 1, 6, 6, 1, 6, 6, 1)

Maybe:

#-----------------------------------------------------------------
data = '''
1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 9 3 3 0 3 3 0 3 3 0 3 3 0 10 6 6
1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 9 3 3 0 3 3 0 3 3 0 3 3 0 10 6 6
1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 9 3 3 0 3 3 0 3 3 0 3 3 0 10 6 6
1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 9 3 3 0 3 3 0 3 3 0 3 3 0 10 6 6
1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1'''

data = [int(x) for x in data.split()]

from itertools import groupby

S1 = [0, 3, 9]

s = set()
for k, g in groupby(data, lambda x: x in S1):
* * *seq = tuple(g)
* * *# maybe the next line should be 'if 9 in seq or 10 in seq'?
* * *if seq[0] in [9, 10]:
* * * * *s.add(seq)

print s
#------------------------------------------------------------------
set(
[(9, 3, 3, 0, 3, 3, 0, 3, 3, 0, 3, 3, 0),
(10, 6, 6, 1, 6, 6, 1, 6, 6, 1, 6, 6, 1, 6, 6, 1, 6, 6, 1, 6, 6, 1)])

hth

G.
Hi Gerard,
This definitely looks like a path to walk along, and I think your code
does the trick, although I have to play a little around with the
groupby method, of which I had no prior knowledge. I think I will
write some unit test cases to stress test you concept (on Monday, when
I am back at work). I appreciate your almost full implementation - it
would have sufficed to point me to the itertools module, and then I
think I would have figured out.
-- ~~~~
Nov 22 '08 #8
So I think you just need to find the first two complete sequences of
1,6,10 and 0,3,9, remove any repetitions and then you're done.

data = '''
1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 9 3 3 0 3 7 3 0 3 3 0 3 3 0 10 6 6
1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 10 6 6
1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 9 3 3 0 3 3 0 3 3 0 3 3 0 10 6
6
1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 9 3 3 0 3 3 0 3 3 0 3 3 0 10 6
6
1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1'''
data = [int(x) for x in data.split()]

s1 = frozenset([1,6,10])
s2 = frozenset([0,3,9])

diter = iter(data)

i = diter.next()
curset = (s1,s2)[i in s2]
otherset = lambda : (s1,s2)[curset is s1]
seq = { s1 : [], s2 : [] }

# read until there is the first change in state - discard
# these, since we may have started in the middle of a sequence
other = otherset()
while i not in other:
i = diter.next()

# read in 2 sequences
for _ in range(2):
other = curset
curset = otherset()
tmp = []
while i not in other:
if i in curset:
tmp.append(i)
i = diter.next()
seq[curset] = tmp[:]

# look for repeats in a seq, truncate
def truncateReps(s,sentinel):
if s.count(sentinel) 1:
loc1 = s.index(sentinel)
loc2 = s.index(sentinel,loc1+1)
s[:] = s[:loc2-loc1]

truncateReps(seq[s1],10)
truncateReps(seq[s2],9)

# the answer
print seq[s1]
print seq[s2]

Prints:
[10, 6, 6, 1, 6, 6, 1, 6, 6, 1, 6, 6, 1, 6, 6, 1, 6, 6, 1, 6, 6, 1]
[9, 3, 3, 0, 3, 3, 0, 3, 3, 0, 3, 3, 0]

Your original sample was only the nominal, most friendly case, so it
is hard to know if any submitted solutions will work will all of your
other conditions. Please try this with more challenging data, such as
starting a sequence in the middle, numbers not in the set
(0,1,3,6,9,10), repeated patterns, sequences that don't start with 9
or 10, etc.

-- Paul
Nov 22 '08 #9

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

3
by: syncman | last post by:
I think there are 2 options for how to implement the Strategy pattern. One is to use polymorphism; derived classes have the same interface and can be plugged in. The other is to use templates:...
136
by: Matt Kruse | last post by:
http://www.JavascriptToolbox.com/bestpractices/ I started writing this up as a guide for some people who were looking for general tips on how to do things the 'right way' with Javascript. Their...
47
by: fb | last post by:
Hi Everyone. Thanks for the help with the qudratic equation problem...I didn't think about actually doing the math...whoops. Anyway... I'm having some trouble getting the following program to...
6
by: Daniel Santa Cruz | last post by:
Hello all, I've been trying to go over my OO Patterns book, and I decided to try to implement them in Python this time around. I figured this would help me learn the language better. Well,...
1
by: calvin | last post by:
Can anyone write a code for this? Searching a set of Integers You are given two sets of integers. S1 and S2. The size of S1 is less than sizeof S2, i.e. the number of integers in S1 is less...
5
by: pythoncurious | last post by:
Hi python experts In C++ I can do something like this: class Base { public: void f() { this->f_(); } private: virtual void f_() = 0; };
0
by: ltruett | last post by:
I'm almost done my series of design patterns using PHP 5. Today's pattern is the Strategy Pattern. http://www.fluffycat.com/PHP-Design-Patterns/Strategy/ In the Stratedy Pattern a "family of...
0
by: JosAH | last post by:
Greetings, last week we talked a bit about the Visitor design pattern. This week we'll talk a bit about additional functionality that is sometimes wanted, i.e. the functionality is optional....
13
by: kdt | last post by:
Hi, I'm checking to see if you guys may be able to help me with an algorithm for finding patterns. I have around 2000 short sequences (of length 9) that are aligned. I want to be able to extract...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.