473,787 Members | 2,931 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Sequence and/or pattern matching

Hi everyone,

I'm relatively new to python and I want to write a piece of code who do
the following work for data mining purpose :

1) I have a list of connexion between some computers. This list has
this format :

Ip A Date Ip B
.... ... ...
192.168.0.1 19.10.2005 192.168.0.2
192.168.0.3 19.10.2005 192.168.0.1
192.168.0.4 19.10.2005 192.168.0.6
.... ... ...

2) I want to find if there are unknown sequences of connexions in my
data and if these sequences are repeated along the file :

For example :

Computer A connects to Computer B then
Computer B connects to Computer C then
Computer C connects to Computer A

3) Then, the software gives the sequences it has found and how many
times they appear...

I hope this is clear, point 2) is where I have my main problem. Has
someone an idea where to start and if there's already something coded ?

Thanks

Séb

Oct 19 '05 #1
7 2808
Séb <sc***@unil.c h> wrote:
Hi everyone,

I'm relatively new to python and I want to write a piece of code who do
the following work for data mining purpose :
Essentially, if I understand correctly, you want to detect LOOPS given a
sequence of directed connections A->B. "loop detection" and "graph"
would then be the keywords to search for, in this case.
2) I want to find if there are unknown sequences of connexions in my
data and if these sequences are repeated along the file :

For example :

Computer A connects to Computer B then
Computer B connects to Computer C then
Computer C connects to Computer A


Does this "then" imply you're only interested in loops occurring in this
*sequence*, i.e., is order of connections important? If the sequence of
directed connections was, say, in the different order:

B->C
A->B
C->A

would you want this detected as a loop, or not?
Alex
Oct 19 '05 #2
Essentially, if I understand correctly, you want to detect LOOPS given a
sequence of directed connections A->B. "loop detection" and "graph"
would then be the keywords to search for, in this case.
Exactly, but the sequence has to be discovered by the piece of code !
Does this "then" imply you're only interested in loops occurring in this
*sequence*, i.e., is order of connections important? If the sequence of
directed connections was, say, in the different order:

B->C
A->B
C->A

would you want this detected as a loop, or not?


Yes, it would be nice to detect it as a loop, with for example a
threshold. Btw, it would be nice to ignore additional connections in
such a way :

B->C # Normal connection
D->E # Additional connection to ignore
A->B # Normal connection
C->A # Normal connection

Would it be possible ?

Thank you very much

Oct 19 '05 #3
Séb wrote:
1) I have a list of connexion between some computers. This list has
this format :
It looks like you want graph theory.
Ip A Date Ip B
... ... ...
192.168.0.1 19.10.2005 192.168.0.2
192.168.0.3 19.10.2005 192.168.0.1
192.168.0.4 19.10.2005 192.168.0.6
That looks like a list of edges between graph nodes, you see. Each node
corresponds to an address and each edge is a connection between two
nodes - ip addresses, in your case.
2) I want to find if there are unknown sequences of connexions in my
data and if these sequences are repeated along the file :

For example :

Computer A connects to Computer B then
Computer B connects to Computer C then
Computer C connects to Computer A
That looks like finding a path between Node A and Node C. This is a
common application of graph theory, and especially when finding routes
(eg. for train journeys, or for AI characters in computer games).
3) Then, the software gives the sequences it has found and how many
times they appear...
You can find all the routes between 1 node and others by using
depth-first search (or indeed, any other simple graph search
algorithm). Basically, this says that, for any node, examine all the
nodes it leads to. Then examine those nodes... repeat until you run out
of nodes or find where you're looking for. The only complication is
remembering the route you took.
I hope this is clear, point 2) is where I have my main problem. Has
someone an idea where to start and if there's already something coded ?


I found a couple of Python graph libraries on Google but I can't vouch
for their quality. However, writing your own simple graph traversal
algorithm should be quite trivial. I would start by populating a
dictionary where the keys are instances of address A and the values are
lists of address B values for address A. Add each link again with
address B as the key and address A as the value if you need to
represent bidirectional connections. Then you can perform search on
that as required. The only slight complication is avoiding infinite
loops - I might use a dictionary of address->boolean values here and
check off each address as the algorithm progresses.

--
Ben Sizer

Oct 19 '05 #4
"Séb" <sc***@unil.c h> writes:
Essentially, if I understand correctly, you want to detect LOOPS given a
sequence of directed connections A->B. "loop detection" and "graph"
would then be the keywords to search for, in this case.


Exactly, but the sequence has to be discovered by the piece of code !
Does this "then" imply you're only interested in loops occurring in this
*sequence*, i.e., is order of connections important? If the sequence of
directed connections was, say, in the different order:

B->C
A->B
C->A

would you want this detected as a loop, or not?


Yes, it would be nice to detect it as a loop, with for example a
threshold. Btw, it would be nice to ignore additional connections in
such a way :

B->C # Normal connection
D->E # Additional connection to ignore
A->B # Normal connection
C->A # Normal connection

Would it be possible ?


As Ben Sizer pointed out, this is a fairly well-known graph
algorithm. You just want to add information to add ordering
information to each edge, so you can verify that the an edge that is
further down the path is "older" than the last edge added. Given your
ordering requirement, simply numbering the edges and checking that the
"older" edge has a larger number than the previous edge should do.

With my admin hat on, I have to wonder - don't you really want to deal
with duration? I.e - each connection has a "start" and "end" time, and
you're really only interested in paths that happen while all the
connections actually exist? Just a wild ass guess, mind you. However,
it doesn't radically change the test. You now tag each edge with a
start and stop time, and check that all previous edges on the path
exist while the one you are adding exists.

<mike
--
Mike Meyer <mw*@mired.or g> http://www.mired.org/home/mwm/
Independent WWW/Perforce/FreeBSD/Unix consultant, email for more information.
Oct 19 '05 #5
Hi everybody,

Thanks for the time taken to answer my question. Unfortunatly, it seems
that there's a little confusion about what I want to do.

In fact, I don't want to search for a particular path between
computers. What I really want is to detect sequences of connection that
are repeated along the log. Is it clearer, if not, I will put another
exmample ;-)

Thank you !

Ps Python community is very nice, I'm glad I learn this language !

Oct 20 '05 #6
"Séb" <sc***@unil.c h> writes:
Hi everybody,

Thanks for the time taken to answer my question. Unfortunatly, it seems
that there's a little confusion about what I want to do.

In fact, I don't want to search for a particular path between
computers. What I really want is to detect sequences of connection that
are repeated along the log. Is it clearer, if not, I will put another
exmample ;-)


You're right - I was confused. That's why I asked about the timing
issue. But I also indicated that I might be way off base with that.

Everyone may be confused as well - we thought you were looking for
*any* series of connections that started at one computer and ended at
a second computer. That problem is the well-understood problem of
finding a path through a graph, where "path" is used in a
graph-theoretic sense and not a network sense.

From what you say above, it looks like you're looking for a specfic
sequence of connections in your connetions record. That's a much
easier problem. Roughly, you'd do:

def matches(desired , history):
"""Determin es if history contains the desired path.
desired = # List of connections we're looking for.
history = # List of connections that were actually made."""

for con in desired:
try:
while not con.matches(his tory[hp]):
hp += 1
except IndexError:
return False
return True
<mike
--
Mike Meyer <mw*@mired.or g> http://www.mired.org/home/mwm/
Independent WWW/Perforce/FreeBSD/Unix consultant, email for more information.
Oct 20 '05 #7
Sorry for the confusion, I think my example was unclear. Thank you Mike
for this piece of code who solves a part of my problem. In fact, the
sequences are unknown at the beginning, so the first part of the code
has to find possible sequences and if those sequences are repeated,
counts how many time they appear (as your code does).

I have found this morning that there's a software produced by i2
software who does this kind of job, but for telephone call analysis.
Maybe the description could help to better understand my goal :

http://www.i2.co.uk/products/Pattern_Tracer/default.asp

Séb

Oct 21 '05 #8

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

4
5495
by: | last post by:
Hi, I'm fairly new to regular expressions, and this may be a rather dumb question, but so far I haven't found the answer in any tutorial or reference yet... If I have f.i. the string "The {{{{power of {{{{regular expressions}}}} comes from}}}} the ability to include alternatives and repetitions in the pattern." from which I want to extract chunks starting with "{{{{" and ending with "}}}}".
8
6982
by: gsv2com | last post by:
One of my weaknesses has always been pattern matching. Something I definitely need to study up on and maybe you guys can give me a pointer here. I'm looking to remove all of this code and just use pattern matching to determine if the proper amount of numeric characters has been met. Here is the function I've already done. Any help you can give in a pattern matching solution would be much appreciated and very educational.
176
8189
by: Thomas Reichelt | last post by:
Moin, short question: is there any language combining the syntax, flexibility and great programming experience of Python with static typing? Is there a project to add static typing to Python? Thank you, -- greetz tom
9
3218
by: Xah Lee | last post by:
# -*- coding: utf-8 -*- # Python # Matching string patterns # # Sometimes you want to know if a string is of # particular pattern. Let's say in your website # you have converted all images files from gif # format to png format. Now you need to change the # html code to use the .png files. So, essentially
5
5759
by: olaufr | last post by:
Hi, I'd need to perform simple pattern matching within a string using a list of possible patterns. For example, I want to know if the substring starting at position n matches any of the string I have a list, as below: sentence = "the color is $red" patterns = pos = sentence.find($)
43
2814
by: Roger L. Cauvin | last post by:
Say I have some string that begins with an arbitrary sequence of characters and then alternates repeating the letters 'a' and 'b' any number of times, e.g. "xyz123aaabbaabbbbababbbbaaabb" I'm looking for a regular expression that matches the first, and only the first, sequence of the letter 'a', and only if the length of the sequence is exactly 3.
8
3762
by: regis | last post by:
Greetings, about scanf matching nonempty sequences using the "%" matches a nonempty sequence of anything except '-' "%" matches a nonempty sequence of anything except ']" matches a nonempty sequence of anything except ']' "%" matches a nonempty sequence of anything except '^' "%" matches a nonempty sequence of '-' "%" matches a nonempty sequence of ']" matches a nonempty sequence of ']' ....but how to match a nonempty sequence of '^'...
8
5660
by: Daneel | last post by:
Hello! I'm looking for an algorithm which finds all occurences of a bit sequence (e.g., "0001") in a file. This sequence can start at any bit in the file (it is not byte aligned). I have some ideas of how to approach the problem (1) reading file into unsigned char buffer, 2) defining bit structure, 3) comparing the first 4 bits of the buffer with the bit structure, 4) shifting the char buffer one left, 5) repeat at step 3)) but I'm...
8
2779
by: Slaunger | last post by:
Hi all, I am a Python novice, and I have run into a problem in a project I am working on, which boils down to identifying the patterns in a sequence of integers, for example ..... 1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 9 3 3 0 3 3 0 3 3 0 3 3 0 10 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 9 3 3 0 3 3 0 3 3 0 3 3 0 10 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 9 3 3 0 3 3 0 3 3 0 3 3 0 10 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 9 3 3 0 3 3 0...
0
9655
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
9498
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
10363
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
10172
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
0
8993
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
0
5398
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
0
5535
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
2
3670
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
3
2894
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.