Hi everyone,
I'm relatively new to python and I want to write a piece of code who do
the following work for data mining purpose :
1) I have a list of connexion between some computers. This list has
this format :
Ip A Date Ip B
.... ... ...
192.168.0.1 19.10.2005 192.168.0.2
192.168.0.3 19.10.2005 192.168.0.1
192.168.0.4 19.10.2005 192.168.0.6
.... ... ...
2) I want to find if there are unknown sequences of connexions in my
data and if these sequences are repeated along the file :
For example :
Computer A connects to Computer B then
Computer B connects to Computer C then
Computer C connects to Computer A
3) Then, the software gives the sequences it has found and how many
times they appear...
I hope this is clear, point 2) is where I have my main problem. Has
someone an idea where to start and if there's already something coded ?
Thanks
Séb 7 2808
Séb <sc***@unil.c h> wrote: Hi everyone,
I'm relatively new to python and I want to write a piece of code who do the following work for data mining purpose :
Essentially, if I understand correctly, you want to detect LOOPS given a
sequence of directed connections A->B. "loop detection" and "graph"
would then be the keywords to search for, in this case.
2) I want to find if there are unknown sequences of connexions in my data and if these sequences are repeated along the file :
For example :
Computer A connects to Computer B then Computer B connects to Computer C then Computer C connects to Computer A
Does this "then" imply you're only interested in loops occurring in this
*sequence*, i.e., is order of connections important? If the sequence of
directed connections was, say, in the different order:
B->C
A->B
C->A
would you want this detected as a loop, or not?
Alex Essentially, if I understand correctly, you want to detect LOOPS given a sequence of directed connections A->B. "loop detection" and "graph" would then be the keywords to search for, in this case.
Exactly, but the sequence has to be discovered by the piece of code !
Does this "then" imply you're only interested in loops occurring in this *sequence*, i.e., is order of connections important? If the sequence of directed connections was, say, in the different order:
B->C A->B C->A
would you want this detected as a loop, or not?
Yes, it would be nice to detect it as a loop, with for example a
threshold. Btw, it would be nice to ignore additional connections in
such a way :
B->C # Normal connection
D->E # Additional connection to ignore
A->B # Normal connection
C->A # Normal connection
Would it be possible ?
Thank you very much
Séb wrote: 1) I have a list of connexion between some computers. This list has this format :
It looks like you want graph theory.
Ip A Date Ip B ... ... ... 192.168.0.1 19.10.2005 192.168.0.2 192.168.0.3 19.10.2005 192.168.0.1 192.168.0.4 19.10.2005 192.168.0.6
That looks like a list of edges between graph nodes, you see. Each node
corresponds to an address and each edge is a connection between two
nodes - ip addresses, in your case.
2) I want to find if there are unknown sequences of connexions in my data and if these sequences are repeated along the file :
For example :
Computer A connects to Computer B then Computer B connects to Computer C then Computer C connects to Computer A
That looks like finding a path between Node A and Node C. This is a
common application of graph theory, and especially when finding routes
(eg. for train journeys, or for AI characters in computer games).
3) Then, the software gives the sequences it has found and how many times they appear...
You can find all the routes between 1 node and others by using
depth-first search (or indeed, any other simple graph search
algorithm). Basically, this says that, for any node, examine all the
nodes it leads to. Then examine those nodes... repeat until you run out
of nodes or find where you're looking for. The only complication is
remembering the route you took.
I hope this is clear, point 2) is where I have my main problem. Has someone an idea where to start and if there's already something coded ?
I found a couple of Python graph libraries on Google but I can't vouch
for their quality. However, writing your own simple graph traversal
algorithm should be quite trivial. I would start by populating a
dictionary where the keys are instances of address A and the values are
lists of address B values for address A. Add each link again with
address B as the key and address A as the value if you need to
represent bidirectional connections. Then you can perform search on
that as required. The only slight complication is avoiding infinite
loops - I might use a dictionary of address->boolean values here and
check off each address as the algorithm progresses.
--
Ben Sizer
"Séb" <sc***@unil.c h> writes: Essentially, if I understand correctly, you want to detect LOOPS given a sequence of directed connections A->B. "loop detection" and "graph" would then be the keywords to search for, in this case.
Exactly, but the sequence has to be discovered by the piece of code !
Does this "then" imply you're only interested in loops occurring in this *sequence*, i.e., is order of connections important? If the sequence of directed connections was, say, in the different order:
B->C A->B C->A
would you want this detected as a loop, or not?
Yes, it would be nice to detect it as a loop, with for example a threshold. Btw, it would be nice to ignore additional connections in such a way :
B->C # Normal connection D->E # Additional connection to ignore A->B # Normal connection C->A # Normal connection
Would it be possible ?
As Ben Sizer pointed out, this is a fairly well-known graph
algorithm. You just want to add information to add ordering
information to each edge, so you can verify that the an edge that is
further down the path is "older" than the last edge added. Given your
ordering requirement, simply numbering the edges and checking that the
"older" edge has a larger number than the previous edge should do.
With my admin hat on, I have to wonder - don't you really want to deal
with duration? I.e - each connection has a "start" and "end" time, and
you're really only interested in paths that happen while all the
connections actually exist? Just a wild ass guess, mind you. However,
it doesn't radically change the test. You now tag each edge with a
start and stop time, and check that all previous edges on the path
exist while the one you are adding exists.
<mike
--
Mike Meyer <mw*@mired.or g> http://www.mired.org/home/mwm/
Independent WWW/Perforce/FreeBSD/Unix consultant, email for more information.
Hi everybody,
Thanks for the time taken to answer my question. Unfortunatly, it seems
that there's a little confusion about what I want to do.
In fact, I don't want to search for a particular path between
computers. What I really want is to detect sequences of connection that
are repeated along the log. Is it clearer, if not, I will put another
exmample ;-)
Thank you !
Ps Python community is very nice, I'm glad I learn this language !
"Séb" <sc***@unil.c h> writes: Hi everybody,
Thanks for the time taken to answer my question. Unfortunatly, it seems that there's a little confusion about what I want to do.
In fact, I don't want to search for a particular path between computers. What I really want is to detect sequences of connection that are repeated along the log. Is it clearer, if not, I will put another exmample ;-)
You're right - I was confused. That's why I asked about the timing
issue. But I also indicated that I might be way off base with that.
Everyone may be confused as well - we thought you were looking for
*any* series of connections that started at one computer and ended at
a second computer. That problem is the well-understood problem of
finding a path through a graph, where "path" is used in a
graph-theoretic sense and not a network sense.
From what you say above, it looks like you're looking for a specfic
sequence of connections in your connetions record. That's a much
easier problem. Roughly, you'd do:
def matches(desired , history):
"""Determin es if history contains the desired path.
desired = # List of connections we're looking for.
history = # List of connections that were actually made."""
for con in desired:
try:
while not con.matches(his tory[hp]):
hp += 1
except IndexError:
return False
return True
<mike
--
Mike Meyer <mw*@mired.or g> http://www.mired.org/home/mwm/
Independent WWW/Perforce/FreeBSD/Unix consultant, email for more information.
Sorry for the confusion, I think my example was unclear. Thank you Mike
for this piece of code who solves a part of my problem. In fact, the
sequences are unknown at the beginning, so the first part of the code
has to find possible sequences and if those sequences are repeated,
counts how many time they appear (as your code does).
I have found this morning that there's a software produced by i2
software who does this kind of job, but for telephone call analysis.
Maybe the description could help to better understand my goal : http://www.i2.co.uk/products/Pattern_Tracer/default.asp
Séb This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics |
by: |
last post by:
Hi,
I'm fairly new to regular expressions, and this may be a rather dumb
question, but so far I haven't found the answer in any tutorial or reference
yet...
If I have f.i. the string "The {{{{power of {{{{regular expressions}}}}
comes from}}}} the ability to include alternatives and repetitions in the
pattern." from which I want to extract chunks starting with "{{{{" and
ending with "}}}}".
|
by: gsv2com |
last post by:
One of my weaknesses has always been pattern matching. Something I
definitely need to study up on and maybe you guys can give me a pointer
here.
I'm looking to remove all of this code and just use pattern matching to
determine if the proper amount of numeric characters has been met. Here is
the function I've already done. Any help you can give in a pattern matching
solution would be much appreciated and very educational.
|
by: Thomas Reichelt |
last post by:
Moin,
short question: is there any language combining the syntax, flexibility and
great programming experience of Python with static typing? Is there a
project to add static typing to Python?
Thank you,
--
greetz tom
|
by: Xah Lee |
last post by:
# -*- coding: utf-8 -*-
# Python
# Matching string patterns
#
# Sometimes you want to know if a string is of
# particular pattern. Let's say in your website
# you have converted all images files from gif
# format to png format. Now you need to change the
# html code to use the .png files. So, essentially
|
by: olaufr |
last post by:
Hi,
I'd need to perform simple pattern matching within a string using a
list of possible patterns. For example, I want to know if the substring
starting at position n matches any of the string I have a list, as
below:
sentence = "the color is $red"
patterns =
pos = sentence.find($)
| |
by: Roger L. Cauvin |
last post by:
Say I have some string that begins with an arbitrary sequence of characters
and then alternates repeating the letters 'a' and 'b' any number of times,
e.g.
"xyz123aaabbaabbbbababbbbaaabb"
I'm looking for a regular expression that matches the first, and only the
first, sequence of the letter 'a', and only if the length of the sequence is
exactly 3.
|
by: regis |
last post by:
Greetings,
about scanf matching nonempty sequences using the "%" matches a nonempty sequence of anything except '-'
"%" matches a nonempty sequence of anything except ']" matches a nonempty sequence of anything except ']'
"%" matches a nonempty sequence of anything except '^'
"%" matches a nonempty sequence of '-'
"%" matches a nonempty sequence of ']" matches a nonempty sequence of ']'
....but how to match a nonempty sequence of '^'...
|
by: Daneel |
last post by:
Hello!
I'm looking for an algorithm which finds all occurences of a bit
sequence (e.g., "0001") in a file. This sequence can start at any bit
in the file (it is not byte aligned).
I have some ideas of how to approach the problem (1) reading file into
unsigned char buffer, 2) defining bit structure, 3) comparing the
first 4 bits of the buffer with the bit structure, 4) shifting the
char buffer one left, 5) repeat at step 3)) but I'm...
|
by: Slaunger |
last post by:
Hi all,
I am a Python novice, and I have run into a problem in a project I am
working on, which boils down to identifying the patterns in a sequence
of integers, for example
..... 1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 9 3 3 0 3 3 0 3 3 0 3 3 0 10 6 6
1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 9 3 3 0 3 3 0 3 3 0 3 3 0 10 6 6
1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 9 3 3 0 3 3 0 3 3 0 3 3 0 10 6 6
1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 6 6 1 9 3 3 0 3 3 0...
|
by: marktang |
last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look !
Part I. Meaning of...
|
by: Hystou |
last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it.
First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
| |
by: Oralloy |
last post by:
Hello folks,
I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>".
The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed.
This is as boiled down as I can make it.
Here is my compilation command:
g++-12 -std=c++20 -Wnarrowing bit_field.cpp
Here is the code in...
|
by: jinu1996 |
last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth.
The Art of Business Website Design
Your website is...
|
by: agi2029 |
last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own....
Now, this would greatly impact the work of software developers. The idea...
|
by: TSSRALBI |
last post by:
Hello
I'm a network technician in training and I need your help.
I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs.
The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols.
I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
|
by: adsilva |
last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
|
by: muto222 |
last post by:
How can i add a mobile payment intergratation into php mysql website.
| |
by: bsmnconsultancy |
last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...
| |