473,395 Members | 1,368 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,395 software developers and data experts.

Graph Data Structures

Hi All,

Currently I am working on a generic graph library so I can do various
graph based analysis for various projects I have ideas for. Currently
I am implementing Graph as a wrapper around a dictionary. Currently my
implementation works like this:

t = Graph()
n1 = Node("Node1")
n2 = Node("Test2")
edge1 = Edge("Test3")
t += n1 { n1:{}}
t[n1][n2] = edge1 { n1:{n2:edge1}

However this isnt actually ending up with the structure I want. I want
it to finally end up as ...... { n1:{n2:edge1}, n2:{}}. Is
there anyway I can do this simply????

Also I am looking at having a large graph and was wondering if anyone
knew of anyway I could reduce the memory requirements of this
structure and improve the speed of queries on it. I m thinking writing
a C extension for it....is this a good idea and where would I start?
Or does Python have some kind of transparent memory access module I
can implement.

Many Thanks in advance,

Nathan

PS.....Please find my code below:

class Graph(object):
def __init__(self, g= { } ):
self.graph = g
def __iadd__(self, p):
if p not in self.graph:
self.graph[p] = PathsDict()
return self
def __getitem__(self, p):
try:
return self.graph[p]
except KeyError:
raise KeyError( "%s not in graph" %(repr(p)) )
def __str__(self):
return str(self.graph)
def filter(self, filter):
pass

class PathsDict(object):
def __init__(self):
self.paths = { }
def __setitem__(self, p, val):
if p not in self.paths:
self.paths[p] = val
def __getitem__(self, p):
return self.paths[p]
# catch exception here
def paths(self):
for k, v in self.paths:
yield (k, v)
def edges(self):
return self.paths.values()
def __str__(self):
return str(self.paths)
def __len__(self):
return len(self.paths)

class Node(object):
def __init__(self, name):
self.name = name
def __str__(self):
return self.name

class Edge(dict):
def __init__(self, name, weight = 1):
self["name"] = name
self["weight"] = weight
def __str__(self):
return self["name"]
Nov 25 '06 #1
12 2850
i haven't read your code, but there are many graph implementations in
python.
in case you haven't found these yet:
http://wiki.python.org/moin/PythonGraphApi

if you only want to do some analysis i think you need this one (as it's
pretty complete and simple):
https://networkx.lanl.gov/

i also recommend Guido's essay to read:
http://www.python.org/doc/essays/graphs.html

Nov 25 '06 #2
On Sat, 25 Nov 2006 14:05:27 +0000, Nathan Harmston wrote:
Hi All,

Currently I am working on a generic graph library so I can do various
graph based analysis for various projects I have ideas for. Currently
I am implementing Graph as a wrapper around a dictionary. Currently my
implementation works like this:
[snip]

http://www.python.org/doc/essays/graphs.html

http://mail.python.org/pipermail/pyt...il/137593.html
Hope this helps.
--
Steven.

Nov 25 '06 #3
Szabolcs Nagy wrote:
.........
if you only want to do some analysis i think you need this one (as it's
pretty complete and simple):
https://networkx.lanl.gov/
.........

seems to be broken at present with a python traceback coming out; not a
good advert for python and/or trac
--
Robin Becker
Nov 25 '06 #4
Szabolcs Nagy:
i haven't read your code, but there are many graph implementations in
python.
in case you haven't found these yet:
http://wiki.python.org/moin/PythonGraphApi

if you only want to do some analysis i think you need this one (as it's
pretty complete and simple):
https://networkx.lanl.gov/

i also recommend Guido's essay to read:
http://www.python.org/doc/essays/graphs.html
I can also suggest my one:
http://sourceforge.net/projects/pynetwork/

And boost graph bindings for Python, quite fast:
http://www.osl.iu.edu/~dgregor/bgl-python/

Bye,
bearophile

Nov 25 '06 #5
https://networkx.lanl.gov/

This was working for me earlier, I managed to get everything from
there earlier. It seems a very good package. It seems theres more out
there than what I had thought, which unfortunately makes it harder for
me to decide what to use (pynetwork and bgl look useful aswell). I m
going to do some testing on it later and see what happens with it.
Thanks a lot for your help.

Has anyone got an idea how I could split the contents of a node and
its representation (to save memory in my graph). ie.... the nodes
contain the start and end coordinates and id and the actual
representation contains the string. I was going to have :

class Node(object):
pass

class Section(Node):
pass

class Item(object):
pass

Where section contains a slice of the Item which im interested. I m
just not sure how I can access the contents of item without storing
it. ---If u get what I mean???

Many Thanks in advance

Nathan
Nov 25 '06 #6

Nathan Harmston wrote:
https://networkx.lanl.gov/

This was working for me earlier, I managed to get everything from
there earlier. It seems a very good package. It seems theres more out
there than what I had thought, which unfortunately makes it harder for
me to decide what to use (pynetwork and bgl look useful aswell). I m
going to do some testing on it later and see what happens with it.
Thanks a lot for your help.

Has anyone got an idea how I could split the contents of a node and
its representation (to save memory in my graph). ie.... the nodes
contain the start and end coordinates and id and the actual
representation contains the string. I was going to have :

class Node(object):
pass

class Section(Node):
pass

class Item(object):
pass

Where section contains a slice of the Item which im interested. I m
just not sure how I can access the contents of item without storing
it. ---If u get what I mean???
No. Not at all. "pass" is not very informative. Neither are
"representation" and "the string". Please tell us what you mean by
"slice". What is an "item", if it's not a "node"? Try listing out the
attributes of a node, with a couple of sample values for each, and then
we might get a clue.

What makes you think that you need to save memory?

What makes you think that you could save memory by splitting whatever
it is?

HTH,
John

Nov 25 '06 #7
Nathan Harmston wrote:
https://networkx.lanl.gov/
.......

I got it back just once, but when I clicked again I see this

RuntimeError Python 2.4.4c1: /usr/bin/python
Sat Nov 25 16:21:16 2006

A problem occurred in a Python script. Here is the sequence of function
calls leading up to the error, in the order they occurred.
/build/bdist.linux-x86_64/egg/tracrst/macro.py in
render_macro(self=<tracrst.macro.TracReSTMacro object>, req=<trac.web.api

.......

782 self.__dict__["_parent_pool"] = \
783 parent_pool or libsvn.core.application_pool;
784 if self.__dict__["_parent_pool"]:
self = <libsvn.repos.svn_repos_t; proxy of C svn_repos_t instance>,
self.__dict__ = {'this': <Swig Object of type 'svn_repos_t *'>},
parent_pool = <libsvn.core.apr_pool_t; proxy of C apr_pool_t instance>,
libsvn = <module 'libsvn' from
'/usr/lib/python2.4/site-packages/libsvn/__init__.pyc'>, libsvn.core =
<module 'libsvn.core' from
'/usr/lib/python2.4/site-packages/libsvn/core.pyc'>,
libsvn.core.application_pool = <libsvn.core.apr_pool_t; proxy of C
apr_pool_t instance>

RuntimeError: instance.__dict__ not accessible in restricted mode
args = ('instance.__dict__ not accessible in restricted mode',)
perhaps I'm seeing different apache processes or something
--
Robin Becker
Nov 25 '06 #8
Hi,

The idea is that I m going to use it to build graphs for sequence
alignment (at the moment), I read a discussion on the corebio
(reimplementation of biopython) group about using intervals to
represent sequence slices. The idea being that, my graph may contain
millions of alignments and storing the sequence (the actual ATGC) is
not required.

class Node(object):
pass

class Interval(Node):
_id = "gene1"
_start = 50
_end = 200
_strand = 1

class Sequence(object):
_sequence = "atgtcgtgagagagagttgtgag................."

So one interval on one sequence would align to another interval from
another sequence, but I want changes I make to the interval to be
reflected in the representation later. If I reverse complement it i
want the interval to store this information but the Sequence only
shows this later on when I call use it calling repr or str.

Do you get what I mean.
Many Thanks

Nathan
Nov 25 '06 #9

Nathan Harmston wrote:
Hi,

The idea is that I m going to use it to build graphs for sequence
alignment (at the moment), I read a discussion on the corebio
(reimplementation of biopython) group about using intervals to
represent sequence slices. The idea being that, my graph may contain
millions of alignments and storing the sequence (the actual ATGC) is
not required.

class Node(object):
pass

class Interval(Node):
_id = "gene1"
_start = 50
_end = 200
_strand = 1
What is the point of subclassing Node if it's just a dummy?
>
class Sequence(object):
_sequence = "atgtcgtgagagagagttgtgag................."

So one interval on one sequence would align to another interval from
another sequence, but I want changes I make to the interval to be
reflected in the representation later. If I reverse complement it i
want the interval to store this information but the Sequence only
shows this later on when I call use it calling repr or str.

Do you get what I mean.
Only vaguely. You use several terms which appear to be from your trade
jargon as they are not understandable when interpreted in either the
context of Python-speak or ordinary English e.g. "sequence",
"alignment", "ATGC", "reverse complement", "interval".

Two options:
(a) communicate understandably
(b) wait till your wontoks are back from holidays.

Nov 25 '06 #10
Hi,

It seems that by just going through the problem writing out a better
explanation for the reply I have figured out a solution and the
problem isnt as difficult as I thought it would be.

What is a wontok?

Thanks

Nathan

PS --the start of my reply:

class Interval(object):
_id = "gene1"
_start = 50
_end = 200
_strand = 1

class Sequence(object):
_sequence = "atgtcgtgagagagagttgtgag................."
Only vaguely. You use several terms which appear to be from your trade
jargon
Sequence is a string made from a restricted alphabet (A,T,G,C...).
Sequences can be aligned: 1 ATGCTGCAT
2 TAGCTGTTA
-------
2 5

I m trying to represent this as a graph Interval(id=1, start=2, end=6,
strand=1) ---edge------Interval(id=2, start=2, end=6, strand=1)

The problem is I was planning on storing the sequences in a dictionary
{id:Seq}, however each dictionary would represent a different source
of sequences. File1, File2....... (
STORE THE SOURCES AS A DICT AND HAVE SOURCE IN INTERVAL ASWELL
Nov 26 '06 #11

Nathan Harmston wrote:
Hi,

It seems that by just going through the problem writing out a better
explanation for the reply I have figured out a solution and the
problem isnt as difficult as I thought it would be.
Often happens.
>
What is a wontok?
It's Melanesian Pidgin (from the English "one talk") meaning a person
who speaks the same language as you, a member of your clan, ... the
context being that [at least in Papua New Guinea] there are relatively
many languages each with relatively not many speakers :-)
>
Thanks

Nathan

PS --the start of my reply:

class Interval(object):
_id = "gene1"
_start = 50
_end = 200
_strand = 1

class Sequence(object):
_sequence = "atgtcgtgagagagagttgtgag................."
Only vaguely. You use several terms which appear to be from your trade
jargon

Sequence is a string made from a restricted alphabet (A,T,G,C...).
Sequences can be aligned: 1 ATGCTGCAT
2 TAGCTGTTA
-------
2 5
I'm sure they can be, but appearances can be deceptive when you mix
tabs and spaces -- or whatever caused the above 4 lines to be not
vertically aligned but staggered diagonally like a flight of ducks
heading equatorwards for winter.

Sometimes a line of code (e.g. str1[2:6] == str2[2:6]) is worth a
thousand pictures :-)
>
I m trying to represent this as a graph Interval(id=1, start=2, end=6,
strand=1) ---edge------Interval(id=2, start=2, end=6, strand=1)

The problem is I was planning on storing the sequences in a dictionary
{id:Seq}, however each dictionary would represent a different source
of sequences. File1, File2....... (
STORE THE SOURCES AS A DICT
Mapping what keys to what values?
AND HAVE SOURCE IN INTERVAL ASWELL
So you had a data modelling problem. These are often better solved as a
separate step before you think about implementation details like
dictionaries.

Good luck with your project.

Cheers,
John

Nov 26 '06 #12
Nathan Harmston wrote:
Currently I am working on a generic graph library so I can do various
graph based analysis for various projects I have ideas for. Currently
I am implementing Graph as a wrapper around a dictionary. Currently my
implementation works like this:

t = Graph()
n1 = Node("Node1")
n2 = Node("Test2")
edge1 = Edge("Test3")
t += n1 { n1:{}}
t[n1][n2] = edge1 { n1:{n2:edge1}

However this isnt actually ending up with the structure I want. I want
it to finally end up as ...... { n1:{n2:edge1}, n2:{}}. Is
there anyway I can do this simply????
Nathan

By now you probably discovered that the networkx package can handle
this.
If I have this right, you want to create a digraph with
a directed edge from "Node1" to "Node2" and this edge
has the string "Test3" attached to it. In networkx, this is exacty what
the XDiGraph class was designed to do. Here DiGraph means
directed graph and the X means you are allowed to add (any)
data to the edge,for example:
>>import networkx as nx
t = nx.XDiGraph()
t.add_edge( "Node1", "Node2", "Test3")
Also I am looking at having a large graph and was wondering if anyone
knew of anyway I could reduce the memory requirements of this
structure and improve the speed of queries on it. I m thinking writing
a C extension for it....is this a good idea and where would I start?
Or does Python have some kind of transparent memory access module I
can implement.
Networkx was designed so that you can hook your own
C extension in. However, making it ispeed or memory efficient
is quite application dependent. I am still not clear as to exactly what

class of algorithms you want to implement via a string-interval
representation, and whether you demand exact alignment or whether
missing/incorrect data etc. is allowed as part of the alignment
problem.

HTH
Pieter Swart

Nov 26 '06 #13

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

25
by: Magnus Lie Hetland | last post by:
Is there any interest in a (hypothetical) standard graph API (with 'graph' meaning a network, consisting of nodes and edges)? Yes, we have the standard ways of implementing graphs through (e.g.)...
2
by: Christian Christmann | last post by:
Hi, I need to write a graph which provides at least the following functions: 1) stores nodes and edges (both store further information which can be of any type) 2) manipulations on nodes and...
4
by: Sameer | last post by:
Hello, A data structure for the implementation of graph can be struct node { int index; struct node *next; };
1
by: Gregor Rot | last post by:
Hi, i am interested in the fastest possible representation of an undirected graph G=(V,E) (vertex, edge representation) in connection with the Kruskal's algorithm (finding a minimal spanning...
3
by: alice | last post by:
hi all, I am very new to graphs.So can somebody please give me a link to some graphs tutorial as well as their assignment. Thanks, Alice Walls
10
by: andrea | last post by:
I'm studying some graphs algorithm (minumum spanning tree, breath search first topological sort etc etc...) and to understand better the theory I'm implementing them with python... I made my own...
2
by: tmuldner | last post by:
Hi, I am looking for existing software/description of a directed graph representation of an XML Schema. Any help will be appreciated.
2
by: sriniwas | last post by:
Hi Frnd's, m using prefuse visulation,it's have one display class and this class have one saveImage(outPutStream, String jpg,double size);. now graph is converting ia jpg image properly.now my...
5
by: chrispoliquin | last post by:
I need to represent the hyperlinks between a large number of HTML files as a graph. My non-directed graph will have about 63,000 nodes and and probably close to 500,000 edges. I have looked...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.