By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
443,908 Members | 1,945 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 443,908 IT Pros & Developers. It's quick & easy.

Graph Data Structures

P: n/a
Hi All,

Currently I am working on a generic graph library so I can do various
graph based analysis for various projects I have ideas for. Currently
I am implementing Graph as a wrapper around a dictionary. Currently my
implementation works like this:

t = Graph()
n1 = Node("Node1")
n2 = Node("Test2")
edge1 = Edge("Test3")
t += n1 { n1:{}}
t[n1][n2] = edge1 { n1:{n2:edge1}

However this isnt actually ending up with the structure I want. I want
it to finally end up as ...... { n1:{n2:edge1}, n2:{}}. Is
there anyway I can do this simply????

Also I am looking at having a large graph and was wondering if anyone
knew of anyway I could reduce the memory requirements of this
structure and improve the speed of queries on it. I m thinking writing
a C extension for it....is this a good idea and where would I start?
Or does Python have some kind of transparent memory access module I
can implement.

Many Thanks in advance,

Nathan

PS.....Please find my code below:

class Graph(object):
def __init__(self, g= { } ):
self.graph = g
def __iadd__(self, p):
if p not in self.graph:
self.graph[p] = PathsDict()
return self
def __getitem__(self, p):
try:
return self.graph[p]
except KeyError:
raise KeyError( "%s not in graph" %(repr(p)) )
def __str__(self):
return str(self.graph)
def filter(self, filter):
pass

class PathsDict(object):
def __init__(self):
self.paths = { }
def __setitem__(self, p, val):
if p not in self.paths:
self.paths[p] = val
def __getitem__(self, p):
return self.paths[p]
# catch exception here
def paths(self):
for k, v in self.paths:
yield (k, v)
def edges(self):
return self.paths.values()
def __str__(self):
return str(self.paths)
def __len__(self):
return len(self.paths)

class Node(object):
def __init__(self, name):
self.name = name
def __str__(self):
return self.name

class Edge(dict):
def __init__(self, name, weight = 1):
self["name"] = name
self["weight"] = weight
def __str__(self):
return self["name"]
Nov 25 '06 #1
Share this Question
Share on Google+
12 Replies


P: n/a
i haven't read your code, but there are many graph implementations in
python.
in case you haven't found these yet:
http://wiki.python.org/moin/PythonGraphApi

if you only want to do some analysis i think you need this one (as it's
pretty complete and simple):
https://networkx.lanl.gov/

i also recommend Guido's essay to read:
http://www.python.org/doc/essays/graphs.html

Nov 25 '06 #2

P: n/a
On Sat, 25 Nov 2006 14:05:27 +0000, Nathan Harmston wrote:
Hi All,

Currently I am working on a generic graph library so I can do various
graph based analysis for various projects I have ideas for. Currently
I am implementing Graph as a wrapper around a dictionary. Currently my
implementation works like this:
[snip]

http://www.python.org/doc/essays/graphs.html

http://mail.python.org/pipermail/pyt...il/137593.html
Hope this helps.
--
Steven.

Nov 25 '06 #3

P: n/a
Szabolcs Nagy wrote:
.........
if you only want to do some analysis i think you need this one (as it's
pretty complete and simple):
https://networkx.lanl.gov/
.........

seems to be broken at present with a python traceback coming out; not a
good advert for python and/or trac
--
Robin Becker
Nov 25 '06 #4

P: n/a
Szabolcs Nagy:
i haven't read your code, but there are many graph implementations in
python.
in case you haven't found these yet:
http://wiki.python.org/moin/PythonGraphApi

if you only want to do some analysis i think you need this one (as it's
pretty complete and simple):
https://networkx.lanl.gov/

i also recommend Guido's essay to read:
http://www.python.org/doc/essays/graphs.html
I can also suggest my one:
http://sourceforge.net/projects/pynetwork/

And boost graph bindings for Python, quite fast:
http://www.osl.iu.edu/~dgregor/bgl-python/

Bye,
bearophile

Nov 25 '06 #5

P: n/a
https://networkx.lanl.gov/

This was working for me earlier, I managed to get everything from
there earlier. It seems a very good package. It seems theres more out
there than what I had thought, which unfortunately makes it harder for
me to decide what to use (pynetwork and bgl look useful aswell). I m
going to do some testing on it later and see what happens with it.
Thanks a lot for your help.

Has anyone got an idea how I could split the contents of a node and
its representation (to save memory in my graph). ie.... the nodes
contain the start and end coordinates and id and the actual
representation contains the string. I was going to have :

class Node(object):
pass

class Section(Node):
pass

class Item(object):
pass

Where section contains a slice of the Item which im interested. I m
just not sure how I can access the contents of item without storing
it. ---If u get what I mean???

Many Thanks in advance

Nathan
Nov 25 '06 #6

P: n/a

Nathan Harmston wrote:
https://networkx.lanl.gov/

This was working for me earlier, I managed to get everything from
there earlier. It seems a very good package. It seems theres more out
there than what I had thought, which unfortunately makes it harder for
me to decide what to use (pynetwork and bgl look useful aswell). I m
going to do some testing on it later and see what happens with it.
Thanks a lot for your help.

Has anyone got an idea how I could split the contents of a node and
its representation (to save memory in my graph). ie.... the nodes
contain the start and end coordinates and id and the actual
representation contains the string. I was going to have :

class Node(object):
pass

class Section(Node):
pass

class Item(object):
pass

Where section contains a slice of the Item which im interested. I m
just not sure how I can access the contents of item without storing
it. ---If u get what I mean???
No. Not at all. "pass" is not very informative. Neither are
"representation" and "the string". Please tell us what you mean by
"slice". What is an "item", if it's not a "node"? Try listing out the
attributes of a node, with a couple of sample values for each, and then
we might get a clue.

What makes you think that you need to save memory?

What makes you think that you could save memory by splitting whatever
it is?

HTH,
John

Nov 25 '06 #7

P: n/a
Nathan Harmston wrote:
https://networkx.lanl.gov/
.......

I got it back just once, but when I clicked again I see this

RuntimeError Python 2.4.4c1: /usr/bin/python
Sat Nov 25 16:21:16 2006

A problem occurred in a Python script. Here is the sequence of function
calls leading up to the error, in the order they occurred.
/build/bdist.linux-x86_64/egg/tracrst/macro.py in
render_macro(self=<tracrst.macro.TracReSTMacro object>, req=<trac.web.api

.......

782 self.__dict__["_parent_pool"] = \
783 parent_pool or libsvn.core.application_pool;
784 if self.__dict__["_parent_pool"]:
self = <libsvn.repos.svn_repos_t; proxy of C svn_repos_t instance>,
self.__dict__ = {'this': <Swig Object of type 'svn_repos_t *'>},
parent_pool = <libsvn.core.apr_pool_t; proxy of C apr_pool_t instance>,
libsvn = <module 'libsvn' from
'/usr/lib/python2.4/site-packages/libsvn/__init__.pyc'>, libsvn.core =
<module 'libsvn.core' from
'/usr/lib/python2.4/site-packages/libsvn/core.pyc'>,
libsvn.core.application_pool = <libsvn.core.apr_pool_t; proxy of C
apr_pool_t instance>

RuntimeError: instance.__dict__ not accessible in restricted mode
args = ('instance.__dict__ not accessible in restricted mode',)
perhaps I'm seeing different apache processes or something
--
Robin Becker
Nov 25 '06 #8

P: n/a
Hi,

The idea is that I m going to use it to build graphs for sequence
alignment (at the moment), I read a discussion on the corebio
(reimplementation of biopython) group about using intervals to
represent sequence slices. The idea being that, my graph may contain
millions of alignments and storing the sequence (the actual ATGC) is
not required.

class Node(object):
pass

class Interval(Node):
_id = "gene1"
_start = 50
_end = 200
_strand = 1

class Sequence(object):
_sequence = "atgtcgtgagagagagttgtgag................."

So one interval on one sequence would align to another interval from
another sequence, but I want changes I make to the interval to be
reflected in the representation later. If I reverse complement it i
want the interval to store this information but the Sequence only
shows this later on when I call use it calling repr or str.

Do you get what I mean.
Many Thanks

Nathan
Nov 25 '06 #9

P: n/a

Nathan Harmston wrote:
Hi,

The idea is that I m going to use it to build graphs for sequence
alignment (at the moment), I read a discussion on the corebio
(reimplementation of biopython) group about using intervals to
represent sequence slices. The idea being that, my graph may contain
millions of alignments and storing the sequence (the actual ATGC) is
not required.

class Node(object):
pass

class Interval(Node):
_id = "gene1"
_start = 50
_end = 200
_strand = 1
What is the point of subclassing Node if it's just a dummy?
>
class Sequence(object):
_sequence = "atgtcgtgagagagagttgtgag................."

So one interval on one sequence would align to another interval from
another sequence, but I want changes I make to the interval to be
reflected in the representation later. If I reverse complement it i
want the interval to store this information but the Sequence only
shows this later on when I call use it calling repr or str.

Do you get what I mean.
Only vaguely. You use several terms which appear to be from your trade
jargon as they are not understandable when interpreted in either the
context of Python-speak or ordinary English e.g. "sequence",
"alignment", "ATGC", "reverse complement", "interval".

Two options:
(a) communicate understandably
(b) wait till your wontoks are back from holidays.

Nov 25 '06 #10

P: n/a
Hi,

It seems that by just going through the problem writing out a better
explanation for the reply I have figured out a solution and the
problem isnt as difficult as I thought it would be.

What is a wontok?

Thanks

Nathan

PS --the start of my reply:

class Interval(object):
_id = "gene1"
_start = 50
_end = 200
_strand = 1

class Sequence(object):
_sequence = "atgtcgtgagagagagttgtgag................."
Only vaguely. You use several terms which appear to be from your trade
jargon
Sequence is a string made from a restricted alphabet (A,T,G,C...).
Sequences can be aligned: 1 ATGCTGCAT
2 TAGCTGTTA
-------
2 5

I m trying to represent this as a graph Interval(id=1, start=2, end=6,
strand=1) ---edge------Interval(id=2, start=2, end=6, strand=1)

The problem is I was planning on storing the sequences in a dictionary
{id:Seq}, however each dictionary would represent a different source
of sequences. File1, File2....... (
STORE THE SOURCES AS A DICT AND HAVE SOURCE IN INTERVAL ASWELL
Nov 26 '06 #11

P: n/a

Nathan Harmston wrote:
Hi,

It seems that by just going through the problem writing out a better
explanation for the reply I have figured out a solution and the
problem isnt as difficult as I thought it would be.
Often happens.
>
What is a wontok?
It's Melanesian Pidgin (from the English "one talk") meaning a person
who speaks the same language as you, a member of your clan, ... the
context being that [at least in Papua New Guinea] there are relatively
many languages each with relatively not many speakers :-)
>
Thanks

Nathan

PS --the start of my reply:

class Interval(object):
_id = "gene1"
_start = 50
_end = 200
_strand = 1

class Sequence(object):
_sequence = "atgtcgtgagagagagttgtgag................."
Only vaguely. You use several terms which appear to be from your trade
jargon

Sequence is a string made from a restricted alphabet (A,T,G,C...).
Sequences can be aligned: 1 ATGCTGCAT
2 TAGCTGTTA
-------
2 5
I'm sure they can be, but appearances can be deceptive when you mix
tabs and spaces -- or whatever caused the above 4 lines to be not
vertically aligned but staggered diagonally like a flight of ducks
heading equatorwards for winter.

Sometimes a line of code (e.g. str1[2:6] == str2[2:6]) is worth a
thousand pictures :-)
>
I m trying to represent this as a graph Interval(id=1, start=2, end=6,
strand=1) ---edge------Interval(id=2, start=2, end=6, strand=1)

The problem is I was planning on storing the sequences in a dictionary
{id:Seq}, however each dictionary would represent a different source
of sequences. File1, File2....... (
STORE THE SOURCES AS A DICT
Mapping what keys to what values?
AND HAVE SOURCE IN INTERVAL ASWELL
So you had a data modelling problem. These are often better solved as a
separate step before you think about implementation details like
dictionaries.

Good luck with your project.

Cheers,
John

Nov 26 '06 #12

P: n/a
Nathan Harmston wrote:
Currently I am working on a generic graph library so I can do various
graph based analysis for various projects I have ideas for. Currently
I am implementing Graph as a wrapper around a dictionary. Currently my
implementation works like this:

t = Graph()
n1 = Node("Node1")
n2 = Node("Test2")
edge1 = Edge("Test3")
t += n1 { n1:{}}
t[n1][n2] = edge1 { n1:{n2:edge1}

However this isnt actually ending up with the structure I want. I want
it to finally end up as ...... { n1:{n2:edge1}, n2:{}}. Is
there anyway I can do this simply????
Nathan

By now you probably discovered that the networkx package can handle
this.
If I have this right, you want to create a digraph with
a directed edge from "Node1" to "Node2" and this edge
has the string "Test3" attached to it. In networkx, this is exacty what
the XDiGraph class was designed to do. Here DiGraph means
directed graph and the X means you are allowed to add (any)
data to the edge,for example:
>>import networkx as nx
t = nx.XDiGraph()
t.add_edge( "Node1", "Node2", "Test3")
Also I am looking at having a large graph and was wondering if anyone
knew of anyway I could reduce the memory requirements of this
structure and improve the speed of queries on it. I m thinking writing
a C extension for it....is this a good idea and where would I start?
Or does Python have some kind of transparent memory access module I
can implement.
Networkx was designed so that you can hook your own
C extension in. However, making it ispeed or memory efficient
is quite application dependent. I am still not clear as to exactly what

class of algorithms you want to implement via a string-interval
representation, and whether you demand exact alignment or whether
missing/incorrect data etc. is allowed as part of the alignment
problem.

HTH
Pieter Swart

Nov 26 '06 #13

This discussion thread is closed

Replies have been disabled for this discussion.