By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
445,645 Members | 1,048 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 445,645 IT Pros & Developers. It's quick & easy.

Standard graph API?

P: n/a
Is there any interest in a (hypothetical) standard graph API (with
'graph' meaning a network, consisting of nodes and edges)? Yes, we
have the standard ways of implementing graphs through (e.g.) dicts
mapping nodes to neighbor-sets, but if one wants a graph that's
implemented in some other way, this may not be the most convenient (or
abstract) interface to emulate. It might be nice to have the kind of
polymorphic freedom that one has with, e.g, with the DB-API. One could
always develop factories or adaptors (such as for PyProtocols) to/from
the dict-of-sets version...

So, any interest? Or am I just a lone nut in wanting this?

--
Magnus Lie Hetland "Canned Bread: The greatest thing since sliced
http://hetland.org bread!" [from a can in Spongebob Squarepants]
Jul 18 '05 #1
Share this Question
Share on Google+
25 Replies


P: n/a
Magnus Lie Hetland <mlh <at> furu.idi.ntnu.no> writes:
Is there any interest in a (hypothetical) standard graph API (with
'graph' meaning a network, consisting of nodes and edges)?


I don't need one right now, but I know I have a few times in the past.
Certainly seems like a good idea to me. We've got sets as builtins now, no
reason we shouldn't have a simple graph API, at least in the library.

Steve

Jul 18 '05 #2

P: n/a
Magnus Lie Hetland wrote:
Is there any interest in a (hypothetical) standard graph API (with
'graph' meaning a network, consisting of nodes and edges)? Yes, we
have the standard ways of implementing graphs through (e.g.) dicts
mapping nodes to neighbor-sets, but if one wants a graph that's
implemented in some other way, this may not be the most convenient (or
abstract) interface to emulate. It might be nice to have the kind of
polymorphic freedom that one has with, e.g, with the DB-API. One could
always develop factories or adaptors (such as for PyProtocols) to/from
the dict-of-sets version...

So, any interest? Or am I just a lone nut in wanting this?

Magnus,
A know I'd appreciate it. It could be used to configure
neural nets and logic networks; where this api would make
it easy to build an abstraction then "compile" it into a
faster representation for execution - or just run the
tree/graph in "interpreted" mode.
I don't think it would get a lot of use, but the use
would be high end.
wes

Jul 18 '05 #3

P: n/a
In article <sl***************@furu.idi.ntnu.no>,
ml*@furu.idi.ntnu.no (Magnus Lie Hetland) wrote:
Yes, we have the standard ways of implementing graphs through (e.g.)
dicts mapping nodes to neighbor-sets, but if one wants a graph that's
implemented in some other way, this may not be the most convenient
(or abstract) interface to emulate.


Actually, my interpretation of this standard way is as a fairly abstract
interface, rather than a specific instantiation such as dict-of-sets:
Most of the time, I merely require that iter(G) produces a sequence of
the vertices of graph G, and iter(G[v]) produces a sequence of neighbors
of vertex v. I also sometimes use "v in G" and "w in G[v]" to test
existence of vertices or edges.

Pros and cons of this approach:

- You can use a list instead of a set in the adjacency list part of the
representation. This may be faster and more space efficient when the
vertex degrees are small.

- It's easy to create test graphs as code literals

G1 = {
0: [1,2,5],
1: [0,5],
2: [0,3,4],
3: [2,4,5,6],
4: [2,3,5,6],
5: [0,1,3,4],
6: [3,4],
}
G2 = {
0: [2,5],
1: [3,8],
2: [0,3,5],
3: [1,2,6,8],
4: [7],
5: [0,2],
6: [3,8],
7: [4],
8: [1,3,6],
}

- Any indexable object can be a vertex. The vertex identities can be
something meaningful to your program. On the other hand, that means
(unless you know where your graph came from) you can't rely on the
vertices being special vertex objects with nice properties and you can't
use objects like None as flag values unless you're sure they won't be
vertices.

- It doesn't provide an abstract way of changing the graph (although
that's relatively easy if G is e.g. a dict of sets)

- It doesn't directly represent multigraphs

- It doesn't directly represent undirected graphs (instead you have to
replace an undirected edge by two directed edges and hope your callers
don't give you a directed graph by mistake).

- There isn't an explicit object representing an edge, although you can
create one by using a tuple (v,w) or (for undirected edges) a set. This
can be an advantage in terms of memory usage but a disadvantage in terms
of number of object creations. Also it means that if you want to store
information on the edges you have to use a dict indexed by the edge
instead of attributes on an edge object (probably better style anyway
since it prevents different algorithms on the same graph from colliding
with each other's attributes).

--
David Eppstein
Computer Science Dept., Univ. of California, Irvine
http://www.ics.uci.edu/~eppstein/
Jul 18 '05 #4

P: n/a
+1 for standard graph API!

I don't have a "high-end" use for it, but I did write a program which
graphs the revision history of a software repository. It would have been
nice to have most of that code in a library, and if such a library
existed, it would probably implement operations I was too lazy to
implement, such as coloring.

On Mon, Aug 23, 2004 at 06:43:35PM +0000, wes weston wrote:
Magnus Lie Hetland wrote:
Is there any interest in a (hypothetical) standard graph API (with
'graph' meaning a network, consisting of nodes and edges)? Yes, we
have the standard ways of implementing graphs through (e.g.) dicts
mapping nodes to neighbor-sets, but if one wants a graph that's
implemented in some other way, this may not be the most convenient (or
abstract) interface to emulate. It might be nice to have the kind of
polymorphic freedom that one has with, e.g, with the DB-API. One could
always develop factories or adaptors (such as for PyProtocols) to/from
the dict-of-sets version...

So, any interest? Or am I just a lone nut in wanting this?

Magnus,
A know I'd appreciate it. It could be used to configure
neural nets and logic networks; where this api would make
it easy to build an abstraction then "compile" it into a
faster representation for execution - or just run the
tree/graph in "interpreted" mode.
I don't think it would get a lot of use, but the use
would be high end.
wes

--
http://mail.python.org/mailman/listinfo/python-list

Jul 18 '05 #5

P: n/a
In article <ma**************************************@python.o rg>,
Phil Frost <in****@bitglue.com> wrote:
+1 for standard graph API!

I don't have a "high-end" use for it, but I did write a program which
graphs the revision history of a software repository. It would have been
nice to have most of that code in a library, and if such a library
existed, it would probably implement operations I was too lazy to
implement, such as coloring.


I have a random sample of graph algorithms implemented in
http://www.ics.uci.edu/~eppstein/PADS/

I use the existing Guido-standard graph representation, that is:
iter(G) and iter(G[v]) list vertices of G and neighbors of v in G
v in G and w in G[v] test existence of vertices and edges in G

It includes both simple basic graph algorithm stuff (copying a graph, a
DFS implementation that works non-recursively so it doesn't run into
Python's recursion limit) and some much more advanced algorithms (e.g.
non-bipartite maximum matching).

--
David Eppstein
Computer Science Dept., Univ. of California, Irvine
http://www.ics.uci.edu/~eppstein/
Jul 18 '05 #6

P: n/a
"Magnus Lie Hetland" <ml*@furu.idi.ntnu.no> wrote in message
news:slrncik8tm.4g.ml*@furu.idi.ntnu.no...
Is there any interest in a (hypothetical) standard graph API (with
'graph' meaning a network, consisting of nodes and edges)? Yes, we
have the standard ways of implementing graphs through (e.g.) dicts
mapping nodes to neighbor-sets, but if one wants a graph that's
implemented in some other way, this may not be the most convenient (or
abstract) interface to emulate. It might be nice to have the kind of
polymorphic freedom that one has with, e.g, with the DB-API. One could
always develop factories or adaptors (such as for PyProtocols) to/from
the dict-of-sets version...

So, any interest? Or am I just a lone nut in wanting this?

--
Magnus Lie Hetland "Canned Bread: The greatest thing since sliced
http://hetland.org bread!" [from a can in Spongebob Squarepants]


Not sure if this falls under the category of an API, but it may be relevant
to what you are doing.

This is a Python API to the Graphviz DOT language:
http://dkbza.org/pydot.html

So I think this is evidence you are not alone.

-- Paul
Jul 18 '05 #7

P: n/a
Magnus Lie Hetland wrote:
So, any interest? Or am I just a lone nut in wanting this?


I have often needed to use simple graph concepts and wrote a bunch
of code, then at some point I have started to unify it and (slowly)
put together a consistent model. When my research brings me back to
graphs I'll try to finish it.

http://www.personal.psu.edu/staff/i/...graphlib/html/

I do have a lot functionality working, you can associate arbitrary
data with the nodes and edges, bfs, dfs, topological sort,graph
generation, David Epstein's python dijkstra algorithm, graphviz
visualization.

Istvan.
Jul 18 '05 #8

P: n/a
On Mon, 23 Aug 2004 11:58:15 -0700, David Eppstein wrote:
- It doesn't directly represent multigraphs

- It doesn't directly represent undirected graphs (instead you have to
replace an undirected edge by two directed edges and hope your callers
don't give you a directed graph by mistake).

- There isn't an explicit object representing an edge, although you can
create one by using a tuple (v,w) or (for undirected edges) a set.


I think these three things speak to why there isn't a graph type and
probably won't be one any time soon; unlike "Sets", there are just too
many types of "graphs" in use, all fundamentally different in
implementation, and with all differences having massive performance
implications. As you indirectly point out, each of the following is an
independent dimension:

* Directed, undirected
* Multi or non-multi
* Explicit edges/explicit nodes with links/node and edge objects
* Simple and fast implementation of nodes/nodes and edges that take
attributes

That's a good 24 possible types of graph library, each with implications
w.r.t. algorithms and performance.

While the abstract idea of a standard graph library is appealing to some
people, any actual concrete implementation will likely leave the majority
of people who want to use it out in the cold, resulting either in
something only useful in the simplest of cases, or suffering from major
feeping creaturitis as it tries to cover too many bases at once.

Jul 18 '05 #9

P: n/a
Magnus Lie Hetland wrote:
Is there any interest in a (hypothetical) standard graph API (with
'graph' meaning a network, consisting of nodes and edges)? Yes, we
have the standard ways of implementing graphs through (e.g.) dicts
mapping nodes to neighbor-sets, but if one wants a graph that's
implemented in some other way, this may not be the most convenient (or
abstract) interface to emulate. It might be nice to have the kind of
polymorphic freedom that one has with, e.g, with the DB-API. One could
always develop factories or adaptors (such as for PyProtocols) to/from
the dict-of-sets version...

So, any interest? Or am I just a lone nut in wanting this?


Magnus,
Do you have any design thoughts. It would be good to have weighted,
directed graphs and depth first traversal.
wes

Jul 18 '05 #10

P: n/a
In article <ep****************************@news.service.uci.e du>,
David Eppstein wrote:
In article <sl***************@furu.idi.ntnu.no>,
ml*@furu.idi.ntnu.no (Magnus Lie Hetland) wrote:
Yes, we have the standard ways of implementing graphs through (e.g.)
dicts mapping nodes to neighbor-sets, but if one wants a graph that's
implemented in some other way, this may not be the most convenient
(or abstract) interface to emulate.
Actually, my interpretation of this standard way is as a fairly abstract
interface, rather than a specific instantiation such as dict-of-sets:
Most of the time, I merely require that iter(G) produces a sequence of
the vertices of graph G, and iter(G[v]) produces a sequence of neighbors
of vertex v. I also sometimes use "v in G" and "w in G[v]" to test
existence of vertices or edges.


Yes, I agree, to some extent. I guess the problems start when you want
to manipulate the graph. I think it would be nice to be able to use an
empty graph object to build a given graph without knowing the
implementation. I guess you could do that in this implementation too
(if all the neighbor sets were initialized).

But if this does turn out to be an acceptable API, I'm all for it. I
just think it would be nice to have a Recommended Standard(tm), to
create interoperability.

[snip]- It doesn't provide an abstract way of changing the graph (although
that's relatively easy if G is e.g. a dict of sets)
Right.
- It doesn't directly represent multigraphs


Unless you insist on having neighbor-sets, it does, doesn't it?
Neighbor-lists can be used for this...?

[snip]

--
Magnus Lie Hetland "Canned Bread: The greatest thing since sliced
http://hetland.org bread!" [from a can in Spongebob Squarepants]
Jul 18 '05 #11

P: n/a
In article <pa****************************@jerf.org>, Jeremy Bowers
wrote:
[snip]

As I tried to state in the original post (I probably wasn't clear
enough) I'm not talking about a standard *implementation*, just a
standard *API*, like the DB-API. This could easily cover all kinds of
strange beasts such as directed or undirected, weighted or unweighted
(etc.) graphs; multigraphs, chain graphs, hypergraphs, who knows.

I'm basically just suggesting that it might be useful to have a
"standard" interface for these things. It may be that the simple de
facto standard that David cites is sufficient (although it certainly
doesn't cover hypergraphs -- but that's possibly going a bit too far
anyway.)

--
Magnus Lie Hetland "Canned Bread: The greatest thing since sliced
http://hetland.org bread!" [from a can in Spongebob Squarepants]
Jul 18 '05 #12

P: n/a
In article <iG*******************@fe2.texas.rr.com>, Paul McGuire wrote:
[snip]
Not sure if this falls under the category of an API, but it may be
relevant to what you are doing.

This is a Python API to the Graphviz DOT language:
http://dkbza.org/pydot.html
Yes, I found that -- probably quite useful.
So I think this is evidence you are not alone.
:)
-- Paul


--
Magnus Lie Hetland "Canned Bread: The greatest thing since sliced
http://hetland.org bread!" [from a can in Spongebob Squarepants]
Jul 18 '05 #13

P: n/a
In article <hI********************@giganews.com>, Istvan Albert wrote:
[snip]
http://www.personal.psu.edu/staff/i/...graphlib/html/


I've only looked at it briefly, but it looks quite nice indeed.

--
Magnus Lie Hetland "Canned Bread: The greatest thing since sliced
http://hetland.org bread!" [from a can in Spongebob Squarepants]
Jul 18 '05 #14

P: n/a
In article
<Ys*********************@bgtnsc04-news.ops.worldnet.att.net>, wes
weston wrote:
[snip]

Magnus,
Do you have any design thoughts. It would be good to have weighted,
directed graphs and depth first traversal.
I've thought of several alternatives; basically, I just thought about
defining the "standard" API for the basic abstract data type
(including weights, direction, labels, colours etc.). Concrete
implementations and algorithms would be a separate issue.
wes


--
Magnus Lie Hetland "Canned Bread: The greatest thing since sliced
http://hetland.org bread!" [from a can in Spongebob Squarepants]
Jul 18 '05 #15

P: n/a
In article <sl****************@furu.idi.ntnu.no>,
ml*@furu.idi.ntnu.no (Magnus Lie Hetland) wrote:
- It doesn't directly represent multigraphs


Unless you insist on having neighbor-sets, it does, doesn't it?
Neighbor-lists can be used for this...?


If you're doing anything serious with a multigraph you need to have some
way of distinguishing different edges between the same pair of vertices.
For instance, an edge object for each edge, that you can use as an index
to store information about that edge. A neighbor list that has multiple
copies of the same neighbor won't let you do that, you can iterate
through the edges but not distinguish one from another.

Another possibility, which fits into the same general abstract API but
is more specialized, would be to represent a multigraph by a dict of
dicts, where the outer dict maps each vertex to its neighbors and the
inner dict maps each neighbor to the number of edges; then you could
represent each edge by a tuple (v,w,index) with index in range(G[v][w]).

--
David Eppstein
Computer Science Dept., Univ. of California, Irvine
http://www.ics.uci.edu/~eppstein/
Jul 18 '05 #16

P: n/a
In article <sl****************@furu.idi.ntnu.no>,
ml*@furu.idi.ntnu.no (Magnus Lie Hetland) wrote:
Do you have any design thoughts. It would be good to have weighted,
directed graphs and depth first traversal.


I've thought of several alternatives; basically, I just thought about
defining the "standard" API for the basic abstract data type
(including weights, direction, labels, colours etc.). Concrete
implementations and algorithms would be a separate issue.


I would strongly prefer not to have weights or similar attributes as
part of a graph API. I would rather have the weights be a separate dict
or function or whatever passed to the graph algorithm. The main reason
is that I might want the same algorithm to be applied to the same graph
with a different set of weights.

A secondary reason is that we already have in Python a good general
mechanism (dicts) for associating arbitrary information with objects, I
don't see a need for reinventing a more specific mechanism for doing so
when the objects are pieces of graphs and the information is some list
of weight, label, etc that some graph API designer thinks is sufficient.

I think this may contradict some things I said a year or two ago about
using a dict-of-dicts representation in which G[v][w] provides the
weight; I've changed my mind.

--
David Eppstein
Computer Science Dept., Univ. of California, Irvine
http://www.ics.uci.edu/~eppstein/
Jul 18 '05 #17

P: n/a
On Mon, 23 Aug 2004 20:41:53 +0000, Magnus Lie Hetland wrote:
enough) I'm not talking about a standard *implementation*, just a
standard *API*, like the DB-API. This could easily cover all kinds of
strange beasts such as directed or undirected, weighted or unweighted
(etc.) graphs; multigraphs, chain graphs, hypergraphs, who knows.


I believe the equivalent thing in the C++ world is the Boost Graph Library
described here:

http://www.boost.org/libs/graph/doc/..._contents.html
http://www.awprofessional.com/bookst...sbn=0201729148

I tried using it once, and it was so horrendously complicated that I gave
up. Some of the complexity came from having to abstract all the different
kinds of graphs that were supported, but a lot of it was also a result of
the static nature of C++.

Still, it may be useful as a source of ideas and/or warning of problems.

-param

Jul 18 '05 #18

P: n/a
On Mon, 23 Aug 2004 20:41:53 +0000, Magnus Lie Hetland wrote:
In article <pa****************************@jerf.org>, Jeremy Bowers wrote:
[snip]

As I tried to state in the original post (I probably wasn't clear enough)
I'm not talking about a standard *implementation*, just a standard *API*,
like the DB-API. This could easily cover all kinds of strange beasts such
as directed or undirected, weighted or unweighted (etc.) graphs;
multigraphs, chain graphs, hypergraphs, who knows.


Point conceded about API and not a library, but I'm not sure that changes
my point much. Your API is going to assume something about how edges are
represented (which will conflict with somebody), *or* it will be so vague
as to not have any particular advantage over the nothing we have now. And
so on for most of the other dimensions.
Jul 18 '05 #19

P: n/a
In article <ep****************************@news.service.uci.e du>,
David Eppstein wrote:
[snip]
I would strongly prefer not to have weights or similar attributes as
part of a graph API. I would rather have the weights be a separate dict
or function or whatever passed to the graph algorithm. The main reason
is that I might want the same algorithm to be applied to the same graph
with a different set of weights.
I can see that.

[snip stuff about using dicts]

This can be said about all objects, really; no reason to have
attributes as long as we can associate values with the objects using
dicts. This is where things start to look implementation-specific,
even though it is *possible* to keep it abstract using this interface.

One of my motivations is allowing arbitrary structures behind the
scenes, e.g. the graph may be a front-end for something that is
computed on-the-fly using specialized hardware (in fact a very real
possibilit in my case). I could have something like this be
represented by several distinct objects (e.g. one for the topological
structure, one for the weights), of course, but I'd certainly have to
implement the weight mapping myself, and not use built-in dicts.

I do think your API is nice in that it is simple, but I also have the
feeling that using it with other implementations would be sort of
unnatural; one would be trying to emulate the "dict-of-lists with
dicts for properties" implementation, because that implementation
*would* have been simple -- had one used it.

Also, again, this doesn't lend itself very well to manipulating
graphs. If I set one weight to infinity, I might expect (perhaps) the
corresponding edge to disappear (otherwise the graph would have to be
complete in the first place) or similar things; there may also be
other dependencies between properties. Not easily handled with a
separate object for each property. And using functions for everything
that needs calculating doesn't easily lead to polymorphism...

It's not horribly inconvenient, of course (just a matter of defining a
few objects referring to the same underlying mechanism, each with
a different __getitem__ method). I'm just airing my thoughts about why
it *might* be useful to have something else -- possibly in addition.

Perhaps one could have something like two levels? The Level 1 Graph
API would support the "graph as mapping from nodes to neighbors" with
"properties as separate mappings" and the Level 2 Graph API could add
some convenience methods/properties for encapsulation and
manipulation?

[snip] I think this may contradict some things I said a year or two ago
about using a dict-of-dicts representation in which G[v][w] provides
the weight;
Yeah, I remember you saying that :)
I've changed my mind.


Fair enough. FWIW, I agree with your new position when it comes to the
simple dict-based implementation; this is basically how it's done in
pseudocode, usually (e.g. having pi[v] represent the predecessor of v
and so forth).

--
Magnus Lie Hetland "Canned Bread: The greatest thing since sliced
http://hetland.org bread!" [from a can in Spongebob Squarepants]
Jul 18 '05 #20

P: n/a
I've been off-line for a couple months so a bit late
following up to this thread...

I too would like some standard collection of graph
algorithms, but not necessarily a standard API. I
work with a lot of molecular graphs which means I
would prefer using names like "atoms" and "bonds"
instead of "nodes" and "edges".

David Eppstein wrote:
I would strongly prefer not to have weights or similar attributes as
part of a graph API. I would rather have the weights be a separate dict
or function or whatever passed to the graph algorithm. The main reason
is that I might want the same algorithm to be applied to the same graph
with a different set of weights.


An alternative, which is both intriguing and sends alarm
bells ringing in my head is to have the algorithm
collection instead generate code as needed, so that
I can ask for, say, "depth first search using '.atoms'
to get a list of neighboring nodes." The result could
exec'ed to generate usable Python dynamically, or
written to a file to be used as a normal Python module.

In this case if the DFS generates callback events then
it could include options to only create the code for
events that are needed.

There would need to be some standards on how the
graph is used, like "two nodes are the same iff
'node1 is node2'" or "the result of getting the list
of edges is iterable."

I believe this design philosophy is similar to Boost's.

Andrew
da***@dalkescientific.com
Jul 18 '05 #21

P: n/a
In article <vC*****************@newsread1.news.pas.earthlink. net>,
Andrew Dalke wrote:
I've been off-line for a couple months so a bit late
following up to this thread...
Well, I'm still looking for replies, so... ;)
I too would like some standard collection of graph
algorithms, but not necessarily a standard API.
Hm. How would the algorithms work without a standard API?

To clarify, by API I here mean a protocol, i.e. a definition of how a
graph is supposed to behave -- *not* a standard implementation.
I work with a lot of molecular graphs which means I would prefer
using names like "atoms" and "bonds" instead of "nodes" and "edges".
Sure -- but unless we have some standard protocol/API, it would be
hard to get the algorithms to work with your graphs...

I've been thinking a bit about the use of object adaptation here; I
think it would be quite perfect. One possibility would be to use the
functionality of PyProtocols, but that's hardly standard... But it
would allow you to do stuff like

graph = AdjacencyMap(someStrangeMolecularGraph)
# or graph = adapt(someStrange..., AdjacencyMap)
graphAlgorithm(graph)

or the like...

We could then have a few different perspectives on graphs, such as
adjmap, incmap, adjarray and edgelist, for example.

Just loose thoughts.

An adjacency map would work just like a dict of lists (or something
equivalent) and a dict of list could be used as one -- except,
perhaps, for some "advanced" features which would be optional.
(Modifying the graph is a bit awkward with this syntax, especially as
you have to know whether the adjacency collection is a list or a set,
for example.)
David Eppstein wrote:
I would strongly prefer not to have weights or similar attributes as
part of a graph API. I would rather have the weights be a separate dict
or function or whatever passed to the graph algorithm. The main reason
is that I might want the same algorithm to be applied to the same graph
with a different set of weights.
An alternative, which is both intriguing and sends alarm bells
ringing


Sounds like a fun combination ;)
in my head is to have the algorithm
collection instead generate code as needed, so that
I can ask for, say, "depth first search using '.atoms'
to get a list of neighboring nodes."
Wouldn't it be *much* better to use the established (but not standard)
mechanism of object adaptation, as championed in PEP 246 and as
implemented and extended upon in PyProtocols?

Note that we could certainly do with *only* the PEP 246 mechanism
(perhaps with some minor updates to the PEP) *without* PyProtocols, if
we wanted a more minimalist solution.

(If only adapt() could become a standard library function... <sigh> ;)
The result could exec'ed to generate usable Python dynamically, or
written to a file to be used as a normal Python module.
See above -- adapt() would be much better IMO, and would fill the same
need (as I see it).

[snip]
There would need to be some standards on how the graph is used, like
"two nodes are the same iff 'node1 is node2'"
Yeah. Or one might want to allow equality to determine this, so that
the implementer could decide how to do it (nodes might be dynamically
created front-end objects after all).
or "the result of getting the list of edges is iterable."


Right -- that's typically the kind of API definition I'm after.

Basically what I was proposing is a sort of "Python Graph API" in the
same vein as the "Python DB API" (PEP 249), that is, simply an
informative PEP about how graphs should behave to ensure
interoperability, possibly with some standard wrapper functions (or
maybe not).

[snip]

It seems there are at least a few people who are interested in the
general concept, though. In case there is any merit in the idea, I've
set up a temporary wiki at

http://www.idi.ntnu.no/~mlh/python-graph-wiki.cgi

I'll post a separate announcement about it.

--
Magnus Lie Hetland The time you enjoy wasting is not wasted time
http://hetland.org -- Bertrand Russel
Jul 18 '05 #22

P: n/a
Magnus Lie Hetland wrote:
Hm. How would the algorithms work without a standard API?
There are certain things the different graphs have in
common. For example,
1. "test if two nodes/edges are the same"
2. "test if two nodes/edges are identically colored"
3. "list of all nodes adjacent to a node"
4. "list of all edges from a node" (either directed or
undirected)
5. "get the appropriate weight"

Different graphs have different ways to do this.
My molecular graphs do this with

1. atom1 is atom2
2. atom_compare(atom1, atom2)
3. atom.xatoms
4. atom.bonds
5. atom.weight # though that's the atomic weight and
# nothing to do with flow :)

An adjacency graph might do this with

1. node1 is node2
2. node1 == node2
3. edge_table[node]
4. -- not defined --
5. weights[node]

The ways to get the properties differ but the things
you do with them do not change.

I can concieve then some way to generate code
based on a template, like this

dfs = make_code(dfs_template,
args = "node, handler",
bond_neighbors = "node.xatoms",
on_enter = "handler.enter(node)")

.. make the graph ...
class Handler:
def enter(self, node):
print "Hi", node

dfs(graph, Handler())

or for an adjacency graph, something like

dfs = make_code(dfs_template,
args = "bond_table, node, handler",
get_neighbors = "bond_table[node]",
on_enter = "handler.enter(node)")
...
dfs(bond_table, start_node, handler)

Obviously it would need some sort of templating language.
Wonder if anyone's ever make one of those before. ;)
To clarify, by API I here mean a protocol, i.e. a definition of how a
graph is supposed to behave -- *not* a standard implementation.
I think we're talking about the same thing -- the sorts
of things you can do with nodes and edges, and not a
statement that a node or edge has a given property or method.
I've been thinking a bit about the use of object adaptation here; I
think it would be quite perfect. One possibility would be to use the
functionality of PyProtocols, but that's hardly standard... But it
would allow you to do stuff like

graph = AdjacencyMap(someStrangeMolecularGraph)
# or graph = adapt(someStrange..., AdjacencyMap)
graphAlgorithm(graph)
The problem is all the conversion from/to my graph
form to the standard graph form. Either the adapters
have to be live (so the modification to the standard
form get propogated back to my graph) or there needs
to be conversions between the two. Both sound slow.
David Eppstein wrote:
I would strongly prefer not to have weights or similar attributes as
part of a graph API. I would rather have the weights be a separate dict
or function or whatever passed to the graph algorithm.

Me:An alternative, which is both intriguing and sends alarm bells
ringing
Magnus Sounds like a fun combination ;)
There the idea for me would be

find_flow = make_code(flow_template,
args = "source_nodes, sink_nodes, weight_table",
edges = "node.out_edges",
weight = "weight_table[edge]")

or to use a weight function

find_flow = make_code(flow_template,
args = "source_nodes, sink_nodes, weight_func",
edges = "node.out_edges",
weight = "weight_func(edge)")
Wouldn't it be *much* better to use the established (but not standard)
mechanism of object adaptation, as championed in PEP 246 and as
implemented and extended upon in PyProtocols?
Not really. Consider the events that could occur in a
DFS. There's
- on enter
- on exit
- on backtrack
and probably a few more that could be used in a general
purpose DFS. But I might need only one of them. With
a PE 246 adapter I can adapt my graph to fit the algorithm,
but if I don't need all of those events there's no way
to adapt the algorithm to fit my needs.

(Yes, even though I'm using Python I'm still worried
about efficiency.)

(If only adapt() could become a standard library function... <sigh> ;)
Perhaps someday, when people get more experience with it.
I've not yet used it.
There would need to be some standards on how the graph is used, like
"two nodes are the same iff 'node1 is node2'"

Yeah. Or one might want to allow equality to determine this, so that
the implementer could decide how to do it (nodes might be dynamically
created front-end objects after all).


I found that 'is' testing for my graphs is much better.
At the very least, it's a lot faster (no method call overhead).
It seems there are at least a few people who are interested in the
general concept, though. In case there is any merit in the idea, I've
set up a temporary wiki at

http://www.idi.ntnu.no/~mlh/python-graph-wiki.cgi

I'll post a separate announcement about it.


It said "warning, in use, you might want to wait about 10
minutes before editing."

I think it's been about 10 minutes now. :)

Andrew
da***@dalkescientific.com
Jul 18 '05 #23

P: n/a
In article <Sj******************@newsread1.news.pas.earthlink .net>,
Andrew Dalke wrote:
Magnus Lie Hetland wrote:
Hm. How would the algorithms work without a standard API?
There are certain things the different graphs have in
common. For example,
1. "test if two nodes/edges are the same"
2. "test if two nodes/edges are identically colored"
3. "list of all nodes adjacent to a node"
4. "list of all edges from a node" (either directed or
undirected)
5. "get the appropriate weight">


Right. My idea was that we could define a standard API for this
functionality, as in (e.g.) the DB API. Your code generating idea is
an alternative.
Different graphs have different ways to do this.
Yes.

[snip]The ways to get the properties differ but the things
you do with them do not change.
Indeed.
I can concieve then some way to generate code
based on a template, like this [code generating template example snipped]

Hm. What you're basically is proposing is (as you said, basically ;) a
C++-like approach (as used in Boost)?

Or -- this goes beyond the standard C++ approach, of course, as that
would basically be a way of getting "standard Python functionality" in
a static language.

Your idea is interesting -- but quite a way from what I had
envisioned... Would there be a way to combine the ideas? I.e. define
an (abstract) interface to standard graph functionality and in
addition have this sort of template stuff? It might be a bit far
fetched, though, as it seems your template idea obviates the need for
a standard interface.

I need the interface for when I write my own algorithms -- I don't
have as much need for templated stock algorithms; so it seems our
needs may be somewhat complementary, perhaps?

[snip]Obviously it would need some sort of templating language.
Wonder if anyone's ever make one of those before. ;)
Hehe.
To clarify, by API I here mean a protocol, i.e. a definition of how a
graph is supposed to behave -- *not* a standard implementation.


I think we're talking about the same thing -- the sorts
of things you can do with nodes and edges, and not a
statement that a node or edge has a given property or method.


Well -- I was rather aiming for a definition of a graph protocol
(similar to e.g. the sequence protocol or the mapping protocol), and
that would entail fixing the methods and attributes involved.

[snip]The problem is all the conversion from/to my graph
form to the standard graph form. Either the adapters
have to be live
Sure -- that wouldn't be a problem, would it?
(so the modification to the standard
form get propogated back to my graph) or there needs
to be conversions between the two. Both sound slow.
Maybe. I guess it depends on how you implement things. For example, if
it's only a matter of renaming methods, that wouldn't entail any
performance loss at all (you could just have another attribute bound
to the same method).

[snip]There the idea for me would be

find_flow = make_code(flow_template,
args = "source_nodes, sink_nodes, weight_table",
edges = "node.out_edges",
weight = "weight_table[edge]")
But coding this doesn't seem like much fun.

My hope was that I could write new graph algorithms so they looked
somewhat pretty and readable. Having to write them in some new
template langage doesn't seem very tempting.

(As I said, stock algorithms aren't of that much interest to me.)
Wouldn't it be *much* better to use the established (but not standard)
mechanism of object adaptation, as championed in PEP 246 and as
implemented and extended upon in PyProtocols?


Not really. Consider the events that could occur in a
DFS. There's
- on enter
- on exit
- on backtrack
and probably a few more that could be used in a general
purpose DFS. But I might need only one of them. With
a PE 246 adapter I can adapt my graph to fit the algorithm,
but if I don't need all of those events there's no way
to adapt the algorithm to fit my needs.


I don't see how this has anything to do with the issue at hand,
really.

The person who implemented the DFS could deal with this issue (by
parametrizing it) or something. (*Or* using adaptation, for that
matter.) I'm sure there are many other ways of dealing with this...
IMO this is orthogonal. Whether or not you can tell the DFS how you
access the neighbors of a node isn't directly related to what you want
the DFS to do...
(Yes, even though I'm using Python I'm still worried
about efficiency.)
Well -- you could always make a C library for your graph structures,
with an optional interface (without any performance loss) that
conformed to the Hypothetical Graph API(tm), somewhat reminiscent of
how the DB API works...
(If only adapt() could become a standard library function... <sigh>
;)


Perhaps someday, when people get more experience with it.
I've not yet used it.


I agree with you -- one ought to get more experience with it. I have
not used it seriously myself. I think it might have quite a bit of
potential in this sort of situation, though.
There would need to be some standards on how the graph is used, like
"two nodes are the same iff 'node1 is node2'"


Yeah. Or one might want to allow equality to determine this, so that
the implementer could decide how to do it (nodes might be dynamically
created front-end objects after all).


I found that 'is' testing for my graphs is much better.


The problem is that it is impossible to override.

If I were to use some special hardware as the source of my graph
(which is a realistic example for me) I might have to create node
wrappers on the fly. Unless I use the Singleton pattern (the Borg
pattern wouldn't work) I could easily end up with equal but
non-identical nodes, and no way of overriding this.
At the very least, it's a lot faster (no method call overhead).
Right.

[snip wiki stuff]It said "warning, in use, you might want to wait about 10
minutes before editing."
Right -- I was making some changes ;)
I think it's been about 10 minutes now. :)


Yup. Just go ahead and add stuff.

--
Magnus Lie Hetland The time you enjoy wasting is not wasted time
http://hetland.org -- Bertrand Russel
Jul 18 '05 #24

P: n/a
Magnus Lie Hetland wrote:
Yup. Just go ahead and add stuff.
I tried. I can view the page and get to the edit page
but I can neither save it nor preview it. My browser
times out. Here are my two sets of comments
Brian Kelley's "frowns" packages
(http://staffa.wi.mit.edu/people/kelley/ ) has graph code appropriate
for a chemical informatics library, including a wrapper to the VFlib
isomorphism graph library. Andrew Dalke has some clique finding
code at http://starship.python.net/crew/dalke/clique/ .
and

(Andrew Dalke speaking here.)

It's pretty uniformly agreed that there is no standard graph
representation, if for no other reason than that my nodes are called
"atoms" and my edges are called "bonds" and I don't care about neither
multigraphs nor directed edges. ;)

Magnus suggests that adapters is the right approach. My concern with
that is two-fold. First, the adapter layer will cause a non-trivial
overhead -- at least an extra function call for every graph-specific
action. Second, the algorithm can't be specialized for the graph at
hand. (Eg, in a DFS the algorithm may generate events for node enter,
exit, and backtrack, while I may only need one of those.)

My thought is to use some sort of templating system. If done well the
actual code might even be Python, with a naming scheme to allow
replacements appropriate to the data structure (eg, to use
"weights[edge]" or "edge.size" for a flow algorithm) and remove code
blocks not needed for a given algorithm (eg, don't include a callback
code if it isn't used.)

This would depend on having standard ideas of what you can do with a
graph. Some include "iterate over all nodes", "compare two nodes for
equivalence", "get all edges for a given node."


Now responding to your post
Your idea is interesting -- but quite a way from what I had
envisioned... Would there be a way to combine the ideas? I.e. define
an (abstract) interface to standard graph functionality and in
addition have this sort of template stuff? It might be a bit far
fetched, though, as it seems your template idea obviates the need for
a standard interface.
As I outlined above, I believe it should be possible to build a
templating system on top of working Python code. The code
uses the 'standard' API and the templating system gives a way
to conform the algorithm to the graph at hand.

I think many people will just use the standard scheme,
but that enough people have different needs (like chemistry)
that it's worth the effort to be malleable that way.
Well -- I was rather aiming for a definition of a graph protocol
(similar to e.g. the sequence protocol or the mapping protocol), and
that would entail fixing the methods and attributes involved.
Ahh, then we are talking about different but related things.
I don't think it's possible to have a graph protocol like
mention which is good enough for all graph systems. Eg,
does the weight for a max flow calculation come from a
property of the edge, from a table passed into the function,
or from a callable? If a table, is it indexed by id(node),
by the node itself (is the node hashable?) or by some unique
id given to all nodes?

I know there are lessons to learn from both LEDA and
the Boost Graph Library. Sadly, I don't know them.
Maybe. I guess it depends on how you implement things. For example, if
it's only a matter of renaming methods, that wouldn't entail any
performance loss at all (you could just have another attribute bound
to the same method).
I do not want that. Which will users of my library use,
"edges" or "bonds"? This sort of choice is bad.
But coding this doesn't seem like much fun.

My hope was that I could write new graph algorithms so they looked
somewhat pretty and readable. Having to write them in some new
template langage doesn't seem very tempting.

(As I said, stock algorithms aren't of that much interest to me.)
What I need to do is take a look at, say, David Eppstein's
set of algorithms and see if my templating idea can work
without being too onerous.
I don't see how this has anything to do with the issue at hand,
really.

The person who implemented the DFS could deal with this issue (by
parametrizing it) or something. (*Or* using adaptation, for that
matter.) I'm sure there are many other ways of dealing with this...
IMO this is orthogonal. Whether or not you can tell the DFS how you
access the neighbors of a node isn't directly related to what you want
the DFS to do...
Mmm, in some way you are right. You've been talking
about adapting the data structure to work with the
algorithm. I've been talking about adapting the algorithm
to deal with the data structure.
I found that 'is' testing for my graphs is much better.

The problem is that it is impossible to override.


That's why it's faster. :)
If I were to use some special hardware as the source of my graph
(which is a realistic example for me) I might have to create node
wrappers on the fly. Unless I use the Singleton pattern (the Borg
pattern wouldn't work) I could easily end up with equal but
non-identical nodes, and no way of overriding this.


I had experience using a C library for the underlying
graph data. It returned opaque integers for object
references. Every time I got one I wrapped it inside
a new Python object. So I could have different Python
objects for the same underlying object.

It turned out the some algorithms used so many "have
I seen this atom/bond before?" tests that the switch
from == to 'is' made a big difference.

On the other hand, it did mean I needed to convert from
my library's natural style of graph into the form that
allowed 'is' tests.

Andrew
da***@dalkescientific.com
Jul 18 '05 #25

P: n/a
Andrew Dalke <ad****@mindspring.com> writes:
Magnus Lie Hetland wrote:
Hm. How would the algorithms work without a standard API?
There are certain things the different graphs have in
common. For example,
1. "test if two nodes/edges are the same"
2. "test if two nodes/edges are identically colored"
3. "list of all nodes adjacent to a node"
4. "list of all edges from a node" (either directed or
undirected)
5. "get the appropriate weight"

Different graphs have different ways to do this.


[...]
The ways to get the properties differ but the things
you do with them do not change.

I can concieve then some way to generate code
based on a template, like this

dfs = make_code(dfs_template,
args = "node, handler",
bond_neighbors = "node.xatoms",
on_enter = "handler.enter(node)")

.. make the graph ...
class Handler:
def enter(self, node):
print "Hi", node

dfs(graph, Handler())
Yuk.

This is *exactly* the type of thing that adaptation (PEP 246,
PyProtocols) is designed to support:

# pseudo-code here, I don't know the exact PEP 246 code form off
# the top of my head.
class IGraph(Protocol):
def nodes_same(n1, n2):
"No implementation here - this is just a protocol"
# etc

class IDFSVisitor(Protocol):
# mode protocol defns...

def dfs(g, h):
g = adapt(g, IGraph)
h = adapt(h, IDFSVisitor)
# code, using standard method names

Now, for your molecular graphs, you just write an adapter to make the
graph conform to the IGraph protocol. In theory, the adaptation
should be maximally efficient (so that you don't have to worry about
overheads of unnecessary wrappers), in practice I don't know how
close PyProtocols comes (although I believe it's good).
To clarify, by API I here mean a protocol, i.e. a definition of how a
graph is supposed to behave -- *not* a standard implementation.


I think we're talking about the same thing -- the sorts
of things you can do with nodes and edges, and not a
statement that a node or edge has a given property or method.
I've been thinking a bit about the use of object adaptation here; I
think it would be quite perfect. One possibility would be to use the
functionality of PyProtocols, but that's hardly standard... But it
would allow you to do stuff like
graph = AdjacencyMap(someStrangeMolecularGraph)
# or graph = adapt(someStrange..., AdjacencyMap)
graphAlgorithm(graph)
Yes, this (to me) would be ideal. The issue of not being standard is
a bit circular - it's not standard because there aren't enough
examples of where it is needed, but people don't use it because it's
not standard.

I believe Guido also has some concerns over how interfaces "should"
work - but he's unlikely to do anything about that unless someone
forces the issue by championing PEP 246. It might be worth asking
Philip Eby if he would be happy to see PyProtocols added to the
standard library.
The problem is all the conversion from/to my graph
form to the standard graph form. Either the adapters
have to be live (so the modification to the standard
form get propogated back to my graph) or there needs
to be conversions between the two. Both sound slow.
My understanding is that a well-written adapter can be very
efficient. But I don't know in practice how that is achieved. And
obviously, a particular operation will never be efficient if the
underlying representation can't support it efficiently.
Wouldn't it be *much* better to use the established (but not standard)
mechanism of object adaptation, as championed in PEP 246 and as
implemented and extended upon in PyProtocols?


Not really. Consider the events that could occur in a
DFS. There's
- on enter
- on exit
- on backtrack
and probably a few more that could be used in a general
purpose DFS. But I might need only one of them. With
a PE 246 adapter I can adapt my graph to fit the algorithm,
but if I don't need all of those events there's no way
to adapt the algorithm to fit my needs.


You could adapt a visitor class to fit a standard DFS-visitor
protocol.
(Yes, even though I'm using Python I'm still worried
about efficiency.)
I worry about efficiency, as well. But my experiments showed call
overhead as the killer - adding an "on backtrack" callback to the
algorithm, and calling it with a visitor which had a null
implementation of the callback still added a chunk of overhead. Caling
a "do-nothing" callback for each node in a 10,000 node graph isn't
free. Of course, this argues for the template "build code on the fly"
approach, which I don't relish. Anyone know another good way of
avoiding this overhead?
(If only adapt() could become a standard library function... <sigh> ;)


Perhaps someday, when people get more experience with it.
I've not yet used it.


It needs a champion. Someone to do the work to get it into the
library. We have a PEP, and an implementation (PyProtocols) so I
suspect that it's a job that wouldn't require huge technical skills.
I found that 'is' testing for my graphs is much better.
At the very least, it's a lot faster (no method call overhead).


You're never going to avoid a method call in any standard - there's
going to be *someone* for whom "is" is inappropriate. So the standard
will have to cater for that.

Paul.
--
Home computers are being called upon to perform many new functions,
including the consumption of homework formerly eaten by the dog --
Doug Larson
Jul 18 '05 #26

This discussion thread is closed

Replies have been disabled for this discussion.