Hi,
I've been scouring the net and reading the PostgreSQL docs for a while
now trying to learn how to create a recursive function in PL/pgSQL
that will return a whole subtree given a starting node. I wanted to
share my summary results here. Maybe this will help someone and save
them from doing a bunch of research (like I had to do :). For the
record, this is the first thing I've ever written in PL/pgSQL,
although I do have significant SQL experience. If any PostgreSQL
developers are reading this, you should implement a "CONNECT BY /
PRIOR" SQL command like Oracle does. It turns all this stuff into a
2-line SQL query. No extra functions needed.
In this proof-of-concept, I just did a lame table that has a node and
a parent which points back to another node in the same table. No
actual content. Also note the function overloading for the helper to
make it take just one argument. I'm passing a depth argument in this
example even though I don't use it. Again...proof-of-concept. The
depth of recursion is often useful to know. You can do LPAD()s to
indent your tree, for example.
One thing that's not immediately obvious if you've never done this:
RETURN NEXT is used to build up a return set, one row at a time. When
you're done, you RETURN and it returns the built-up set.
I did a few benchmarks on this data set. I wrote an equivalent
recursive function in PHP such that a new SQL query was issued on each
level of recursion. I "SELECT node, parent FROM tree WHERE parent =
$root" in PHP, and for each row I recurse on the PHP function. Bear in
mind this is a very small dataset (9 rows) and also that the database
and PHP are running on the same Linux 2.4.20 box. It would be
interesting to re-benchmark this with a much, much larger dataset.
Below is the ratio of execution time of PHP to SQL recursion on a few
trials.
Hope this helps someone,
Travis
PHP / SQL: 1.5631495206638
PHP / SQL: 1.488893156209
PHP / SQL: 1.8263854296389
PHP / SQL: 2.6210461029564
CREATE TABLE tree (node INTEGER, parent INTEGER);
INSERT INTO tree(node,parent) VALUES(1,0);
INSERT INTO tree(node,parent) VALUES(2,0);
INSERT INTO tree(node,parent) VALUES(3,1);
INSERT INTO tree(node,parent) VALUES(4,3);
INSERT INTO tree(node,parent) VALUES(5,4);
INSERT INTO tree(node,parent) VALUES(6,2);
INSERT INTO tree(node,parent) VALUES(7,6);
INSERT INTO tree(node,parent) VALUES(8,6);
INSERT INTO tree(node,parent) VALUES(9,2);
CREATE OR REPLACE FUNCTION getTree(INTEGER, INTEGER) RETURNS SETOF
tree AS '
DECLARE
root ALIAS FOR $1;
depth ALIAS FOR $2;
tempRow1 tree%ROWTYPE;
tempRow2 tree%ROWTYPE;
BEGIN
-- Using PostgreSQL 7.3.4.
-- Docs: http://www.postgresql.org/docs/7.3/static/plpgsql.html
-- See chapter 19, especially 19.6
FOR tempRow1 IN SELECT node, parent FROM tree WHERE parent = root
LOOP
RETURN NEXT tempRow1;
FOR tempRow2 IN SELECT node, parent FROM
getTree(tempRow1.node, depth+1) LOOP
RETURN NEXT tempRow2;
END LOOP;
END LOOP;
RETURN;
END;
' LANGUAGE 'plpgsql';
CREATE OR REPLACE FUNCTION getTree(INTEGER) RETURNS SETOF tree AS '
DECLARE
root ALIAS FOR $1;
tempRow tree%ROWTYPE;
BEGIN
FOR tempRow IN SELECT node, parent FROM getTree(root, 0) LOOP
RETURN NEXT tempRow;
END LOOP;
RETURN;
END;
' LANGUAGE 'plpgsql';
SELECT * FROM getTree(0);
node | parent
------+--------
1 | 0
3 | 1
4 | 3
5 | 4
2 | 0
6 | 2
7 | 6
8 | 6
9 | 2
(9 rows)
SELECT * FROM getTree(1);
node | parent
------+--------
3 | 1
4 | 3
5 | 4
(3 rows)
SELECT * FROM getTree(2);
node | parent
------+--------
6 | 2
7 | 6
8 | 6
9 | 2
(4 rows)