By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
449,247 Members | 1,223 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 449,247 IT Pros & Developers. It's quick & easy.

Find root node in tree structure

P: n/a
Hey!

Can anyone give me a hint, how this problem is best implemented:
I have a table of users (see below), where every user has one
"superior user" (= parent node), this should be a fully unambigous
tree structure. The root node can have whatever value you prefer, I
suppose NULL would be good for a start. What I want to do is finding
the way from an arbitrary node in the tree.

Example:

ROOT
|
----------
A B
---------
C D
-------
E F G
The query should return C, B, ROOT if i give it the input of G for
example.

This is my users table:

users
=====================
user_id (primary key)
name
superior_id

Do you think there's any possibility to do this without firing several
queries? I mean I could send a query for each 'get-parent' lookup, but
that seams to be rather inefficient on large trees, no?

Thank you very much
Phil
Jul 23 '05 #1
Share this Question
Share on Google+
12 Replies


P: n/a
pillepop2003 wrote:
What I want to do is finding
the way from an arbitrary node in the tree.


This is a frequent question for SQL.
I usually suggest implementing a "path enumeration" table
to store each ancestor-descendant path in your tree.
Do this instead of storing simply the superiod_id for each user.

See also:
http://groups-beta.google.com/group/...51ff05edbbe71/
http://groups-beta.google.com/group/...b6d55c69e32ff/

Regards,
Bill K.
Jul 23 '05 #2

P: n/a
pillepop2003 wrote:
Hey!

Can anyone give me a hint, how this problem is best implemented:
I have a table of users (see below), where every user has one
"superior user" (= parent node), this should be a fully unambigous
tree structure. The root node can have whatever value you prefer, I
suppose NULL would be good for a start. What I want to do is finding
the way from an arbitrary node in the tree.

Example:

ROOT
|
----------
A B
---------
C D
-------
E F G
The query should return C, B, ROOT if i give it the input of G for
example.

This is my users table:

users
=====================
user_id (primary key)
name
superior_id

Do you think there's any possibility to do this without firing several
queries? I mean I could send a query for each 'get-parent' lookup, but
that seams to be rather inefficient on large trees, no?

Thank you very much
Phil


Phil,

The structure you are using is an adjacent list which has limitations, as
you've found already, a better model is that of nested sets. These may
appear more complicated at first however if you take the time to understand
it you will be amazed by it's power and it makes queries such as the one
above a piece of cake...

http://www.intelligententerprise.com...questid=327517

--
Thanks

Murph
Jul 23 '05 #3

P: n/a
It depends on how large a table you have and how it is used. Is it
prone to heavy updating?

I built a mechanism to store forests of trees, tied to web pages. My
needs involved slight denormalization, and I wanted to do my recursive
logic in PHP -- to avoid costly recursive queries whenever possible.

It worked like this:

page_id
element_id
parent_id (another element_id)
element_order (1-N, for the number of elements on the page)
gen_level (generational level)
{node data fields}

With this structure I'm actually able to do quite a bit of logic in PHP,
and avoid costly recursive queries a number of times. Also, I'm using
mySQL 3.23 which lacks subselects, so that informed my design as well.
Paul Bramscher

pillepop2003 wrote:
Hey!

Can anyone give me a hint, how this problem is best implemented:
I have a table of users (see below), where every user has one
"superior user" (= parent node), this should be a fully unambigous
tree structure. The root node can have whatever value you prefer, I
suppose NULL would be good for a start. What I want to do is finding
the way from an arbitrary node in the tree.

Example:

ROOT
|
----------
A B
---------
C D
-------
E F G
The query should return C, B, ROOT if i give it the input of G for
example.

This is my users table:

users
=====================
user_id (primary key)
name
superior_id

Do you think there's any possibility to do this without firing several
queries? I mean I could send a query for each 'get-parent' lookup, but
that seams to be rather inefficient on large trees, no?

Thank you very much
Phil

Jul 23 '05 #4

P: n/a
Paul Bramscher wrote:
It depends on how large a table you have and how it is used. Is it
prone to heavy updating?

I built a mechanism to store forests of trees, tied to web pages. My
needs involved slight denormalization, and I wanted to do my recursive
logic in PHP -- to avoid costly recursive queries whenever possible.

It worked like this:

page_id
element_id
parent_id (another element_id)
element_order (1-N, for the number of elements on the page)
gen_level (generational level)
{node data fields}

With this structure I'm actually able to do quite a bit of logic in PHP,
and avoid costly recursive queries a number of times. Also, I'm using
mySQL 3.23 which lacks subselects, so that informed my design as well.
Paul Bramscher

pillepop2003 wrote:
Hey!

Can anyone give me a hint, how this problem is best implemented:
I have a table of users (see below), where every user has one
"superior user" (= parent node), this should be a fully unambigous
tree structure. The root node can have whatever value you prefer, I
suppose NULL would be good for a start. What I want to do is finding
the way from an arbitrary node in the tree.

Example:

ROOT
|
----------
A B
---------
C D
-------
E F G
The query should return C, B, ROOT if i give it the input of G for
example.

This is my users table:

users
=====================
user_id (primary key)
name
superior_id

Do you think there's any possibility to do this without firing several
queries? I mean I could send a query for each 'get-parent' lookup, but
that seams to be rather inefficient on large trees, no?

Thank you very much
Phil


Have a look at this link:

http://www.intelligententerprise.com...questid=327517

I think here is the solution for your problem. It is a very clear
implementation of how to solve trees and getting information about them
in (my)SQL. It is more complex than your solution, but it has some big
advantages over your model, especially the easy way to get a lot of
information from the tree, not only all the levels above from somewhere
in the tree.

The method you use is called the adjacency list model, after the graph
theory technique of the same name; the pairs of nodes are adjacent to
each other, and is the textbook solution for creating trees. The author
of this article gives a better solution based on nested sets.

In example 1 the tree nodes of all the above levels are returned from
the database. If I were you I would go this way in stead of using the
adjacency model.

I hope this helps.

Jonathan

--
A: Because it messes up the order in which people normally read text.
Q: Why is it such a bad thing?
A: Top-posting.
Q: What is the most annoying thing on usenet and in e-mail?
Jul 23 '05 #5

P: n/a
Jonathan wrote:
Paul Bramscher wrote:
It depends on how large a table you have and how it is used. Is it
prone to heavy updating?

I built a mechanism to store forests of trees, tied to web pages. My
needs involved slight denormalization, and I wanted to do my recursive
logic in PHP -- to avoid costly recursive queries whenever possible.

It worked like this:

page_id
element_id
parent_id (another element_id)
element_order (1-N, for the number of elements on the page)
gen_level (generational level)
{node data fields}

With this structure I'm actually able to do quite a bit of logic in
PHP, and avoid costly recursive queries a number of times. Also, I'm
using mySQL 3.23 which lacks subselects, so that informed my design as
well.
Paul Bramscher

pillepop2003 wrote:
Hey!

Can anyone give me a hint, how this problem is best implemented:
I have a table of users (see below), where every user has one
"superior user" (= parent node), this should be a fully unambigous
tree structure. The root node can have whatever value you prefer, I
suppose NULL would be good for a start. What I want to do is finding
the way from an arbitrary node in the tree.

Example:

ROOT
|
----------
A B
---------
C D
-------
E F G
The query should return C, B, ROOT if i give it the input of G for
example.

This is my users table:

users
=====================
user_id (primary key)
name
superior_id

Do you think there's any possibility to do this without firing several
queries? I mean I could send a query for each 'get-parent' lookup, but
that seams to be rather inefficient on large trees, no?

Thank you very much
Phil

Have a look at this link:

http://www.intelligententerprise.com...questid=327517

I think here is the solution for your problem. It is a very clear
implementation of how to solve trees and getting information about them
in (my)SQL. It is more complex than your solution, but it has some big
advantages over your model, especially the easy way to get a lot of
information from the tree, not only all the levels above from somewhere
in the tree.

The method you use is called the adjacency list model, after the graph
theory technique of the same name; the pairs of nodes are adjacent to
each other, and is the textbook solution for creating trees. The author
of this article gives a better solution based on nested sets.

In example 1 the tree nodes of all the above levels are returned from
the database. If I were you I would go this way in stead of using the
adjacency model.

I hope this helps.

Jonathan


Not quite, my model is better than a straight adjaceny list in a few
ways (offering denormalized values of generational level and overall
cardinal order). Though I agree, the nested method seems to be best of
all from a back-end perspective.

But let's say your application is about building websites that look like
newsgroup threads. You have root-level elements, child element,
grandchild element, etc. Each root-level element is a tree in itself,
so the page is a forest of trees.

In this nested model, what's a non-recursive way to quickly pull out the
trees and their elements for display in that fashion.

Imagine you've got a web site with 40,000 elements that gets hit 1-5
times/second. You're thinking more about denormalization and less about
perfection.

But I'll study your reference more. It looks like this traversal is
anticipated in the design, and a simple order by the left value will
accomplish this. Good article, wish I'd seen it 3-4 years ago.
Jul 23 '05 #6

P: n/a
Jonathan wrote:
Paul Bramscher wrote:
It depends on how large a table you have and how it is used. Is it
prone to heavy updating?

I built a mechanism to store forests of trees, tied to web pages. My
needs involved slight denormalization, and I wanted to do my recursive
logic in PHP -- to avoid costly recursive queries whenever possible.

It worked like this:

page_id
element_id
parent_id (another element_id)
element_order (1-N, for the number of elements on the page)
gen_level (generational level)
{node data fields}

With this structure I'm actually able to do quite a bit of logic in
PHP, and avoid costly recursive queries a number of times. Also, I'm
using mySQL 3.23 which lacks subselects, so that informed my design as
well.
Paul Bramscher

pillepop2003 wrote:
Hey!

Can anyone give me a hint, how this problem is best implemented:
I have a table of users (see below), where every user has one
"superior user" (= parent node), this should be a fully unambigous
tree structure. The root node can have whatever value you prefer, I
suppose NULL would be good for a start. What I want to do is finding
the way from an arbitrary node in the tree.

Example:

ROOT
|
----------
A B
---------
C D
-------
E F G
The query should return C, B, ROOT if i give it the input of G for
example.

This is my users table:

users
=====================
user_id (primary key)
name
superior_id

Do you think there's any possibility to do this without firing several
queries? I mean I could send a query for each 'get-parent' lookup, but
that seams to be rather inefficient on large trees, no?

Thank you very much
Phil

Have a look at this link:

http://www.intelligententerprise.com...questid=327517

I think here is the solution for your problem. It is a very clear
implementation of how to solve trees and getting information about them
in (my)SQL. It is more complex than your solution, but it has some big
advantages over your model, especially the easy way to get a lot of
information from the tree, not only all the levels above from somewhere
in the tree.

The method you use is called the adjacency list model, after the graph
theory technique of the same name; the pairs of nodes are adjacent to
each other, and is the textbook solution for creating trees. The author
of this article gives a better solution based on nested sets.

In example 1 the tree nodes of all the above levels are returned from
the database. If I were you I would go this way in stead of using the
adjacency model.

I hope this helps.

Jonathan


I've looked at this article again, and it commits a little dishonesty
when it produces an adjacency implementation which is not normalized.
That's a problem with that implementation, not adjacency models in
general. This portion of my example is fully normalized:

element_id
parent_id (another element_id)
{node data fields}

el_id name parent_id (another el_id)
1 bob NULL
2 george 1
3 bill 1

etc. This is a normalized adjacency implementation. If bob decided to
change his name to "Robert", then we don't have to update multiple
columns of his subordinates -- only one. So the example provided in the
article is a deliberately poor one.

Only one other concern comes to mind, and that's performance of
implementations in terms of where new nodes are added. Behaviorally,
are they generally added at the bottom? How many rows in a nested model
do you need to adjust to make room for a new node at the bottom? In the
middle somewhere? The top?

My implementation involves the least cost for nodes at the bottom, the
greatest for new nodes at the top. I'm not sure yet, but it seems that
the nested performance hit for new nodes works a little differently, and
is more complex.
Jul 23 '05 #7

P: n/a
Hey!

I'm so sorry it took me soooo long to answer my one thread. First of
all I want to thank you all for your extensive replies. I might be busy
for the next few days to work through all your tips, but as soon as I'm
done with that, I'll answer ;-)

So long
Philipp

Paul Bramscher wrote:
Jonathan wrote:
Paul Bramscher wrote:
It depends on how large a table you have and how it is used. Is it prone to heavy updating?

I built a mechanism to store forests of trees, tied to web pages. My needs involved slight denormalization, and I wanted to do my recursive logic in PHP -- to avoid costly recursive queries whenever possible.
It worked like this:

page_id
element_id
parent_id (another element_id)
element_order (1-N, for the number of elements on the page)
gen_level (generational level)
{node data fields}

With this structure I'm actually able to do quite a bit of logic in PHP, and avoid costly recursive queries a number of times. Also, I'm using mySQL 3.23 which lacks subselects, so that informed my design as well.
Paul Bramscher

pillepop2003 wrote:

Hey!

Can anyone give me a hint, how this problem is best implemented:
I have a table of users (see below), where every user has one
"superior user" (= parent node), this should be a fully unambigous tree structure. The root node can have whatever value you prefer, I suppose NULL would be good for a start. What I want to do is finding the way from an arbitrary node in the tree.

Example:

ROOT
|
----------
A B
---------
C D
-------
E F G
The query should return C, B, ROOT if i give it the input of G for example.

This is my users table:

users
=====================
user_id (primary key)
name
superior_id

Do you think there's any possibility to do this without firing several queries? I mean I could send a query for each 'get-parent' lookup, but that seams to be rather inefficient on large trees, no?

Thank you very much
Phil

Have a look at this link:

http://www.intelligententerprise.com...questid=327517
I think here is the solution for your problem. It is a very clear
implementation of how to solve trees and getting information about them in (my)SQL. It is more complex than your solution, but it has some big advantages over your model, especially the easy way to get a lot of information from the tree, not only all the levels above from somewhere in the tree.

The method you use is called the adjacency list model, after the graph theory technique of the same name; the pairs of nodes are adjacent to each other, and is the textbook solution for creating trees. The author of this article gives a better solution based on nested sets.

In example 1 the tree nodes of all the above levels are returned from the database. If I were you I would go this way in stead of using the adjacency model.

I hope this helps.

Jonathan


I've looked at this article again, and it commits a little dishonesty

when it produces an adjacency implementation which is not normalized. That's a problem with that implementation, not adjacency models in
general. This portion of my example is fully normalized:

element_id
parent_id (another element_id)
{node data fields}

el_id name parent_id (another el_id)
1 bob NULL
2 george 1
3 bill 1

etc. This is a normalized adjacency implementation. If bob decided to change his name to "Robert", then we don't have to update multiple
columns of his subordinates -- only one. So the example provided in the article is a deliberately poor one.

Only one other concern comes to mind, and that's performance of
implementations in terms of where new nodes are added. Behaviorally, are they generally added at the bottom? How many rows in a nested model do you need to adjust to make room for a new node at the bottom? In the middle somewhere? The top?

My implementation involves the least cost for nodes at the bottom, the greatest for new nodes at the top. I'm not sure yet, but it seems that the nested performance hit for new nodes works a little differently, and is more complex.


Jul 23 '05 #8

P: n/a
Paul,

I'm not sure if your "path enumeration" model is appropriate for my
problem: I was looking for a way to query *the_way* to the root node
from an arbitrary node. If you review my "graphical" example above, the
relation is transitive, so i want to yield "C, B, ROOT" as a result set
when I provide 'G'. Your solution can't handle this in a single query,
can it? (at least as long as I don't have a database that supports
recursive queries - mysql 4.0.x does not!).

I am expecting the whole tree to grow to about 50.000+ nodes and there
will be almost no updates but mostly select which should gain
high-performance.

Am I left to implement the recursion in php? Suggestions?
(I assume that nested sets are to complex in this case, what do you
think?)

Jul 23 '05 #9

P: n/a
Here's my method. Given a table structure like this:

page_id
element_id
parent_id (another element id)
gen_level
element_order
{node data}

Problem: I need to get a the root R of an arbitrary element E which has
cardinal order D on page P. I do it quite cleanly and simply like this:

SELECT element_id FROM element WHERE page_id = P AND element_order < D
AND gen_level = 0 ORDER BY element_order DESC LIMIT 1

This automatically gives you the answer, since we know that the root
element *must* occur in cardinal order prior to arbitrary element E, and
that it must have a generational level of 0. Indeed, it will always be
the first such element encountered if we reverse the resultset order.

Far simpler than the nested method, and quite elegant I might add. :-)

But don't get me wrong, I think the nested set method might still offer
something. I'll need to look more carefully into it. I still have to
do recursion at times, but I generally do all of that in PHP, or with
crafty cursor manipulations. You don't want to do that with queries
against a table if at all possible.
Paul Bramscher
pi**********@yahoo.de wrote:
Paul,

I'm not sure if your "path enumeration" model is appropriate for my
problem: I was looking for a way to query *the_way* to the root node
from an arbitrary node. If you review my "graphical" example above, the
relation is transitive, so i want to yield "C, B, ROOT" as a result set
when I provide 'G'. Your solution can't handle this in a single query,
can it? (at least as long as I don't have a database that supports
recursive queries - mysql 4.0.x does not!).

I am expecting the whole tree to grow to about 50.000+ nodes and there
will be almost no updates but mostly select which should gain
high-performance.

Am I left to implement the recursion in php? Suggestions?
(I assume that nested sets are to complex in this case, what do you
think?)

Jul 23 '05 #10

P: n/a
Paul,

your query only finds _THE_ root node, i need to find the _WAY_ to the
root node. This cannot be done with your model imho... at least i can't
see how - correct me if I'm wrong.

Jul 23 '05 #11

P: n/a
pi**********@yahoo.de wrote:
Paul,

your query only finds _THE_ root node, i need to find the _WAY_ to the
root node. This cannot be done with your model imho... at least i can't
see how - correct me if I'm wrong.


I do this a couple different ways. Here's the simplist: a single SQL
and walking through the recordset with PHP. Given an arbitrary element
with order of D and generation level of G on page P, do this:

resultset = SELECT element_id FROM element WHERE page_id = P AND
element_order < D AND gen_level < G ORDER BY G DESC

This produces more than what we need in most cases, especially if you
have a forest of trees for a given page P. So we can do it this way:

Pseudocode in PHP, C, Perl, etc.:
path_stack = empty array to store parentage of element id's

// When you hit 0, you've completed the path
while (gen_level > 0) {
push element_id onto path_stack.
move to next row
}
// exit condition is reached when root element is hit

You could write this in a SQL loop also, but I prefer to do this sort of
logic in a procedural language. This doesn't use the parent_id, which
is still there and essential for doing this recursively -- and that's
another option, either doing it recursively with SQL calls (not wise at
all) or recursively in code, based on a single SQL (much more elegant).
Cheers,

-Paul
Jul 23 '05 #12

P: n/a
Paul Bramscher wrote:
pi**********@yahoo.de wrote:
Paul,

your query only finds _THE_ root node, i need to find the _WAY_ to the
root node. This cannot be done with your model imho... at least i can't
see how - correct me if I'm wrong.


I do this a couple different ways. Here's the simplist: a single SQL
and walking through the recordset with PHP. Given an arbitrary element
with order of D and generation level of G on page P, do this:

resultset = SELECT element_id FROM element WHERE page_id = P AND
element_order < D AND gen_level < G ORDER BY G DESC

This produces more than what we need in most cases, especially if you
have a forest of trees for a given page P. So we can do it this way:

Pseudocode in PHP, C, Perl, etc.:
path_stack = empty array to store parentage of element id's

// When you hit 0, you've completed the path
while (gen_level > 0) {
push element_id onto path_stack.
move to next row
}
// exit condition is reached when root element is hit

You could write this in a SQL loop also, but I prefer to do this sort of
logic in a procedural language. This doesn't use the parent_id, which
is still there and essential for doing this recursively -- and that's
another option, either doing it recursively with SQL calls (not wise at
all) or recursively in code, based on a single SQL (much more elegant).
Cheers,

-Paul


Oops, make that query ORDER BY D, G DESC. There's a couple different
ways to do it, and you might need to set up a little flag in code as you
walk backwards through it.
Jul 23 '05 #13

This discussion thread is closed

Replies have been disabled for this discussion.