We are currently in the process of developing a client/server based
generic tree application with which we want to be able to model and
render any size, shape and depth of tree.
Our primary objective is to provide the ability to model large data
trees with some 20 million+ nodes whilst still allowing adequate
rendering and query times from the client.
We have just bought Joe Celko's "Tree's and Hierarchies in SQL fo
Smarties" and have picked up some very useful tips.
We are definitely looking at using some hybrid model but are undecided
as to what combination would suit us best.
Whilst constructing the tree we have the following information
available to us, that we could/may include in our database structure:
1) The Node's parent.
2) The Node's depth/level.
3) The Node's path to the root node.
4) The Node's child nodes count.
5) The Node's descendant nodes count.
6) The Node's position relative to it's sibling nodes.
This infomation would allow us to combine many of the models Joe Celko
discusses, including:
1) The Nested Sets/Interval model - allows us to quickly find
subtrees.
2) The Adjacency List model - allows us to quickly find immediate
child nodes.
3) The Materialized Path model - allows us to quickly find the
ancestors of the node.
4) The Depth Model - allows us to quickly find nodes related to their
levels/depth.
It is also quite likely that we will want to change the structure of
the tree fairly frequently. As an indication, I can see update rates
of every 15 minutes or less.
To us, disk space and hardware are cheap, so we are not concerned with
storage overhead. We just need optimal query performance.
Types of queries we need are for example:
Given a node...
1) Get it's immediate children.
2) Get it's ancestors.
3) Get all of it's descendants.
4) Get it's descendants to a certain depth.
5) Get it's siblings
etc. etc.
Any suggestions/foresight/tips that may help us in the database
modelling would be most appreciated?