[Followups set to c.p.]

On Mon, 24 Apr 2006, Mountain wrote:

I would like opinions on whether this solution could be important and

whether anyone knows if it is already solved. (I would also like to

know if anyone thinks it cannot be solved.)

First, we start with a general tree (any of the trees that my data

structures book calls rooted trees or unordered trees). Then we are

given two specific nodes (Na and Nb) that belong to that tree.

We would like to know if the second node (Nb) is an ancestor of the

first node (Na).

Here's my attempt at a summary of what's been said in the various

subthreads, as much for my benefit as yours. :)

The problem doesn't seem terribly useful, although it is useful for

hierarchies, such as the example given in

http://www.barneyb.com/blog/archives/000570.jsp
food

/ \

meat fruit

| / \

beef apple pear

If we can solve the "is-ancestor-of" problem in O(1), then we can answer

questions such as "Is beef food?" and "Is an apple a fruit?" This could

be useful in typechecking object-oriented languages that use a lot of

subtyping and inheritance.

So, how can we implement this? Joe Celko's "nested sets" solution

looks like this:

1.food.12

/ \

2.meat.5 6.fruit.11

| / \

3.beef.4 7.apple.8 9.pear.10

Each node has two fields "left" and "right", such that

parent.left < child.left and child.right < parent.right. Then checking

that Na is a descendant of Nb simply means checking that

(Nb.left < Na.left && Na.right < Nb.right)

This solution is easy to implement, and deleting a node is easy, but

adding new nodes is naively an O(n) operation, since the "left" and

"right" fields of about half the tree's nodes need to be updated.

However, for the OO-typechecking application, lookups are going to be

much more common than inserts, so we win.

Googling "Joe Celko nested sets" does indeed turn up useful information

on this method.

Another method was proposed by Alf Steinbach in this thread: In each

node, instead of a parent pointer, store the path from the root itself,

represented as a sequence of "left" and "right" directions. For example,

the path stored in the "apple" node above would be "right,left". (Note

that this method only really works for binary trees, or with extensions,

N-ary trees. It doesn't seem applicable to general rooted trees.)

Then, checking whether Na is a descendant of Nb boils down to checking

whether Nb.path is a prefix of Na.path, which can be done in O(lg N) time

naively, and is probably possible in much less time using bitwise

arithmetic. (See below.)

As Alf points out, O(lg N) is basically O(1) on a machine with N-bit

addressing. (lg 32 is just a constant.) This means that the real-world

solution of simply storing a parent pointer in each node is basically

constant-time, too --- but the stored-path solution saves time if

dereferencing is expensive.

A third approach, which boils down to Alf's solution, is to notice that

iff Nb is an ancestor of Na, then the lowest common ancestor of Na and Nb

is Nb itself. Let's number the nodes in the order of an in-order traversal

on a complete binary tree, like this:

food.100

/ \

meat.010 fruit.110

/ / \

beef.001 apple.101 pear.111

Notice that Nx.number can also be described as "the path from the root

to Nx, followed by a 1 and right-padded with zeros." Then the paths of Na

and Nb diverge at the leftmost 1 bit of (Na.number XOR Nb.number), and

we can get their lowest common ancestor by computing

/* suppose Na = apple, Nb = fruit */

xor = Na.number ^ Nb.number; /* xor = 011 */

left = find_leftmost_one(xor); /* left = 010 */

pathmask = ~(2*left-1); /* pathmask = 100 */

path = (Na.number & pathmask) | left; /* path = 110 */

Now, since path == Nb.number, we see that indeed, Nb is the lowest common

ancestor of Na and Nb.

Unfortunately, actually /finding/ the leftmost 1 bit in a word isn't

possible in constant time using only the standard C operators. I think

Java provides a library function to do it quickly; and there is a hack

involving the FPU (convert to 'double', examine the exponent field) that

would work if you /really/ needed speed at the cost of portability.

More information on the "lowest common ancestor" algorithm is available

here:

http://homepage.usask.ca/~ctl271/810...n_ancestor.pdf
The find-lowest-one FPU hack is documented here:

http://www.diku.dk/hjemmesider/ansat...ineering-1998/
HTH,

-Arthur