Making a tree with "millions and millions" of dynamic nodes

Christian Fowler

I have a VERY LARGE pile of geographic data that I am importing into a database (db of choice is postgres,
though may hop to oracle if necessary). The data is strictly hierarchical - each node has one, and only one
parent. The depth should not exceed 6 or 7 levels. The initial import will have about 6 million leaves, and
3 million branches. I would expect the leaves to grow significantly, in number easily tripling. However, the
branches will likely stay very constant in number, but I expect there locations to shift around somewhat
(affecting up to thousand of children).

For selection, it is crucial for me to get:

1. path generation speed
2. immediate sibling speed
3. children count speed
I have implemented a first run using Joe Celko's Nested Sets (w/ a mod to store tree level for speed). The
update propigation issue is the achilles heel of this approach for me. I have read Vadim Tropashko Nested
Interval concept ( http://www.dbazine.com/tropashko4.html ) , and my brain is painfully stretched enough to
get the general idea. I have a feeling i will hit the arithmetic issues others of reported.

So it seems Materialized Path is my only option, however I am concerned about LIKE performance for the right
hand side of the tree, where the path is 8digits x 6 levels = 48 chars. Should I be concerned? I need
split-second real-time performance, and can't imagine it will be as fast the Nested Set arithmatic approach.
I can flatten out the import to insure the upper tree has the smallest numbers, however, it will save at
most 8 chars on the path.

I've been googling through USENET archives watching the big debates of years gone by. I've grazed much
knowledge, but now am curious if anyone has any thoughts/advice/war stories about a data set this large.

(and if any fellow postgres fans have some reassuring words about real-time performance of a 20 million row
tree, i'd love to hear ;-)
--
[ \ /
[ >X< Christian Fowler | http://www.steelsun.com/
[ / \

Nov 12 '05 #1

Subscribe Reply

6783

Mikito Harakiri

"Christian Fowler" <go****@NOSPAM. gravesweeper.co m> wrote in message
news:6b******** ************@sp eakeasy.net...

For selection, it is crucial for me to get:

1. path generation speed
Path to the root? It is coded explicitly in the materialized path. All you
need to do is parsing the path and generating a list of prefixes that you
use within your query like this:

select * from hierarchy where path in ('1','1.2','1.2 .1', '1.2.1.1')

If you have an index, and your rdbms optimiser is smart enough, the query
should execute fast.
2. immediate sibling speed
Here is the kludge:

select * from hierarchy where path in ('1.2.1.1','1.2 .1.2','1.2.1.3' ,
'1.2.1.4','1.2. 1.5')

Again, I assume that your query would use an index here.

If you get exactly 5 records, then you can't be sure that your node has that
many children. You have to repeat query like this:
select * from hierarchy where path in (
'1.2.1.1','1.2. 1.2','1.2.1.3', '1.2.1.4','1.2. 1.5'
,'1.2.1.6','1.2 .1.7','1.2.1.8' , '1.2.1.9','1.2. 1.10')
,'1.2.1.11','1. 2.1.12','1.2.1. 13', '1.2.1.4','1.2. 1.15')
,'1.2.1.16','1. 2.1.17','1.2.1. 18', '1.2.1.19','1.2 .1.20')
,'1.2.1.21','1. 2.1.22','1.2.1. 23', '1.2.1.4','1.2. 1.25')

Yet again, there might be more than 25 children, so you'll have to run yet
again more "generos" query.

The pitfall here is that at some point the optimiser may get tired to do
"or" expansion, so that at some point you might end up with full table scan.
But, this is implementation dependent, so that you might be able to
influence optimizer decision.

As you see, I'm not worried how many bytes materialized path has, or if
parsing it takes more time than multiplying 2 integers. My concern is if
your query can leverage index or not.
3. children count speed

Children, or descendants? I guess no method can beat Celko's descendant's
formula
#descendants=(r gt-lft+1)/2
The count is implicitly stored at the root node, so that even for hierarchy
1M nodes we answer how many nodes are there instantly. This is also an
Achiles' heel of the method: any update to the hierarchy triggers refresh
of the "counter". One aggregate is never enough, though: it's often useful
to know total salary too:-)

If you meant not descendants, but children, then use a method similar to
bullet #2.

Nov 12 '05 #2

Joe \Nuke Me Xemu\ Foster

"Mikito Harakiri" <mi*********@ia hu.com> wrote in message <news:3k******* *******@news.or acle.com>...

"Christian Fowler" <go****@NOSPAM. gravesweeper.co m> wrote in message
news:6b******** ************@sp eakeasy.net...
For selection, it is crucial for me to get:

1. path generation speed

Path to the root? It is coded explicitly in the materialized path. All you
need to do is parsing the path and generating a list of prefixes that you
use within your query like this:

select * from hierarchy where path in ('1','1.2','1.2 .1', '1.2.1.1')

If you have an index, and your rdbms optimiser is smart enough, the query
should execute fast.
2. immediate sibling speed

Here is the kludge:

select * from hierarchy where path in ('1.2.1.1','1.2 .1.2','1.2.1.3' ,
'1.2.1.4','1.2. 1.5')

Again, I assume that your query would use an index here.

If you get exactly 5 records, then you can't be sure that your node has that
many children. You have to repeat query like this:
select * from hierarchy where path in (
'1.2.1.1','1.2. 1.2','1.2.1.3', '1.2.1.4','1.2. 1.5'
,'1.2.1.6','1.2 .1.7','1.2.1.8' , '1.2.1.9','1.2. 1.10')
,'1.2.1.11','1. 2.1.12','1.2.1. 13', '1.2.1.4','1.2. 1.15')
,'1.2.1.16','1. 2.1.17','1.2.1. 18', '1.2.1.19','1.2 .1.20')
,'1.2.1.21','1. 2.1.22','1.2.1. 23', '1.2.1.4','1.2. 1.25')

Yet again, there might be more than 25 children, so you'll have to run yet
again more "generos" query.

Here's an alternative which may not perform very well but may
still be better than risking a full table-scan...

select * from hierarchy
where path like '1.2.1.%'
and path not like '1.2.1.%.%'

What if you use a fixed number of chars per level? The sibling
and immediate children queries then become something more like,

select * from hierarchy
where path like '0001.0002.0001 .____'

select count(*) from hierarchy
where path like '0001.0002.0001 .1234.____'

Since the characters per level is fixed, you can get rid of the
dots too, and parsing becomes a matter of length and substring.
Finding all descendants could be implemented using BETWEEN in
either materialized path scheme:

select * from hierarchy
where path between '1.2.1.1234.' and '1.2.1.1234~'

select * from hierarchy
where path between '0001.0002.0001 .1234.' and '0001.0002.0001 .1234~'

select * from hierarchy
where path between '00010002000112 34.' and '00010002000112 34~'

The '.' and '~' may need changing depending on collating order.

--
Joe Foster <mailto:jlfoste r%40znet.com> DC8s in Spaace: <http://www.xenu.net/>
WARNING: I cannot be held responsible for the above They're coming to
because my cats have apparently learned to type. take me away, ha ha!

Nov 12 '05 #3

Alvar Freude

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi,

- -- "Joe \"Nuke Me Xemu\" Foster" <bf********@new s.hub.org> wrote:

Here's an alternative which may not perform very well but may
still be better than risking a full table-scan...

I use exactly this method in a forum software, and it performs VERY good. I
tested it with about a million of rows, and it is really very fast. Even
with deep trees (1000 sub branches).

The only difference is that I use a base 255 encoded text column instead of
only 0-9. But attention: the character set must be ASCII (ordering!) ...

I want to change this to bytea to avoid base255 encoding and the character
set problems, but there are still some Bugs with bytea and the like
operator. :-(
Ciao
Alvar

- --
** Alvar C.H. Freude -- http://alvar.a-blast.org/ -- http://odem.org/
** Berufsverbot? http://odem.org/aktuelles/staatsanwalt.de.html
** ODEM.org-Tour: http://tour.odem.org/
** Informationsges ellschaft: http://www.wsis-koordinierungskreis.de/

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.3 (FreeBSD)

iD4DBQE/z59nOndlH63J86w RAsrEAJ9OjO5fXh nw2mmLoB7YNHJFY O/X8QCXc31M
FWdV8T92N3kzctS gOOkVMw==
=Uwtm
-----END PGP SIGNATURE-----
---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

Nov 12 '05 #4

Joe Conway

Alvar Freude wrote:

I want to change this to bytea to avoid base255 encoding and the character
set problems, but there are still some Bugs with bytea and the like
operator. :-(

Be patient, I'm working on it. Apparently you are the first person to
ever really try to *use* the like operator for bytea, because the bug
has been there since the feature was originally committed.

Joe

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to ma*******@postg resql.org so that your
message can get through to the mailing list cleanly

Nov 12 '05 #5

Arjen van der Meijden

> Christian Fowler wrote:

So it seems Materialized Path is my only option, however I am
concerned about LIKE performance for the right hand side of
the tree, where the path is 8digits x 6 levels = 48 chars.
Should I be concerned? I need split-second real-time
performance, and can't imagine it will be as fast the Nested
Set arithmatic approach.

Perhaps the contrib/ltree-stuff is interesting if you're going to do
materialized paths.
It's an indexable tree-format for postgresql based on the materialized
paths (so it seems at least).

I haven't really tested it, but wanted to point it out to you.

Regards,

Arjen van der Meijden

---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

Nov 12 '05 #6

Oleg Bartunov

Christian,

you may try our contrib/ltree module
see http://www.sai.msu.su/~megera/postgres/gist/ltree
With tuned postgresql and reasonable hardware you might get what you're
looking for.

Oleg

On Tue, 2 Dec 2003, Christian Fowler wrote:

I have a VERY LARGE pile of geographic data that I am importing into a database (db of choice is postgres,
though may hop to oracle if necessary). The data is strictly hierarchical - each node has one, and only one
parent. The depth should not exceed 6 or 7 levels. The initial import will have about 6 million leaves, and
3 million branches. I would expect the leaves to grow significantly, in number easily tripling. However, the
branches will likely stay very constant in number, but I expect there locations to shift around somewhat
(affecting up to thousand of children).

For selection, it is crucial for me to get:

1. path generation speed
2. immediate sibling speed
3. children count speed
I have implemented a first run using Joe Celko's Nested Sets (w/ a mod to store tree level for speed). The
update propigation issue is the achilles heel of this approach for me. I have read Vadim Tropashko Nested
Interval concept ( http://www.dbazine.com/tropashko4.html ) , and my brain is painfully stretched enough to
get the general idea. I have a feeling i will hit the arithmetic issues others of reported.

So it seems Materialized Path is my only option, however I am concerned about LIKE performance for the right
hand side of the tree, where the path is 8digits x 6 levels = 48 chars. Should I be concerned? I need
split-second real-time performance, and can't imagine it will be as fast the Nested Set arithmatic approach.
I can flatten out the import to insure the upper tree has the smallest numbers, however, it will save at
most 8 chars on the path.

I've been googling through USENET archives watching the big debates of years gone by. I've grazed much
knowledge, but now am curious if anyone has any thoughts/advice/war stories about a data set this large.

(and if any fellow postgres fans have some reassuring words about real-time performance of a 20 million row
tree, i'd love to hear ;-)

Regards,
Oleg
_______________ _______________ _______________ _______________ _
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: ol**@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html

Nov 12 '05 #7

Rick Gigger

I was glad to see this topic come up on the list as I was about to start
asking about some of these issues myself. I would like to discuss each of
the methods I have researched so far for doing trees in sql and see if
anyone has some experience or insite into the topic that could help me.

That being said the four methods that seem to be the prevailing ideas are
the following:

1) Adjacency List - Just use a parent field, nothing interesting or
non-obvious here.
2) Materialized Path - Basically you store the path up to the node in a
field called path
3) Nested Sets - uh, read the articles below.
4) Nested Intervals - definitely read the first article

Here are some articles explaining these concepts:

http://www.dbazine.com/tropashko4.html (this is from an earlier posting in
this thread)
http://www.intelligententerprise.com/001020/celko.shtml (the nested set or
Celko method as it seems it should be called)
http://www.dbmsmag.com/9603d06.html (more on the celko method)
http://www.dbmsmag.com/9604d06.html
http://www.dbmsmag.com/9605d06.html
http://www.dbmsmag.com/9606d06.html
I have a lot of thouhts on this but here is my first question:

In the first article at the end of the section on materialized paths and the
beginning of the nested set section Tropashko basically says that neither
one really does a good job at answering "who are all of my ancestors" but
that the materialized path method has a nice kludge that you can do with
some funky string manipulation.

Is this true? Celko in his articles seems to make it sound like this query
will be very fast. But Tropashko it seems is saying that it will be slow.
Am I reading this right? If so is Tropashko right? Any ideas on this? Any
articles / papers that might shed some light on this specific question or
the topic in general?

thanks in advance

rg
---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

Nov 12 '05 #8

Arjen van der Meijden

> Rick Gigger wrote:

I was glad to see this topic come up on the list as I was
about to start asking about some of these issues myself. I
would like to discuss each of the methods I have researched
so far for doing trees in sql and see if anyone has some
experience or insite into the topic that could help me.
Have a look here: http://www.scenic-route.com/program/db/lists.htm
It is quite a nice overview with some advantages and disadvantages of
each method.
I have a lot of thouhts on this but here is my first question:

In the first article at the end of the section on materialized paths
and the beginning of the nested set section Tropashko basically says
that neither one really does a good job at answering "who are all of
my ancestors" but that the materialized path method has a nice kludge
that you can do with some funky string manipulation.
I don't see how the nested intervals-query is better/faster than the
nested set approach, but both seem to be possibly using indices.
The nested set uses a simple "where left between parent.left and
parent.right", if you happen to have those numbers in your application's
memory, you could even build your query directly like that and save
yourself a self join.
Another approach, I used, is something like:
Select * from nset n1, (select (distinct) * from nset where nset.id =
$parentid) as n2 where n1.left between n2.left and n2.right
(the distinct will force a materialization , which might improve the
performance or not, in my tests it did)

With the materialized path you'll have to use a like-search (child.path
like parent.path || '%'), where hopefully the optimiser notices that it
can chop off the end of the string and just look at the first part
Is this true? Celko in his articles seems to make it sound
like this query will be very fast. But Tropashko it seems It'll use an index range-search, so I don't see how it can be faster,
except for a hash/index lookup or so (which is possible with the
transitive closure version of an adjacency list).
is saying that it will be slow. Am I reading this right? If
so is Tropashko right? Any ideas on this? Any articles / papers
that might shed some light on this specific question or the
topic in general?

The above article is a nice one to read I think.

I'll try to summarize my thoughts on them, since I'm looking into an
efficient tree approach myself aswell:
All methods have both advantages and disadvantages and it'll depend on
your situation whether that is a showstopper or not. All are bad when
you want to remove a node from the tree, but the insert/update penalties
depend on the situation.

* The adjancency list is lightning fast with inserts, updates but a bit
clumsy with deletions (you need to connect the direct children to
another element, that's all). It'll be very fast with the two simple
queries "who are my direct children" and "who is my parent", but you
need either a recursive approach or some enhancement like a transitive
closure-graph to enhance queries like "who are my predecessors" and "who
are all my ancestors".

So if you'll have a large tree, only few a levels deep and a lot of
inserts and/or subtree movements, this seems to be the best approach.
And you can store "max int"-number of nodes.

* The materialized path is not so very efficient in terms of storage, it
does inserts fast, but updates and deletions are slow. Retrieval of all
ancestors is fast, retrieval of the predecessors is extremely trivial,
but retrieval of the direct parent and/or children is somewhat more
difficult (you'll need to look at the depth, if you do that often a
functional-index might be handy).
Perhaps the ltree-implementation in contrib/ltree is worth a look and
encoding the trees in much more dense encodings (like a 256bit encoding
I saw mentioned earlier and skipping the dot for just a single-bit
terminator) makes it less inefficient.

So if you do a lot of inserts, not so many updates and deletes and have
much need for the entire list of predecessors/ancestors, I suppose this
one is quite nice. But it is not so efficient in terms of storage, so
very large/deep trees impose a huge overhead since each node stores its
entire path.

* The nested set is inefficient with insertions, updates and deletions
but much more efficient with storage than the MP. To speed up the
process of selecting only the direct parent/children you can use a depth
field (redundant, but handy) which'll save you one or two extra
selfjoins.
For relatively static trees the nested set is good, but actively
changing trees are very bad for your performance. You can store "max
int"/2 nodes in this approach.

* The nested intervals use the same approach as the nested set, but uses
a static encoding. This allows you to do very efficient inserts, so the
author claims. I have my doubts dough, if you don't know the number of
children of your parent and just want to do "connect this child to that
parent", it'll be more difficult, you'll need to find that parent, count
it's children, decide the encoding-number of this new child and then you
can calculate its new numbers.
Still more efficient than the nested set, but not so efficient and you
DO need to look into your database, while the author claimed you didn't
need to.
I havent't worked with this approach yet, but I saw the author use a lot
of functions in his queries, I'm not sure whether that is a good idea,
it might affect the usability of your indices...
Another disadvantage is its very inefficient use of the integer-values
available, it can only have a depth of 32 with normal integers, afaik
and not so many children on a level as you'd like perhaps. The author
suggests defining a unlimited integer, which is not so good for your
performance at all (search for '"nested interval" sql' in google or so
and you'll find his messages on the dbforums).

* His other approach, the single-integer version, is even worse in terms
of efficient integer-use, you can have only 32 levels at the best case
(the left most branch only!) and only about 30 nodes next to each other
in the same bestcase (since the numbers increase exponantional: level1 =
5, 11, 23, 47, 95, etc). The worst case (the right most branch) can
possibly store only one or two levels with only one or two nodes...

So the claimed advantage of the better performance is a bit limited,
you'll either need *very* large integers to store large trees, or you
are simply limited in the size of your tree by some relatively small
numbers. And that is a bit sad, since bad performance is only a
problem/visible with large(r) trees.

I'm not sure whether the above summary is correct, anyone any idea? I've
to use it in a research paper for my study anyway, so expanding my
thoughts in an email is a nice test :)
Please comment.

Best regards,

Arjen van der Meijden

---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match

Nov 12 '05 #9

Christian Fowler

Here is another approach (only first half of the page applies here)
I was emailed personally in response to my orginal post:

http://fungus.teststation.com/~jon/t...eeHandling.htm

Basically, the (too obivious for me to think of ;-) concept appears to
be storing a mapping of the Node / Parent for every parent in the
chain. Basically it splits up the materialized view concept into
multiple rows, or one can think of it as the adjancency list with all
parents stored. Perhaps call this the "Materializ ed List".

Personally, this is my likely favorite since I tend to "trust" the
performance of giant tables that are nothing more than 2 columns of
integers. Seems like this approach would handle most of the desired
tree cases (select path & child count, insert, delete, move) very
well.

Hopefully postgres can handle a very active, very narrow 50 million
row table without breaking a sweat.

"Arjen van der Meijden" <ac********@vul canus.its.tudel ft.nl> wrote:

Rick Gigger wrote:

I was glad to see this topic come up on the list as I was
about to start asking about some of these issues myself. I
would like to discuss each of the methods I have researched
so far for doing trees in sql and see if anyone has some
experience or insite into the topic that could help me.

Have a look here: http://www.scenic-route.com/program/db/lists.htm
It is quite a nice overview with some advantages and disadvantages of
each method.
I have a lot of thouhts on this but here is my first question:

In the first article at the end of the section on materialized paths
and the beginning of the nested set section Tropashko basically says
that neither one really does a good job at answering "who are all of
my ancestors" but that the materialized path method has a nice kludge
that you can do with some funky string manipulation.

I don't see how the nested intervals-query is better/faster than the
nested set approach, but both seem to be possibly using indices.
The nested set uses a simple "where left between parent.left and
parent.right", if you happen to have those numbers in your application's
memory, you could even build your query directly like that and save
yourself a self join.
Another approach, I used, is something like:
Select * from nset n1, (select (distinct) * from nset where nset.id =
$parentid) as n2 where n1.left between n2.left and n2.right
(the distinct will force a materialization , which might improve the
performance or not, in my tests it did)

With the materialized path you'll have to use a like-search (child.path
like parent.path || '%'), where hopefully the optimiser notices that it
can chop off the end of the string and just look at the first part
Is this true? Celko in his articles seems to make it sound
like this query will be very fast. But Tropashko it seems

It'll use an index range-search, so I don't see how it can be faster,
except for a hash/index lookup or so (which is possible with the
transitive closure version of an adjacency list).
is saying that it will be slow. Am I reading this right? If
so is Tropashko right? Any ideas on this? Any articles / papers
that might shed some light on this specific question or the
topic in general?

The above article is a nice one to read I think.

I'll try to summarize my thoughts on them, since I'm looking into an
efficient tree approach myself aswell:
All methods have both advantages and disadvantages and it'll depend on
your situation whether that is a showstopper or not. All are bad when
you want to remove a node from the tree, but the insert/update penalties
depend on the situation.

* The adjancency list is lightning fast with inserts, updates but a bit
clumsy with deletions (you need to connect the direct children to
another element, that's all). It'll be very fast with the two simple
queries "who are my direct children" and "who is my parent", but you
need either a recursive approach or some enhancement like a transitive
closure-graph to enhance queries like "who are my predecessors" and "who
are all my ancestors".

So if you'll have a large tree, only few a levels deep and a lot of
inserts and/or subtree movements, this seems to be the best approach.
And you can store "max int"-number of nodes.

* The materialized path is not so very efficient in terms of storage, it
does inserts fast, but updates and deletions are slow. Retrieval of all
ancestors is fast, retrieval of the predecessors is extremely trivial,
but retrieval of the direct parent and/or children is somewhat more
difficult (you'll need to look at the depth, if you do that often a
functional-index might be handy).
Perhaps the ltree-implementation in contrib/ltree is worth a look and
encoding the trees in much more dense encodings (like a 256bit encoding
I saw mentioned earlier and skipping the dot for just a single-bit
terminator) makes it less inefficient.

So if you do a lot of inserts, not so many updates and deletes and have
much need for the entire list of predecessors/ancestors, I suppose this
one is quite nice. But it is not so efficient in terms of storage, so
very large/deep trees impose a huge overhead since each node stores its
entire path.

* The nested set is inefficient with insertions, updates and deletions
but much more efficient with storage than the MP. To speed up the
process of selecting only the direct parent/children you can use a depth
field (redundant, but handy) which'll save you one or two extra
selfjoins.
For relatively static trees the nested set is good, but actively
changing trees are very bad for your performance. You can store "max
int"/2 nodes in this approach.

* The nested intervals use the same approach as the nested set, but uses
a static encoding. This allows you to do very efficient inserts, so the
author claims. I have my doubts dough, if you don't know the number of
children of your parent and just want to do "connect this child to that
parent", it'll be more difficult, you'll need to find that parent, count
it's children, decide the encoding-number of this new child and then you
can calculate its new numbers.
Still more efficient than the nested set, but not so efficient and you
DO need to look into your database, while the author claimed you didn't
need to.
I havent't worked with this approach yet, but I saw the author use a lot
of functions in his queries, I'm not sure whether that is a good idea,
it might affect the usability of your indices...
Another disadvantage is its very inefficient use of the integer-values
available, it can only have a depth of 32 with normal integers, afaik
and not so many children on a level as you'd like perhaps. The author
suggests defining a unlimited integer, which is not so good for your
performance at all (search for '"nested interval" sql' in google or so
and you'll find his messages on the dbforums).

* His other approach, the single-integer version, is even worse in terms
of efficient integer-use, you can have only 32 levels at the best case
(the left most branch only!) and only about 30 nodes next to each other
in the same bestcase (since the numbers increase exponantional: level1 =
5, 11, 23, 47, 95, etc). The worst case (the right most branch) can
possibly store only one or two levels with only one or two nodes...

So the claimed advantage of the better performance is a bit limited,
you'll either need *very* large integers to store large trees, or you
are simply limited in the size of your tree by some relatively small
numbers. And that is a bit sad, since bad performance is only a
problem/visible with large(r) trees.

I'm not sure whether the above summary is correct, anyone any idea? I've
to use it in a research paper for my study anyway, so expanding my
thoughts in an email is a nice test :)
Please comment.

Best regards,

Arjen van der Meijden

---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match

--
[ \ /
[ >X< cf*****@steelsu n.com | http://www.steelsun.com/
[ / \

Nov 12 '05 #10

Similar topics

2047

Andre - A response to your post in the "C# memory problem: no end for our problem ?" thread

by: Andreas Suurkuusk | last post by:

Hi, I just noticed your post in the "C# memory problem: no end for our problem?" thread. In the post you implied that I do not how the garbage collector works and that I mislead people. Since the thread is over a month old, I decided to start a new one with my response. Please see my comments inline.

.NET Framework

5373

Multi-threading article finally "finished" - reviewers welcome

by: Jon Skeet [C# MVP] | last post by:

Please excuse the cross-post - I'm pretty sure I've had interest in the article on all the groups this is posted to. I've finally managed to finish my article on multi-threading - at least for the moment. I'd be *very* grateful if people with any interest in multi-threading would read it (even just bits of it - it's somewhat long to go through the whole thing!) to check for accuracy, effectiveness of examples, etc. Feel free to mail...

.NET Framework

6226

Microsoft not content with "dissing" just the Classic VB Developer Army....

by: Jim Hubbard | last post by:

It seems that Microsoft not only does not need the classic Visual Basic developer army (the largest army of developers the world has ever seen), but now they don't need ANY Windows developer at a small or mid-sized business. http://groups-beta.google.com/group/microsoft.public.msdn.general/browse_thread/thread/9d7e8f9a00c1c7da/459ca99eb0e7c328?q=%22Proposed+MSDN+subscription+changes%22&rnum=1#459ca99eb0e7c328 Damn! To be that...

.NET Framework

1688

Tree View Performance

by: Kevin Lippiatt | last post by:

Can anyone advise me on how well the windows .net tree view control performs when loaded with large numbers of nodes. I'm intersted in how it might perform when used with possibly millions of nodes. At what point will the control be 'over-stressed'?

Visual Basic .NET

7399

"RuntimeError: dictionary changed size during iteration" ; Good atomiccopy operations?

by: robert | last post by:

In very rare cases a program crashes (hard to reproduce) : * several threads work on an object tree with dict's etc. in it. Items are added, deleted, iteration over .keys() ... ). The threads are "good" in such terms, that this core data structure is changed only by atomic operations, so that the data structure is always consistent regarding the application. Only the change-operations on the dicts and lists itself seem to cause problems...

Python

5056

C# coding guidelines: use "this." or not when referring to member fields/properties within the object?

by: Dave | last post by:

I'm never quite sure whether to use "this." or not when referring to fields or properties in the same class. It obviously works just fine without it but sometimes I wonder if using this. consistently may make the code easier to understand for someone else, etc. Using "this." makes it immediately clear, for example, that the variable being referred to is a member of the same class and is not declared in the method as a local variable. ...

C# / C Sharp

3177

Why not generate static pages instead of dynamic?

by: pittendrigh | last post by:

There must be millions of dynamically generated html pages out there now, built by on-the-fly php code (and jsp, perl cgi, asp, etc). Programatic page generation is transparently useful. But querying a database, negotiatinig lots of if-then-else logic and echo'ing html code out on port 80 every time a page is requested has to be a huge waste of resources. Why not use that logic to print static html instead of dynamic?

PHP

5584

Why would you create an index with "ignore nulls" set to yes?

by: shira | last post by:

I have done a fair bit of searching, but haven't yet been able to find an explanation as to why one would set "ignore nulls" to "yes" when creating an index. I understand what it does (I think), but I'm looking to understand what scenario might prompt either setting (yes or no). Any clarity you can provide is much appreciated! Thanks kindly.

Microsoft Access / VBA

8584

"Criticism of the C programming language ??????"

by: aarklon | last post by:

Hi all, I found an interesting article here:- http://en.wikipedia.org/wiki/Criticism_of_the_C_programming_language well what do you guys think of this article....??? Is it constructive criticism that needs to be appreciated always...???

C / C++

9568

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...

C / C++

9389

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...

Online Marketing

9335

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...

Windows Server

9256

Discussion: How does Zigbee compare with other wireless protocols in smart home applications?

by: tracyyun | last post by:

Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...

General

6801

Access Europe - Using VBA to create a class based on a table - Wed 1 May

by: isladogs | last post by:

The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...

Microsoft Access / VBA

6079

Couldn’t get equations in html when convert word .docx file to html file in C#.

by: conductexam | last post by:

I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...

C# / C Sharp

4881

Windows Forms - .Net 8.0

by: adsilva | last post by:

A Windows Forms form does not have the event Unload, like VB6. What one acts like?

Visual Basic .NET

3320

transfer the data from one system to another through ip address

by: 6302768590 | last post by:

Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system

C# / C Sharp

2218

Comprehensive Guide to Website Development in Toronto: Expert Insights from BSMN Consultancy

by: bsmnconsultancy | last post by:

In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

General