473,394 Members | 2,031 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,394 software developers and data experts.

Concurrency issues with "Tree" structures.

Hi,

I'm currently implementing a database with a tree structure in a table. The
nodes in the tree are stored as records with a column called "Parent". The
root of the tree has a "NULL" parent. The path to each node is stored in
the column "Path" and is of the form "\000001\000002\000003\" etc. The
latter enabling me to fetch subtrees using the "LIKE" predicate. I also
have created the relation "ID" <-> "ID_Parent, effectively the table is
related to itself. I did this so that an attempt to remove a parent when
that parent still has children will fail due to referential integrity.
Unfortunately, in order to delete subtrees, I have to first set all of the
nodes in the subtree to point to the "NULL" parent so that I don't get
caught with integrity errors during deletes.

Unlike a typical linear system, any given node record is related to more
than one of the other records. What I mean is it is possible to follow a
chain from any given node back to the root (ancestors) or collect a series
of branches and leaves (descendants) and so it is not reasonable to consider
any given node in isolation. Changes to any given descendant can trigger
changes which are propagated to it's ancestors. I am not using recursion to
do this, rather, I am using an iterative approach to select and update the
parent, then the parents parent, etc. according to it's new state (given by
the state of it's immediate descendants), right back up to the root.

Now, the problem comes when I consider concurrency with respect to this
scheme. It seems to me, that locking the record I am updating is not
sufficient to ensure clients are kept synchronised or the integrity of the
tree structure is correct. I think I need to lock all ancestors of the tree
(HOLDLOCK) before performing any operation on a given node. Is this
reasonable? Also, consider the "delete" problem given above. I really
should HOLDLOCK on the entire subtree of any node I wish to delete as I am
going to set the entire subtrees parent values to NULL. I don't want
another client to perform a read on part of the subtree while the nodes are
"parentless" pending deletion.

Secondly, I am not sure how to handle synchronization of the tree for each
client. How does each client know when a change has been made to the tree?
Consider a client holding a copy of the tree in memory. Another client
deletes a subtree. The first client attempts to update one of the deleted
subtree nodes and fails because the node no longer exists. At this point,
in the eyes of the first client, all of the ancestors and all of the
descendants of the node in question must become suspect. Should the
software now attempt to re-build this part of the tree? It seems that any
operation on any of the nodes in the tree will potentially make a lot of
other nodes suspect and so my application will be doing a lot of updating.

I would be interested to hear any insights people have on these issues,
particularly those implementing a "file system" structure in their database
or similar. How do you deal with these concurrency issues when manipulating
trees?

Thanks


Robin

Jul 23 '05 #1
4 1967
I believe that Joe Celko has a book that is devoted to tree structures
in a SQl environment. Do a search on Amazon for Celko and you should
see it. I'm sure that Joe will probably chime in here as well. My
thoughts on the subject though...
The path to each node is stored in the column "Path" and is of the form "\000001\000002\000003\" Ack, yuck, ick! I hate that method of storing tree information myself.
Check Joe's book for several different ways to store trees in a RDBMS.
also have created the relation "ID" <-> "ID_Parent, effectively the table is related to itself. I did this so that an attempt to remove a parent when that parent still has children will fail due to referential integrity. Unfortunately, in order to delete subtrees, I have to first set all of the nodes in the subtree to point to the "NULL" parent so that I don't get caught with integrity errors during deletes. So, what you seem to be saying is that you want RI so that you can't
delete any rows by mistake, but you want to be able to easily delete
rows. You can't have it both ways... you either need to explicitly
delete the children (or set the FK's to NULL) or you need to remove the
RI.
Changes to any given descendant can trigger changes which are propagated to it's ancestors

This sounds like a design problem with regards to normalization to me,
but without knowing the specifics I really can't say.

As for the client application tracking changes to the tree, that is
really dependent on the business requirements for your application. You
can reload the client every X seconds or you could wait for the client
to take an action then check to see if the action is still valid on the
tree in its current form. To limit the number of client refreshes that
you have to perform, each node could use a last_updated datetime column
to track changes and you could retrieve only those nodes and their
subtrees where the last_updated column has a value that is greater than
the last time that you refreshed your tree.

My suggestion would be to check out Joe's book (or Google past posts by
Joe) to find alternative tree representations. Some of them are rather
ingenious and easy to work with.

HTH,
-Tom.

Jul 23 '05 #2
Robin Tucker (id*************************@reallyidont.com) writes:
Now, the problem comes when I consider concurrency with respect to this
scheme. It seems to me, that locking the record I am updating is not
sufficient to ensure clients are kept synchronised or the integrity of
the tree structure is correct. I think I need to lock all ancestors of
the tree (HOLDLOCK) before performing any operation on a given node. Is
this reasonable?
UPDLOCK would be better. Else you could run into conversion deadlocks.
An UPDLOCK is a shared lock, so other processes can read. But only one
can have an UPDLOCK.
Also, consider the "delete" problem given above. I
really should HOLDLOCK on the entire subtree of any node I wish to
delete as I am going to set the entire subtrees parent values to NULL.
I don't want another client to perform a read on part of the subtree
while the nodes are "parentless" pending deletion.
I'm not really sure why you need to do this set NULL thing. In fact
that is something I would avoid like the plague. But then I know
very little of your actual business problem.

I think Joe Celko's trick for tress is to number each node in a way so
that a subtree is a contiguous range. Then you can blow away to whole
subtree in one delete.

Then again, you already had that path. Would not:

DELETE tbl WHERE path = 'a/b/c' or path LIKE 'a/b/c/%'

work?

Of course, even with set-based statements, you can have interesting
effects if two clients are it at the same time.
Secondly, I am not sure how to handle synchronization of the tree for
each client. How does each client know when a change has been made to
the tree?


You will have to ask you tech lead about that. :-) Seriously, with
no knowledge of the requirements etc, it is very difficult to answer.
As Thomas said, you can refresh automatically with some frequency.
For more bells and whistle you could push the change by activating
some signaling mechanism from a trigger. But you make have to ask for
a bigger budget to do this.

--
Erland Sommarskog, SQL Server MVP, es****@sommarskog.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techinf...2000/books.asp
Jul 23 '05 #3
>> I would be interested to hear any insights people have on these
issues, .. <<

Get a copy of TREES & HIERARCHIES IN SQL and you can see several
different approaches.
in order to delete subtrees, I have to first set all of the nodes in

the subtree to point to the "NULL" parent so that I don't get caught
with integrity errors during deletes. <<

Look up the nested sets model; this one atomic DELETE FROM statement.

Jul 23 '05 #4

Hi Tom,
Ack, yuck, ick! I hate that method of storing tree information myself.
Check Joe's book for several different ways to store trees in a RDBMS.

Ok, I don't want to get into a religious war about the best representation
of trees. I am aware of Celkos nested sets model but prefer the clarify of
the path string approach (my system isn't going to be operating with 1,000
users or anything, so I don't mind if it's performance isn't so good); of
course when I refer to clarity, I mean my own personal ability to visualise
whats going on and write code to perform the various manipulations required!
However, I will look once again at Celkos methods because I rejected them a
year ago for various reasons (the potential renumbering of large numbers of
nodes during inserts for example) and am now a bit wiser. I also tried out
nested sets with Tropashenkos' (apologies if I misspelled the name)
materialised path (binary fractions) and found this to be unbelievably slow
and restrictive (for various reasons mainly based on numeric accuracy).
So, what you seem to be saying is that you want RI so that you can't
delete any rows by mistake, but you want to be able to easily delete
rows. You can't have it both ways... you either need to explicitly
delete the children (or set the FK's to NULL) or you need to remove the
RI.
You are right of course, I need to throw away the RI on this, then deletion
of subtrees will be atomic and I will not have to set parents to NULL.
Changes to any given descendant can trigger changes which are propagated to it's
ancestors

This sounds like a design problem with regards to normalization to me,
but without knowing the specifics I really can't say.


I will solve this by locking the ancestors (as another person says in his
post, using UPDLOCK rather than HOLDLOCK) and then processing them in any
transaction that changes state. Thinking about it though, it will still be
possible for a client to read a node that has not yet been processed while
another client is currently processing. So that first client may well
receive some processed and some unprocessed nodes back after a read
operation (is this true though? If the processing occurs within a
transaction, can other users read the uncommitted data?). The reason for
this is because the algorithm for state propagation cannot be atomic (as
fetching "ancestors" of any given node requires iteration).

However, I suppose it is possible for the client to detect problems like
this, by checking ancestors itself for consistency and forcing a refresh if
it detects an inconsistency when it reads back parts of the tree. Extra
work I think but bearing in mind the above (Celko nested sets), I suppose it
may be possible to UPDATE all ancestors atomically. If this is indeed the
case, then I might not need an iterative approach. But I doubt this is
possible, as I would have to ensure updates on the set of ancestors were
performed in order of their "depth" (leaves first of course). I don't think
SQL has such a mechanism (tables are sets after all, not sequences). I
would love to know how, using a path approach it is possible to fetch all
ancestors of a given node. I don't think it can be done with just a path
string and ID_Parent relation.

The specifics of this is each node has a "condition" (green, yellow, red,
undefined) according to the condition of it's children. For nodes of type
0, it's condition is the highest condition of each of it's children, for
nodes of type 1, it's condition is the condition of it's "most recently
added child". For nodes of type 2, the condition is fixed by the client
when the record is created. Condition propagation basically is tasked with
ensuring these conditions are consistent after deletes, updates or inserts.
As for the client application tracking changes to the tree, that is
really dependent on the business requirements for your application. You
can reload the client every X seconds or you could wait for the client
to take an action then check to see if the action is still valid on the
tree in its current form. To limit the number of client refreshes that
you have to perform, each node could use a last_updated datetime column
to track changes and you could retrieve only those nodes and their
subtrees where the last_updated column has a value that is greater than
the last time that you refreshed your tree.

I have another idea about this (I will timestamp the nodes). I have a
worker thread that serializes access to the database from the various client
controls, firing callbacks when it performs an operation. This thread can
check through the tree in it's idle time to ensure it is synchronized. Of
course, if I execute an update and fail (ROWCOUNT = 0) or a delete come to
think of it, I can force a refresh of the entire subtree and it's ancestors.
But I would consider the entire subtree and all it's ancestors to be suspect
if I detected any given node had a later timestamp than that expected.
My suggestion would be to check out Joe's book (or Google past posts by
Joe) to find alternative tree representations. Some of them are rather
ingenious and easy to work with.

HTH,
-Tom.


Thanks for all of your ideas.

Robin
Jul 23 '05 #5

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

3
by: imani_technology_spam | last post by:
We need to present hierarchical data on a web page, the same way the tree view shows files in Windows Explorer. Here's the catch: that tree view needs to be bound to a SQL Server database. How...
1
by: Jonathan Wilson | last post by:
I am looking for a C++ class library that can store data in a tree. The class library needs to be: 1.Available under a licence like GPL, LGPL or BSD so I can use it in my GPL program and 2.Usable...
2
by: Robin Tucker | last post by:
With respect to my (now not so recent) thread on Concurrency, I would like to run my idea past you gurus to see if its a runner. First, a brief recap: I have a single user system (one user, one...
1
by: Richard | last post by:
http://dynamicdrive.com/dynamicindex1/switchmenu.htm I want to add a second level menu item to the existing design. Currently only one level is possible. Item 1 ......link ......link item 2...
19
by: Christian Fowler | last post by:
I have a VERY LARGE pile of geographic data that I am importing into a database (db of choice is postgres, though may hop to oracle if necessary). The data is strictly hierarchical - each node has...
21
by: Helge Jensen | last post by:
I've got some data that has Set structure, that is membership, insert and delete is fast (O(1), hashing). I can't find a System.Collections interface that matches the operations naturally offered...
0
by: Tom Bower | last post by:
In the Windows Task Manager if I select a Process and right-click, I can choose to "End Process" or "End Process Tree." Is there a VB equivalent for "End Process Tree" if you have a handle to a...
77
by: M.B | last post by:
Guys, Need some of your opinion on an oft beaten track We have an option of using "goto" in C language, but most testbooks (even K&R) advice against use of it. My personal experience was that...
30
by: josh | last post by:
Hi all, what does it meaning that strange sintax (look at the object :) ? if I have i.e. array.length I can use array. and is it IE/Firefox compatible??
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.