473,837 Members | 1,963 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Concurrency issues with "Tree" structures.

Hi,

I'm currently implementing a database with a tree structure in a table. The
nodes in the tree are stored as records with a column called "Parent". The
root of the tree has a "NULL" parent. The path to each node is stored in
the column "Path" and is of the form "\000001\000002 \000003\" etc. The
latter enabling me to fetch subtrees using the "LIKE" predicate. I also
have created the relation "ID" <-> "ID_Parent, effectively the table is
related to itself. I did this so that an attempt to remove a parent when
that parent still has children will fail due to referential integrity.
Unfortunately, in order to delete subtrees, I have to first set all of the
nodes in the subtree to point to the "NULL" parent so that I don't get
caught with integrity errors during deletes.

Unlike a typical linear system, any given node record is related to more
than one of the other records. What I mean is it is possible to follow a
chain from any given node back to the root (ancestors) or collect a series
of branches and leaves (descendants) and so it is not reasonable to consider
any given node in isolation. Changes to any given descendant can trigger
changes which are propagated to it's ancestors. I am not using recursion to
do this, rather, I am using an iterative approach to select and update the
parent, then the parents parent, etc. according to it's new state (given by
the state of it's immediate descendants), right back up to the root.

Now, the problem comes when I consider concurrency with respect to this
scheme. It seems to me, that locking the record I am updating is not
sufficient to ensure clients are kept synchronised or the integrity of the
tree structure is correct. I think I need to lock all ancestors of the tree
(HOLDLOCK) before performing any operation on a given node. Is this
reasonable? Also, consider the "delete" problem given above. I really
should HOLDLOCK on the entire subtree of any node I wish to delete as I am
going to set the entire subtrees parent values to NULL. I don't want
another client to perform a read on part of the subtree while the nodes are
"parentless " pending deletion.

Secondly, I am not sure how to handle synchronization of the tree for each
client. How does each client know when a change has been made to the tree?
Consider a client holding a copy of the tree in memory. Another client
deletes a subtree. The first client attempts to update one of the deleted
subtree nodes and fails because the node no longer exists. At this point,
in the eyes of the first client, all of the ancestors and all of the
descendants of the node in question must become suspect. Should the
software now attempt to re-build this part of the tree? It seems that any
operation on any of the nodes in the tree will potentially make a lot of
other nodes suspect and so my application will be doing a lot of updating.

I would be interested to hear any insights people have on these issues,
particularly those implementing a "file system" structure in their database
or similar. How do you deal with these concurrency issues when manipulating
trees?

Thanks


Robin

Jul 23 '05 #1
4 1987
I believe that Joe Celko has a book that is devoted to tree structures
in a SQl environment. Do a search on Amazon for Celko and you should
see it. I'm sure that Joe will probably chime in here as well. My
thoughts on the subject though...
The path to each node is stored in the column "Path" and is of the form "\000001\000002 \000003\" Ack, yuck, ick! I hate that method of storing tree information myself.
Check Joe's book for several different ways to store trees in a RDBMS.
also have created the relation "ID" <-> "ID_Parent, effectively the table is related to itself. I did this so that an attempt to remove a parent when that parent still has children will fail due to referential integrity. Unfortunately, in order to delete subtrees, I have to first set all of the nodes in the subtree to point to the "NULL" parent so that I don't get caught with integrity errors during deletes. So, what you seem to be saying is that you want RI so that you can't
delete any rows by mistake, but you want to be able to easily delete
rows. You can't have it both ways... you either need to explicitly
delete the children (or set the FK's to NULL) or you need to remove the
RI.
Changes to any given descendant can trigger changes which are propagated to it's ancestors

This sounds like a design problem with regards to normalization to me,
but without knowing the specifics I really can't say.

As for the client application tracking changes to the tree, that is
really dependent on the business requirements for your application. You
can reload the client every X seconds or you could wait for the client
to take an action then check to see if the action is still valid on the
tree in its current form. To limit the number of client refreshes that
you have to perform, each node could use a last_updated datetime column
to track changes and you could retrieve only those nodes and their
subtrees where the last_updated column has a value that is greater than
the last time that you refreshed your tree.

My suggestion would be to check out Joe's book (or Google past posts by
Joe) to find alternative tree representations . Some of them are rather
ingenious and easy to work with.

HTH,
-Tom.

Jul 23 '05 #2
Robin Tucker (id************ *************@r eallyidont.com) writes:
Now, the problem comes when I consider concurrency with respect to this
scheme. It seems to me, that locking the record I am updating is not
sufficient to ensure clients are kept synchronised or the integrity of
the tree structure is correct. I think I need to lock all ancestors of
the tree (HOLDLOCK) before performing any operation on a given node. Is
this reasonable?
UPDLOCK would be better. Else you could run into conversion deadlocks.
An UPDLOCK is a shared lock, so other processes can read. But only one
can have an UPDLOCK.
Also, consider the "delete" problem given above. I
really should HOLDLOCK on the entire subtree of any node I wish to
delete as I am going to set the entire subtrees parent values to NULL.
I don't want another client to perform a read on part of the subtree
while the nodes are "parentless " pending deletion.
I'm not really sure why you need to do this set NULL thing. In fact
that is something I would avoid like the plague. But then I know
very little of your actual business problem.

I think Joe Celko's trick for tress is to number each node in a way so
that a subtree is a contiguous range. Then you can blow away to whole
subtree in one delete.

Then again, you already had that path. Would not:

DELETE tbl WHERE path = 'a/b/c' or path LIKE 'a/b/c/%'

work?

Of course, even with set-based statements, you can have interesting
effects if two clients are it at the same time.
Secondly, I am not sure how to handle synchronization of the tree for
each client. How does each client know when a change has been made to
the tree?


You will have to ask you tech lead about that. :-) Seriously, with
no knowledge of the requirements etc, it is very difficult to answer.
As Thomas said, you can refresh automatically with some frequency.
For more bells and whistle you could push the change by activating
some signaling mechanism from a trigger. But you make have to ask for
a bigger budget to do this.

--
Erland Sommarskog, SQL Server MVP, es****@sommarsk og.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techinf...2000/books.asp
Jul 23 '05 #3
>> I would be interested to hear any insights people have on these
issues, .. <<

Get a copy of TREES & HIERARCHIES IN SQL and you can see several
different approaches.
in order to delete subtrees, I have to first set all of the nodes in

the subtree to point to the "NULL" parent so that I don't get caught
with integrity errors during deletes. <<

Look up the nested sets model; this one atomic DELETE FROM statement.

Jul 23 '05 #4

Hi Tom,
Ack, yuck, ick! I hate that method of storing tree information myself.
Check Joe's book for several different ways to store trees in a RDBMS.

Ok, I don't want to get into a religious war about the best representation
of trees. I am aware of Celkos nested sets model but prefer the clarify of
the path string approach (my system isn't going to be operating with 1,000
users or anything, so I don't mind if it's performance isn't so good); of
course when I refer to clarity, I mean my own personal ability to visualise
whats going on and write code to perform the various manipulations required!
However, I will look once again at Celkos methods because I rejected them a
year ago for various reasons (the potential renumbering of large numbers of
nodes during inserts for example) and am now a bit wiser. I also tried out
nested sets with Tropashenkos' (apologies if I misspelled the name)
materialised path (binary fractions) and found this to be unbelievably slow
and restrictive (for various reasons mainly based on numeric accuracy).
So, what you seem to be saying is that you want RI so that you can't
delete any rows by mistake, but you want to be able to easily delete
rows. You can't have it both ways... you either need to explicitly
delete the children (or set the FK's to NULL) or you need to remove the
RI.
You are right of course, I need to throw away the RI on this, then deletion
of subtrees will be atomic and I will not have to set parents to NULL.
Changes to any given descendant can trigger changes which are propagated to it's
ancestors

This sounds like a design problem with regards to normalization to me,
but without knowing the specifics I really can't say.


I will solve this by locking the ancestors (as another person says in his
post, using UPDLOCK rather than HOLDLOCK) and then processing them in any
transaction that changes state. Thinking about it though, it will still be
possible for a client to read a node that has not yet been processed while
another client is currently processing. So that first client may well
receive some processed and some unprocessed nodes back after a read
operation (is this true though? If the processing occurs within a
transaction, can other users read the uncommitted data?). The reason for
this is because the algorithm for state propagation cannot be atomic (as
fetching "ancestors" of any given node requires iteration).

However, I suppose it is possible for the client to detect problems like
this, by checking ancestors itself for consistency and forcing a refresh if
it detects an inconsistency when it reads back parts of the tree. Extra
work I think but bearing in mind the above (Celko nested sets), I suppose it
may be possible to UPDATE all ancestors atomically. If this is indeed the
case, then I might not need an iterative approach. But I doubt this is
possible, as I would have to ensure updates on the set of ancestors were
performed in order of their "depth" (leaves first of course). I don't think
SQL has such a mechanism (tables are sets after all, not sequences). I
would love to know how, using a path approach it is possible to fetch all
ancestors of a given node. I don't think it can be done with just a path
string and ID_Parent relation.

The specifics of this is each node has a "condition" (green, yellow, red,
undefined) according to the condition of it's children. For nodes of type
0, it's condition is the highest condition of each of it's children, for
nodes of type 1, it's condition is the condition of it's "most recently
added child". For nodes of type 2, the condition is fixed by the client
when the record is created. Condition propagation basically is tasked with
ensuring these conditions are consistent after deletes, updates or inserts.
As for the client application tracking changes to the tree, that is
really dependent on the business requirements for your application. You
can reload the client every X seconds or you could wait for the client
to take an action then check to see if the action is still valid on the
tree in its current form. To limit the number of client refreshes that
you have to perform, each node could use a last_updated datetime column
to track changes and you could retrieve only those nodes and their
subtrees where the last_updated column has a value that is greater than
the last time that you refreshed your tree.

I have another idea about this (I will timestamp the nodes). I have a
worker thread that serializes access to the database from the various client
controls, firing callbacks when it performs an operation. This thread can
check through the tree in it's idle time to ensure it is synchronized. Of
course, if I execute an update and fail (ROWCOUNT = 0) or a delete come to
think of it, I can force a refresh of the entire subtree and it's ancestors.
But I would consider the entire subtree and all it's ancestors to be suspect
if I detected any given node had a later timestamp than that expected.
My suggestion would be to check out Joe's book (or Google past posts by
Joe) to find alternative tree representations . Some of them are rather
ingenious and easy to work with.

HTH,
-Tom.


Thanks for all of your ideas.

Robin
Jul 23 '05 #5

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

3
10828
by: imani_technology_spam | last post by:
We need to present hierarchical data on a web page, the same way the tree view shows files in Windows Explorer. Here's the catch: that tree view needs to be bound to a SQL Server database. How can this be done?
1
2169
by: Jonathan Wilson | last post by:
I am looking for a C++ class library that can store data in a tree. The class library needs to be: 1.Available under a licence like GPL, LGPL or BSD so I can use it in my GPL program and 2.Usable on multiple compilers (specificly, Visual C++ on windows and GCC 3.x MingW on windows. If it works on GCC on Linux, thats even better but not essential) What I am storing in this tree is data for a directory tree (its going to represent the...
2
1276
by: Robin Tucker | last post by:
With respect to my (now not so recent) thread on Concurrency, I would like to run my idea past you gurus to see if its a runner. First, a brief recap: I have a single user system (one user, one copy of the software, one copy of MSDE, one machine) that I wish to convert into a multi-user/single database networked system. The problem I had was that a lot of information is fetched from the database and cached in the client program (the...
1
2761
by: Richard | last post by:
http://dynamicdrive.com/dynamicindex1/switchmenu.htm I want to add a second level menu item to the existing design. Currently only one level is possible. Item 1 ......link ......link item 2 item 3
19
6793
by: Christian Fowler | last post by:
I have a VERY LARGE pile of geographic data that I am importing into a database (db of choice is postgres, though may hop to oracle if necessary). The data is strictly hierarchical - each node has one, and only one parent. The depth should not exceed 6 or 7 levels. The initial import will have about 6 million leaves, and 3 million branches. I would expect the leaves to grow significantly, in number easily tripling. However, the branches will...
21
13858
by: Helge Jensen | last post by:
I've got some data that has Set structure, that is membership, insert and delete is fast (O(1), hashing). I can't find a System.Collections interface that matches the operations naturally offered by Sets. - ICollection cannot decide containment - IList promises indexability by the natural numbers, which is not achievable (since i hash elements, not sort them). - IDictionary is definatly not setlike. Although I can, of course, define...
0
3747
by: Tom Bower | last post by:
In the Windows Task Manager if I select a Process and right-click, I can choose to "End Process" or "End Process Tree." Is there a VB equivalent for "End Process Tree" if you have a handle to a process? I know about process.Kill and process.CloseMainWindow, but how would I do the equivalent of "End Process Tree" where any child or associated processes are also killed? How do you know what processes are considered to be part of the...
77
4046
by: M.B | last post by:
Guys, Need some of your opinion on an oft beaten track We have an option of using "goto" in C language, but most testbooks (even K&R) advice against use of it. My personal experience was that goto sometimes makes program some more cleaner and easy to understand and also quite useful (in error handling cases). So why goto is outlawed from civilized c programmers community. is there any technical inefficiency in that.
30
2960
by: josh | last post by:
Hi all, what does it meaning that strange sintax (look at the object :) ? if I have i.e. array.length I can use array. and is it IE/Firefox compatible??
0
9846
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, weíll explore What is ONU, What Is Router, ONU & Routerís main usage, and What is the difference between ONU and Router. Letís take a closer look ! Part I. Meaning of...
0
10890
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
10581
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
10634
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
10279
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
7007
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
5675
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
0
5855
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
3
3127
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.