Searching in tree structures

Gordon

I've developed a CMS that manages content across several sites. The
content is represented in a tree structure maintained in a database,
where each item has an id and a parent that points to the containing
item or 0 if it's a root item (a site).

I developed a fulltext search script for the CMS that will return
pages ordered by relevance.

This is fine, but I've now had a change request whereby searches can
be restricted to specific sites. This is a problem because the only
way to know what site an item belongs to is to fetch it, walk up it's
tree and look at it's root node.

I can see 2 possible solutions, but both have drawbacks. The first is
to just fetch all matches and then discard ones that don't belong to
the specified site. This means increased DB load and traffic between
the DB and the webserver and could potentially cause quite a
performance hit when the number of pages being dealt with climbs into
the thousands.

The other solution is to, in addition to the parent node, also store
the root node in the database. This would give an extra field to
match against and make eliminating items with an SQL query, thus
reducing load and traffic, quite simple. However the big drawback is
that there are now two fields determining where an item fits into the
tree structure instead of just one, making it an obvious source of
potential error. It would also mean updating code related to item
creation updating that wouldn't have to be touched with the other
approach.

If any of you guys out there have another potential solution that I
might have not thought of I'd like to hear what they are. Failing
that, which of the options I do have would you recommend?

Jun 2 '08 #1

Subscribe Reply

1435

Jerry Stuckle

Gordon wrote:

I've developed a CMS that manages content across several sites. The
content is represented in a tree structure maintained in a database,
where each item has an id and a parent that points to the containing
item or 0 if it's a root item (a site).

I developed a fulltext search script for the CMS that will return
pages ordered by relevance.

This is fine, but I've now had a change request whereby searches can
be restricted to specific sites. This is a problem because the only
way to know what site an item belongs to is to fetch it, walk up it's
tree and look at it's root node.

I can see 2 possible solutions, but both have drawbacks. The first is
to just fetch all matches and then discard ones that don't belong to
the specified site. This means increased DB load and traffic between
the DB and the webserver and could potentially cause quite a
performance hit when the number of pages being dealt with climbs into
the thousands.

The other solution is to, in addition to the parent node, also store
the root node in the database. This would give an extra field to
match against and make eliminating items with an SQL query, thus
reducing load and traffic, quite simple. However the big drawback is
that there are now two fields determining where an item fits into the
tree structure instead of just one, making it an obvious source of
potential error. It would also mean updating code related to item
creation updating that wouldn't have to be touched with the other
approach.

If any of you guys out there have another potential solution that I
might have not thought of I'd like to hear what they are. Failing
that, which of the options I do have would you recommend?

Not from the PHP end. You will have to retrieve the data and throw away
what you don't want.

If you're looking for a database solution, you should try the
appropriate database newsgroup.

--
=============== ===
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
js*******@attgl obal.net
=============== ===

Jun 2 '08 #2

sheldonlg

Gordon wrote:

I've developed a CMS that manages content across several sites. The
content is represented in a tree structure maintained in a database,
where each item has an id and a parent that points to the containing
item or 0 if it's a root item (a site).

I developed a fulltext search script for the CMS that will return
pages ordered by relevance.

This is fine, but I've now had a change request whereby searches can
be restricted to specific sites. This is a problem because the only
way to know what site an item belongs to is to fetch it, walk up it's
tree and look at it's root node.

I can see 2 possible solutions, but both have drawbacks. The first is
to just fetch all matches and then discard ones that don't belong to
the specified site. This means increased DB load and traffic between
the DB and the webserver and could potentially cause quite a
performance hit when the number of pages being dealt with climbs into
the thousands.

The other solution is to, in addition to the parent node, also store
the root node in the database. This would give an extra field to
match against and make eliminating items with an SQL query, thus
reducing load and traffic, quite simple. However the big drawback is
that there are now two fields determining where an item fits into the
tree structure instead of just one, making it an obvious source of
potential error. It would also mean updating code related to item
creation updating that wouldn't have to be touched with the other
approach.

If any of you guys out there have another potential solution that I
might have not thought of I'd like to hear what they are. Failing
that, which of the options I do have would you recommend?

I opt for the second. It falls into the realm of code once, thoroughly
test, forget about it. A severe performance hit, OTOH, is always present.

Jun 2 '08 #3

=?ISO-8859-1?Q?=22=C1lvaro_G=2E_Vicario=22?=

Gordon escribió:

I've developed a CMS that manages content across several sites. The
content is represented in a tree structure maintained in a database,
where each item has an id and a parent that points to the containing
item or 0 if it's a root item (a site).

[...]

If any of you guys out there have another potential solution that I
might have not thought of I'd like to hear what they are. Failing
that, which of the options I do have would you recommend?

There are many DBMS-specific solutions to handle hierarchical data. What
database types does your CMS need to support?

In any case, unless your tree is too big you can:

1. Fetch the complete tree structure (1 DB query and not much data, just
IDs)
2. Find the IDs of the right nodes in PHP
3. Fetch the desired info filtering by node: WHERE NODE_ID IN (....)

A couple of links I've gathered:

[MySQL+InnoDB] http://jan.kneschke.de/projects/mysql/sp
[Oracle] http://philip.greenspun.com/sql/trees.html
--
-- http://alvaro.es - Álvaro G. Vicario - Burgos, Spain
-- Mi sitio sobre programación web: http://bits.demogracia.com
-- Mi web de humor al baño María: http://www.demogracia.com
--

Jun 2 '08 #4

Gordon

On May 20, 1:00 pm, sheldonlg <sheldonlgwrote :

Gordon wrote:
I've developed a CMS that manages content across several sites. The
content is represented in a tree structure maintained in a database,
where each item has an id and a parent that points to the containing
item or 0 if it's a root item (a site).

I developed a fulltext search script for the CMS that will return
pages ordered by relevance.

This is fine, but I've now had a change request whereby searches can
be restricted to specific sites. This is a problem because the only
way to know what site an item belongs to is to fetch it, walk up it's
tree and look at it's root node.

I can see 2 possible solutions, but both have drawbacks. The first is
to just fetch all matches and then discard ones that don't belong to
the specified site. This means increased DB load and traffic between
the DB and the webserver and could potentially cause quite a
performance hit when the number of pages being dealt with climbs into
the thousands.

The other solution is to, in addition to the parent node, also store
the root node in the database. This would give an extra field to
match against and make eliminating items with an SQL query, thus
reducing load and traffic, quite simple. However the big drawback is
that there are now two fields determining where an item fits into the
tree structure instead of just one, making it an obvious source of
potential error. It would also mean updating code related to item
creation updating that wouldn't have to be touched with the other
approach.

If any of you guys out there have another potential solution that I
might have not thought of I'd like to hear what they are. Failing
that, which of the options I do have would you recommend?

I opt for the second. It falls into the realm of code once, thoroughly
test, forget about it. A severe performance hit, OTOH, is always present.

That's a good point. I suppose if I'm sufficiantly careful then the 2
pointers solution offers almost no performance hit as opposed to the
other approach. I just get a bit paranoid about that kind of thing, I
had it drilled into me during my education on databases you don't
store redundant data because of the problems it can cause if it
desyncs. In this case the tradeoff is probably worth it.

Jun 2 '08 #5

Gordon

On May 20, 1:06 pm, "Álvaro G. Vicario"
<alvaroNOSPAMTH A...@demogracia .comwrote:

Gordon escribió:

I've developed a CMS that manages content across several sites. The
content is represented in a tree structure maintained in a database,
where each item has an id and a parent that points to the containing
item or 0 if it's a root item (a site).
[...]
If any of you guys out there have another potential solution that I
might have not thought of I'd like to hear what they are. Failing
that, which of the options I do have would you recommend?

There are many DBMS-specific solutions to handle hierarchical data. What
database types does your CMS need to support?

In any case, unless your tree is too big you can:

1. Fetch the complete tree structure (1 DB query and not much data, just
IDs)
2. Find the IDs of the right nodes in PHP
3. Fetch the desired info filtering by node: WHERE NODE_ID IN (....)

A couple of links I've gathered:

[MySQL+InnoDB] http://jan.kneschke.de/projects/mysql/sp
[Oracle]http://philip.greenspu n.com/sql/trees.html

--
--http://alvaro.es- Álvaro G. Vicario - Burgos, Spain
-- Mi sitio sobre programación web:http://bits.demogracia.com
-- Mi web de humor al baño María:http://www.demogracia.com
--

Thanks for the pointers. The CMS I'm working on is using Postgres
8.3.1 as the backend. Very good fulltext search once you've gotten
used to its quirks. :)

Jun 2 '08 #6

Jerry Stuckle

Gordon wrote:

On May 20, 1:00 pm, sheldonlg <sheldonlgwrote :
>Gordon wrote:
>>I've developed a CMS that manages content across several sites. The
content is represented in a tree structure maintained in a database,
where each item has an id and a parent that points to the containing
item or 0 if it's a root item (a site).
I developed a fulltext search script for the CMS that will return
pages ordered by relevance.
This is fine, but I've now had a change request whereby searches can
be restricted to specific sites. This is a problem because the only
way to know what site an item belongs to is to fetch it, walk up it's
tree and look at it's root node.
I can see 2 possible solutions, but both have drawbacks. The first is
to just fetch all matches and then discard ones that don't belong to
the specified site. This means increased DB load and traffic between
the DB and the webserver and could potentially cause quite a
performance hit when the number of pages being dealt with climbs into
the thousands.
The other solution is to, in addition to the parent node, also store
the root node in the database. This would give an extra field to
match against and make eliminating items with an SQL query, thus
reducing load and traffic, quite simple. However the big drawback is
that there are now two fields determining where an item fits into the
tree structure instead of just one, making it an obvious source of
potential error. It would also mean updating code related to item
creation updating that wouldn't have to be touched with the other
approach.
If any of you guys out there have another potential solution that I
might have not thought of I'd like to hear what they are. Failing
that, which of the options I do have would you recommend?
I opt for the second. It falls into the realm of code once, thoroughly
test, forget about it. A severe performance hit, OTOH, is always present.

That's a good point. I suppose if I'm sufficiantly careful then the 2
pointers solution offers almost no performance hit as opposed to the
other approach. I just get a bit paranoid about that kind of thing, I
had it drilled into me during my education on databases you don't
store redundant data because of the problems it can cause if it
desyncs. In this case the tradeoff is probably worth it.

Depending on the database, it might be possible with the structure you
have now. But you need to be asking in a newsgroup for your database.

--
=============== ===
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
js*******@attgl obal.net
=============== ===

Jun 2 '08 #7

Similar topics

5494

Fastest ip address searching method/implementation

by: Jonathan | last post by:

I am hoping that someone more experienced than myself can point me towards what might be the fastest data lookup method to use for storing ip addresses. My situation is that I will need to maintain a list of perhaps 50 ip addresses to be used in a packet sniffing application. For each packet that goes through the application (which will be monitoring all traffic through a switch), I need to see if an entry for the source ip of that packet...

C / C++

1987

Concurrency issues with "Tree" structures.

by: Robin Tucker | last post by:

Hi, I'm currently implementing a database with a tree structure in a table. The nodes in the tree are stored as records with a column called "Parent". The root of the tree has a "NULL" parent. The path to each node is stored in the column "Path" and is of the form "\000001\000002\000003\" etc. The latter enabling me to fetch subtrees using the "LIKE" predicate. I also have created the relation "ID" <-> "ID_Parent, effectively the...

Microsoft SQL Server

2697

A doubt about C's tree data struct

by: dam_fool_2003 | last post by:

I am just a beginner in tree data – struct. I have this little doubt. Left node ‘weights' lesser than the right one. I have seen, so far it is algorithm implementations. But why not vice-versa that is right node ‘weights' lesser than the left one? Why the trees are implemented in that way? Can any body clarify? Thanks in advance

C / C++

5369

convert a list to tree

by: prabhat143 | last post by:

Hi, Given a singly linked, null terminated list, how can it be converted to tree? Each node in the list has three attributes: it's ID, it's parent ID and of course, the next node it's pointing to. The parent id of root of the tree is 0. The length of list is not known. What will be the optimal solution? Node* convertToTree(Node* listHead);

C / C++

8674

Radix Tree

by: Foodbank | last post by:

Hi, I'm currently teaching myself C data structures in preparation for a job transition and I've just passed the point in which radix sorting was covered in my book. I've recently finished a radix sorting program and was surprised by its speed. Therefore, I'd like help with a radix tree program in order to compare it with a prior binary tree program that I finished. The parts I'm having trouble with are collecting statistics and...

C / C++

2138

searching algorithm

by: Gigs_ | last post by:

Hi all! I have text file (english-croatian dictionary) with words in it in alphabetical order. This file contains 179999 words in this format: english word: croatian word I want to make instant search for my gui Instant search, i mean that my program search words and show words to user as user type letters.

Python

2602

Main for binary search tree

by: Defected | last post by:

Hi, How i can implement a main function with this Binary Search Tree. thanks for help. is this code corrected ? #include<iostream>

C / C++

1605

Searching for a file - in a tree?

by: jodleren | last post by:

Hi! I wonder, which way to do this fastest. I have a disk, where I need to search for a file or directory. I do it recursively, meaning that I start from the top dir, then I add all directories to an array, and by a counter I work my way through that array. And, while doing that I add the directory or file (name) to and result

PHP

1556

tree

by: Bill Cunningham | last post by:

pgs 140-1 of kandr2 talk about recursion. I tried typing in the code char by char and couldn't find the bugs so I just read the code and a new concept to me jumped out. Maybe this is what it means by recursion. struct tnode *addtree (struct tnode *, char *); is declared on p 140. But I noticed this on p 141. struct tnode *addtree (struct tnode *p, char *w);

C / C++

9672

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...

General

10214

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...

Online Marketing

10001

Discussion: How does Zigbee compare with other wireless protocols in smart home applications?

by: tracyyun | last post by:

Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...

General

7540

Access Europe - Using VBA to create a class based on a table - Wed 1 May

by: isladogs | last post by:

The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...

Microsoft Access / VBA

6780

Couldn’t get equations in html when convert word .docx file to html file in C#.

by: conductexam | last post by:

I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...

C# / C Sharp

5437

Trying to create a lan-to-lan vpn between two differents networks

by: TSSRALBI | last post by:

Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...

Networking - Hardware / Configuration

5563

Windows Forms - .Net 8.0

by: adsilva | last post by:

A Windows Forms form does not have the event Unload, like VB6. What one acts like?

Visual Basic .NET

3727

How to add payments to a PHP MySQL app.

by: muto222 | last post by:

How can i add a mobile payment intergratation into php mysql website.

PHP

2920

Comprehensive Guide to Website Development in Toronto: Expert Insights from BSMN Consultancy

by: bsmnconsultancy | last post by:

In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

General