473,785 Members | 2,568 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Searching My XML File Using Keyword Searches?

Hi.

I am somewhat new to this and would like some advice.
I want to search my xml file using "keyword" search and
return results based on "proximity matching" - in other words,
since the search string will often not produce a direct match,
the results will be based on proximity (50%, 20% 100%, etc).

are there any good examples out there on how to do keyword
searches on XML data? How should i set up my xml file so
as to make a tag as likely as possible to match a related
search term?

and finally, how is proximity determined?

I know that this is a heady question, but i am hoping that some
answers at least put me on the right track - possibly with
links, sample code, or examples.

thanks in advance.

Nov 10 '06 #1
7 2622
pbd22 wrote:
I want to search my xml file using "keyword" search and
return results based on "proximity matching"
I don't know of any off-the-shelf code for this purpose, so you may be
stuck with implementing it yourself based on basic XML APIs and/or as a
complicated stylesheet.

--
() ASCII Ribbon Campaign | Joe Kesselman
/\ Stamp out HTML e-mail! | System architexture and kinetic poetry
Nov 10 '06 #2

Hi.

Thanks.

I figured this. I have done a bit of searching and it seems that XPath,
XML/XSLT and CSS
are the way to go.

I am very new to this and have a follow up question. I am trying to
just get going and
am having trouble getting an example to work.

I was attempting to load categories.xml from my server using the below
commented out code (at the bottom of the page). It looks right, but, it
fails at xmlDoc = new ActiveXObject(. ..

So, it seems that the ActiveXObject part is causing it to fail. I then
added the below
"testing" code from another web site to expore what to use for my
xmlhttp variable
and i get the categories.xml file from the server just fine.

but, i need to be able to use the MSXML API as in the commented code.
How do
i access "Msxml2.DOMDocu ment.6.0"? What am i doing wrong?

Thanks.

<script type="text/javascript">

function BuildDocument() {

var xmlhttp=false;

try {
xmlhttp = new ActiveXObject(" Msxml2.DOMDocum ent.6.0");
} catch (e) {
try {
xmlhttp = new ActiveXObject(" Msxml2.DOMDocum ent.6.0");
} catch (E) {
xmlhttp = false;
}
}

if (!xmlhttp && typeof XMLHttpRequest! ='undefined') {
try {
xmlhttp = new XMLHttpRequest( );
} catch (e) {
xmlhttp=false;
}
}
if (!xmlhttp && window.createRe quest) {
try {
xmlhttp = window.createRe quest();
} catch (e) {
xmlhttp=false;
}
}

xmlhttp.open("G ET", "categories.xml ", true);
xmlhttp.onready statechange=fun ction() {

if (xmlhttp.readyS tate==4) {
document.getEle mentById('resul ts').innerHTML =
xmlhttp.respons eText;
}

}

xmlhttp.send(nu ll)

_______________ _______________ ____

COMMENTED OUT CODE:
_______________ _______________ ____

/*************** *************** *************** ***

// Load XML

var xmlDoc = new ActiveXObject(" Msxml2.DOMDocum ent.6.0");
xmlDoc.async = false;
xmlDoc.validate OnParse = false;
xmlDoc.load("ca tegories.xml");
xml.async = false;
xml.load("categ ories.xml");

// Load XSL
var xsl = new ActiveXObject(" Msxml2.DOMDocum ent.6.0");
xsl.async = false;
xsl.load("categ ories.xsl");

// Transform
document.write( xml.transformNo de(xsl));

*************** *************** *************** ****/

}
Joe Kesselman wrote:
pbd22 wrote:
I want to search my xml file using "keyword" search and
return results based on "proximity matching"

I don't know of any off-the-shelf code for this purpose, so you may be
stuck with implementing it yourself based on basic XML APIs and/or as a
complicated stylesheet.

--
() ASCII Ribbon Campaign | Joe Kesselman
/\ Stamp out HTML e-mail! | System architexture and kinetic poetry
Nov 10 '06 #3
pbd22 wrote:
Hi.

Thanks.

I figured this. I have done a bit of searching and it seems that XPath,
XML/XSLT and CSS are the way to go.
For a single file this will probably work, but for anything bigger
(eg a folder-full) you really need an indexing engine, otherwise it
will take forever.

The problem with proximity search in marked text is to decide what
"proximate" means. If you allow proximity to bleed over markup
boundaries, you increase the number of hits but you risk them being
inaccurate or misleading. For example if you search for "character
function" with proximity set to more than 12 words, the text

<para>...stuff. ..and his character was by far the strongest
in the play.</para>
</section>
</chapter>
<chapter>
<head>Set Design</head>
<para>The function of set design in Restoration drama...</para>

will produce a hit which computer scientists may not expect. IMHE
the acceptable limit is to allow proximity to bleed across markup
in mixed content plus the first higher level of element content.
This would allow it to operate across (for example) adjacent
paragraphs, but not across adjacent sections or chapters.

This has implications for the indexing engine, as it needs to store
not only the character offsets of words but also their markup depth
and adjacency. Very few manage to do this correctly, despite the
original technique having been implemented a long time ago (PAT).

///Peter
--
XML FAQ: http://xml.silmaril.ie/
Nov 11 '06 #4
hi peter.

ok, thanks. well, i guess then i am in luck (kind of).
i am only doing a search on a single file (categories.xml ).
the file, however, is very large an quite detailed - there are
sub categories of sub categories of sub categories and so on.

the good news is that the file does not take user input. or,
for that matter, any text at all. it simply servs as a way for
users to search a term, say, "Hard Drive" and find what
categories of the many available match that term. the response
from the server should be as many (remotely) related paths
as possible and their associated relevancy rank:

1) Technology Hardware Hard Drives
100%
2) Cinema Movies Features "Hard Drive" 93%
3) Books Politics Elections "Hard Drive" 90%
4) Books Sports Swimming Biography 36%
5) Media News International Art
30%
6) Music New Age
4%

so, my example doesnt really match yours in the sense that
paragraphs with massive contextual differences could produce very
misleading
results.

What i do need to understand is how to rank such a search. How would
the logic
work for scoring number (2) as 93% and (3) as 90%, say? Should i be
including
a series of related words in the XML for each topic - those with "more"
related words get a higher rank? That seems very crude. I'll do
research on Indexing Engines but, based
on what you said, it seems like it may be overkill since i am working
with a single
file (categories.xml and categories.xsl) and am not dealing with wordy
paragraphs.

thanks again.
Peter Flynn wrote:
pbd22 wrote:
Hi.

Thanks.

I figured this. I have done a bit of searching and it seems that XPath,
XML/XSLT and CSS are the way to go.

For a single file this will probably work, but for anything bigger
(eg a folder-full) you really need an indexing engine, otherwise it
will take forever.

The problem with proximity search in marked text is to decide what
"proximate" means. If you allow proximity to bleed over markup
boundaries, you increase the number of hits but you risk them being
inaccurate or misleading. For example if you search for "character
function" with proximity set to more than 12 words, the text

<para>...stuff. ..and his character was by far the strongest
in the play.</para>
</section>
</chapter>
<chapter>
<head>Set Design</head>
<para>The function of set design in Restoration drama...</para>

will produce a hit which computer scientists may not expect. IMHE
the acceptable limit is to allow proximity to bleed across markup
in mixed content plus the first higher level of element content.
This would allow it to operate across (for example) adjacent
paragraphs, but not across adjacent sections or chapters.

This has implications for the indexing engine, as it needs to store
not only the character offsets of words but also their markup depth
and adjacency. Very few manage to do this correctly, despite the
original technique having been implemented a long time ago (PAT).

///Peter
--
XML FAQ: http://xml.silmaril.ie/
Nov 12 '06 #5
pbd22 wrote:
What i do need to understand is how to rank such a search. How would
the logic
work for scoring number (2) as 93% and (3) as 90%, say?
That's an application design issue, not an XML issue per se.

--
() ASCII Ribbon Campaign | Joe Kesselman
/\ Stamp out HTML e-mail! | System architexture and kinetic poetry
Nov 12 '06 #6

ok, fair enough.

i was just hoping that somebody could give me some ideas
about how to sturcture my categories.xml file for the kind of
search i am trying to do.

another poster provided some useful code for the XSL file (below).
but now, if somebody could show me how to pass the value from
the user's search string on the client to the XSL file and, how to
structure the XML file for the kind of "proximity searching" that
i was discussing wiht Peter. Should each node contain a string
of key words?

If this is an application design issue and not an XML problem, fair
enough. otherwise, advice appreciated.

<?xml version="1.0" encoding="UTF-8"?>
<xsl:styleshe et version="1.0"
xmlns:xsl="http ://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8"
indent="yes"/>
<!--change <xsl:variable name="data" select="'met sport baseball'"/>
in
<xsl:param name="data"/>-->
<xsl:variable name="data" select="'met sport baseball'"/>
<xsl:variable name="upperCase "
select="'ABCDEF GHIJKLMNOPQRSTU VWXYZ'"/>
<xsl:variable name="lowerCase "
select="'abcdef ghijklmnopqrstu vwxyz'"/>
<xsl:variable name="test"
select="transla te($data,$upper Case,$lowerCase )"/>
<xsl:template match="/">
<xsl:apply-templates select="*/*">
<xsl:with-param name="search" select="$test"/>
</xsl:apply-templates>
</xsl:template>
<xsl:template match="*">
<xsl:param name="search"/>
<xsl:variable name="result">
<xsl:call-template name="searching ">
<xsl:with-param name="Sdeb" select="$search "/>
<xsl:with-param name="Send" />
<xsl:with-param name="val" select="."/>
</xsl:call-template>
</xsl:variable>
<xsl:if test="string($r esult)=''">
trouvé <xsl:value-of select="."/>
</xsl:if>
</xsl:template>
<xsl:template match="*[@title]">
<xsl:param name="search"/>
<xsl:variable name="result">
<xsl:call-template name="searching ">
<xsl:with-param name="Sdeb" select="$search "/>
<xsl:with-param name="Send" />
<xsl:with-param name="val" select="@title"/>
</xsl:call-template>
</xsl:variable>
<xsl:choose>
<xsl:when test="string($r esult)=''">
trouvé <xsl:value-of select="@title"/>
</xsl:when>
<xsl:otherwis e>
<xsl:apply-templates select="*">
<xsl:with-param name="search" select="string( $result)"/>
</xsl:apply-templates>
</xsl:otherwise>
</xsl:choose>
</xsl:template>
<xsl:template name="searching ">
<xsl:param name="Sdeb"/>
<xsl:param name="Send"/>
<xsl:param name="val"/>
<xsl:variable name="trans">
<xsl:choose>
<xsl:when test="contains( $Sdeb,' ')">
<xsl:value-of select="substri ng-before($Sdeb,' ')"/>
</xsl:when>
<xsl:otherwis e>
<xsl:value-of select="$Sdeb"/>
</xsl:otherwise>
</xsl:choose>
</xsl:variable>
<xsl:variable name="word" select="string( $trans)"/>
<xsl:choose>
<xsl:when
test="contains( translate($val, $upperCase,$low erCase),$word)" >
<xsl:choose>
<xsl:when test="$Sdeb=$wo rd">
<xsl:value-of select="$Send"/>
</xsl:when>
<xsl:otherwis e>
<xsl:call-template name="searching ">
<xsl:with-param name="Sdeb"
select="substri ng-after($Sdeb,' ')"/>
<xsl:with-param name="Send" select="$Send"/>
<xsl:with-param name="val" select="$val"/>
</xsl:call-template>
</xsl:otherwise>
</xsl:choose>
</xsl:when>
<xsl:otherwis e>
<xsl:choose>
<xsl:when test="$Sdeb=$wo rd">
<xsl:value-of select="concat( $Send,' ',$word)"/>
</xsl:when>
<xsl:otherwis e>
<xsl:call-template name="searching ">
<xsl:with-param name="Sdeb"
select="substri ng-after($Sdeb,' ')"/>
<xsl:with-param name="Send" select="concat( $Send,'
',$word)"/>
<xsl:with-param name="val" select="$val"/>
</xsl:call-template>
</xsl:otherwise>
</xsl:choose>
</xsl:otherwise>
</xsl:choose>
</xsl:template>
</xsl:stylesheet>

Joe Kesselman wrote:
pbd22 wrote:
What i do need to understand is how to rank such a search. How would
the logic
work for scoring number (2) as 93% and (3) as 90%, say?

That's an application design issue, not an XML issue per se.

--
() ASCII Ribbon Campaign | Joe Kesselman
/\ Stamp out HTML e-mail! | System architexture and kinetic poetry
Nov 12 '06 #7
pbd22 wrote:
but now, if somebody could show me how to pass the value from
the user's search string on the client to the XSL file
Look up "stylesheet parameters". The exact syntax for passing them in
varies from one XSLT processor to another, but the XSL syntax is the
same in all processors.

Getting it from the client to a server is, presumably, standard client
forms and server programming.
and, how to
structure the XML file for the kind of "proximity searching" that
i was discussing wiht Peter.
As I say, I think that's drifting off from XML into basic programming
and data-structure design. Others may, of course, disagree.

--
() ASCII Ribbon Campaign | Joe Kesselman
/\ Stamp out HTML e-mail! | System architexture and kinetic poetry
Nov 12 '06 #8

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

4
2765
by: Michi | last post by:
I was wondering what the best solution is for making large numbers of TEXT (or BLOB?) fields searchable. For example, if I have a forum, what is the best way to be able to search for specific words? How about exact phrases? I saw a solution where all words are preindexed in a "dictionary" like table and then another table stores the word matches. That seems really fast, but it has two major problems: 1) it can't do exact matches, and 2)...
0
1208
by: Adam | last post by:
I am currently determining the architecture for a rewrite of an existing retailed software product - moving from Smalltalk to C# .Net. It is to be a rich-client single-user desktop application. The architecture requires a database to store and manage complex relational information in conjunction with text documents. SQL Server 2000 Desktop Engine seems an ideal architecture candidate Unfortunately, the Desktop Engine does not seem to...
4
1648
by: James | last post by:
We have a need to search through an entire drive for a specific file name. The process is currently written with recursive loops through each directory and the Scripting.FileSystemObject. Problem is, it takes forever. When I do a standard file search through Windows Explorer, it typically finds the file I'm looking for within 5 seconds. Is there another approach to file searching, or am I stuck with trying to make this recursive loop...
3
1867
by: googleboy | last post by:
Hi there. I have defined a class called Item with several (about 30 I think) different attributes (is that the right word in this context?). An abbreviated example of the code for this is: class Item(object): def __init__(self, height, length, function): params = locals()
5
1776
by: jayjay | last post by:
I'm trying to help a friend setup a database to track resumes. The candidates will submit their resume in a Word doc format, and I'd like to make a search that will do a context search of the word files and generate a report that matches the keyword search. How would you do something like this in access?
11
2251
by: Michele and John | last post by:
I would like to write a C++ program that searches for the variable "state != 0" in a text file, and then go back 3 steps each time to read "count". The program should create a new file with "state count". The data in the old file is variable, but could be as follows: old_file.txt new_file.txt state count state count 0 22
5
2405
by: justobservant | last post by:
When more than one keyword is typed into a search-query, most of the search-results displayed indicate specified keywords scattered throughout an entire website of content i.e., this is shown as three bolded periods '...' in search-result listings. Additionally, most content is outdated; as many users need up-to-date content. Hence, filtering-through search-results becomes quite cumbersome. The newsgroup listings allow detailed...
4
5349
by: Hunk | last post by:
Hi I have a binary file which contains records sorted by Identifiers which are strings. The Identifiers are stored in ascending order. I would have to write a routine to give the record given the Identifier. The logical way would be to read the record once and put it in an STL container such as vector and then use lower_bound to search for a given identifier. But for some strange reason i'm asked to not use a container but instead...
1
2785
by: alamodgal | last post by:
hiiiiiii I have a problem in highlighting searching keyword.Actually im using this function for searching Public Function HighLight(ByVal Keyword As String, ByVal ContentFor As String) Dim objHighLight As New highlight(Keyword, "<span class='searchKeyword'>{keyword}</span>") ContentFor = objHighLight.process(ContentFor, False, False) Return ContentFor 'Dim RegExp As Regex = New Regex(Keyword.Replace(" ", "|").Trim(),...
0
9645
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
9481
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
10341
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
8979
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
1
7502
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
5383
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
0
5513
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
4054
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
2
3656
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.