473,406 Members | 2,345 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,406 software developers and data experts.

Search - stop words - array/database/text file?

Lo all,

Ok - I'm adding site search functionality to a database driven website.

I have a list of 390 stop/ignore words, having looked at ASPFAQ already I
see that the example uses an array, what I was wondering was whether this
would still be the best practice for this quantity of stop words?

There is a larger over head in me defining the array initially as I will
have to hard code them all in, alternatively I thought I could import them
into a SQL Server table from the excel file they are currently in and then
query that, but I believe the ASPFAQ article gave a good reason for not
doing that, my last thought was to read them in from a text file...

Anyone got any thoughts? Would an array be equally as efficient for 390 stop
words as it is for 10-20? Is it better to hard code them rather than grab
them from a database?

Any help / advice would be appreciated.

Regards

Rob
Jul 19 '05 #1
8 2389
Rob Meade wrote:
Lo all,

Ok - I'm adding site search functionality to a database driven
website.

I have a list of 390 stop/ignore words, having looked at ASPFAQ
already I see that the example uses an array, what I was wondering
was whether this would still be the best practice for this quantity
of stop words?

There is a larger over head in me defining the array initially as I
will have to hard code them all in, alternatively I thought I could
import them into a SQL Server table from the excel file they are
currently in and then query that, but I believe the ASPFAQ article
gave a good reason for not doing that, my last thought was to read
them in from a text file...

Anyone got any thoughts? Would an array be equally as efficient for
390 stop words as it is for 10-20? Is it better to hard code them
rather than grab them from a database?

Any help / advice would be appreciated.

Regards

Rob


If the list will be static, I would store it in an Application variable,
making the decision as to whether to store it in a database or a textfile
superfluous. If the list has no relationship to any of your database data,
then a text file on your web server seems to be indicated.

Moreover, I would suggest storing it as an XML DOMDocument, allowing you to
use the XML Parser DOM methods to easily search for values in the list.

Bob Barrows

--
Microsoft MVP - ASP/ASP.NET
Please reply to the newsgroup. This email account is my spam trap so I
don't check it very often. If you must reply off-line, then remove the
"NO SPAM"
Jul 19 '05 #2
"Bob Barrows" wrote ...
If the list will be static, I would store it in an Application variable,
making the decision as to whether to store it in a database or a textfile
superfluous.
Hi Bob,

Yes, initially this list will definately be static, I do not plan to add to
the list dynamically at this stage.
If the list has no relationship to any of your database data,
then a text file on your web server seems to be indicated.
ok
Moreover, I would suggest storing it as an XML DOMDocument, allowing you to use the XML Parser DOM methods to easily search for values in the list.


hmmm...hadn't thought of XML for this..

Wouldnt this be quite a bit of extra code considering all I want to do is
iterate through the list and chop those words out of the original search
string etc? Maybe not, not sure...can't see the advantages of this method?

Any further info appreciated..

Regards

Rob
Jul 19 '05 #3
Rob Meade wrote:
"Bob Barrows" wrote ...
If the list will be static, I would store it in an Application
variable, making the decision as to whether to store it in a
database or a textfile superfluous.


Hi Bob,

Yes, initially this list will definately be static, I do not plan to
add to the list dynamically at this stage.
If the list has no relationship to any of your database data,
then a text file on your web server seems to be indicated.


ok
Moreover, I would suggest storing it as an XML DOMDocument, allowing
you to use the XML Parser DOM methods to easily search for values in
the list.


hmmm...hadn't thought of XML for this..

Wouldnt this be quite a bit of extra code considering all I want to
do is iterate through the list and chop those words out of the
original search string etc? Maybe not, not sure...can't see the
advantages of this method?


Ah! I see. I was thinking you would need to do the opposite: find specific
words in the list.

To find a word in an array:
for i = 0 to ubound(ar)
if ar(i) = <something> then
exit for
end if
next

To find a word in an XML Document:
xmldoc.selectsinglenode("/root/node[value='<something>']")

There is no extra code involved in looping through a DOM Document:

for each oNode in xmldoc.documentelement.childnodes
'do something with oNode.Text
next

Given the comparative sizes of the array and xml document, if I did not need
search capabilities, I would go with the array.

Bob Barrows

--
Microsoft MVP - ASP/ASP.NET
Please reply to the newsgroup. This email account is my spam trap so I
don't check it very often. If you must reply off-line, then remove the
"NO SPAM"
Jul 19 '05 #4
"Bob Barrows" wrote ...
Given the comparative sizes of the array and xml document, if I did not need search capabilities, I would go with the array.


Hi Bob,

Many thanks for the reply, and examples, I will use the array method for now
then - many thanks - if you have time - I've another question - see Search
(part2) :oD

Cheers

Rob
Jul 19 '05 #5
"Bob Barrows" wrote:
: To find a word in an array:
: for i = 0 to ubound(ar)
: if ar(i) = <something> then
: exit for
: end if
: next
:
: To find a word in an XML Document:
: xmldoc.selectsinglenode("/root/node[value='<something>']")
:
: There is no extra code involved in looping through a DOM Document:
:
: for each oNode in xmldoc.documentelement.childnodes
: 'do something with oNode.Text
: next
:
: Given the comparative sizes of the array and xml document, if I did not
need
: search capabilities, I would go with the array.

Or you could use Filter and eliminate the For...Next loop:

<%@ Language=VBScript %>
<%
Option Explicit
Response.Buffer = True

sub lPrt(strMsg)
Response.Write(strMsg & "<br />" & vbCrLf)
end sub

sub Prt(strMsg)
Response.Write(strMsg)
end sub

sub findWord(arr, fWord)
if isFound(arr, fWord) = fWord Then
Response.Write(fWord & " found in array.<br />" & vbCrLf)
else
Response.Write(fWord & " not found in array.<br />" & vbCrLf)
end if
end sub

function isFound(arr, fWord)
dim f
f = Filter(arr, fWord)
if ubound(f) <> 0 Then
isFound = ""
else
isFound = f(0)
end if
end function

dim str, myarray, fWord
str = "one two three four five six seven eight nine ten"
myarray = Split(str)

lPrt("Using Filter to find words in an array")
lPrt("Array elements: " & str)
Prt("Testing eleven: ")
findWord myarray, "eleven"
Prt("Testing five: ")
findWord myarray, "five"
%>

http://kiddanger.com/lab/filter.asp

--
Roland Hall
/* This information is distributed in the hope that it will be useful, but
without any warranty; without even the implied warranty of merchantability
or fitness for a particular purpose. */
Technet Script Center - http://www.microsoft.com/technet/scriptcenter/
WSH 5.6 Documentation - http://msdn.microsoft.com/downloads/list/webdev.asp
MSDN Library - http://msdn.microsoft.com/library/default.asp
Jul 19 '05 #6
Roland Hall wrote:
"Bob Barrows" wrote:
To find a word in an array:
for i = 0 to ubound(ar)
if ar(i) = <something> then
exit for
end if
next

To find a word in an XML Document:
xmldoc.selectsinglenode("/root/node[value='<something>']")

There is no extra code involved in looping through a DOM Document:

for each oNode in xmldoc.documentelement.childnodes
'do something with oNode.Text
next

Given the comparative sizes of the array and xml document, if I did
not need search capabilities, I would go with the array.


Or you could use Filter and eliminate the For...Next loop:

<%@ Language=VBScript %>
<%
Option Explicit
Response.Buffer = True

sub lPrt(strMsg)
Response.Write(strMsg & "<br />" & vbCrLf)
end sub

sub Prt(strMsg)
Response.Write(strMsg)
end sub

sub findWord(arr, fWord)
if isFound(arr, fWord) = fWord Then
Response.Write(fWord & " found in array.<br />" & vbCrLf)
else
Response.Write(fWord & " not found in array.<br />" & vbCrLf)
end if
end sub

function isFound(arr, fWord)
dim f
f = Filter(arr, fWord)
if ubound(f) <> 0 Then
isFound = ""
else
isFound = f(0)
end if
end function

dim str, myarray, fWord
str = "one two three four five six seven eight nine ten"
myarray = Split(str)

lPrt("Using Filter to find words in an array")
lPrt("Array elements: " & str)
Prt("Testing eleven: ")
findWord myarray, "eleven"
Prt("Testing five: ")
findWord myarray, "five"
%>

http://kiddanger.com/lab/filter.asp


Hah! I had forgotten about that. Thanks for the heads-up.

Bob Barrows
--
Microsoft MVP - ASP/ASP.NET
Please reply to the newsgroup. This email account is my spam trap so I
don't check it very often. If you must reply off-line, then remove the
"NO SPAM"
Jul 19 '05 #7
"Bob Barrows" wrote:
: Roland Hall wrote:
: > "Bob Barrows" wrote:
: >> To find a word in an array:
: >> for i = 0 to ubound(ar)
: >> if ar(i) = <something> then
: >> exit for
: >> end if
: >> next
: >>
: >> To find a word in an XML Document:
: >> xmldoc.selectsinglenode("/root/node[value='<something>']")
: >>
: >> There is no extra code involved in looping through a DOM Document:
: >>
: >> for each oNode in xmldoc.documentelement.childnodes
: >> 'do something with oNode.Text
: >> next
: >>
: >> Given the comparative sizes of the array and xml document, if I did
: >> not need search capabilities, I would go with the array.
: >
: > Or you could use Filter and eliminate the For...Next loop:
: >
: > <%@ Language=VBScript %>
: > <%
: > Option Explicit
: > Response.Buffer = True
: >
: > sub lPrt(strMsg)
: > Response.Write(strMsg & "<br />" & vbCrLf)
: > end sub
: >
: > sub Prt(strMsg)
: > Response.Write(strMsg)
: > end sub
: >
: > sub findWord(arr, fWord)
: > if isFound(arr, fWord) = fWord Then
: > Response.Write(fWord & " found in array.<br />" & vbCrLf)
: > else
: > Response.Write(fWord & " not found in array.<br />" & vbCrLf)
: > end if
: > end sub
: >
: > function isFound(arr, fWord)
: > dim f
: > f = Filter(arr, fWord)
: > if ubound(f) <> 0 Then
: > isFound = ""
: > else
: > isFound = f(0)
: > end if
: > end function
: >
: > dim str, myarray, fWord
: > str = "one two three four five six seven eight nine ten"
: > myarray = Split(str)
: >
: > lPrt("Using Filter to find words in an array")
: > lPrt("Array elements: " & str)
: > Prt("Testing eleven: ")
: > findWord myarray, "eleven"
: > Prt("Testing five: ")
: > findWord myarray, "five"
: > %>
: >
: > http://kiddanger.com/lab/filter.asp
:
: Hah! I had forgotten about that. Thanks for the heads-up.

No problem. I was workin' on it couple of days ago so it was fresh in my
mind.

Roland
Jul 19 '05 #8
"Roland Hall" wrote ...
No problem. I was workin' on it couple of days ago so it was fresh in my
mind.


Thanks for that Roland,

Tell me, is using the filter method more efficient perhaps than what I
bashed out with two pencils stuck to my head yesterday (I'm guessing so but
figured would ask)...

<!--INSERT VERY LARGE ARRAY UP HERE-->

intMatch = 0

For intLoop = 0 To UBound(aSearchCriteria)

For intLoop2 = 0 To UBound(aIgnoreWords)

If UCase(aSearchCriteria(intLoop)) = UCase(aIgnoreWords(intLoop2)) Then

intMatch = 1

strIgnoredWords = strIgnoredWords & aIgnoreWords(intLoop2) & ", "

Exit For

End If

Next

If intMatch = 0 Then

strTempSearchCriteria = strTempSearchCriteria & aSearchCriteria(intLoop)
& " "

End If

intMatch = 0

Next

strSearchCriteria = Trim(strTempSearchCriteria)

In this I'm obviously iterating through the entire array of ignore words for
each word in the search criteria, I am then creating a new string of words
that are not found which eventually get used as criteria, and I also create
a string of 'ignored' words which then get dumped on the page to make it
look really clevaaarrr :oD

You example has less lines of code so I suspect its far more efficient and
probably the preferred way, mine was bashed out whilst drinking stella :o)

Regards

Rob
Jul 19 '05 #9

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

5
by: David | last post by:
Hi, I'm trying to add a search facility to a page that looks for matches in one, other or both memo fields of a database. The code below works fine if the visitor types in one word, or the term...
83
by: D. Dante Lorenso | last post by:
Trying to use the 'search' in the docs section of PostgreSQL.org is extremely SLOW. Considering this is a website for a database and databases are supposed to be good for indexing content, I'd...
7
by: Timo Haberkern | last post by:
Hi there, i have some troubles with my TSearch2 Installation. I have done this installation as described in http://www.sai.msu.su/~megera/oddmuse/index.cgi/Tsearch_V2_compound_words...
5
by: Martien van Wanrooij | last post by:
I have been using phpdig in some websites but now I stored a lot of larger texts into a mysql database. In the phpdig search engine, when you entered a search word, the page where the search word...
3
by: Russell | last post by:
Hey, ok i have numerous tables to search through for a 'site search'. some of the searchble fields have html embeded within so after some quick referencing, saw I can use the regExp function...
3
by: vonclausowitz | last post by:
Hi All, I was thinking of creating a table in my database to index all words in the database. That way I can quickly search for one or more words and the index table will return the words and...
0
by: | last post by:
I have a question about spawning and displaying subordinate list controls within a list control. I'm also interested in feedback about the design of my search application. Lots of code is at the...
9
by: tomjones75 | last post by:
dear community, i want to search the content of all fields in one table in a access database. it already works for the content of one field in the table. please take a look at the code in...
1
by: nganglove | last post by:
C++ string search -------------------------------------------------------------------------------- Hello, please can any one help me? I am given an assigment in C++ to read a text file and...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.