473,386 Members | 1,819 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,386 software developers and data experts.

Searching 18,000 strings of an average size of 10Kb

What would be the fastest way to search 18,000 strings of an average size of
10Kb, I can have all the strings in memory, should I simply do a instr on
all of the strings? Or is there a faster way? I would like to have a kind of
search like google where you can enter several words to search for, guess
that calls for a regular expression "word1|word2|word3|...".
Is there any kind of indexing tools available for this kind of thing, I have
my data in an MSSQL 2000 database, and it's not updated very often, some
hundred new records added a day spread over the whole day. and some hundred
records removed a day in one go.

I don't mind that I'm using 200MB memory to hold everything in memory at all
times. But if there is a better more efficient way to do it I would like som
pointers.

Kind Regards,
Allan Ebdrup.
Jan 11 '07 #1
8 1844
this seems interesting:
http://www.codeproject.com/useritems/damnfastsearch.asp

"Allan Ebdrup" <eb****@noemail.noemailwrote in message
news:OW*************@TK2MSFTNGP06.phx.gbl...
What would be the fastest way to search 18,000 strings of an average size
of 10Kb, I can have all the strings in memory, should I simply do a instr
on all of the strings? Or is there a faster way? I would like to have a
kind of search like google where you can enter several words to search
for, guess that calls for a regular expression "word1|word2|word3|...".
Is there any kind of indexing tools available for this kind of thing, I
have my data in an MSSQL 2000 database, and it's not updated very often,
some hundred new records added a day spread over the whole day. and some
hundred records removed a day in one go.

I don't mind that I'm using 200MB memory to hold everything in memory at
all times. But if there is a better more efficient way to do it I would
like som pointers.

Kind Regards,
Allan Ebdrup.

Jan 11 '07 #2
If you can sort the list of strings...use a Binary Search. A Binary
Search is capable of finding for instance the string required in about
16 steps of say 500,000 strings. Don't quote me on that but if you
doubled your size to 1,000,000 it would only require 1 more step to
find it.

Essentially a binary search halves the data it needs to search but
again this requires you to work from an array that you can garuntee to
be sorted.

..net has this functionality built in and it's wicked fast.

-R

Allan Ebdrup wrote:
What would be the fastest way to search 18,000 strings of an average size of
10Kb, I can have all the strings in memory, should I simply do a instr on
all of the strings? Or is there a faster way? I would like to have a kind of
search like google where you can enter several words to search for, guess
that calls for a regular expression "word1|word2|word3|...".
Is there any kind of indexing tools available for this kind of thing, I have
my data in an MSSQL 2000 database, and it's not updated very often, some
hundred new records added a day spread over the whole day. and some hundred
records removed a day in one go.

I don't mind that I'm using 200MB memory to hold everything in memory at all
times. But if there is a better more efficient way to do it I would like som
pointers.

Kind Regards,
Allan Ebdrup.
Jan 12 '07 #3
_DD
On Thu, 11 Jan 2007 13:31:16 +0100, "Allan Ebdrup"
<eb****@noemail.noemailwrote:
>What would be the fastest way to search 18,000 strings of an average size of
10Kb, I can have all the strings in memory, should I simply do a instr on
all of the strings? Or is there a faster way? I would like to have a kind of
search like google where you can enter several words to search for, guess
that calls for a regular expression "word1|word2|word3|...".
Is there any kind of indexing tools available for this kind of thing, I have
my data in an MSSQL 2000 database, and it's not updated very often, some
hundred new records added a day spread over the whole day. and some hundred
records removed a day in one go.

I don't mind that I'm using 200MB memory to hold everything in memory at all
times. But if there is a better more efficient way to do it I would like som
pointers.

Kind Regards,
Allan Ebdrup.
Tough to say from your description whether you are looking for
something fairly simple, fast, and direct, or whether you need to
search on multiple fields.

If the latter, and if the data is already in a SQL db, then a SQL
query may be the best way to do that, no? People spend years trying to
optimize db queries, so it would be hard to get something running
faster, especially as you are probably doing a query to get the data
in the first place. You would have no appreciable memory hit. Of
course you would not be asking if it were that simple.

So...if it's the former, and you need blazing speed at the expense of
loading everything into ram, then check into Boyer-Moore. I believe
that is still the basis of the fastest linear string search
algorithms.

Boyer Moore:
http://www.cs.utexas.edu/users/moore...ing/index.html

That should get you started.
Jan 12 '07 #4
"Deckarep" <de******@gmail.comwrote in message
news:11**********************@v45g2000cwv.googlegr oups.com...
If you can sort the list of strings...use a Binary Search. A Binary
Search is capable of finding for instance the string required in about
16 steps of say 500,000 strings. Don't quote me on that but if you
doubled your size to 1,000,000 it would only require 1 more step to
find it.

Essentially a binary search halves the data it needs to search but
again this requires you to work from an array that you can garuntee to
be sorted.

.net has this functionality built in and it's wicked fast.
I know about binary searches.
However I'm searching for the sting to be somewhere inside the 10Kb of the
string in 18.000 instances, so I can't sort the data, I need some kind of
indexing.

Kind Regards,
Allan Ebdrup
Jan 12 '07 #5
"_DD" <_D*@nospam.comwrote in message
news:e2********************************@4ax.com...
On Thu, 11 Jan 2007 13:31:16 +0100, "Allan Ebdrup"
<eb****@noemail.noemailwrote:
Tough to say from your description whether you are looking for
something fairly simple, fast, and direct, or whether you need to
search on multiple fields.
Simple fast and direct, just one field.
If the latter, and if the data is already in a SQL db, then a SQL
query may be the best way to do that, no? People spend years trying to
optimize db queries, so it would be hard to get something running
faster, especially as you are probably doing a query to get the data
in the first place. You would have no appreciable memory hit. Of
course you would not be asking if it were that simple.
LIKE runs to slow, and I can't use a fulltext index because it splits up
into whole words, and I want a search where searching for "pædagog" also
retuns results with "børnehavepædagog" or "pædagogmedhjælper"
So...if it's the former, and you need blazing speed at the expense of
loading everything into ram, then check into Boyer-Moore. I believe
that is still the basis of the fastest linear string search
algorithms.

Boyer Moore:
http://www.cs.utexas.edu/users/moore...ing/index.html
This is probably the kind of thing used in RegEx, it's blasing fast when
compared to .IndexOf, witch has a surprisingly poor performance.

I found this plugin for MSSQL:
http://www.surfray.com/en/index.php/products/overloook
Seems to be just the kind of thing I want. Ad an index and have fast
searching. I'll try it out and let you know.

Kind Regards,
Allan Ebdrup
Jan 12 '07 #6
_DD
On Fri, 12 Jan 2007 09:30:25 +0100, "Allan Ebdrup"
<eb****@noemail.noemailwrote:
>"_DD" <_D*@nospam.comwrote in message
news:e2********************************@4ax.com.. .
.. I want a search where searching for "pædagog" also
retuns results with "børnehavepædagog" or "pædagogmedhjælper"
Gotta appreciate anyone who knows how to do that combined 'ae' thing!
>So...if it's the former, and you need blazing speed at the expense of
loading everything into ram, then check into Boyer-Moore. I believe
that is still the basis of the fastest linear string search
algorithms.

Boyer Moore:
http://www.cs.utexas.edu/users/moore...ing/index.html

This is probably the kind of thing used in RegEx, it's blasing fast when
compared to .IndexOf, witch has a surprisingly poor performance.
String functions in general do not seem to be optimal performers. The
reason that I suggested a raw Boyer-Moore algorithm is that if you are
indeed doing a linear search as you describe, then it will probably be
the most direct method. You may be able to optimize for your
particular data format.

I don't doubt that Regex often uses Boyer-Moore, but I'm not sure how
much overhead a Regex implementation would add due to the fact that
they are more generalized.
>I found this plugin for MSSQL:
http://www.surfray.com/en/index.php/products/overloook
Seems to be just the kind of thing I want. Ad an index and have fast
searching. I'll try it out and let you know.
1800 times faster than an MS SQL 2005 search? I take back what I said
about the SQL DB team working around the clock to optimize queries.
Seriously, I guess I'd be skeptical about that, but who knows? Please
do report back if you use it.

Jan 12 '07 #7
I don't have a specific answer, but you might want to take a look at
Bioinformatics code or ask a Bioinformatics group. This sort of search is
pretty common and I am sure it has been optimized to death.
Ethan
"Allan Ebdrup" <eb****@noemail.noemailwrote in message
news:OW*************@TK2MSFTNGP06.phx.gbl...
What would be the fastest way to search 18,000 strings of an average size
of 10Kb, I can have all the strings in memory, should I simply do a instr
on all of the strings? Or is there a faster way? I would like to have a
kind of search like google where you can enter several words to search
for, guess that calls for a regular expression "word1|word2|word3|...".
Is there any kind of indexing tools available for this kind of thing, I
have my data in an MSSQL 2000 database, and it's not updated very often,
some hundred new records added a day spread over the whole day. and some
hundred records removed a day in one go.

I don't mind that I'm using 200MB memory to hold everything in memory at
all times. But if there is a better more efficient way to do it I would
like som pointers.

Kind Regards,
Allan Ebdrup.

Jan 12 '07 #8
_DD
On Fri, 12 Jan 2007 13:28:12 -0600, "Ethan Strauss" <ethan dot strauss
at Promega dot comwrote:
>I don't have a specific answer, but you might want to take a look at
Bioinformatics code or ask a Bioinformatics group. This sort of search is
pretty common and I am sure it has been optimized to death.
Ethan
Gene-matching algorithms--That's certainly an interesting thought. Not
only optimized; they build custom hardware for that. But I thought
there would be a fuzzy aspect to most of their matching algorithms. Is
that not the case?
Jan 12 '07 #9

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

18
by: jblazi | last post by:
I should like to search certain characters in a string and when they are found, I want to replace other characters in other strings that are at the same position (for a very simply mastermind game)...
3
by: Edg Bamyasi | last post by:
This Is A Late Cross Post from comp.lang.python. It seems the mistery is deeper then i expected. What is the running time of conactination on character strings. i.e. >> joe="123" >>...
25
by: Rainmaker | last post by:
Hi, Can anyone tell me an efficient algorithm to sort an array of strings? Keep in mind that this array is HUGE and so the algorithm should me efficient enough to deal with it. Thanks
2
by: Macca | last post by:
Hi, My application uses an asynchronous socket server. The question I have is what i should set my socket server buffer size to. I will know the size of each data packet sent across the...
3
by: C++Geek | last post by:
I need to get this program to average the salaries. What am I doing wrong? //Program to read in employee data and calculate the average salaries of the emplyees.
89
by: scroopy | last post by:
Hi, I've always used std::string but I'm having to use a 3rd party library that returns const char*s. Given: char* pString1 = "Blah "; const char* pString2 = "Blah Blah"; How do I append...
5
by: Kelth.Raptor | last post by:
Im having some difficulty with strings here, I hope someone is kind enough to help, I do appreciate it. Im working on a grade point average calculator for my intro to programming class and I...
2
by: Walter Kerelitsch | last post by:
hi folks, i need an algorithm for comparing two strings for equivalence, but levenshtein is just working for strings up to 255 characters, i have to process longer strings (lets say, about...
2
by: jac130 | last post by:
I have an array, populated with student grades. there is a list of student names, but their grades are stored in a class level array. there is a calculate average button,that is supposed to calculate...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.