473,472 Members | 1,748 Online
Bytes | Software Development & Data Engineering Community
Create Post

Home Posts Topics Members FAQ

How to search files for text string most efficiently?

Jim
Hello,

I am working on a small windows application for a client, and as one of the
functions they want a search that will let them enter a search string, then
search a directory for all flies that contain that search string AND display
the lines that contain the search string.

They have windows ME, XP and 2000 systems.

Does anyone have any ideas as to the most efficient way to do this?

Also, if multiple directories are chosen, should threads be used for the
search operation?

Thanks!

Jim
Nov 21 '05 #1
5 2087
On Thu, 28 Oct 2004 12:24:54 -0500, "Jim" <jr@nospam.wi.rr.com> wrote:
Hello,

I am working on a small windows application for a client, and as one of the
functions they want a search that will let them enter a search string, then
search a directory for all flies that contain that search string AND display
the lines that contain the search string.

They have windows ME, XP and 2000 systems.

Does anyone have any ideas as to the most efficient way to do this?

Also, if multiple directories are chosen, should threads be used for the
search operation?

Thanks!


Personnaly I would do it this way:

Each directory would be queried so as to load all filenames in one
array (I've never done that part so...)

Then using that array, get each filename, get and read by OPEN AS
BINARY each file, loading its content in a single one string buffer,
sized after getting the file size. on which you could do a "instr"
function.

Error checking would be needed all the way through.
************************************************** ********************
Richard Jalbert Programmer-Analyst Ri******@sympatico.ca

Dogs have owners, cats have staff.

http://www3.sympatico.ca/richmann/
************************************************** ********************
Nov 21 '05 #2
"Richard Jalbert" <ri******@sympatico.ca> schrieb:
I am working on a small windows application for a client, and as one of
the
functions they want a search that will let them enter a search string,
then
search a directory for all flies that contain that search string AND
display
the lines that contain the search string.

[...]
Then using that array, get each filename, get and read by OPEN AS
BINARY each file, loading its content in a single one string buffer,
sized after getting the file size. on which you could do a "instr"
function.


If the files are "small", that's a good approach. If the files are large,
it's trickier, you'll have to read the file in chunks of a certain size and
then perform 'InStr', notice that you will have to check for occurances that
overlap the ends of two chunks separately.

--
Herfried K. Wagner [MVP]
<URL:http://dotnet.mvps.org/>

Nov 21 '05 #3
Jim
What do you define as a "small" file?

How would you get the line the occurrence is on to show it? Use the INSTR
to find the string, then find the prior CRLF, and next CRLF from that
position?

And, what about the threading portion of the question?

Jim

"Herfried K. Wagner [MVP]" <hi***************@gmx.at> wrote in message
news:er**************@TK2MSFTNGP09.phx.gbl...
"Richard Jalbert" <ri******@sympatico.ca> schrieb:
I am working on a small windows application for a client, and as one of
the
functions they want a search that will let them enter a search string,
then
search a directory for all flies that contain that search string AND
display
the lines that contain the search string.

[...]
Then using that array, get each filename, get and read by OPEN AS
BINARY each file, loading its content in a single one string buffer,
sized after getting the file size. on which you could do a "instr"
function.


If the files are "small", that's a good approach. If the files are large,
it's trickier, you'll have to read the file in chunks of a certain size
and then perform 'InStr', notice that you will have to check for
occurances that overlap the ends of two chunks separately.

--
Herfried K. Wagner [MVP]
<URL:http://dotnet.mvps.org/>

Nov 21 '05 #4
On Thu, 28 Oct 2004 20:52:17 +0200, "Herfried K. Wagner [MVP]"
<hi***************@gmx.at> wrote:
"Richard Jalbert" <ri******@sympatico.ca> schrieb:
I am working on a small windows application for a client, and as one of
the
functions they want a search that will let them enter a search string,
then
search a directory for all flies that contain that search string AND
display
the lines that contain the search string. [...]
Then using that array, get each filename, get and read by OPEN AS
BINARY each file, loading its content in a single one string buffer,
sized after getting the file size. on which you could do a "instr"
function.


If the files are "small", that's a good approach.


Is not the maximum size for a string buffer something like 0 to 2
billion characters?
If the files are large,
What would be a large file?
I have one that is 214 Megs (PI to a million place and I cannot open
it on my machine (I concaneted it from 20 smaller files))
it's trickier, you'll have to read the file in chunks of a certain size and
then perform 'InStr', notice that you will have to check for occurances that
overlap the ends of two chunks separately.


Overlap is easily checked by reading the first buffer then when
reading the second, back the byte pointer by at least the size of the
substring to be found.

One detail that was not stated: what is the substring is split by a
vbCRLF character. this mean they would have to be removed from the
file before doing the search, no ?
************************************************** ********************
Richard Jalbert Programmer-Analyst Ri******@sympatico.ca

Dogs have owners, cats have staff.

http://www3.sympatico.ca/richmann/
************************************************** ********************
Nov 21 '05 #5
"Richard Jalbert" <ri******@sympatico.ca> schrieb:
Then using that array, get each filename, get and read by OPEN AS
BINARY each file, loading its content in a single one string buffer,
sized after getting the file size. on which you could do a "instr"
function.


If the files are "small", that's a good approach.


Is not the maximum size for a string buffer something like 0 to 2
billion characters?


.... but your physical RAM is limited... ;-).
If the files are large,


What would be a large file?
I have one that is 214 Megs (PI to a million place and I
cannot open it on my machine (I concaneted it from 20
smaller files))


That's a "large" file.

--
Herfried K. Wagner [MVP]
<URL:http://dotnet.mvps.org/>
Nov 21 '05 #6

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

5
by: tmb | last post by:
I need to search a folder & sub-folders for key words in ASP files... I can open the files with Notepad and see the text string there... But when I try to navigate to the folder with Windows...
6
by: Alex Gerdemann | last post by:
Hello, I am writing a program where I have a vector (std::vector<std:string> list) that I need to search many times. To accomplish this efficiently, I plan to sort the list using...
2
by: Todd_M | last post by:
I was wondering what anyone might suggest as "best practice" patterns for streaming out fixed formatted text files with C#? Let's say we get our data in a dataset table and we need to iterate over...
60
by: Julie | last post by:
What is the *fastest* way in .NET to search large on-disk text files (100+ MB) for a given string. The files are unindexed and unsorted, and for the purposes of my immediate requirements, can't...
16
by: Computer geek | last post by:
Hello, I am new to VB.NET and programming in general. I have taught myself a lot of the basics with vb.net but am still quite the novice. I am working on a little application now and I need some...
2
by: princymg | last post by:
I want to search a file from server and want to copy it to the local disk. how it is done? This is working if the file is in my hard disk itself.But not when it comes to server. If i map the server...
2
by: princymg | last post by:
I want to search a file from server and want to copy it to the local disk. how it is done? This is working if the file is in my hard disk itself.But not when it comes to server. If i map the server...
3
by: =?Utf-8?B?UGVycmlud29sZg==?= | last post by:
Not sure where to post this... Found some interesting behavior in Windows Search (Start =Search =All files and folders =search for "A word or phrase in the file:"). This applies to XP and maybe...
0
Debadatta Mishra
by: Debadatta Mishra | last post by:
Introduction In this article I will provide you an approach to manipulate an image file. This article gives you an insight into some tricks in java so that you can conceal sensitive information...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
1
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
0
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and...
0
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
0
muto222
php
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.