473,699 Members | 2,364 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

How to search files for text string most efficiently?

Jim
Hello,

I am working on a small windows application for a client, and as one of the
functions they want a search that will let them enter a search string, then
search a directory for all flies that contain that search string AND display
the lines that contain the search string.

They have windows ME, XP and 2000 systems.

Does anyone have any ideas as to the most efficient way to do this?

Also, if multiple directories are chosen, should threads be used for the
search operation?

Thanks!

Jim
Nov 21 '05 #1
5 2104
On Thu, 28 Oct 2004 12:24:54 -0500, "Jim" <jr@nospam.wi.r r.com> wrote:
Hello,

I am working on a small windows application for a client, and as one of the
functions they want a search that will let them enter a search string, then
search a directory for all flies that contain that search string AND display
the lines that contain the search string.

They have windows ME, XP and 2000 systems.

Does anyone have any ideas as to the most efficient way to do this?

Also, if multiple directories are chosen, should threads be used for the
search operation?

Thanks!


Personnaly I would do it this way:

Each directory would be queried so as to load all filenames in one
array (I've never done that part so...)

Then using that array, get each filename, get and read by OPEN AS
BINARY each file, loading its content in a single one string buffer,
sized after getting the file size. on which you could do a "instr"
function.

Error checking would be needed all the way through.
*************** *************** *************** *************** **********
Richard Jalbert Programmer-Analyst Ri******@sympat ico.ca

Dogs have owners, cats have staff.

http://www3.sympatico.ca/richmann/
*************** *************** *************** *************** **********
Nov 21 '05 #2
"Richard Jalbert" <ri******@sympa tico.ca> schrieb:
I am working on a small windows application for a client, and as one of
the
functions they want a search that will let them enter a search string,
then
search a directory for all flies that contain that search string AND
display
the lines that contain the search string.

[...]
Then using that array, get each filename, get and read by OPEN AS
BINARY each file, loading its content in a single one string buffer,
sized after getting the file size. on which you could do a "instr"
function.


If the files are "small", that's a good approach. If the files are large,
it's trickier, you'll have to read the file in chunks of a certain size and
then perform 'InStr', notice that you will have to check for occurances that
overlap the ends of two chunks separately.

--
Herfried K. Wagner [MVP]
<URL:http://dotnet.mvps.org/>

Nov 21 '05 #3
Jim
What do you define as a "small" file?

How would you get the line the occurrence is on to show it? Use the INSTR
to find the string, then find the prior CRLF, and next CRLF from that
position?

And, what about the threading portion of the question?

Jim

"Herfried K. Wagner [MVP]" <hi************ ***@gmx.at> wrote in message
news:er******** ******@TK2MSFTN GP09.phx.gbl...
"Richard Jalbert" <ri******@sympa tico.ca> schrieb:
I am working on a small windows application for a client, and as one of
the
functions they want a search that will let them enter a search string,
then
search a directory for all flies that contain that search string AND
display
the lines that contain the search string.

[...]
Then using that array, get each filename, get and read by OPEN AS
BINARY each file, loading its content in a single one string buffer,
sized after getting the file size. on which you could do a "instr"
function.


If the files are "small", that's a good approach. If the files are large,
it's trickier, you'll have to read the file in chunks of a certain size
and then perform 'InStr', notice that you will have to check for
occurances that overlap the ends of two chunks separately.

--
Herfried K. Wagner [MVP]
<URL:http://dotnet.mvps.org/>

Nov 21 '05 #4
On Thu, 28 Oct 2004 20:52:17 +0200, "Herfried K. Wagner [MVP]"
<hi************ ***@gmx.at> wrote:
"Richard Jalbert" <ri******@sympa tico.ca> schrieb:
I am working on a small windows application for a client, and as one of
the
functions they want a search that will let them enter a search string,
then
search a directory for all flies that contain that search string AND
display
the lines that contain the search string. [...]
Then using that array, get each filename, get and read by OPEN AS
BINARY each file, loading its content in a single one string buffer,
sized after getting the file size. on which you could do a "instr"
function.


If the files are "small", that's a good approach.


Is not the maximum size for a string buffer something like 0 to 2
billion characters?
If the files are large,
What would be a large file?
I have one that is 214 Megs (PI to a million place and I cannot open
it on my machine (I concaneted it from 20 smaller files))
it's trickier, you'll have to read the file in chunks of a certain size and
then perform 'InStr', notice that you will have to check for occurances that
overlap the ends of two chunks separately.


Overlap is easily checked by reading the first buffer then when
reading the second, back the byte pointer by at least the size of the
substring to be found.

One detail that was not stated: what is the substring is split by a
vbCRLF character. this mean they would have to be removed from the
file before doing the search, no ?
*************** *************** *************** *************** **********
Richard Jalbert Programmer-Analyst Ri******@sympat ico.ca

Dogs have owners, cats have staff.

http://www3.sympatico.ca/richmann/
*************** *************** *************** *************** **********
Nov 21 '05 #5
"Richard Jalbert" <ri******@sympa tico.ca> schrieb:
Then using that array, get each filename, get and read by OPEN AS
BINARY each file, loading its content in a single one string buffer,
sized after getting the file size. on which you could do a "instr"
function.


If the files are "small", that's a good approach.


Is not the maximum size for a string buffer something like 0 to 2
billion characters?


.... but your physical RAM is limited... ;-).
If the files are large,


What would be a large file?
I have one that is 214 Megs (PI to a million place and I
cannot open it on my machine (I concaneted it from 20
smaller files))


That's a "large" file.

--
Herfried K. Wagner [MVP]
<URL:http://dotnet.mvps.org/>
Nov 21 '05 #6

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

5
9186
by: tmb | last post by:
I need to search a folder & sub-folders for key words in ASP files... I can open the files with Notepad and see the text string there... But when I try to navigate to the folder with Windows Explorer, right click and 'search for word in file' ... it reports back that the text string was not found. I've fooled around with the 'advanced' settings but can't seem to make it work... even when I'm searching a single folder with a single...
6
8852
by: Alex Gerdemann | last post by:
Hello, I am writing a program where I have a vector (std::vector<std:string> list) that I need to search many times. To accomplish this efficiently, I plan to sort the list using std::sort(list.begin(),list.end()), then run binary searches. I need to get the indices of the elements found so I can construct a matrix where I allocate a row of a matrix for each string in the list (row one for the first item in the list, etc). However...
2
6229
by: Todd_M | last post by:
I was wondering what anyone might suggest as "best practice" patterns for streaming out fixed formatted text files with C#? Let's say we get our data in a dataset table and we need to iterate over thousands..potentially tens of thousands of rows to create a properly formatted text file -- like an ACH file, for example. In there you typically have a header, a body block of detail rows and a trailer. (The trailers usually contain sums of the...
60
49102
by: Julie | last post by:
What is the *fastest* way in .NET to search large on-disk text files (100+ MB) for a given string. The files are unindexed and unsorted, and for the purposes of my immediate requirements, can't be indexed/sorted. I don't want to load the entire file into physical memory, memory-mapped files are ok (and preferred). Speed/performance is a requirement -- the target is to locate the string in 10 seconds or less for a 100 MB file. The...
16
2853
by: Computer geek | last post by:
Hello, I am new to VB.NET and programming in general. I have taught myself a lot of the basics with vb.net but am still quite the novice. I am working on a little application now and I need some help with one part of the code. When a button is clicked I need to have it go out to a network drive location and count how many files are present with a certain file extension. Then store that number in a declared variable. Is this possible? Can...
2
1972
by: princymg | last post by:
I want to search a file from server and want to copy it to the local disk. how it is done? This is working if the file is in my hard disk itself.But not when it comes to server. If i map the server i can search.like y:\\serverfolde\\folder am tring to make an exe. different people will map to different drive. so i cant give like that.should give like @\\server\\serverfolde\\folder\\ but it is not working. My code is<code>...
2
1687
by: princymg | last post by:
I want to search a file from server and want to copy it to the local disk. how it is done? This is working if the file is in my hard disk itself.But not when it comes to server. If i map the server i can search.like y:\\serverfolde\\folder am tring to make an exe. different people will map to different drive. so i cant give like that.should give like @\\server\\serverfolde\\folder\\ but it is not working. My code is...
3
3547
by: =?Utf-8?B?UGVycmlud29sZg==?= | last post by:
Not sure where to post this... Found some interesting behavior in Windows Search (Start =Search =All files and folders =search for "A word or phrase in the file:"). This applies to XP and maybe other Windows flavors. Procedure: 1. Create a simple text file named test.txt. 2. Open the text file in a text editor and add a simple test word such as "blah" (not quotes).
0
10766
Debadatta Mishra
by: Debadatta Mishra | last post by:
Introduction In this article I will provide you an approach to manipulate an image file. This article gives you an insight into some tricks in java so that you can conceal sensitive information inside an image, hide your complete image as text ,search for a particular image inside a directory, minimize the size of the image. However this is not a new concept, there is a concept called Steganography which enables to conceal your secret...
0
8687
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
8617
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
9174
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
9035
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
0
7751
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
1
6534
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
5875
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
1
3057
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
2
2347
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.