473,412 Members | 2,003 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,412 software developers and data experts.

searching a substring

Can anyone suggest some efficient way to search a substring in
a text file. For eg. suppose i want to search the pattern "abc"
in a text file, then which algorithm should i use.
I just want some hints, not the complete program.

thanx in advance.....
Nov 14 '05 #1
7 2183

"junky_fellow" <ju**********@yahoo.co.in> wrote in message
news:8c**************************@posting.google.c om...
Can anyone suggest some efficient way to search a substring in
a text file. For eg. suppose i want to search the pattern "abc"
in a text file, then which algorithm should i use.
I just want some hints, not the complete program.

thanx in advance.....

It is a fact that you must check all characters (except the last two). The
simplest method would be to search for 'a' and if you find it, test for 'b'
then 'c'.

The fastest method in this case (byte data[4] = a,b,c) would be to cast data
to an int (if you work with a 32-bit processor) then "xor" it with every
third byte of the data string, if the result is 0 you have found a sub
string. This can be optimized in assembler as well, and I think this is the
fastest method if combined with a suitable buffering method.
Nov 14 '05 #2
junky_fellow wrote:

Can anyone suggest some efficient way to search a substring in
a text file. For eg. suppose i want to search the pattern "abc"
in a text file, then which algorithm should i use.
I just want some hints, not the complete program.


Use srstr(), from <string.h>.

--
pete
Nov 14 '05 #3
Martin Johansen wrote:

"junky_fellow" <ju**********@yahoo.co.in> wrote in message
news:8c**************************@posting.google.c om...
Can anyone suggest some efficient way to search a substring in
a text file. For eg. suppose i want to search the pattern "abc"
in a text file, then which algorithm should i use.
I just want some hints, not the complete program.

thanx in advance.....
It is a fact that you must check all characters
(except the last two). The simplest method would be to
search for 'a' and if you find it, test for 'b' then 'c'.

The fastest method in this case (byte data[4] = a,b,c)
would be to cast data to an int
(if you work with a 32-bit processor) then "xor" it with every


xor "what" with every third byte ?
third byte of the data string,
if the result is 0 you have found a sub string.


Even assuming CHAR_BIT equals 8 and sizeof(int) equals 4, and
ignoring alignment problems,
I have no idea of what you think you're saying.

--
pete
Nov 14 '05 #4
On Thu, 12 Feb 2004, junky_fellow wrote:
Can anyone suggest some efficient way to search a substring in
a text file. For eg. suppose i want to search the pattern "abc"
in a text file, then which algorithm should i use.
I just want some hints, not the complete program.


You have to define efficient. If primary memory is a concern then
efficient would be reading one character at a time and seeing if it
matches the first letter of the string. If it does then compare the
second, third, etc. characters of the string.

If efficient is as fast as possible without concern for primary memory
then determine file size, allocate a block of memory the size of the file,
read the entire contents into memory then search the memory for the string
(maybe using strstr() or just using the search for the first letter, if
you find it search the the entire string).

Or you could try a combination. Allocate a large chunk of memory, read in
part of the file and search the memory for the string you are looking for.
Only danger here is if the last letter of block #1 is 'a' and the first
letters of block #2 are "bc". You'll never to figure out how to handle
that situations.

--
Send e-mail to: darrell at cs dot toronto dot edu
Don't send e-mail to vi************@whitehouse.gov
Nov 14 '05 #5

"pete" <pf*****@mindspring.com> wrote in message
news:40***********@mindspring.com...
Martin Johansen wrote:

"junky_fellow" <ju**********@yahoo.co.in> wrote in message
news:8c**************************@posting.google.c om...
Can anyone suggest some efficient way to search a substring in
a text file. For eg. suppose i want to search the pattern "abc"
in a text file, then which algorithm should i use.
I just want some hints, not the complete program.

thanx in advance.....
It is a fact that you must check all characters
(except the last two). The simplest method would be to
search for 'a' and if you find it, test for 'b' then 'c'.

The fastest method in this case (byte data[4] = a,b,c)
would be to cast data to an int
(if you work with a 32-bit processor) then "xor" it with every


xor "what" with every third byte ?


Oh, sorry, I ment xor every third byte with data[3] casted to int, three
byte from a file could be read into an array in the same manner as a,b,c in
data[3].
third byte of the data string,
if the result is 0 you have found a sub string.


Even assuming CHAR_BIT equals 8 and sizeof(int) equals 4, and
ignoring alignment problems,
I have no idea of what you think you're saying.

--
pete

Nov 14 '05 #6
Martin Johansen wrote:

"pete" <pf*****@mindspring.com> wrote in message
news:40***********@mindspring.com...
Martin Johansen wrote:

"junky_fellow" <ju**********@yahoo.co.in> wrote in message
news:8c**************************@posting.google.c om...
> Can anyone suggest some efficient way to search a substring in
> a text file. For eg. suppose i want to search the pattern "abc"
> in a text file, then which algorithm should i use.
> I just want some hints, not the complete program.
>
> thanx in advance.....

It is a fact that you must check all characters
(except the last two). The simplest method would be to
search for 'a' and if you find it, test for 'b' then 'c'.

The fastest method in this case (byte data[4] = a,b,c)
would be to cast data to an int
(if you work with a 32-bit processor) then "xor" it with every


xor "what" with every third byte ?


Oh, sorry, I ment xor every third byte with data[3]
casted to int, three byte from a file could be read
into an array in the same manner as a,b,c in data[3].


I think you mean data[2], if you mean 'c'.
third byte of the data string,
if the result is 0 you have found a sub string.

I don't understand why you prefer to check the result of an xor,
instead of checking equality directly.
Your described procedure,
only checks for one letter, not a substring.
Even assuming CHAR_BIT equals 8 and sizeof(int) equals 4, and
ignoring alignment problems,
I have no idea of what you think you're saying.


--
pete
Nov 14 '05 #7
Yes, I was a bit sloppy, simple character comparizon is the fastest.
Nov 14 '05 #8

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
by: Adrian | last post by:
This script works well for searching thru a SELECT list but doesnt work so well with the MULTIPLE option. I need to modify it to work for a SELECT MULTIPLE but I have no idea where to begin, any...
9
by: C3 | last post by:
I have to process some data in C that is given to me as a char * array. I have a fairly large number of substrings (well, they're not actually printable, but let's treat them as strings) that I...
5
by: Andy | last post by:
I am having trouble using the ComboBox in my app. I am attempting to use the text from the ComboBox, to search the list box portion. However: In my "TextChanged" event: I cannot get the value...
35
by: Cor | last post by:
Hallo, I have promised Jay B yesterday to do some tests. The subject was a string evaluation that Jon had send in. Jay B was in doubt what was better because there was a discussion in the C#...
0
by: Emily | last post by:
I am new to VB.NET and I need help finding and updating values in an arraylist. My arraylist is called arrVarInfo and holds data in this structure type: Public Structure VarInfo Dim Name As...
2
by: chernetsov | last post by:
I am creating a server indexing files in my local area network, in order to provide a searching feature. So i want to make it possible to searchsuch rows where the 'name' (VARCHAR) column contains...
7
by: pbd22 | last post by:
Hi. I am somewhat new to this and would like some advice. I want to search my xml file using "keyword" search and return results based on "proximity matching" - in other words, since the search...
3
by: Aaron | last post by:
I'm trying to parse a table on a webpage to pull down some data I need. The page is based off of information entered into a form. when you submit the data from the form it displays a...
2
by: jhansi | last post by:
I need c program to display the possible names of the persons in a record when there starting letter or some substring is given. example: if we give "v" it has to display the person names...
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...
0
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.