473,669 Members | 2,492 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

searching a substring

Can anyone suggest some efficient way to search a substring in
a text file. For eg. suppose i want to search the pattern "abc"
in a text file, then which algorithm should i use.
I just want some hints, not the complete program.

thanx in advance.....
Nov 14 '05 #1
7 2198

"junky_fell ow" <ju**********@y ahoo.co.in> wrote in message
news:8c******** *************** ***@posting.goo gle.com...
Can anyone suggest some efficient way to search a substring in
a text file. For eg. suppose i want to search the pattern "abc"
in a text file, then which algorithm should i use.
I just want some hints, not the complete program.

thanx in advance.....

It is a fact that you must check all characters (except the last two). The
simplest method would be to search for 'a' and if you find it, test for 'b'
then 'c'.

The fastest method in this case (byte data[4] = a,b,c) would be to cast data
to an int (if you work with a 32-bit processor) then "xor" it with every
third byte of the data string, if the result is 0 you have found a sub
string. This can be optimized in assembler as well, and I think this is the
fastest method if combined with a suitable buffering method.
Nov 14 '05 #2
junky_fellow wrote:

Can anyone suggest some efficient way to search a substring in
a text file. For eg. suppose i want to search the pattern "abc"
in a text file, then which algorithm should i use.
I just want some hints, not the complete program.


Use srstr(), from <string.h>.

--
pete
Nov 14 '05 #3
Martin Johansen wrote:

"junky_fell ow" <ju**********@y ahoo.co.in> wrote in message
news:8c******** *************** ***@posting.goo gle.com...
Can anyone suggest some efficient way to search a substring in
a text file. For eg. suppose i want to search the pattern "abc"
in a text file, then which algorithm should i use.
I just want some hints, not the complete program.

thanx in advance.....
It is a fact that you must check all characters
(except the last two). The simplest method would be to
search for 'a' and if you find it, test for 'b' then 'c'.

The fastest method in this case (byte data[4] = a,b,c)
would be to cast data to an int
(if you work with a 32-bit processor) then "xor" it with every


xor "what" with every third byte ?
third byte of the data string,
if the result is 0 you have found a sub string.


Even assuming CHAR_BIT equals 8 and sizeof(int) equals 4, and
ignoring alignment problems,
I have no idea of what you think you're saying.

--
pete
Nov 14 '05 #4
On Thu, 12 Feb 2004, junky_fellow wrote:
Can anyone suggest some efficient way to search a substring in
a text file. For eg. suppose i want to search the pattern "abc"
in a text file, then which algorithm should i use.
I just want some hints, not the complete program.


You have to define efficient. If primary memory is a concern then
efficient would be reading one character at a time and seeing if it
matches the first letter of the string. If it does then compare the
second, third, etc. characters of the string.

If efficient is as fast as possible without concern for primary memory
then determine file size, allocate a block of memory the size of the file,
read the entire contents into memory then search the memory for the string
(maybe using strstr() or just using the search for the first letter, if
you find it search the the entire string).

Or you could try a combination. Allocate a large chunk of memory, read in
part of the file and search the memory for the string you are looking for.
Only danger here is if the last letter of block #1 is 'a' and the first
letters of block #2 are "bc". You'll never to figure out how to handle
that situations.

--
Send e-mail to: darrell at cs dot toronto dot edu
Don't send e-mail to vi************@ whitehouse.gov
Nov 14 '05 #5

"pete" <pf*****@mindsp ring.com> wrote in message
news:40******** ***@mindspring. com...
Martin Johansen wrote:

"junky_fell ow" <ju**********@y ahoo.co.in> wrote in message
news:8c******** *************** ***@posting.goo gle.com...
Can anyone suggest some efficient way to search a substring in
a text file. For eg. suppose i want to search the pattern "abc"
in a text file, then which algorithm should i use.
I just want some hints, not the complete program.

thanx in advance.....
It is a fact that you must check all characters
(except the last two). The simplest method would be to
search for 'a' and if you find it, test for 'b' then 'c'.

The fastest method in this case (byte data[4] = a,b,c)
would be to cast data to an int
(if you work with a 32-bit processor) then "xor" it with every


xor "what" with every third byte ?


Oh, sorry, I ment xor every third byte with data[3] casted to int, three
byte from a file could be read into an array in the same manner as a,b,c in
data[3].
third byte of the data string,
if the result is 0 you have found a sub string.


Even assuming CHAR_BIT equals 8 and sizeof(int) equals 4, and
ignoring alignment problems,
I have no idea of what you think you're saying.

--
pete

Nov 14 '05 #6
Martin Johansen wrote:

"pete" <pf*****@mindsp ring.com> wrote in message
news:40******** ***@mindspring. com...
Martin Johansen wrote:

"junky_fell ow" <ju**********@y ahoo.co.in> wrote in message
news:8c******** *************** ***@posting.goo gle.com...
> Can anyone suggest some efficient way to search a substring in
> a text file. For eg. suppose i want to search the pattern "abc"
> in a text file, then which algorithm should i use.
> I just want some hints, not the complete program.
>
> thanx in advance.....

It is a fact that you must check all characters
(except the last two). The simplest method would be to
search for 'a' and if you find it, test for 'b' then 'c'.

The fastest method in this case (byte data[4] = a,b,c)
would be to cast data to an int
(if you work with a 32-bit processor) then "xor" it with every


xor "what" with every third byte ?


Oh, sorry, I ment xor every third byte with data[3]
casted to int, three byte from a file could be read
into an array in the same manner as a,b,c in data[3].


I think you mean data[2], if you mean 'c'.
third byte of the data string,
if the result is 0 you have found a sub string.

I don't understand why you prefer to check the result of an xor,
instead of checking equality directly.
Your described procedure,
only checks for one letter, not a substring.
Even assuming CHAR_BIT equals 8 and sizeof(int) equals 4, and
ignoring alignment problems,
I have no idea of what you think you're saying.


--
pete
Nov 14 '05 #7
Yes, I was a bit sloppy, simple character comparizon is the fastest.
Nov 14 '05 #8

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
6172
by: Adrian | last post by:
This script works well for searching thru a SELECT list but doesnt work so well with the MULTIPLE option. I need to modify it to work for a SELECT MULTIPLE but I have no idea where to begin, any tips? <script> var searchString = ""; function clearSearchString() {
9
1801
by: C3 | last post by:
I have to process some data in C that is given to me as a char * array. I have a fairly large number of substrings (well, they're not actually printable, but let's treat them as strings) that I have to search for in the data. I need to keep a count of how often each of these substrings occurs in my original data and then print it out at the end. This is a fairly mundane task, but since I have so many substrings, it's a pain having to...
5
4138
by: Andy | last post by:
I am having trouble using the ComboBox in my app. I am attempting to use the text from the ComboBox, to search the list box portion. However: In my "TextChanged" event: I cannot get the value of the entered text. Below is my code. private void cbEdit_TextChanged(object sender, System.EventArgs e) { bool bFound = false; string sText = cbEdit.Text; int iPosition = cbEdit.SelectionStart;
35
2562
by: Cor | last post by:
Hallo, I have promised Jay B yesterday to do some tests. The subject was a string evaluation that Jon had send in. Jay B was in doubt what was better because there was a discussion in the C# newsgroup on 25 September. The regular expressions where in that newsgroup too involved. I told yesterday night, to Jay that I would test all 4 methods and the stupid method I was thinking of the first time that night when I saw Jon's
0
952
by: Emily | last post by:
I am new to VB.NET and I need help finding and updating values in an arraylist. My arraylist is called arrVarInfo and holds data in this structure type: Public Structure VarInfo Dim Name As String Dim Format As String Dim Desc As String
2
7972
by: chernetsov | last post by:
I am creating a server indexing files in my local area network, in order to provide a searching feature. So i want to make it possible to searchsuch rows where the 'name' (VARCHAR) column contains a substring submitted by user. an i use a full-text index to make it faster? from my first sight i noticed that this index can be used to search for words in a string. So the example below would not go: searching 'cool' in 'uncoolness' Am i...
7
2614
by: pbd22 | last post by:
Hi. I am somewhat new to this and would like some advice. I want to search my xml file using "keyword" search and return results based on "proximity matching" - in other words, since the search string will often not produce a direct match, the results will be based on proximity (50%, 20% 100%, etc). are there any good examples out there on how to do keyword searches on XML data? How should i set up my xml file so
3
3279
by: Aaron | last post by:
I'm trying to parse a table on a webpage to pull down some data I need. The page is based off of information entered into a form. when you submit the data from the form it displays a "Searching..." page then, refreshes and displays the table I want. I have code that grabs data from the page using cURL but when I look at the data it contains the "Searching..." page and not the table that I want. below is the code i have so far....Thanks...
2
1329
by: jhansi | last post by:
I need c program to display the possible names of the persons in a record when there starting letter or some substring is given. example: if we give "v" it has to display the person names starting with "v" in a record or if i give a substring like "vij" it has to give me the possible names with that substring. plzzzzzzzz i need the solution very urgently
0
8895
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
8809
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
8588
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
8658
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
1
6210
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
5682
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
4386
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
2
2032
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
2
1788
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.