473,396 Members | 1,832 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,396 software developers and data experts.

Search in a file

I'm trying to do something relatively simple - find the offset of the
first - or next - occurance of a string in a file, ideally in a case-
insensitive way. I've seen a few solutions that seemed to involve
loading the whole file as a string, but that seems to be limiting the
size of the file stream that could be handled.

Is there a way to do searching with just the stream? Or do i need to
start loading the file into string buffers of limited size and
searching those? I'd rather not, since handling possible overlap would
be a pain, but...

Anyway, hope to see help here soon.

Thanks.

--
Chris R.
=======
http://offlineblog.com/
Nov 15 '05 #1
5 2158
Simply read a buffer from the file in a string. Try to find your string with
buffer.IndexOf(searchString). seek searchString.Lenght-1 backwards in your
file to avoid overlaps and repeat the procedure.

--
cody

[Freeware, Games and Humor]
www.deutronium.de.vu || www.deutronium.tk
Nov 15 '05 #2
If you are working with text files, you should use "readers" (TextReader,
StreamReader) rathern than bare "streams". Then, you can read line by line,
which should solve your problem if the patterns that you are looking for
don't span lines.

Readers/Writers are designed for text I/O.
Streams are designed for binary I/O.

Bruno.

"Chris R." <of*****@atshawdot.ca> a écrit dans le message de
news:Xn***************************@207.46.248.16.. .
I'm trying to do something relatively simple - find the offset of the
first - or next - occurance of a string in a file, ideally in a case-
insensitive way. I've seen a few solutions that seemed to involve
loading the whole file as a string, but that seems to be limiting the
size of the file stream that could be handled.

Is there a way to do searching with just the stream? Or do i need to
start loading the file into string buffers of limited size and
searching those? I'd rather not, since handling possible overlap would
be a pain, but...

Anyway, hope to see help here soon.

Thanks.

--
Chris R.
=======
http://offlineblog.com/

Nov 15 '05 #3
I am. So, basically, what i want to do is read the file in line by
line seeking the string, and then to get to the string in the stream, i
just keep track of the lengths of the lines and the offset into the
matching line, seek so many bytes into the stream, and that should put
me on the start of the substring that i'm searching for?

Because i need to find this string in order to pass it as an offset
into another method that will be working on the stream. Or is it
possible to mark the stream?

Heck, even better, if i have found the substring in the stream, can i
just push the line back onto it and then work with another reference to
the same stream?

"Bruno Jouhier [MVP]" <bj******@club-internet.fr> wrote in
news:uL**************@TK2MSFTNGP12.phx.gbl:
If you are working with text files, you should use "readers"
(TextReader, StreamReader) rathern than bare "streams". Then, you
can read line by line, which should solve your problem if the
patterns that you are looking for don't span lines.

Readers/Writers are designed for text I/O.
Streams are designed for binary I/O.

Bruno.


--
Chris R.
=======
http://offlineblog.com/
Nov 15 '05 #4
Well, you have to be careful if you want to use byte offsets to identify
substrings in a file:

* you have to know how lines are terminated (CRLF or LF alone).
* you have to know the encoding. If your text only contains ASCII you are on
the safe side but if it contains accentuated chars and if you save it in
UTF-8 (the default for .NET), some chars will take 2 bytes (even 3 if you
have Chinese chars).
* you will have to mix binary (stream) and text (reader) I/O because the
reader API won't let you seek to a given byte offset.

I don't know what you are really trying to achieve, but it may be easier for
you to keep track of locations by line number + char offset in line than by
byte offset, or to isolate the reader I/O somewhere and pass strings (rather
than the reader itself) to your other methods. Mixing text reader and byte
offsets is just awkward.

Text readers won't let you "push the line back". If you want to do this, you
have to open the file in binary mode (as a stream), save byte offsets and
and seek back to them later, but then, you have to deal with line separators
and encoding yourself.

Bruno.
"Chris R." <of*****@atshawdot.ca> a écrit dans le message de
news:Xn***************************@207.46.248.16.. .
I am. So, basically, what i want to do is read the file in line by
line seeking the string, and then to get to the string in the stream, i
just keep track of the lengths of the lines and the offset into the
matching line, seek so many bytes into the stream, and that should put
me on the start of the substring that i'm searching for?

Because i need to find this string in order to pass it as an offset
into another method that will be working on the stream. Or is it
possible to mark the stream?

Heck, even better, if i have found the substring in the stream, can i
just push the line back onto it and then work with another reference to
the same stream?

"Bruno Jouhier [MVP]" <bj******@club-internet.fr> wrote in
news:uL**************@TK2MSFTNGP12.phx.gbl:
If you are working with text files, you should use "readers"
(TextReader, StreamReader) rathern than bare "streams". Then, you
can read line by line, which should solve your problem if the
patterns that you are looking for don't span lines.

Readers/Writers are designed for text I/O.
Streams are designed for binary I/O.

Bruno.


--
Chris R.
=======
http://offlineblog.com/

Nov 15 '05 #5
What i'm trying to do is look for the first occurance of a string,
"BEGIN:VCARD" in this case, in a stream. then i want to parse the
vCard at that point, and then search for every subsequent occurance of
a vCard in the file after the termination of this one.

as per the official spec.

I'll admit, i'm at a bit of a loss with this one.

Any suggestions?

"Bruno Jouhier [MVP]" <bj******@club-internet.fr> wrote in
news:eP**************@TK2MSFTNGP12.phx.gbl:
I don't know what you are really trying to achieve, but it may be
easier for you to keep track of locations by line number + char
offset in line than by byte offset, or to isolate the reader I/O
somewhere and pass strings (rather than the reader itself) to your
other methods. Mixing text reader and byte offsets is just
awkward.


--
Chris R.
=======
http://offlineblog.com/
Nov 15 '05 #6

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
by: Les Juby | last post by:
A year or two back I needed a search script to scan thru HTML files on a client site. Usual sorta thing. A quick search turned up a neat script that provided great search results. It was fast,...
2
by: Rafael Nenninger | last post by:
This question has to do with MS file search but it is happening only with ..asp pages, so I though someone programming with .asp pages has experienced the same situation. I'm trying to find .asp...
1
by: Eric | last post by:
Hi: I have two files. I search pattern ":" from emails text file and save email contents into a database. Another search pattern " field is blank. Please try again.", vbExclamation + vbOKOnly...
13
by: Ray Muforosky | last post by:
Hello all: Task: I want to do file search, using the "conatining text" option from a web page. How do I search for a file on my local drive containing a certain string, from a web page. That...
4
by: Dameon | last post by:
Hi All, I have a process where I'd like to search the contents of a file(in a dir) for all occurences (or the count of) of a given string. My goal is to focus more on performance, as some of the...
8
by: stunna | last post by:
Hi there, i want to implement a a system to search for a file in a folder using the file name and if its there, i want it to be displayed. I have one through a couple of books on how to implement...
1
by: nganglove | last post by:
C++ string search -------------------------------------------------------------------------------- Hello, please can any one help me? I am given an assigment in C++ to read a text file and...
4
by: ravindarjobs | last post by:
hi...... i am using ms access 2003,vb6 i have a form. in that i have 2 buttons 1. start search 2 stop search when i click the "start search" button the fucntion SearchSystem() is called,...
3
by: =?Utf-8?B?UGVycmlud29sZg==?= | last post by:
Not sure where to post this... Found some interesting behavior in Windows Search (Start =Search =All files and folders =search for "A word or phrase in the file:"). This applies to XP and maybe...
0
Debadatta Mishra
by: Debadatta Mishra | last post by:
Introduction In this article I will provide you an approach to manipulate an image file. This article gives you an insight into some tricks in java so that you can conceal sensitive information...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.