By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
459,203 Members | 1,638 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 459,203 IT Pros & Developers. It's quick & easy.

Fastest Way to search for a string in a large text file (75 to 100mb)

P: n/a
Hi,
I am trying to find the fastest way to search a txt file for a
particular string and return the line that contains the string. I have
so for just used the most basic method. Initialized a variable as
IO.streamreader. Read each line and perform an if-then to see if
var.contains(mystring) is true or false. if true I get my string if
false it reads the next line. This takes for ever. Is there anything I
can do to speed this up?
Thanks.
Feb 28 '08 #1
Share this Question
Share on Google+
9 Replies


P: n/a
How big are these files?

"Clinto" <my*************@hotmail.comwrote in message
news:83**********************************@d5g2000h sc.googlegroups.com...
Hi,
I am trying to find the fastest way to search a txt file for a
particular string and return the line that contains the string. I have
so for just used the most basic method. Initialized a variable as
IO.streamreader. Read each line and perform an if-then to see if
var.contains(mystring) is true or false. if true I get my string if
false it reads the next line. This takes for ever. Is there anything I
can do to speed this up?
Thanks.
Feb 28 '08 #2

P: n/a


Clinto wrote:
Hi,
I am trying to find the fastest way to search a txt file for a
particular string and return the line that contains the string. I have
so for just used the most basic method. Initialized a variable as
IO.streamreader. Read each line and perform an if-then to see if
var.contains(mystring) is true or false. if true I get my string if
false it reads the next line. This takes for ever. Is there anything I
can do to speed this up?
Thanks.
if the file is only a 100mb... Then, seriously, I would just read
the entire file at once and process it in memory. If you read line-by-
line, then you are going to hit the disk a lot, and that will really
slow you down...

If you don't want to do it that way, then you might want to read the
file in chunks as binary data - then convert your bytes to strings and
do yor compares... of course, that is going to make it a little
tricky because you might end up in the middle of a line....

--
Tom Shelton
Feb 28 '08 #3

P: n/a
On Feb 28, 7:49 am, Tom Shelton <tom_shel...@comcast.netwrote:
Clinto wrote:
Hi,
I am trying to find the fastest way to search a txt file for a
particular string and return the line that contains the string. I have
so for just used the most basic method. Initialized a variable as
IO.streamreader. Read each line and perform an if-then to see if
var.contains(mystring) is true or false. if true I get my string if
false it reads the next line. This takes for ever. Is there anything I
can do to speed this up?
Thanks.

if the file is only a 100mb... Then, seriously, I would just read
the entire file at once and process it in memory. If you read line-by-
line, then you are going to hit the disk a lot, and that will really
slow you down...

If you don't want to do it that way, then you might want to read the
file in chunks as binary data - then convert your bytes to strings and
do yor compares... of course, that is going to make it a little
tricky because you might end up in the middle of a line....

--
Tom Shelton
I agree Tom, about 100mb is a huge size for a text file, reading it as
raw then converting each byte to string is a good idea, but the key
point is how to do it programmaticaly :-)
Feb 28 '08 #4

P: n/a
kimiraikkonen wrote:
I agree Tom, about 100mb is a huge size for a text file, reading it as
raw then converting each byte to string is a good idea, but the key
point is how to do it programmaticaly :-)
The most efficient way would presumably be to read the entire file into a
single string using IO.File.ReadAllText and see whether your search string
is contained within the file at all (which you can then do using a single
call to .Contains). If it't not there then there's no point trying to work
out which line it's on, and you can stop looking any further straight away.

If you do find the search string, you can count the line breaks that appear
before the search string to work out which line it's on.

HTH,

--

(O)enone
Feb 28 '08 #5

P: n/a


"(O)enone" wrote:
kimiraikkonen wrote:
I agree Tom, about 100mb is a huge size for a text file, reading it as
raw then converting each byte to string is a good idea, but the key
point is how to do it programmaticaly :-)

The most efficient way would presumably be to read the entire file into a
single string using IO.File.ReadAllText and see whether your search string
is contained within the file at all (which you can then do using a single
call to .Contains). If it't not there then there's no point trying to work
out which line it's on, and you can stop looking any further straight away.

If you do find the search string, you can count the line breaks that appear
before the search string to work out which line it's on.

HTH,

--

(O)enone

I would use System.IO.File.ReadAllLines(Filename), because this returns the
lines split out for you. You just loop through the array of individual lines
in the array.
>
Feb 28 '08 #6

P: n/a
Family Tree Mike wrote:
I would use System.IO.File.ReadAllLines(Filename), because this
returns the lines split out for you. You just loop through the array
of individual lines in the array.
I did originally write the same thing in my message but then chose to remove
it before I posted it. I think the ReadAllText approach may be quicker
because you can check whether the string exists at all without having to
loop... You could them possible determine the line by using a call to
Replace() on the string prior to the search result position, changing the
two-character line break with a one-character replacement string, and then
see how much smaller the string has got; the number of characters it reduces
by will be the line count.

Maybe needs someone to try it to see which is more efficient.

--

(O)enone
Feb 28 '08 #7

P: n/a
On Feb 27, 10:53*pm, "Chris K." <ckoeber[Do Not
Spam]@googlesemailservice.figureitoutwrote:
How big are these files?

"Clinto" <myjunkcontai...@hotmail.comwrote in message

news:83**********************************@d5g2000h sc.googlegroups.com...
Hi,
I am trying to find the fastest way to search a txt file for a
particular string and return the line that contains the string. I have
so for just used the most basic method. Initialized a variable as
IO.streamreader. Read each line and perform an if-then to see if
var.contains(mystring) is true or false. if true I get my string if
false it reads the next line. This takes for ever. Is there anything I
can do to speed this up?
Thanks.- Hide quoted text -

- Show quoted text -
usually anywhere from 75 to 100mb
Feb 29 '08 #8

P: n/a
On Feb 28, 6:58*am, "\(O\)enone" <oen...@nowhere.comwrote:
Family Tree Mike wrote:
I would use System.IO.File.ReadAllLines(Filename), because this
returns the lines split out for you. *You just loop through the array
of individual lines in the array.

I did originally write the same thing in my message but then chose to remove
it before I posted it. I think the ReadAllText approach may be quicker
because you can check whether the string exists at all without having to
loop... You could them possible determine the line by using a call to
Replace() on the string prior to the search result position, changing the
two-character line break with a one-character replacement string, and then
see how much smaller the string has got; the number of characters it reduces
by will be the line count.

Maybe needs someone to try it to see which is more efficient.

--

(O)enone
Thanks everyone, I appreciate the responses. I tried several methods,
ReadAllText, io.filestream, readallLines and all seem about the same.
It became apparent that I am also fighting a slow server connection,
which increases the time to open the files.
Mar 1 '08 #9

P: n/a
Clinto,

Use the Visual Basic Find as that is optimized for strings, any other method
will go slower, just because those are optimized for characters.

Cor
Mar 1 '08 #10

This discussion thread is closed

Replies have been disabled for this discussion.