473,397 Members | 2,056 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,397 software developers and data experts.

Fastest Way to search for a string in a large text file (75 to 100mb)

Hi,
I am trying to find the fastest way to search a txt file for a
particular string and return the line that contains the string. I have
so for just used the most basic method. Initialized a variable as
IO.streamreader. Read each line and perform an if-then to see if
var.contains(mystring) is true or false. if true I get my string if
false it reads the next line. This takes for ever. Is there anything I
can do to speed this up?
Thanks.
Feb 28 '08 #1
9 6103
How big are these files?

"Clinto" <my*************@hotmail.comwrote in message
news:83**********************************@d5g2000h sc.googlegroups.com...
Hi,
I am trying to find the fastest way to search a txt file for a
particular string and return the line that contains the string. I have
so for just used the most basic method. Initialized a variable as
IO.streamreader. Read each line and perform an if-then to see if
var.contains(mystring) is true or false. if true I get my string if
false it reads the next line. This takes for ever. Is there anything I
can do to speed this up?
Thanks.
Feb 28 '08 #2


Clinto wrote:
Hi,
I am trying to find the fastest way to search a txt file for a
particular string and return the line that contains the string. I have
so for just used the most basic method. Initialized a variable as
IO.streamreader. Read each line and perform an if-then to see if
var.contains(mystring) is true or false. if true I get my string if
false it reads the next line. This takes for ever. Is there anything I
can do to speed this up?
Thanks.
if the file is only a 100mb... Then, seriously, I would just read
the entire file at once and process it in memory. If you read line-by-
line, then you are going to hit the disk a lot, and that will really
slow you down...

If you don't want to do it that way, then you might want to read the
file in chunks as binary data - then convert your bytes to strings and
do yor compares... of course, that is going to make it a little
tricky because you might end up in the middle of a line....

--
Tom Shelton
Feb 28 '08 #3
On Feb 28, 7:49 am, Tom Shelton <tom_shel...@comcast.netwrote:
Clinto wrote:
Hi,
I am trying to find the fastest way to search a txt file for a
particular string and return the line that contains the string. I have
so for just used the most basic method. Initialized a variable as
IO.streamreader. Read each line and perform an if-then to see if
var.contains(mystring) is true or false. if true I get my string if
false it reads the next line. This takes for ever. Is there anything I
can do to speed this up?
Thanks.

if the file is only a 100mb... Then, seriously, I would just read
the entire file at once and process it in memory. If you read line-by-
line, then you are going to hit the disk a lot, and that will really
slow you down...

If you don't want to do it that way, then you might want to read the
file in chunks as binary data - then convert your bytes to strings and
do yor compares... of course, that is going to make it a little
tricky because you might end up in the middle of a line....

--
Tom Shelton
I agree Tom, about 100mb is a huge size for a text file, reading it as
raw then converting each byte to string is a good idea, but the key
point is how to do it programmaticaly :-)
Feb 28 '08 #4
kimiraikkonen wrote:
I agree Tom, about 100mb is a huge size for a text file, reading it as
raw then converting each byte to string is a good idea, but the key
point is how to do it programmaticaly :-)
The most efficient way would presumably be to read the entire file into a
single string using IO.File.ReadAllText and see whether your search string
is contained within the file at all (which you can then do using a single
call to .Contains). If it't not there then there's no point trying to work
out which line it's on, and you can stop looking any further straight away.

If you do find the search string, you can count the line breaks that appear
before the search string to work out which line it's on.

HTH,

--

(O)enone
Feb 28 '08 #5


"(O)enone" wrote:
kimiraikkonen wrote:
I agree Tom, about 100mb is a huge size for a text file, reading it as
raw then converting each byte to string is a good idea, but the key
point is how to do it programmaticaly :-)

The most efficient way would presumably be to read the entire file into a
single string using IO.File.ReadAllText and see whether your search string
is contained within the file at all (which you can then do using a single
call to .Contains). If it't not there then there's no point trying to work
out which line it's on, and you can stop looking any further straight away.

If you do find the search string, you can count the line breaks that appear
before the search string to work out which line it's on.

HTH,

--

(O)enone

I would use System.IO.File.ReadAllLines(Filename), because this returns the
lines split out for you. You just loop through the array of individual lines
in the array.
>
Feb 28 '08 #6
Family Tree Mike wrote:
I would use System.IO.File.ReadAllLines(Filename), because this
returns the lines split out for you. You just loop through the array
of individual lines in the array.
I did originally write the same thing in my message but then chose to remove
it before I posted it. I think the ReadAllText approach may be quicker
because you can check whether the string exists at all without having to
loop... You could them possible determine the line by using a call to
Replace() on the string prior to the search result position, changing the
two-character line break with a one-character replacement string, and then
see how much smaller the string has got; the number of characters it reduces
by will be the line count.

Maybe needs someone to try it to see which is more efficient.

--

(O)enone
Feb 28 '08 #7
On Feb 27, 10:53*pm, "Chris K." <ckoeber[Do Not
Spam]@googlesemailservice.figureitoutwrote:
How big are these files?

"Clinto" <myjunkcontai...@hotmail.comwrote in message

news:83**********************************@d5g2000h sc.googlegroups.com...
Hi,
I am trying to find the fastest way to search a txt file for a
particular string and return the line that contains the string. I have
so for just used the most basic method. Initialized a variable as
IO.streamreader. Read each line and perform an if-then to see if
var.contains(mystring) is true or false. if true I get my string if
false it reads the next line. This takes for ever. Is there anything I
can do to speed this up?
Thanks.- Hide quoted text -

- Show quoted text -
usually anywhere from 75 to 100mb
Feb 29 '08 #8
On Feb 28, 6:58*am, "\(O\)enone" <oen...@nowhere.comwrote:
Family Tree Mike wrote:
I would use System.IO.File.ReadAllLines(Filename), because this
returns the lines split out for you. *You just loop through the array
of individual lines in the array.

I did originally write the same thing in my message but then chose to remove
it before I posted it. I think the ReadAllText approach may be quicker
because you can check whether the string exists at all without having to
loop... You could them possible determine the line by using a call to
Replace() on the string prior to the search result position, changing the
two-character line break with a one-character replacement string, and then
see how much smaller the string has got; the number of characters it reduces
by will be the line count.

Maybe needs someone to try it to see which is more efficient.

--

(O)enone
Thanks everyone, I appreciate the responses. I tried several methods,
ReadAllText, io.filestream, readallLines and all seem about the same.
It became apparent that I am also fighting a slow server connection,
which increases the time to open the files.
Mar 1 '08 #9
Clinto,

Use the Visual Basic Find as that is optimized for strings, any other method
will go slower, just because those are optimized for characters.

Cor
Mar 1 '08 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

8
by: Rune Johansen | last post by:
Hi, I'm sorry if these questions are trivial, but I've searched the net and haven't had any luck finding the information I need. I need to perform some regular expression search and replace on...
2
by: Amy L. | last post by:
Is there a way through .net to read a very large text file (400MB+) backwards line by line. In system.io the filestream class has a "seek" method but the only read method requires you to know how...
4
by: Phil | last post by:
Hi, Is there a 'shorter' way to find a string within a text file without 'testing' each character or using the LineInput function. Does the StreamReader have any search facility? Thanks, ...
6
by: jcrouse | last post by:
Here is a sniplet from a text file game name mapp description "Mappy (US) year 198 manufacturer "Namco history "\nMappy (c) 03/1983 Namco. \n\n- TRIVIA: \n\nLicensed to Bally Midway for US...
10
by: Avi | last post by:
Hi I need to read in a large set of text files (9GB+ each) into a database table based on fixed width lengths. There are several ways to complete this, but I am wondering if anyone has...
14
by: mfrsousa | last post by:
hi there, i have a huge large text file (350.000 lines) that i want to import to a MS Acccess Database, of course i don't want to use Access, but do it with C#. i already have tried the...
10
by: sarthur | last post by:
Hi Friends, I am trying to get the last 50 Mb of a large Text file(Over 2 GB) using VB. I have to add this to an Access form so that when the user clicks the button a pop up window asks...
2
by: shane12345 | last post by:
i hav a text file anagrams.text.......and i hav a 2d array r....i want to search in the text file anagrams.txt the words present in the 2d array and if present print them....thanx....(im trying to...
1
by: Alex T | last post by:
Hello, I have a bit of a problem here: I am trying to make the search find the positions of a string in the text file. The .txt file is very large and the string comes up multiple times, so I need...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.