By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
457,728 Members | 1,157 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 457,728 IT Pros & Developers. It's quick & easy.

Fastest way to search through txt file?

P: n/a
Hi,
I haven't posted any problem in quite a while now, but I came to the
point that I really need to ask for help.

I need to create an application which will search through .txt log
file and find all lines where email from hotmail occured.

All these emails need to be printed to list box on the form.

Problem with code you'll see below, is that it takes long time to
search through. On just 10mb file it takes almost 2 minutes. And I
need to process 1-2 gb files. Because this is only middle step and
program has much more functionality, I can't afford to wait this
long.

Right now the way I do the search is, that I load each line into
hidden richtextbox and use its find comman to look if there is any
occurence of hotmail.com, and if it is I display the line in listbox.

This process is extremely slow.

How to extract that email address out of the line, that is another
story, does someone know about any good email parser for vb.net?

Can someone look at the code and tell me what I am doing wrong? Why is
is so slow?
Thanks a lot.

vjay
Dim oFile As System.IO.File
Dim oRead As System.IO.StreamReader
Dim linein As String
Dim Result As Integer
Dim count As Integer

oRead = oFile.OpenText(log.txt)

While oRead.Peek <> -1
count = count + 1
linein = oRead.ReadLine()
RichTextBox1.Text = linein
'StatusBar1.Text = count.ToString

Result = RichTextBox1.Find("hotmail.com",
RichTextBoxFinds.MatchCase)
If Result <> -1 Then
ListBox2.Items.Add(linein)
End If
End While
oRead.Close()

Posted Via Usenet.com Premium Usenet Newsgroup Services
----------------------------------------------------------
** SPEED ** RETENTION ** COMPLETION ** ANONYMITY **
----------------------------------------------------------
http://www.usenet.com
Nov 20 '05 #1
Share this Question
Share on Google+
12 Replies


P: n/a
Cor
Hi Vjay,

I think about an half a year ago I did a test in this newsgroup what was
the fastest method to do a find in a textfile (code supported by different
persons).

This code was supported by someone who had as firstname Jon.

(It counts how many time a word is in a text, you needs the places, that is
of course everytime iStart in this routine ).

\\\
Public Function Test2(ByVal strInput As String, ByVal strDelimiter _
As String) As Int32 'Jon (string)
Dim iStart As Int32, iCount As Int32, iResult As Int32
iStart = 1
iCount = 0
Do
iResult = InStr(iStart, strInput, strDelimiter)
If iResult = 0 Then Exit Do
iCount += 1
iStart = iResult + 1
Loop
Return iCount
End Function
///
This was absolute the fastest if the delimiter is a string (not with a char)

I hope that you can use it, if you still needs help to implement this in
your solution tell it than again?

Cor
Nov 20 '05 #2

P: n/a
Cor,
thank you but I don't need to count, I need to find a word and display
the line in which it occured in my listbox.
Please let me know if you know how I could amend it so your fast
script does what I need.
Thanks a lot for your help.
Posted Via Usenet.com Premium Usenet Newsgroup Services
----------------------------------------------------------
** SPEED ** RETENTION ** COMPLETION ** ANONYMITY **
----------------------------------------------------------
http://www.usenet.com
Nov 20 '05 #3

P: n/a
You should try loading a file into parts (if you concerned about memory) and
process it using RegularExpressions.

Using a filestream object may be better for memory, I'm not real sure...

I think you will find RegEx is exactly what your looking for, and its VERY
fast.
"Vjay77" <vj********@hotmail-dot-com.no-spam.invalid> wrote in message
news:40********@Usenet.com...
Cor,
thank you but I don't need to count, I need to find a word and display
the line in which it occured in my listbox.
Please let me know if you know how I could amend it so your fast
script does what I need.
Thanks a lot for your help.
Posted Via Usenet.com Premium Usenet Newsgroup Services
----------------------------------------------------------
** SPEED ** RETENTION ** COMPLETION ** ANONYMITY **
----------------------------------------------------------
http://www.usenet.com

Nov 20 '05 #4

P: n/a
Vjay77,
I don't have a specific .NET example handy.

Have you considered using the Indexing Service instead of coding the search
yourself?

http://msdn.microsoft.com/library/de...intro_297o.asp

You may need to use ADODB instead of ADO.NET, however you should be able to
setup a search of the Indexing Service that returns the list of files with
log entries...

If you have a single log file, instead of multiple log files, then you may
need to continue using the loop you have, however consider moving the search
itself to a second thread...

Hope this helps
Jay

"Vjay77" <vj********@hotmail-dot-com.no-spam.invalid> wrote in message
news:40**********@Usenet.com...
Hi,
I haven't posted any problem in quite a while now, but I came to the
point that I really need to ask for help.

I need to create an application which will search through .txt log
file and find all lines where email from hotmail occured.

All these emails need to be printed to list box on the form.

Problem with code you'll see below, is that it takes long time to
search through. On just 10mb file it takes almost 2 minutes. And I
need to process 1-2 gb files. Because this is only middle step and
program has much more functionality, I can't afford to wait this
long.

Right now the way I do the search is, that I load each line into
hidden richtextbox and use its find comman to look if there is any
occurence of hotmail.com, and if it is I display the line in listbox.

This process is extremely slow.

How to extract that email address out of the line, that is another
story, does someone know about any good email parser for vb.net?

Can someone look at the code and tell me what I am doing wrong? Why is
is so slow?
Thanks a lot.

vjay
Dim oFile As System.IO.File
Dim oRead As System.IO.StreamReader
Dim linein As String
Dim Result As Integer
Dim count As Integer

oRead = oFile.OpenText(log.txt)

While oRead.Peek <> -1
count = count + 1
linein = oRead.ReadLine()
RichTextBox1.Text = linein
'StatusBar1.Text = count.ToString

Result = RichTextBox1.Find("hotmail.com",
RichTextBoxFinds.MatchCase)
If Result <> -1 Then
ListBox2.Items.Add(linein)
End If
End While
oRead.Close()

Posted Via Usenet.com Premium Usenet Newsgroup Services
----------------------------------------------------------
** SPEED ** RETENTION ** COMPLETION ** ANONYMITY **
----------------------------------------------------------
http://www.usenet.com

Nov 20 '05 #5

P: n/a
Cor
Hi CJ,
I think you will find RegEx is exactly what your looking for, and its VERY
fast.

Yes for your standards not for mine.

:-)

As far as I remember me surely 100 times slower than that routine Jon made.

Cor

Nov 20 '05 #6

P: n/a
Cor
Hi Vjay77,

I thought this was what you where looking for, test it because I never use
the instr.

But that should be the fastest.

I hope it works?

Cor
\\\
Dim sr As New IO.StreamReader(log.txt)
Dim linein As String
Dim Result As Integer
linein = sr.ReadLine
Do Until linein Is Nothing
Result = InStr(linein, "hotmail.com")
If Result <> -1 Then
ListBox2.Items.Add(linein)
End If
linein = sr.ReadLine()
Loop
sr.Close()
///
Nov 20 '05 #7

P: n/a
> Hi CJ,
I think you will find RegEx is exactly what your looking for, and its VERY fast.
Yes for your standards not for mine.


Yes, but I tried to forget about Microsoft.VisualBasic when much better and
commonly supported methods are in place...

Maybe you should look at String.IndexOf if you want to use something other
than regex..

Just had to start something this morning didn't ya? =)

But for log analysis? come on...

:-)

As far as I remember me surely 100 times slower than that routine Jon made.
Cor

Nov 20 '05 #8

P: n/a
Cor
Hi CJ,

See the subject. "fastest".

If you see the code I wrote for him you see that I never use it.
(You told you do internet, than you should also only think in indexof in my
opinion)

But that instr is real the fastest, I never use it, I forget always that the
index is 0 + 1

The indexof is 2 times slower.
(this are pico seconds or less)

:-)

Cor
Nov 20 '05 #9

P: n/a
Alright, just ran a test, and yes, instr is faster than anything else.

My apologies. =)
Hi CJ,

See the subject. "fastest".

If you see the code I wrote for him you see that I never use it.
(You told you do internet, than you should also only think in indexof in my opinion)

But that instr is real the fastest, I never use it, I forget always that the index is 0 + 1
I forget that every time from my VB6 days... Instr you can check for 0 (and
use it as a false flag too!) but not indexOf... that always bothered me, I
just got used to it in Java so I think thats why I like to use it in .NET.

The indexof is 2 times slower.
(this are pico seconds or less)

:-)

Cor

Nov 20 '05 #10

P: n/a
Thanks a lot everyone.
Instr worked just fine. Very very fast.
Once again, thanks a lot.
vjay
Posted Via Usenet.com Premium Usenet Newsgroup Services
----------------------------------------------------------
** SPEED ** RETENTION ** COMPLETION ** ANONYMITY **
----------------------------------------------------------
http://www.usenet.com
Nov 20 '05 #11

P: n/a

What about a Boyer-Moore search? My tests show it is faster
than any built-in string search in .Net.
Nov 20 '05 #12

P: n/a
* jerry <je***@nospam.com> scripsit:
What about a Boyer-Moore search? My tests show it is faster
than any built-in string search in .Net.


Or the Knuth-Morris-Pratt string search algorithm...

You can have a look at the implementation in the SSCLI, starting point:

<http://sharedsourcecli.sscli.net/source/browse/sharedsourcecli/clr/src/bcl/system/string.cs>

--
Herfried K. Wagner [MVP]
<http://dotnet.mvps.org/>
Nov 20 '05 #13

This discussion thread is closed

Replies have been disabled for this discussion.