472,954 Members | 2,006 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 472,954 software developers and data experts.

Fastest way to search through txt file?

Hi,
I haven't posted any problem in quite a while now, but I came to the
point that I really need to ask for help.

I need to create an application which will search through .txt log
file and find all lines where email from hotmail occured.

All these emails need to be printed to list box on the form.

Problem with code you'll see below, is that it takes long time to
search through. On just 10mb file it takes almost 2 minutes. And I
need to process 1-2 gb files. Because this is only middle step and
program has much more functionality, I can't afford to wait this
long.

Right now the way I do the search is, that I load each line into
hidden richtextbox and use its find comman to look if there is any
occurence of hotmail.com, and if it is I display the line in listbox.

This process is extremely slow.

How to extract that email address out of the line, that is another
story, does someone know about any good email parser for vb.net?

Can someone look at the code and tell me what I am doing wrong? Why is
is so slow?
Thanks a lot.

vjay
Dim oFile As System.IO.File
Dim oRead As System.IO.StreamReader
Dim linein As String
Dim Result As Integer
Dim count As Integer

oRead = oFile.OpenText(log.txt)

While oRead.Peek <> -1
count = count + 1
linein = oRead.ReadLine()
RichTextBox1.Text = linein
'StatusBar1.Text = count.ToString

Result = RichTextBox1.Find("hotmail.com",
RichTextBoxFinds.MatchCase)
If Result <> -1 Then
ListBox2.Items.Add(linein)
End If
End While
oRead.Close()

Posted Via Usenet.com Premium Usenet Newsgroup Services
----------------------------------------------------------
** SPEED ** RETENTION ** COMPLETION ** ANONYMITY **
----------------------------------------------------------
http://www.usenet.com
Nov 20 '05 #1
12 6286
Cor
Hi Vjay,

I think about an half a year ago I did a test in this newsgroup what was
the fastest method to do a find in a textfile (code supported by different
persons).

This code was supported by someone who had as firstname Jon.

(It counts how many time a word is in a text, you needs the places, that is
of course everytime iStart in this routine ).

\\\
Public Function Test2(ByVal strInput As String, ByVal strDelimiter _
As String) As Int32 'Jon (string)
Dim iStart As Int32, iCount As Int32, iResult As Int32
iStart = 1
iCount = 0
Do
iResult = InStr(iStart, strInput, strDelimiter)
If iResult = 0 Then Exit Do
iCount += 1
iStart = iResult + 1
Loop
Return iCount
End Function
///
This was absolute the fastest if the delimiter is a string (not with a char)

I hope that you can use it, if you still needs help to implement this in
your solution tell it than again?

Cor
Nov 20 '05 #2
Cor,
thank you but I don't need to count, I need to find a word and display
the line in which it occured in my listbox.
Please let me know if you know how I could amend it so your fast
script does what I need.
Thanks a lot for your help.
Posted Via Usenet.com Premium Usenet Newsgroup Services
----------------------------------------------------------
** SPEED ** RETENTION ** COMPLETION ** ANONYMITY **
----------------------------------------------------------
http://www.usenet.com
Nov 20 '05 #3
You should try loading a file into parts (if you concerned about memory) and
process it using RegularExpressions.

Using a filestream object may be better for memory, I'm not real sure...

I think you will find RegEx is exactly what your looking for, and its VERY
fast.
"Vjay77" <vj********@hotmail-dot-com.no-spam.invalid> wrote in message
news:40********@Usenet.com...
Cor,
thank you but I don't need to count, I need to find a word and display
the line in which it occured in my listbox.
Please let me know if you know how I could amend it so your fast
script does what I need.
Thanks a lot for your help.
Posted Via Usenet.com Premium Usenet Newsgroup Services
----------------------------------------------------------
** SPEED ** RETENTION ** COMPLETION ** ANONYMITY **
----------------------------------------------------------
http://www.usenet.com

Nov 20 '05 #4
Vjay77,
I don't have a specific .NET example handy.

Have you considered using the Indexing Service instead of coding the search
yourself?

http://msdn.microsoft.com/library/de...intro_297o.asp

You may need to use ADODB instead of ADO.NET, however you should be able to
setup a search of the Indexing Service that returns the list of files with
log entries...

If you have a single log file, instead of multiple log files, then you may
need to continue using the loop you have, however consider moving the search
itself to a second thread...

Hope this helps
Jay

"Vjay77" <vj********@hotmail-dot-com.no-spam.invalid> wrote in message
news:40**********@Usenet.com...
Hi,
I haven't posted any problem in quite a while now, but I came to the
point that I really need to ask for help.

I need to create an application which will search through .txt log
file and find all lines where email from hotmail occured.

All these emails need to be printed to list box on the form.

Problem with code you'll see below, is that it takes long time to
search through. On just 10mb file it takes almost 2 minutes. And I
need to process 1-2 gb files. Because this is only middle step and
program has much more functionality, I can't afford to wait this
long.

Right now the way I do the search is, that I load each line into
hidden richtextbox and use its find comman to look if there is any
occurence of hotmail.com, and if it is I display the line in listbox.

This process is extremely slow.

How to extract that email address out of the line, that is another
story, does someone know about any good email parser for vb.net?

Can someone look at the code and tell me what I am doing wrong? Why is
is so slow?
Thanks a lot.

vjay
Dim oFile As System.IO.File
Dim oRead As System.IO.StreamReader
Dim linein As String
Dim Result As Integer
Dim count As Integer

oRead = oFile.OpenText(log.txt)

While oRead.Peek <> -1
count = count + 1
linein = oRead.ReadLine()
RichTextBox1.Text = linein
'StatusBar1.Text = count.ToString

Result = RichTextBox1.Find("hotmail.com",
RichTextBoxFinds.MatchCase)
If Result <> -1 Then
ListBox2.Items.Add(linein)
End If
End While
oRead.Close()

Posted Via Usenet.com Premium Usenet Newsgroup Services
----------------------------------------------------------
** SPEED ** RETENTION ** COMPLETION ** ANONYMITY **
----------------------------------------------------------
http://www.usenet.com

Nov 20 '05 #5
Cor
Hi CJ,
I think you will find RegEx is exactly what your looking for, and its VERY
fast.

Yes for your standards not for mine.

:-)

As far as I remember me surely 100 times slower than that routine Jon made.

Cor

Nov 20 '05 #6
Cor
Hi Vjay77,

I thought this was what you where looking for, test it because I never use
the instr.

But that should be the fastest.

I hope it works?

Cor
\\\
Dim sr As New IO.StreamReader(log.txt)
Dim linein As String
Dim Result As Integer
linein = sr.ReadLine
Do Until linein Is Nothing
Result = InStr(linein, "hotmail.com")
If Result <> -1 Then
ListBox2.Items.Add(linein)
End If
linein = sr.ReadLine()
Loop
sr.Close()
///
Nov 20 '05 #7
> Hi CJ,
I think you will find RegEx is exactly what your looking for, and its VERY fast.
Yes for your standards not for mine.


Yes, but I tried to forget about Microsoft.VisualBasic when much better and
commonly supported methods are in place...

Maybe you should look at String.IndexOf if you want to use something other
than regex..

Just had to start something this morning didn't ya? =)

But for log analysis? come on...

:-)

As far as I remember me surely 100 times slower than that routine Jon made.
Cor

Nov 20 '05 #8
Cor
Hi CJ,

See the subject. "fastest".

If you see the code I wrote for him you see that I never use it.
(You told you do internet, than you should also only think in indexof in my
opinion)

But that instr is real the fastest, I never use it, I forget always that the
index is 0 + 1

The indexof is 2 times slower.
(this are pico seconds or less)

:-)

Cor
Nov 20 '05 #9
Alright, just ran a test, and yes, instr is faster than anything else.

My apologies. =)
Hi CJ,

See the subject. "fastest".

If you see the code I wrote for him you see that I never use it.
(You told you do internet, than you should also only think in indexof in my opinion)

But that instr is real the fastest, I never use it, I forget always that the index is 0 + 1
I forget that every time from my VB6 days... Instr you can check for 0 (and
use it as a false flag too!) but not indexOf... that always bothered me, I
just got used to it in Java so I think thats why I like to use it in .NET.

The indexof is 2 times slower.
(this are pico seconds or less)

:-)

Cor

Nov 20 '05 #10
Thanks a lot everyone.
Instr worked just fine. Very very fast.
Once again, thanks a lot.
vjay
Posted Via Usenet.com Premium Usenet Newsgroup Services
----------------------------------------------------------
** SPEED ** RETENTION ** COMPLETION ** ANONYMITY **
----------------------------------------------------------
http://www.usenet.com
Nov 20 '05 #11

What about a Boyer-Moore search? My tests show it is faster
than any built-in string search in .Net.
Nov 20 '05 #12
* jerry <je***@nospam.com> scripsit:
What about a Boyer-Moore search? My tests show it is faster
than any built-in string search in .Net.


Or the Knuth-Morris-Pratt string search algorithm...

You can have a look at the implementation in the SSCLI, starting point:

<http://sharedsourcecli.sscli.net/source/browse/sharedsourcecli/clr/src/bcl/system/string.cs>

--
Herfried K. Wagner [MVP]
<http://dotnet.mvps.org/>
Nov 20 '05 #13

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

11
by: Simon | last post by:
Hi, If I have a string, (variable len), and I am looking for the first position of one char in array starting from position 'x' For example, // the 'haystack' $string = "PHP is great,...
11
by: Ignacio X. Domínguez | last post by:
Hi. I'm developing a desktop application that needs to store some data in a local file. Let's say for example that I want to have an address book with names and phone numbers in a file. I would...
60
by: Julie | last post by:
What is the *fastest* way in .NET to search large on-disk text files (100+ MB) for a given string. The files are unindexed and unsorted, and for the purposes of my immediate requirements, can't...
2
by: UJ | last post by:
I have a dataset that will have say 10000 records in it with the names of files that are used by the system. I then have a large directory of files that correspond to that list of files and want to...
3
by: Harry Haller | last post by:
What is the fastest way to search a client-side database? I have about 60-65 kb of data downloaded to the client which is present in 3 dynamically created list boxes. The boxes are filled from 3...
1
by: Harry Haller | last post by:
What is the fastest way to search a client-side database? I have about 60-65 kb of data downloaded to the client which is present in 3 dynamically created list boxes. The boxes are filled from 3...
5
by: beersa | last post by:
Hi All, I have to query the database with the string from text file. Here are the details: OS: WinXP Home Pro DB: Oracle 9.x The table in DB has 20,000 rows. The text file has 15,000...
9
by: Clinto | last post by:
Hi, I am trying to find the fastest way to search a txt file for a particular string and return the line that contains the string. I have so for just used the most basic method. Initialized a...
0
by: lllomh | last post by:
Define the method first this.state = { buttonBackgroundColor: 'green', isBlinking: false, // A new status is added to identify whether the button is blinking or not } autoStart=()=>{
2
by: DJRhino | last post by:
Was curious if anyone else was having this same issue or not.... I was just Up/Down graded to windows 11 and now my access combo boxes are not acting right. With win 10 I could start typing...
2
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 4 Oct 2023 starting at 18:00 UK time (6PM UTC+1) and finishing at about 19:15 (7.15PM) The start time is equivalent to 19:00 (7PM) in Central...
0
tracyyun
by: tracyyun | last post by:
Hello everyone, I have a question and would like some advice on network connectivity. I have one computer connected to my router via WiFi, but I have two other computers that I want to be able to...
2
by: giovanniandrean | last post by:
The energy model is structured as follows and uses excel sheets to give input data: 1-Utility.py contains all the functions needed to calculate the variables and other minor things (mentions...
4
NeoPa
by: NeoPa | last post by:
Hello everyone. I find myself stuck trying to find the VBA way to get Access to create a PDF of the currently-selected (and open) object (Form or Report). I know it can be done by selecting :...
1
by: Teri B | last post by:
Hi, I have created a sub-form Roles. In my course form the user selects the roles assigned to the course. 0ne-to-many. One course many roles. Then I created a report based on the Course form and...
3
by: nia12 | last post by:
Hi there, I am very new to Access so apologies if any of this is obvious/not clear. I am creating a data collection tool for health care employees to complete. It consists of a number of...
0
NeoPa
by: NeoPa | last post by:
Introduction For this article I'll be focusing on the Report (clsReport) class. This simply handles making the calling Form invisible until all of the Reports opened by it have been closed, when it...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.