473,404 Members | 2,170 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,404 software developers and data experts.

Fastest way to search through txt file?

Hi,
I haven't posted any problem in quite a while now, but I came to the
point that I really need to ask for help.

I need to create an application which will search through .txt log
file and find all lines where email from hotmail occured.

All these emails need to be printed to list box on the form.

Problem with code you'll see below, is that it takes long time to
search through. On just 10mb file it takes almost 2 minutes. And I
need to process 1-2 gb files. Because this is only middle step and
program has much more functionality, I can't afford to wait this
long.

Right now the way I do the search is, that I load each line into
hidden richtextbox and use its find comman to look if there is any
occurence of hotmail.com, and if it is I display the line in listbox.

This process is extremely slow.

How to extract that email address out of the line, that is another
story, does someone know about any good email parser for vb.net?

Can someone look at the code and tell me what I am doing wrong? Why is
is so slow?
Thanks a lot.

vjay
Dim oFile As System.IO.File
Dim oRead As System.IO.StreamReader
Dim linein As String
Dim Result As Integer
Dim count As Integer

oRead = oFile.OpenText(log.txt)

While oRead.Peek <> -1
count = count + 1
linein = oRead.ReadLine()
RichTextBox1.Text = linein
'StatusBar1.Text = count.ToString

Result = RichTextBox1.Find("hotmail.com",
RichTextBoxFinds.MatchCase)
If Result <> -1 Then
ListBox2.Items.Add(linein)
End If
End While
oRead.Close()

Posted Via Usenet.com Premium Usenet Newsgroup Services
----------------------------------------------------------
** SPEED ** RETENTION ** COMPLETION ** ANONYMITY **
----------------------------------------------------------
http://www.usenet.com
Nov 20 '05 #1
12 6387
Cor
Hi Vjay,

I think about an half a year ago I did a test in this newsgroup what was
the fastest method to do a find in a textfile (code supported by different
persons).

This code was supported by someone who had as firstname Jon.

(It counts how many time a word is in a text, you needs the places, that is
of course everytime iStart in this routine ).

\\\
Public Function Test2(ByVal strInput As String, ByVal strDelimiter _
As String) As Int32 'Jon (string)
Dim iStart As Int32, iCount As Int32, iResult As Int32
iStart = 1
iCount = 0
Do
iResult = InStr(iStart, strInput, strDelimiter)
If iResult = 0 Then Exit Do
iCount += 1
iStart = iResult + 1
Loop
Return iCount
End Function
///
This was absolute the fastest if the delimiter is a string (not with a char)

I hope that you can use it, if you still needs help to implement this in
your solution tell it than again?

Cor
Nov 20 '05 #2
Cor,
thank you but I don't need to count, I need to find a word and display
the line in which it occured in my listbox.
Please let me know if you know how I could amend it so your fast
script does what I need.
Thanks a lot for your help.
Posted Via Usenet.com Premium Usenet Newsgroup Services
----------------------------------------------------------
** SPEED ** RETENTION ** COMPLETION ** ANONYMITY **
----------------------------------------------------------
http://www.usenet.com
Nov 20 '05 #3
You should try loading a file into parts (if you concerned about memory) and
process it using RegularExpressions.

Using a filestream object may be better for memory, I'm not real sure...

I think you will find RegEx is exactly what your looking for, and its VERY
fast.
"Vjay77" <vj********@hotmail-dot-com.no-spam.invalid> wrote in message
news:40********@Usenet.com...
Cor,
thank you but I don't need to count, I need to find a word and display
the line in which it occured in my listbox.
Please let me know if you know how I could amend it so your fast
script does what I need.
Thanks a lot for your help.
Posted Via Usenet.com Premium Usenet Newsgroup Services
----------------------------------------------------------
** SPEED ** RETENTION ** COMPLETION ** ANONYMITY **
----------------------------------------------------------
http://www.usenet.com

Nov 20 '05 #4
Vjay77,
I don't have a specific .NET example handy.

Have you considered using the Indexing Service instead of coding the search
yourself?

http://msdn.microsoft.com/library/de...intro_297o.asp

You may need to use ADODB instead of ADO.NET, however you should be able to
setup a search of the Indexing Service that returns the list of files with
log entries...

If you have a single log file, instead of multiple log files, then you may
need to continue using the loop you have, however consider moving the search
itself to a second thread...

Hope this helps
Jay

"Vjay77" <vj********@hotmail-dot-com.no-spam.invalid> wrote in message
news:40**********@Usenet.com...
Hi,
I haven't posted any problem in quite a while now, but I came to the
point that I really need to ask for help.

I need to create an application which will search through .txt log
file and find all lines where email from hotmail occured.

All these emails need to be printed to list box on the form.

Problem with code you'll see below, is that it takes long time to
search through. On just 10mb file it takes almost 2 minutes. And I
need to process 1-2 gb files. Because this is only middle step and
program has much more functionality, I can't afford to wait this
long.

Right now the way I do the search is, that I load each line into
hidden richtextbox and use its find comman to look if there is any
occurence of hotmail.com, and if it is I display the line in listbox.

This process is extremely slow.

How to extract that email address out of the line, that is another
story, does someone know about any good email parser for vb.net?

Can someone look at the code and tell me what I am doing wrong? Why is
is so slow?
Thanks a lot.

vjay
Dim oFile As System.IO.File
Dim oRead As System.IO.StreamReader
Dim linein As String
Dim Result As Integer
Dim count As Integer

oRead = oFile.OpenText(log.txt)

While oRead.Peek <> -1
count = count + 1
linein = oRead.ReadLine()
RichTextBox1.Text = linein
'StatusBar1.Text = count.ToString

Result = RichTextBox1.Find("hotmail.com",
RichTextBoxFinds.MatchCase)
If Result <> -1 Then
ListBox2.Items.Add(linein)
End If
End While
oRead.Close()

Posted Via Usenet.com Premium Usenet Newsgroup Services
----------------------------------------------------------
** SPEED ** RETENTION ** COMPLETION ** ANONYMITY **
----------------------------------------------------------
http://www.usenet.com

Nov 20 '05 #5
Cor
Hi CJ,
I think you will find RegEx is exactly what your looking for, and its VERY
fast.

Yes for your standards not for mine.

:-)

As far as I remember me surely 100 times slower than that routine Jon made.

Cor

Nov 20 '05 #6
Cor
Hi Vjay77,

I thought this was what you where looking for, test it because I never use
the instr.

But that should be the fastest.

I hope it works?

Cor
\\\
Dim sr As New IO.StreamReader(log.txt)
Dim linein As String
Dim Result As Integer
linein = sr.ReadLine
Do Until linein Is Nothing
Result = InStr(linein, "hotmail.com")
If Result <> -1 Then
ListBox2.Items.Add(linein)
End If
linein = sr.ReadLine()
Loop
sr.Close()
///
Nov 20 '05 #7
> Hi CJ,
I think you will find RegEx is exactly what your looking for, and its VERY fast.
Yes for your standards not for mine.


Yes, but I tried to forget about Microsoft.VisualBasic when much better and
commonly supported methods are in place...

Maybe you should look at String.IndexOf if you want to use something other
than regex..

Just had to start something this morning didn't ya? =)

But for log analysis? come on...

:-)

As far as I remember me surely 100 times slower than that routine Jon made.
Cor

Nov 20 '05 #8
Cor
Hi CJ,

See the subject. "fastest".

If you see the code I wrote for him you see that I never use it.
(You told you do internet, than you should also only think in indexof in my
opinion)

But that instr is real the fastest, I never use it, I forget always that the
index is 0 + 1

The indexof is 2 times slower.
(this are pico seconds or less)

:-)

Cor
Nov 20 '05 #9
Alright, just ran a test, and yes, instr is faster than anything else.

My apologies. =)
Hi CJ,

See the subject. "fastest".

If you see the code I wrote for him you see that I never use it.
(You told you do internet, than you should also only think in indexof in my opinion)

But that instr is real the fastest, I never use it, I forget always that the index is 0 + 1
I forget that every time from my VB6 days... Instr you can check for 0 (and
use it as a false flag too!) but not indexOf... that always bothered me, I
just got used to it in Java so I think thats why I like to use it in .NET.

The indexof is 2 times slower.
(this are pico seconds or less)

:-)

Cor

Nov 20 '05 #10
Thanks a lot everyone.
Instr worked just fine. Very very fast.
Once again, thanks a lot.
vjay
Posted Via Usenet.com Premium Usenet Newsgroup Services
----------------------------------------------------------
** SPEED ** RETENTION ** COMPLETION ** ANONYMITY **
----------------------------------------------------------
http://www.usenet.com
Nov 20 '05 #11

What about a Boyer-Moore search? My tests show it is faster
than any built-in string search in .Net.
Nov 20 '05 #12
* jerry <je***@nospam.com> scripsit:
What about a Boyer-Moore search? My tests show it is faster
than any built-in string search in .Net.


Or the Knuth-Morris-Pratt string search algorithm...

You can have a look at the implementation in the SSCLI, starting point:

<http://sharedsourcecli.sscli.net/source/browse/sharedsourcecli/clr/src/bcl/system/string.cs>

--
Herfried K. Wagner [MVP]
<http://dotnet.mvps.org/>
Nov 20 '05 #13

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

11
by: Simon | last post by:
Hi, If I have a string, (variable len), and I am looking for the first position of one char in array starting from position 'x' For example, // the 'haystack' $string = "PHP is great,...
11
by: Ignacio X. Domínguez | last post by:
Hi. I'm developing a desktop application that needs to store some data in a local file. Let's say for example that I want to have an address book with names and phone numbers in a file. I would...
60
by: Julie | last post by:
What is the *fastest* way in .NET to search large on-disk text files (100+ MB) for a given string. The files are unindexed and unsorted, and for the purposes of my immediate requirements, can't...
2
by: UJ | last post by:
I have a dataset that will have say 10000 records in it with the names of files that are used by the system. I then have a large directory of files that correspond to that list of files and want to...
3
by: Harry Haller | last post by:
What is the fastest way to search a client-side database? I have about 60-65 kb of data downloaded to the client which is present in 3 dynamically created list boxes. The boxes are filled from 3...
1
by: Harry Haller | last post by:
What is the fastest way to search a client-side database? I have about 60-65 kb of data downloaded to the client which is present in 3 dynamically created list boxes. The boxes are filled from 3...
5
by: beersa | last post by:
Hi All, I have to query the database with the string from text file. Here are the details: OS: WinXP Home Pro DB: Oracle 9.x The table in DB has 20,000 rows. The text file has 15,000...
9
by: Clinto | last post by:
Hi, I am trying to find the fastest way to search a txt file for a particular string and return the line that contains the string. I have so for just used the most basic method. Initialized a...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.