By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
446,361 Members | 1,929 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 446,361 IT Pros & Developers. It's quick & easy.

Text file parsing

P: n/a
Hi,

I'm trying to parse the text file, which is of size more than 2mb. I'm
using the following sample code

Open "c:\sim1.txt" For Input As #1
Do While Not EOF(1)
Input #1, Data

If (InStr(Data, "Summary")) Then
str = str & Data
End If
Loop
Close #1

str is a string.
the text file consists of more than 20000 lines. i need to read the
values from each of these 20000 lines and apply some business rules.
based upon the conditions that meet the business rules, i need to
classify the sim1.txt file into 4 different files. the problem i'm
facing is, it is taking lot of time to run this program. the above
code that is shown is without business rules, as this program itself
is taking lot of time. the same program that i have done in java is
taking very less time. can anybody suggest or give me ideas or
alternative solution to solve this problem.

Thanking you,
Regards,
Ratnakar Pedagani.
Jul 17 '05 #1
Share this Question
Share on Google+
6 Replies


P: n/a

"Ratnakar Pedagani" <ra***********@yahoo.co.in> wrote in message
news:5b**************************@posting.google.c om...
| Hi,
|
| I'm trying to parse the text file, which is of size more than 2mb. I'm
| using the following sample code
|
| Open "c:\sim1.txt" For Input As #1
| Do While Not EOF(1)
| Input #1, Data
|
| If (InStr(Data, "Summary")) Then
| str = str & Data
| End If
| Loop
| Close #1
|

The line
str = str & Data
is building a very large string, which has to be copied into new memory
each time through the loop.

There are different ways to improve this, depending on the situation.

If possible, open the four output files before starting the loop. Then
read each line, decide if it goes into one of the output files, and
write it there if so, before continuing the loop. This would avoid the
large string altogether.

If you need to gather all the information before making any decisions,
then you should try a different way of storing all the strings. Setting
up an array of strings, and using ReDim to increase its size as needed,
would be the simplest. You still have to allocate a lot of string space,
but at least you don't have to keep copying strings around. This is the
technique used by some string builder classes in other languages, and
probably in Java.


Jul 17 '05 #2

P: n/a
It is a lot more efficient to create the four long empty strings in advance
to hold the four categories of information and to use the mid$ function to
insert the matching text into the relevant string of the four.

something like this:

Dim pString1 As String ' buffer string
Dim pMax1 As Long ' holds the length of the buffer string
Dim pCurr1 As Long ' holds the next free position in pString
Dim pLen1 As Long ' holds the length of the input sting
Dim pTxt1 As String ' input string

' setup empty string for one output string -
' each output category must have its own

pString1 = Space$(5000)
pMax1 = 5000
pCurr1 = 1

' set up four loops one for each category of information
Do While....
pTxt1 = "newstring1"
pLen1 = Len(pTxt1)

' see if the empty string needs to be extended
If pCurr1 + pLen1 > pMax1 Then
pString1 = pString1 & Space$(10 * pLen1)
pMax1 = Len(pString1)
End If

Mid$(pString1, pCurr1) = pTxt1
pCurr1 = pCurr1 + pLen1
Loop

' when done use RTrim$ to remove excess spaces from pString

cheers, soeren
"Steve Gerrard" <my********@comcast.net> wrote in message
news:7b********************@comcast.com...

"Ratnakar Pedagani" <ra***********@yahoo.co.in> wrote in message
news:5b**************************@posting.google.c om...
| Hi,
|
| I'm trying to parse the text file, which is of size more than 2mb. I'm
| using the following sample code
|
| Open "c:\sim1.txt" For Input As #1
| Do While Not EOF(1)
| Input #1, Data
|
| If (InStr(Data, "Summary")) Then
| str = str & Data
| End If
| Loop
| Close #1
|

The line
str = str & Data
is building a very large string, which has to be copied into new memory
each time through the loop.

There are different ways to improve this, depending on the situation.

If possible, open the four output files before starting the loop. Then
read each line, decide if it goes into one of the output files, and
write it there if so, before continuing the loop. This would avoid the
large string altogether.

If you need to gather all the information before making any decisions,
then you should try a different way of storing all the strings. Setting
up an array of strings, and using ReDim to increase its size as needed,
would be the simplest. You still have to allocate a lot of string space,
but at least you don't have to keep copying strings around. This is the
technique used by some string builder classes in other languages, and
probably in Java.

Jul 17 '05 #3

P: n/a
Hi,

I'm very much impressed with the solution that you gave it to me. The
program that i have written is taking 1 min 10 sec time. the program
that u suggested is taking 7 secs of time. is there any alternative
solution which takes lesser time than u suggested earlier.

Thanking you,
Regards,
Ratnakar Pedagani

"S.W. Rasmussen" <sw*@seqtools.dk> wrote in message news:<41*********************@dread16.news.tele.dk >...
It is a lot more efficient to create the four long empty strings in advance
to hold the four categories of information and to use the mid$ function to
insert the matching text into the relevant string of the four.

something like this:

Dim pString1 As String ' buffer string
Dim pMax1 As Long ' holds the length of the buffer string
Dim pCurr1 As Long ' holds the next free position in pString
Dim pLen1 As Long ' holds the length of the input sting
Dim pTxt1 As String ' input string

' setup empty string for one output string -
' each output category must have its own

pString1 = Space$(5000)
pMax1 = 5000
pCurr1 = 1

' set up four loops one for each category of information
Do While....
pTxt1 = "newstring1"
pLen1 = Len(pTxt1)

' see if the empty string needs to be extended
If pCurr1 + pLen1 > pMax1 Then
pString1 = pString1 & Space$(10 * pLen1)
pMax1 = Len(pString1)
End If

Mid$(pString1, pCurr1) = pTxt1
pCurr1 = pCurr1 + pLen1
Loop

' when done use RTrim$ to remove excess spaces from pString

cheers, soeren
"Steve Gerrard" <my********@comcast.net> wrote in message
news:7b********************@comcast.com...

"Ratnakar Pedagani" <ra***********@yahoo.co.in> wrote in message
news:5b**************************@posting.google.c om...
| Hi,
|
| I'm trying to parse the text file, which is of size more than 2mb. I'm
| using the following sample code
|
| Open "c:\sim1.txt" For Input As #1
| Do While Not EOF(1)
| Input #1, Data
|
| If (InStr(Data, "Summary")) Then
| str = str & Data
| End If
| Loop
| Close #1
|

The line
str = str & Data
is building a very large string, which has to be copied into new memory
each time through the loop.

There are different ways to improve this, depending on the situation.

If possible, open the four output files before starting the loop. Then
read each line, decide if it goes into one of the output files, and
write it there if so, before continuing the loop. This would avoid the
large string altogether.

If you need to gather all the information before making any decisions,
then you should try a different way of storing all the strings. Setting
up an array of strings, and using ReDim to increase its size as needed,
would be the simplest. You still have to allocate a lot of string space,
but at least you don't have to keep copying strings around. This is the
technique used by some string builder classes in other languages, and
probably in Java.

Jul 17 '05 #4

P: n/a
On 23 Aug 2004 08:43:44 -0700, ra***********@yahoo.co.in (Ratnakar
Pedagani) wrote:
Hi,

I'm very much impressed with the solution that you gave it to me. The
program that i have written is taking 1 min 10 sec time. the program
that u suggested is taking 7 secs of time. is there any alternative
solution which takes lesser time than u suggested earlier.


One simple method is looking at Length on the Open line

A few more come to mind, but to some extent they have been covered.
ie: buffer file read and writes (up to about 100k)
and use Mid$() as much as possible

Jul 17 '05 #5

P: n/a

"Ratnakar Pedagani" <ra***********@yahoo.co.in> wrote in message
news:5b**************************@posting.google.c om...
| Hi,
|
| I'm very much impressed with the solution that you gave it to me. The
| program that i have written is taking 1 min 10 sec time. the program
| that u suggested is taking 7 secs of time. is there any alternative
| solution which takes lesser time than u suggested earlier.
|
| Thanking you,
| Regards,
| Ratnakar Pedagani
|

An add on to Jerry's post:

I would consider trying

Dim strInput As String
Dim strLines() As String
Dim n As Long

nFile = FreeFile 'better than just using 1
Open "c:\sim1.txt" For Input As nFile
nLen = LOF(nFile)
strInput = Space$(nLen)
Get #nFile,,strInput
Close nFile

strLines = Split(strInput, vbNewLine)

For n = LBound(strLines) to Ubound(strLines)
'process each strLines(n) as before
Next n

This reads the whole file in at once, then breaks it into an array of
strings, one for each line. If the file is really big, this would use up
a lot of memory, but often it runs faster than reading in each line.


Jul 17 '05 #6

P: n/a
Ratnakar,

If you benchmark any of the improvements to the mid$() insertion method I
yould be interested in the result. I use text parsing in several routines
and any improvement in speed is obviously welcome.

Soeren

"Steve Gerrard" <my********@comcast.net> wrote in message
news:Lb********************@comcast.com...

"Ratnakar Pedagani" <ra***********@yahoo.co.in> wrote in message
news:5b**************************@posting.google.c om...
| Hi,
|
| I'm very much impressed with the solution that you gave it to me. The
| program that i have written is taking 1 min 10 sec time. the program
| that u suggested is taking 7 secs of time. is there any alternative
| solution which takes lesser time than u suggested earlier.
|
| Thanking you,
| Regards,
| Ratnakar Pedagani
|

An add on to Jerry's post:

I would consider trying

Dim strInput As String
Dim strLines() As String
Dim n As Long

nFile = FreeFile 'better than just using 1
Open "c:\sim1.txt" For Input As nFile
nLen = LOF(nFile)
strInput = Space$(nLen)
Get #nFile,,strInput
Close nFile

strLines = Split(strInput, vbNewLine)

For n = LBound(strLines) to Ubound(strLines)
'process each strLines(n) as before
Next n

This reads the whole file in at once, then breaks it into an array of
strings, one for each line. If the file is really big, this would use up
a lot of memory, but often it runs faster than reading in each line.

Jul 17 '05 #7

This discussion thread is closed

Replies have been disabled for this discussion.