473,419 Members | 1,594 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,419 software developers and data experts.

Text file parsing

Hi,

I'm trying to parse the text file, which is of size more than 2mb. I'm
using the following sample code

Open "c:\sim1.txt" For Input As #1
Do While Not EOF(1)
Input #1, Data

If (InStr(Data, "Summary")) Then
str = str & Data
End If
Loop
Close #1

str is a string.
the text file consists of more than 20000 lines. i need to read the
values from each of these 20000 lines and apply some business rules.
based upon the conditions that meet the business rules, i need to
classify the sim1.txt file into 4 different files. the problem i'm
facing is, it is taking lot of time to run this program. the above
code that is shown is without business rules, as this program itself
is taking lot of time. the same program that i have done in java is
taking very less time. can anybody suggest or give me ideas or
alternative solution to solve this problem.

Thanking you,
Regards,
Ratnakar Pedagani.
Jul 17 '05 #1
6 19440

"Ratnakar Pedagani" <ra***********@yahoo.co.in> wrote in message
news:5b**************************@posting.google.c om...
| Hi,
|
| I'm trying to parse the text file, which is of size more than 2mb. I'm
| using the following sample code
|
| Open "c:\sim1.txt" For Input As #1
| Do While Not EOF(1)
| Input #1, Data
|
| If (InStr(Data, "Summary")) Then
| str = str & Data
| End If
| Loop
| Close #1
|

The line
str = str & Data
is building a very large string, which has to be copied into new memory
each time through the loop.

There are different ways to improve this, depending on the situation.

If possible, open the four output files before starting the loop. Then
read each line, decide if it goes into one of the output files, and
write it there if so, before continuing the loop. This would avoid the
large string altogether.

If you need to gather all the information before making any decisions,
then you should try a different way of storing all the strings. Setting
up an array of strings, and using ReDim to increase its size as needed,
would be the simplest. You still have to allocate a lot of string space,
but at least you don't have to keep copying strings around. This is the
technique used by some string builder classes in other languages, and
probably in Java.


Jul 17 '05 #2
It is a lot more efficient to create the four long empty strings in advance
to hold the four categories of information and to use the mid$ function to
insert the matching text into the relevant string of the four.

something like this:

Dim pString1 As String ' buffer string
Dim pMax1 As Long ' holds the length of the buffer string
Dim pCurr1 As Long ' holds the next free position in pString
Dim pLen1 As Long ' holds the length of the input sting
Dim pTxt1 As String ' input string

' setup empty string for one output string -
' each output category must have its own

pString1 = Space$(5000)
pMax1 = 5000
pCurr1 = 1

' set up four loops one for each category of information
Do While....
pTxt1 = "newstring1"
pLen1 = Len(pTxt1)

' see if the empty string needs to be extended
If pCurr1 + pLen1 > pMax1 Then
pString1 = pString1 & Space$(10 * pLen1)
pMax1 = Len(pString1)
End If

Mid$(pString1, pCurr1) = pTxt1
pCurr1 = pCurr1 + pLen1
Loop

' when done use RTrim$ to remove excess spaces from pString

cheers, soeren
"Steve Gerrard" <my********@comcast.net> wrote in message
news:7b********************@comcast.com...

"Ratnakar Pedagani" <ra***********@yahoo.co.in> wrote in message
news:5b**************************@posting.google.c om...
| Hi,
|
| I'm trying to parse the text file, which is of size more than 2mb. I'm
| using the following sample code
|
| Open "c:\sim1.txt" For Input As #1
| Do While Not EOF(1)
| Input #1, Data
|
| If (InStr(Data, "Summary")) Then
| str = str & Data
| End If
| Loop
| Close #1
|

The line
str = str & Data
is building a very large string, which has to be copied into new memory
each time through the loop.

There are different ways to improve this, depending on the situation.

If possible, open the four output files before starting the loop. Then
read each line, decide if it goes into one of the output files, and
write it there if so, before continuing the loop. This would avoid the
large string altogether.

If you need to gather all the information before making any decisions,
then you should try a different way of storing all the strings. Setting
up an array of strings, and using ReDim to increase its size as needed,
would be the simplest. You still have to allocate a lot of string space,
but at least you don't have to keep copying strings around. This is the
technique used by some string builder classes in other languages, and
probably in Java.

Jul 17 '05 #3
Hi,

I'm very much impressed with the solution that you gave it to me. The
program that i have written is taking 1 min 10 sec time. the program
that u suggested is taking 7 secs of time. is there any alternative
solution which takes lesser time than u suggested earlier.

Thanking you,
Regards,
Ratnakar Pedagani

"S.W. Rasmussen" <sw*@seqtools.dk> wrote in message news:<41*********************@dread16.news.tele.dk >...
It is a lot more efficient to create the four long empty strings in advance
to hold the four categories of information and to use the mid$ function to
insert the matching text into the relevant string of the four.

something like this:

Dim pString1 As String ' buffer string
Dim pMax1 As Long ' holds the length of the buffer string
Dim pCurr1 As Long ' holds the next free position in pString
Dim pLen1 As Long ' holds the length of the input sting
Dim pTxt1 As String ' input string

' setup empty string for one output string -
' each output category must have its own

pString1 = Space$(5000)
pMax1 = 5000
pCurr1 = 1

' set up four loops one for each category of information
Do While....
pTxt1 = "newstring1"
pLen1 = Len(pTxt1)

' see if the empty string needs to be extended
If pCurr1 + pLen1 > pMax1 Then
pString1 = pString1 & Space$(10 * pLen1)
pMax1 = Len(pString1)
End If

Mid$(pString1, pCurr1) = pTxt1
pCurr1 = pCurr1 + pLen1
Loop

' when done use RTrim$ to remove excess spaces from pString

cheers, soeren
"Steve Gerrard" <my********@comcast.net> wrote in message
news:7b********************@comcast.com...

"Ratnakar Pedagani" <ra***********@yahoo.co.in> wrote in message
news:5b**************************@posting.google.c om...
| Hi,
|
| I'm trying to parse the text file, which is of size more than 2mb. I'm
| using the following sample code
|
| Open "c:\sim1.txt" For Input As #1
| Do While Not EOF(1)
| Input #1, Data
|
| If (InStr(Data, "Summary")) Then
| str = str & Data
| End If
| Loop
| Close #1
|

The line
str = str & Data
is building a very large string, which has to be copied into new memory
each time through the loop.

There are different ways to improve this, depending on the situation.

If possible, open the four output files before starting the loop. Then
read each line, decide if it goes into one of the output files, and
write it there if so, before continuing the loop. This would avoid the
large string altogether.

If you need to gather all the information before making any decisions,
then you should try a different way of storing all the strings. Setting
up an array of strings, and using ReDim to increase its size as needed,
would be the simplest. You still have to allocate a lot of string space,
but at least you don't have to keep copying strings around. This is the
technique used by some string builder classes in other languages, and
probably in Java.

Jul 17 '05 #4
On 23 Aug 2004 08:43:44 -0700, ra***********@yahoo.co.in (Ratnakar
Pedagani) wrote:
Hi,

I'm very much impressed with the solution that you gave it to me. The
program that i have written is taking 1 min 10 sec time. the program
that u suggested is taking 7 secs of time. is there any alternative
solution which takes lesser time than u suggested earlier.


One simple method is looking at Length on the Open line

A few more come to mind, but to some extent they have been covered.
ie: buffer file read and writes (up to about 100k)
and use Mid$() as much as possible

Jul 17 '05 #5

"Ratnakar Pedagani" <ra***********@yahoo.co.in> wrote in message
news:5b**************************@posting.google.c om...
| Hi,
|
| I'm very much impressed with the solution that you gave it to me. The
| program that i have written is taking 1 min 10 sec time. the program
| that u suggested is taking 7 secs of time. is there any alternative
| solution which takes lesser time than u suggested earlier.
|
| Thanking you,
| Regards,
| Ratnakar Pedagani
|

An add on to Jerry's post:

I would consider trying

Dim strInput As String
Dim strLines() As String
Dim n As Long

nFile = FreeFile 'better than just using 1
Open "c:\sim1.txt" For Input As nFile
nLen = LOF(nFile)
strInput = Space$(nLen)
Get #nFile,,strInput
Close nFile

strLines = Split(strInput, vbNewLine)

For n = LBound(strLines) to Ubound(strLines)
'process each strLines(n) as before
Next n

This reads the whole file in at once, then breaks it into an array of
strings, one for each line. If the file is really big, this would use up
a lot of memory, but often it runs faster than reading in each line.


Jul 17 '05 #6
Ratnakar,

If you benchmark any of the improvements to the mid$() insertion method I
yould be interested in the result. I use text parsing in several routines
and any improvement in speed is obviously welcome.

Soeren

"Steve Gerrard" <my********@comcast.net> wrote in message
news:Lb********************@comcast.com...

"Ratnakar Pedagani" <ra***********@yahoo.co.in> wrote in message
news:5b**************************@posting.google.c om...
| Hi,
|
| I'm very much impressed with the solution that you gave it to me. The
| program that i have written is taking 1 min 10 sec time. the program
| that u suggested is taking 7 secs of time. is there any alternative
| solution which takes lesser time than u suggested earlier.
|
| Thanking you,
| Regards,
| Ratnakar Pedagani
|

An add on to Jerry's post:

I would consider trying

Dim strInput As String
Dim strLines() As String
Dim n As Long

nFile = FreeFile 'better than just using 1
Open "c:\sim1.txt" For Input As nFile
nLen = LOF(nFile)
strInput = Space$(nLen)
Get #nFile,,strInput
Close nFile

strLines = Split(strInput, vbNewLine)

For n = LBound(strLines) to Ubound(strLines)
'process each strLines(n) as before
Next n

This reads the whole file in at once, then breaks it into an array of
strings, one for each line. If the file is really big, this would use up
a lot of memory, but often it runs faster than reading in each line.

Jul 17 '05 #7

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

2
by: Bob | last post by:
Hi, I have a website in a Linux/Apache shared hosting environment and have been given access to the MySQL server running on the same machine. To access this database from PHP, I have to call...
1
by: Scott | last post by:
I am new to perl, and have not found any good examples of parsing to help me out. I have a text file that I am reading into an array that has to be parsed out and put into another file. I have not...
27
by: Eric | last post by:
Assume that disk space is not an issue (the files will be small < 5k in general for the purpose of storing preferences) Assume that transportation to another OS may never occur. Are there...
4
by: Hugh | last post by:
Hello, I am having some problems understanding (most likely), parsing a text file. I would like to parse a file like: block1 { stuff; ... stuffN; };
11
by: .Net Sports | last post by:
In VB.net, I'm trying to do a couple of things in a couple of different blocks of code. I need to take the first 25 characters of a text file, then append at the end some ellipses and a MORE link...
13
by: sonald | last post by:
Hi, Can anybody tell me how to change the text delimiter in FastCSV Parser ? By default the text delimiter is double quotes(") I want to change it to anything else... say a pipe (|).. can anyone...
4
by: thenewuser | last post by:
Hi all, I am working on windows 2000 and using php 5.0 and apache 2.0.59. I am facing a problem while parsing a text file.Actually I am using a pop server for parsing an email.I am downloading...
3
by: toton | last post by:
Hi, I have some ascii files, which are having some formatted text. I want to read some section only from the total file. For that what I am doing is indexing the sections (denoted by .START in...
2
by: flyzone | last post by:
Goodmorning people :) I have just started to learn this language and i have a logical problem. I need to write a program to parse various file of text. Here two sample: --------------- trial...
2
by: python | last post by:
I'm parsing a text file for a proprietary product that has the following 2 directives: #include <somefile> #define <name<value> Defined constants are referenced via <#name#syntax. I'm...
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
1
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...
0
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.