473,566 Members | 2,958 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Suggestions to reduce memory use when splitting a string

Good Day,

I have written and utility to convert our DOS COBOL data files to a SQL
Server database. Part of the process requires parsing each line into a
sql statement and validting the data to keep the integrity of the
database. We are parsing roughl 81 files and range in size 1 kb to 65
MB files (Average of 400,000 lines in the larger files).

I have written this utility with VB.NET 2003 and when I parse all of
the files I run out of memory. The following functions seems to be the
main source of my leak. Any help optimizing this code is appreciated.

Public Function SplitDelimitedL ine(ByVal CurrentLine As StringBuilder,
_
ByRef SplitString() As String) As Boolean
'7-25-2005
'BJK
'
'--removed the use of Char replaced with: CurrentLine.Cha rs(i)
'
'---------------------------------------
Dim i As Integer
Dim CountDelimiter As Boolean
Dim Total As Integer
Dim lbResult As Boolean
Dim Section As New StringBuilder
Dim liLen As Integer
Dim liCommaPos As Integer
Dim liDQuotePos As Integer

Try
'We want to count the delimiter unless it is within the
text qualifier
CountDelimiter = True
Total = 0
liLen = CurrentLine.Len gth - 1
For i = 0 To liLen
Select Case CurrentLine.Cha rs(i)
Case gsDoubleQoute
If CountDelimiter Then
CountDelimiter = False
Else
CountDelimiter = True
End If
Case gsComma
If CountDelimiter Then
' Add current section to collection
SplitString(Tot al) = Section.ToStrin g.Trim
Section = Nothing
Section = New StringBuilder
Total = Total + 1
Else
Section.Append( CurrentLine.Cha rs(i))
End If
Case Else
Section.Append( CurrentLine.Cha rs(i))
End Select
Next
' Get the last field - as most files will not have an
ending delimiter
If CountDelimiter Then
' Add current section to collection
SplitString(Tot al) = Section.ToStrin g
End If
lbResult = True
Catch ex As Exception
ps_LastErrSourc e = ex.Source
ps_LastErrDesc = ex.ToString
lbResult = False
Dim loSB As New StringBuilder(p s_LastErrDesc)
UpdateLog(loSB)
End Try
Return lbResult
End Function

This function is stored in a class and is called from other function
within this class.
Thanks
Brian

Apr 3 '06 #1
10 2709
hi Brian,

I would do that using a buffer. You open a StreamWriter and when the
processed text (which you can store in a StringBuilder (sb) ) reaches a
given size, for instance 2MB (or whatever you like, the sb holds more
than 2GB), you can flush it to disk an start again with an empty
buffer. To empty a sb it's sufficient to set the length to 0. When
input ends, you flush to disk what remains in the sb. Remember to close
the StreamWriter.

-t

klineb ha scritto:
Good Day,

I have written and utility to convert our DOS COBOL data files to a SQL
Server database. Part of the process requires parsing each line into a
sql statement and validting the data to keep the integrity of the
database. We are parsing roughl 81 files and range in size 1 kb to 65
MB files (Average of 400,000 lines in the larger files).

I have written this utility with VB.NET 2003 and when I parse all of
the files I run out of memory. The following functions seems to be the
main source of my leak. Any help optimizing this code is appreciated.

Public Function SplitDelimitedL ine(ByVal CurrentLine As StringBuilder,
_
ByRef SplitString() As String) As Boolean
'7-25-2005
'BJK
'
'--removed the use of Char replaced with: CurrentLine.Cha rs(i)
'
'---------------------------------------
Dim i As Integer
Dim CountDelimiter As Boolean
Dim Total As Integer
Dim lbResult As Boolean
Dim Section As New StringBuilder
Dim liLen As Integer
Dim liCommaPos As Integer
Dim liDQuotePos As Integer

Try
'We want to count the delimiter unless it is within the
text qualifier
CountDelimiter = True
Total = 0
liLen = CurrentLine.Len gth - 1
For i = 0 To liLen
Select Case CurrentLine.Cha rs(i)
Case gsDoubleQoute
If CountDelimiter Then
CountDelimiter = False
Else
CountDelimiter = True
End If
Case gsComma
If CountDelimiter Then
' Add current section to collection
SplitString(Tot al) = Section.ToStrin g.Trim
Section = Nothing
Section = New StringBuilder
Total = Total + 1
Else
Section.Append( CurrentLine.Cha rs(i))
End If
Case Else
Section.Append( CurrentLine.Cha rs(i))
End Select
Next
' Get the last field - as most files will not have an
ending delimiter
If CountDelimiter Then
' Add current section to collection
SplitString(Tot al) = Section.ToStrin g
End If
lbResult = True
Catch ex As Exception
ps_LastErrSourc e = ex.Source
ps_LastErrDesc = ex.ToString
lbResult = False
Dim loSB As New StringBuilder(p s_LastErrDesc)
UpdateLog(loSB)
End Try
Return lbResult
End Function

This function is stored in a class and is called from other function
within this class.
Thanks
Brian


Apr 3 '06 #2
Have you thought about regular expressions to parse the fields out of each
line? Going character by character seems really inefficient. At the very
last, use functions like IndexOf, to find your delimiters, and parse out the
pieces that way.

"klineb" <br********@hot mail.com> wrote in message
news:11******** **************@ g10g2000cwb.goo glegroups.com.. .
Good Day,

I have written and utility to convert our DOS COBOL data files to a SQL
Server database. Part of the process requires parsing each line into a
sql statement and validting the data to keep the integrity of the
database. We are parsing roughl 81 files and range in size 1 kb to 65
MB files (Average of 400,000 lines in the larger files).

I have written this utility with VB.NET 2003 and when I parse all of
the files I run out of memory. The following functions seems to be the
main source of my leak. Any help optimizing this code is appreciated.

Public Function SplitDelimitedL ine(ByVal CurrentLine As StringBuilder,
_
ByRef SplitString() As String) As Boolean
'7-25-2005
'BJK
'
'--removed the use of Char replaced with: CurrentLine.Cha rs(i)
'
'---------------------------------------
Dim i As Integer
Dim CountDelimiter As Boolean
Dim Total As Integer
Dim lbResult As Boolean
Dim Section As New StringBuilder
Dim liLen As Integer
Dim liCommaPos As Integer
Dim liDQuotePos As Integer

Try
'We want to count the delimiter unless it is within the
text qualifier
CountDelimiter = True
Total = 0
liLen = CurrentLine.Len gth - 1
For i = 0 To liLen
Select Case CurrentLine.Cha rs(i)
Case gsDoubleQoute
If CountDelimiter Then
CountDelimiter = False
Else
CountDelimiter = True
End If
Case gsComma
If CountDelimiter Then
' Add current section to collection
SplitString(Tot al) = Section.ToStrin g.Trim
Section = Nothing
Section = New StringBuilder
Total = Total + 1
Else
Section.Append( CurrentLine.Cha rs(i))
End If
Case Else
Section.Append( CurrentLine.Cha rs(i))
End Select
Next
' Get the last field - as most files will not have an
ending delimiter
If CountDelimiter Then
' Add current section to collection
SplitString(Tot al) = Section.ToStrin g
End If
lbResult = True
Catch ex As Exception
ps_LastErrSourc e = ex.Source
ps_LastErrDesc = ex.ToString
lbResult = False
Dim loSB As New StringBuilder(p s_LastErrDesc)
UpdateLog(loSB)
End Try
Return lbResult
End Function

This function is stored in a class and is called from other function
within this class.
Thanks
Brian

Apr 3 '06 #3

"klineb" <br********@hot mail.com> wrote in message
news:11******** **************@ g10g2000cwb.goo glegroups.com.. .
Good Day,

I have written and utility to convert our DOS COBOL data files to a SQL
Server database. Part of the process requires parsing each line into a
sql statement and validting the data to keep the integrity of the
database. We are parsing roughl 81 files and range in size 1 kb to 65
MB files (Average of 400,000 lines in the larger files).

I have written this utility with VB.NET 2003 and when I parse all of
the files I run out of memory. The following functions seems to be the
main source of my leak. Any help optimizing this code is appreciated.


There are a few tweaks you can apply here to reduce memory utilization (in
particular you can perhaps reuse the StringBuilders) . However there's
nothing obvious and terrible here. In short this function should not cause
an out of memory error.

David
Apr 3 '06 #4
klineb wrote:
Good Day,

I have written and utility to convert our DOS COBOL data files to a SQL
Server database. Part of the process requires parsing each line into a
sql statement and validting the data to keep the integrity of the
database. We are parsing roughl 81 files and range in size 1 kb to 65
MB files (Average of 400,000 lines in the larger files).

I have written this utility with VB.NET 2003 and when I parse all of
the files I run out of memory. The following functions seems to be the
main source of my leak. Any help optimizing this code is appreciated.

Public Function SplitDelimitedL ine(ByVal CurrentLine As StringBuilder,
_
ByRef SplitString() As String) As Boolean
'7-25-2005
'BJK
'
'--removed the use of Char replaced with: CurrentLine.Cha rs(i)
'
'---------------------------------------
Dim i As Integer
Dim CountDelimiter As Boolean
Dim Total As Integer
Dim lbResult As Boolean
Dim Section As New StringBuilder
Dim liLen As Integer
Dim liCommaPos As Integer
Dim liDQuotePos As Integer

Try
'We want to count the delimiter unless it is within the
text qualifier
CountDelimiter = True
Total = 0
liLen = CurrentLine.Len gth - 1
For i = 0 To liLen
Select Case CurrentLine.Cha rs(i)
Case gsDoubleQoute
If CountDelimiter Then
CountDelimiter = False
Else
CountDelimiter = True
End If
Case gsComma
If CountDelimiter Then
' Add current section to collection
SplitString(Tot al) = Section.ToStrin g.Trim
Section = Nothing
Section = New StringBuilder
Total = Total + 1
Else
Section.Append( CurrentLine.Cha rs(i))
End If
Case Else
Section.Append( CurrentLine.Cha rs(i))
End Select
Next
' Get the last field - as most files will not have an
ending delimiter
If CountDelimiter Then
' Add current section to collection
SplitString(Tot al) = Section.ToStrin g
End If
lbResult = True
Catch ex As Exception
ps_LastErrSourc e = ex.Source
ps_LastErrDesc = ex.ToString
lbResult = False
Dim loSB As New StringBuilder(p s_LastErrDesc)
UpdateLog(loSB)
End Try
Return lbResult
End Function

This function is stored in a class and is called from other function
within this class.
Thanks
Brian

It looks to me like your field separator is always a comma.
So why not just use SplitString = split(CurrentLi ne,","), then use
Replace on each string in the resultant array if you want to get rid of
the quotes. I think this will be much faster than going char by char.
The parameter SplitString will have to be declared as byRef SplitString
as Array, rather than SplitString() as String.

Tom
Apr 3 '06 #5
Actually, arrays are reference types. So modifying the contents of the
array, will work just fine as far as filling it. It doesn't need to be
ByRef.

"tomb" <to**@technetce nter.com> wrote in message
news:8h******** **********@bign ews6.bellsouth. net...
klineb wrote:
Good Day,

I have written and utility to convert our DOS COBOL data files to a SQL
Server database. Part of the process requires parsing each line into a
sql statement and validting the data to keep the integrity of the
database. We are parsing roughl 81 files and range in size 1 kb to 65
MB files (Average of 400,000 lines in the larger files).

I have written this utility with VB.NET 2003 and when I parse all of
the files I run out of memory. The following functions seems to be the
main source of my leak. Any help optimizing this code is appreciated.

Public Function SplitDelimitedL ine(ByVal CurrentLine As StringBuilder,
_
ByRef SplitString() As String) As Boolean
'7-25-2005
'BJK
'
'--removed the use of Char replaced with: CurrentLine.Cha rs(i)
'
'---------------------------------------
Dim i As Integer
Dim CountDelimiter As Boolean
Dim Total As Integer
Dim lbResult As Boolean
Dim Section As New StringBuilder
Dim liLen As Integer
Dim liCommaPos As Integer
Dim liDQuotePos As Integer

Try
'We want to count the delimiter unless it is within the
text qualifier
CountDelimiter = True
Total = 0
liLen = CurrentLine.Len gth - 1
For i = 0 To liLen
Select Case CurrentLine.Cha rs(i)
Case gsDoubleQoute
If CountDelimiter Then
CountDelimiter = False
Else
CountDelimiter = True
End If
Case gsComma
If CountDelimiter Then
' Add current section to collection
SplitString(Tot al) = Section.ToStrin g.Trim
Section = Nothing
Section = New StringBuilder
Total = Total + 1
Else
Section.Append( CurrentLine.Cha rs(i))
End If
Case Else
Section.Append( CurrentLine.Cha rs(i))
End Select
Next
' Get the last field - as most files will not have an
ending delimiter
If CountDelimiter Then
' Add current section to collection
SplitString(Tot al) = Section.ToStrin g
End If
lbResult = True
Catch ex As Exception
ps_LastErrSourc e = ex.Source
ps_LastErrDesc = ex.ToString
lbResult = False
Dim loSB As New StringBuilder(p s_LastErrDesc)
UpdateLog(loSB)
End Try
Return lbResult
End Function

This function is stored in a class and is called from other function
within this class.
Thanks
Brian

It looks to me like your field separator is always a comma. So why not
just use SplitString = split(CurrentLi ne,","), then use Replace on each
string in the resultant array if you want to get rid of the quotes. I
think this will be much faster than going char by char.
The parameter SplitString will have to be declared as byRef SplitString
as Array, rather than SplitString() as String.

Tom

Apr 3 '06 #6
Tom,

Some of the fields are comment field that contain comma.

Ex. 1,2,"This is some, sample text",19.90,

Using SplitSting will not handle this.

Apr 4 '06 #7
Try this:

Public Function SplitDelimitedL ine(ByVal CurrentLine As String) As
String()

Try
Dim _wl As String = String.Empty ' work string
' Create a local copy of CurrentLine
' We don't really need to but I have for the sake of clarity
' The 3 Trim's peel of any extraneous whitespace then any leading
and/or trailing commas and then any extraneous whitespace thet might have
been any leading and/or trailing commas that might have been present
Dim _cl As String = CurrentLine.Tri m.Trim(","c).Tr im
' Find the first " character
Dim _pos As Integer = _cl.IndexOf(""" "c)
' If _pos = -1 then there weren't any
' Loop until there are no more " characters
While _pos > -1
' Append every thing before the first " character to the work string
_wl &= _cl.Substring(0 , _pos)

' Remove every thing before the first " character from the local
copy
_cl = _cl.Remove(0, _pos)
' Find the next " character
' Note that we start the find from the 2nd position because we know
that there is a " character in position 1 (index 0)
_pos = _cl.IndexOf(""" "c, 1)
If _pos > -1 Then
' If we find one then we append every thing from the first " to
the 2nd " inclusive to the work string, replacing any commas in the
substring with a tilde and remove the same number of characters from the
local copy
_wl &= _cl.Substring(0 , _pos + 1).Replace(","c , "~"c)
_cl = _cl.Remove(0, _pos + 1)

' Find the next " character
' Note that we are now back to finding from the beginning of what
is left of the local copy
_pos = _cl.IndexOf(""" "c)
End If
End While
' There are no moe " characters so append the remainder of the local
copy to the work string
_wl &= _cl
' Split the work string using the comma as the delimiter
Dim _ss As String() = _wl.Replace(""" ", String.Empty).S plit(","c)
' Work through the elements of the array
For _i As Integer = 0 To _ss.Length - 1
If the element contains any tilde characters then replace them with
commas
If _ss(_i).IndexOf ("~"c) > -1 Then _ss(_i) = _ss(_i).Replace ("~"c,
","c)
'Trim any whitespaces from the element
_ss(_i) = _ss(_i).Trim
Next
' Return the string array
Return _ss
Catch
' We hit a problem of some description so return Nothing (null)
Return Nothing
End Try

End Function

Private Sub Button1_Click(B yVal sender As System.Object, ByVal e As
System.EventArg s) Handles Button1.Click

' Measure the time for 1 million calls to the function

Dim _start As DateTime = DateTime.Now

For _i As Integer = 1 To 1000000
Dim _ss() As String = SplitDelimitedL ine("1,2,""This is some, sample
text"",19.90,")
Next

Console.WriteLi ne(DateTime.Now .Subtract(_star t).TotalSeconds )

End Sub

On my machine it takes 3.281166 seconds and I do not see any significant
impact on the memory resources.
"klineb" <br********@hot mail.com> wrote in message
news:11******** *************@i 39g2000cwa.goog legroups.com...
Tom,

Some of the fields are comment field that contain comma.

Ex. 1,2,"This is some, sample text",19.90,

Using SplitSting will not handle this.

Apr 4 '06 #8
klineb wrote:
<snip>
I have written this utility with VB.NET 2003 and when I parse all of
the files I run out of memory. The following functions seems to be the
main source of my leak. Any help optimizing this code is appreciated.

Public Function SplitDelimitedL ine(ByVal CurrentLine As StringBuilder,
_
ByRef SplitString() As String) As Boolean

<snip, snip, snip>

I don't know if this is of any help, but you really don't need a
StringBuilder to "capture" the String slices. Whenever you find the
delimiter, you just need to know where the slice begins.

Disclaimer: I *didn't* test the code bellow

<AirCode>
Function SplitDelimitedL ine( _
CurrentLine As StringBuilder, _
SplitString() As String _
) As Boolean

Dim Text As String = CurrentLine.ToS tring
Dim Max As Integer = Text.Length - 1
Dim SliceStart As Integer
DIm Total As Integer

For Index As Integer = 0 To Max

Dim IgnoreComma As Boolean

Select Case Text(Index)
Case gsDoubleQuote
IgnoreComma = Not IgnoreComma
Case gsComma
If Not IgnoreComma Then
Dim Count As Integer = Index - SliceStart
SplitString(Tot al) = Text.Substring( SliceStart, Count).Trim
SliceStart = Index + 1
Total += 1
End If
End Select
Next

If SliceStart <= Max Then
Dim Count As Integer = Max - SliceStart + 1
SplitString(Tot al) = Text.Substring( SliceStart, Count).Trim
End If

Return True
</AirCode>

Regards,

Branco

Apr 4 '06 #9
Brian,

I cannot imagen that this routine takes much memory. And if it would do,
what so ever.

It is a one time operation. The last thing you would think about is in my
opinion the amount of memory you use. Even if you have not enough than add
it. As I assume that you live in an North Atlantic Country, than one hour
thinking of the problem will cost probably more than 1Gb.

So you can only become in problem as you go over the 800Mb.

Just my thought,

Cor
Apr 4 '06 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

226
12378
by: Stephen C. Waterbury | last post by:
This seems like it ought to work, according to the description of reduce(), but it doesn't. Is this a bug, or am I missing something? Python 2.3.2 (#1, Oct 20 2003, 01:04:35) on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> d1 = {'a':1} >>> d2 = {'b':2} >>> d3 = {'c':3}
193
9444
by: Michael B. | last post by:
I was just thinking about this, specifically wondering if there's any features that the C specification currently lacks, and which may be included in some future standardization. Of course, I speak only of features in the spirit of C; something like object-orientation, though a nice feature, does not belong in C. Something like being able...
5
2953
by: fatted | last post by:
I'm trying to write a function which splits a string (possibly multiple times) on a particular character and returns the strings which has been split. What I have below is kind of (oh dear!) printing the results I expect, which I guess means my dynamic memory allocation is a mess. Also, I was advised previously that I should really free...
10
399
by: Segfahlt | last post by:
I have a fairly simple C# program that just needs to open up a fixed width file, convert each record to tab delimited and append a field to the end of it. The input files are between 300M and 600M. I've tried every memory conservation trick I know in my conversion program, and a bunch I picked up from reading some of the MSDN C# blogs, but...
3
3753
by: Ricardo Q.G. | last post by:
in production environment we started to have a memory consumption problem at "asp.net wp". after a hard work we have discovered that Session.Abandon() does not reduce references to objects added to the session and not reduce memory use. in Session_End function at Global.asax file we have added the following lines of code: == == == == == ==...
37
2301
by: Ajai Jose | last post by:
Hi , I work on an ARM processor based embedded system. Code is mainly in C language. The project has a huge source base. We are supposed to optimise it. Most datastructures are declared as static and which directly go into the Zero Initialised region. We need to cut the size of this ZI region by at least 30%. The one way i see of doing this...
7
1898
by: william | last post by:
My question is: Specific memory block where my pointer pointing to changed strangely, seemingly that no statement changed it. Here are two examples I got: ***********1***************** I was about to read from a floppy image and build a tree for all the directories and files. My question is only about a small portion where I had debugging...
3
2057
Banfa
by: Banfa | last post by:
The project I work on has a bespoke hardware platform which is designed to go into a variety of different situations. However to keep things simple we really want the software for the platform to remain the same in all cases (as good practice suggests). One of our thoughts is that to provide the final customisation required for each application...
0
7673
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main...
0
8109
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that...
0
6263
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then...
0
5213
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert...
0
3643
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in...
0
3626
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
2085
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
1
1202
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
0
926
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.