473,387 Members | 1,899 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,387 software developers and data experts.

Suggestions to reduce memory use when splitting a string

Good Day,

I have written and utility to convert our DOS COBOL data files to a SQL
Server database. Part of the process requires parsing each line into a
sql statement and validting the data to keep the integrity of the
database. We are parsing roughl 81 files and range in size 1 kb to 65
MB files (Average of 400,000 lines in the larger files).

I have written this utility with VB.NET 2003 and when I parse all of
the files I run out of memory. The following functions seems to be the
main source of my leak. Any help optimizing this code is appreciated.

Public Function SplitDelimitedLine(ByVal CurrentLine As StringBuilder,
_
ByRef SplitString() As String) As Boolean
'7-25-2005
'BJK
'
'--removed the use of Char replaced with: CurrentLine.Chars(i)
'
'---------------------------------------
Dim i As Integer
Dim CountDelimiter As Boolean
Dim Total As Integer
Dim lbResult As Boolean
Dim Section As New StringBuilder
Dim liLen As Integer
Dim liCommaPos As Integer
Dim liDQuotePos As Integer

Try
'We want to count the delimiter unless it is within the
text qualifier
CountDelimiter = True
Total = 0
liLen = CurrentLine.Length - 1
For i = 0 To liLen
Select Case CurrentLine.Chars(i)
Case gsDoubleQoute
If CountDelimiter Then
CountDelimiter = False
Else
CountDelimiter = True
End If
Case gsComma
If CountDelimiter Then
' Add current section to collection
SplitString(Total) = Section.ToString.Trim
Section = Nothing
Section = New StringBuilder
Total = Total + 1
Else
Section.Append(CurrentLine.Chars(i))
End If
Case Else
Section.Append(CurrentLine.Chars(i))
End Select
Next
' Get the last field - as most files will not have an
ending delimiter
If CountDelimiter Then
' Add current section to collection
SplitString(Total) = Section.ToString
End If
lbResult = True
Catch ex As Exception
ps_LastErrSource = ex.Source
ps_LastErrDesc = ex.ToString
lbResult = False
Dim loSB As New StringBuilder(ps_LastErrDesc)
UpdateLog(loSB)
End Try
Return lbResult
End Function

This function is stored in a class and is called from other function
within this class.
Thanks
Brian

Apr 3 '06 #1
10 2676
hi Brian,

I would do that using a buffer. You open a StreamWriter and when the
processed text (which you can store in a StringBuilder (sb) ) reaches a
given size, for instance 2MB (or whatever you like, the sb holds more
than 2GB), you can flush it to disk an start again with an empty
buffer. To empty a sb it's sufficient to set the length to 0. When
input ends, you flush to disk what remains in the sb. Remember to close
the StreamWriter.

-t

klineb ha scritto:
Good Day,

I have written and utility to convert our DOS COBOL data files to a SQL
Server database. Part of the process requires parsing each line into a
sql statement and validting the data to keep the integrity of the
database. We are parsing roughl 81 files and range in size 1 kb to 65
MB files (Average of 400,000 lines in the larger files).

I have written this utility with VB.NET 2003 and when I parse all of
the files I run out of memory. The following functions seems to be the
main source of my leak. Any help optimizing this code is appreciated.

Public Function SplitDelimitedLine(ByVal CurrentLine As StringBuilder,
_
ByRef SplitString() As String) As Boolean
'7-25-2005
'BJK
'
'--removed the use of Char replaced with: CurrentLine.Chars(i)
'
'---------------------------------------
Dim i As Integer
Dim CountDelimiter As Boolean
Dim Total As Integer
Dim lbResult As Boolean
Dim Section As New StringBuilder
Dim liLen As Integer
Dim liCommaPos As Integer
Dim liDQuotePos As Integer

Try
'We want to count the delimiter unless it is within the
text qualifier
CountDelimiter = True
Total = 0
liLen = CurrentLine.Length - 1
For i = 0 To liLen
Select Case CurrentLine.Chars(i)
Case gsDoubleQoute
If CountDelimiter Then
CountDelimiter = False
Else
CountDelimiter = True
End If
Case gsComma
If CountDelimiter Then
' Add current section to collection
SplitString(Total) = Section.ToString.Trim
Section = Nothing
Section = New StringBuilder
Total = Total + 1
Else
Section.Append(CurrentLine.Chars(i))
End If
Case Else
Section.Append(CurrentLine.Chars(i))
End Select
Next
' Get the last field - as most files will not have an
ending delimiter
If CountDelimiter Then
' Add current section to collection
SplitString(Total) = Section.ToString
End If
lbResult = True
Catch ex As Exception
ps_LastErrSource = ex.Source
ps_LastErrDesc = ex.ToString
lbResult = False
Dim loSB As New StringBuilder(ps_LastErrDesc)
UpdateLog(loSB)
End Try
Return lbResult
End Function

This function is stored in a class and is called from other function
within this class.
Thanks
Brian


Apr 3 '06 #2
Have you thought about regular expressions to parse the fields out of each
line? Going character by character seems really inefficient. At the very
last, use functions like IndexOf, to find your delimiters, and parse out the
pieces that way.

"klineb" <br********@hotmail.com> wrote in message
news:11**********************@g10g2000cwb.googlegr oups.com...
Good Day,

I have written and utility to convert our DOS COBOL data files to a SQL
Server database. Part of the process requires parsing each line into a
sql statement and validting the data to keep the integrity of the
database. We are parsing roughl 81 files and range in size 1 kb to 65
MB files (Average of 400,000 lines in the larger files).

I have written this utility with VB.NET 2003 and when I parse all of
the files I run out of memory. The following functions seems to be the
main source of my leak. Any help optimizing this code is appreciated.

Public Function SplitDelimitedLine(ByVal CurrentLine As StringBuilder,
_
ByRef SplitString() As String) As Boolean
'7-25-2005
'BJK
'
'--removed the use of Char replaced with: CurrentLine.Chars(i)
'
'---------------------------------------
Dim i As Integer
Dim CountDelimiter As Boolean
Dim Total As Integer
Dim lbResult As Boolean
Dim Section As New StringBuilder
Dim liLen As Integer
Dim liCommaPos As Integer
Dim liDQuotePos As Integer

Try
'We want to count the delimiter unless it is within the
text qualifier
CountDelimiter = True
Total = 0
liLen = CurrentLine.Length - 1
For i = 0 To liLen
Select Case CurrentLine.Chars(i)
Case gsDoubleQoute
If CountDelimiter Then
CountDelimiter = False
Else
CountDelimiter = True
End If
Case gsComma
If CountDelimiter Then
' Add current section to collection
SplitString(Total) = Section.ToString.Trim
Section = Nothing
Section = New StringBuilder
Total = Total + 1
Else
Section.Append(CurrentLine.Chars(i))
End If
Case Else
Section.Append(CurrentLine.Chars(i))
End Select
Next
' Get the last field - as most files will not have an
ending delimiter
If CountDelimiter Then
' Add current section to collection
SplitString(Total) = Section.ToString
End If
lbResult = True
Catch ex As Exception
ps_LastErrSource = ex.Source
ps_LastErrDesc = ex.ToString
lbResult = False
Dim loSB As New StringBuilder(ps_LastErrDesc)
UpdateLog(loSB)
End Try
Return lbResult
End Function

This function is stored in a class and is called from other function
within this class.
Thanks
Brian

Apr 3 '06 #3

"klineb" <br********@hotmail.com> wrote in message
news:11**********************@g10g2000cwb.googlegr oups.com...
Good Day,

I have written and utility to convert our DOS COBOL data files to a SQL
Server database. Part of the process requires parsing each line into a
sql statement and validting the data to keep the integrity of the
database. We are parsing roughl 81 files and range in size 1 kb to 65
MB files (Average of 400,000 lines in the larger files).

I have written this utility with VB.NET 2003 and when I parse all of
the files I run out of memory. The following functions seems to be the
main source of my leak. Any help optimizing this code is appreciated.


There are a few tweaks you can apply here to reduce memory utilization (in
particular you can perhaps reuse the StringBuilders). However there's
nothing obvious and terrible here. In short this function should not cause
an out of memory error.

David
Apr 3 '06 #4
klineb wrote:
Good Day,

I have written and utility to convert our DOS COBOL data files to a SQL
Server database. Part of the process requires parsing each line into a
sql statement and validting the data to keep the integrity of the
database. We are parsing roughl 81 files and range in size 1 kb to 65
MB files (Average of 400,000 lines in the larger files).

I have written this utility with VB.NET 2003 and when I parse all of
the files I run out of memory. The following functions seems to be the
main source of my leak. Any help optimizing this code is appreciated.

Public Function SplitDelimitedLine(ByVal CurrentLine As StringBuilder,
_
ByRef SplitString() As String) As Boolean
'7-25-2005
'BJK
'
'--removed the use of Char replaced with: CurrentLine.Chars(i)
'
'---------------------------------------
Dim i As Integer
Dim CountDelimiter As Boolean
Dim Total As Integer
Dim lbResult As Boolean
Dim Section As New StringBuilder
Dim liLen As Integer
Dim liCommaPos As Integer
Dim liDQuotePos As Integer

Try
'We want to count the delimiter unless it is within the
text qualifier
CountDelimiter = True
Total = 0
liLen = CurrentLine.Length - 1
For i = 0 To liLen
Select Case CurrentLine.Chars(i)
Case gsDoubleQoute
If CountDelimiter Then
CountDelimiter = False
Else
CountDelimiter = True
End If
Case gsComma
If CountDelimiter Then
' Add current section to collection
SplitString(Total) = Section.ToString.Trim
Section = Nothing
Section = New StringBuilder
Total = Total + 1
Else
Section.Append(CurrentLine.Chars(i))
End If
Case Else
Section.Append(CurrentLine.Chars(i))
End Select
Next
' Get the last field - as most files will not have an
ending delimiter
If CountDelimiter Then
' Add current section to collection
SplitString(Total) = Section.ToString
End If
lbResult = True
Catch ex As Exception
ps_LastErrSource = ex.Source
ps_LastErrDesc = ex.ToString
lbResult = False
Dim loSB As New StringBuilder(ps_LastErrDesc)
UpdateLog(loSB)
End Try
Return lbResult
End Function

This function is stored in a class and is called from other function
within this class.
Thanks
Brian

It looks to me like your field separator is always a comma.
So why not just use SplitString = split(CurrentLine,","), then use
Replace on each string in the resultant array if you want to get rid of
the quotes. I think this will be much faster than going char by char.
The parameter SplitString will have to be declared as byRef SplitString
as Array, rather than SplitString() as String.

Tom
Apr 3 '06 #5
Actually, arrays are reference types. So modifying the contents of the
array, will work just fine as far as filling it. It doesn't need to be
ByRef.

"tomb" <to**@technetcenter.com> wrote in message
news:8h******************@bignews6.bellsouth.net.. .
klineb wrote:
Good Day,

I have written and utility to convert our DOS COBOL data files to a SQL
Server database. Part of the process requires parsing each line into a
sql statement and validting the data to keep the integrity of the
database. We are parsing roughl 81 files and range in size 1 kb to 65
MB files (Average of 400,000 lines in the larger files).

I have written this utility with VB.NET 2003 and when I parse all of
the files I run out of memory. The following functions seems to be the
main source of my leak. Any help optimizing this code is appreciated.

Public Function SplitDelimitedLine(ByVal CurrentLine As StringBuilder,
_
ByRef SplitString() As String) As Boolean
'7-25-2005
'BJK
'
'--removed the use of Char replaced with: CurrentLine.Chars(i)
'
'---------------------------------------
Dim i As Integer
Dim CountDelimiter As Boolean
Dim Total As Integer
Dim lbResult As Boolean
Dim Section As New StringBuilder
Dim liLen As Integer
Dim liCommaPos As Integer
Dim liDQuotePos As Integer

Try
'We want to count the delimiter unless it is within the
text qualifier
CountDelimiter = True
Total = 0
liLen = CurrentLine.Length - 1
For i = 0 To liLen
Select Case CurrentLine.Chars(i)
Case gsDoubleQoute
If CountDelimiter Then
CountDelimiter = False
Else
CountDelimiter = True
End If
Case gsComma
If CountDelimiter Then
' Add current section to collection
SplitString(Total) = Section.ToString.Trim
Section = Nothing
Section = New StringBuilder
Total = Total + 1
Else
Section.Append(CurrentLine.Chars(i))
End If
Case Else
Section.Append(CurrentLine.Chars(i))
End Select
Next
' Get the last field - as most files will not have an
ending delimiter
If CountDelimiter Then
' Add current section to collection
SplitString(Total) = Section.ToString
End If
lbResult = True
Catch ex As Exception
ps_LastErrSource = ex.Source
ps_LastErrDesc = ex.ToString
lbResult = False
Dim loSB As New StringBuilder(ps_LastErrDesc)
UpdateLog(loSB)
End Try
Return lbResult
End Function

This function is stored in a class and is called from other function
within this class.
Thanks
Brian

It looks to me like your field separator is always a comma. So why not
just use SplitString = split(CurrentLine,","), then use Replace on each
string in the resultant array if you want to get rid of the quotes. I
think this will be much faster than going char by char.
The parameter SplitString will have to be declared as byRef SplitString
as Array, rather than SplitString() as String.

Tom

Apr 3 '06 #6
Tom,

Some of the fields are comment field that contain comma.

Ex. 1,2,"This is some, sample text",19.90,

Using SplitSting will not handle this.

Apr 4 '06 #7
Try this:

Public Function SplitDelimitedLine(ByVal CurrentLine As String) As
String()

Try
Dim _wl As String = String.Empty ' work string
' Create a local copy of CurrentLine
' We don't really need to but I have for the sake of clarity
' The 3 Trim's peel of any extraneous whitespace then any leading
and/or trailing commas and then any extraneous whitespace thet might have
been any leading and/or trailing commas that might have been present
Dim _cl As String = CurrentLine.Trim.Trim(","c).Trim
' Find the first " character
Dim _pos As Integer = _cl.IndexOf(""""c)
' If _pos = -1 then there weren't any
' Loop until there are no more " characters
While _pos > -1
' Append every thing before the first " character to the work string
_wl &= _cl.Substring(0, _pos)

' Remove every thing before the first " character from the local
copy
_cl = _cl.Remove(0, _pos)
' Find the next " character
' Note that we start the find from the 2nd position because we know
that there is a " character in position 1 (index 0)
_pos = _cl.IndexOf(""""c, 1)
If _pos > -1 Then
' If we find one then we append every thing from the first " to
the 2nd " inclusive to the work string, replacing any commas in the
substring with a tilde and remove the same number of characters from the
local copy
_wl &= _cl.Substring(0, _pos + 1).Replace(","c, "~"c)
_cl = _cl.Remove(0, _pos + 1)

' Find the next " character
' Note that we are now back to finding from the beginning of what
is left of the local copy
_pos = _cl.IndexOf(""""c)
End If
End While
' There are no moe " characters so append the remainder of the local
copy to the work string
_wl &= _cl
' Split the work string using the comma as the delimiter
Dim _ss As String() = _wl.Replace("""", String.Empty).Split(","c)
' Work through the elements of the array
For _i As Integer = 0 To _ss.Length - 1
If the element contains any tilde characters then replace them with
commas
If _ss(_i).IndexOf("~"c) > -1 Then _ss(_i) = _ss(_i).Replace("~"c,
","c)
'Trim any whitespaces from the element
_ss(_i) = _ss(_i).Trim
Next
' Return the string array
Return _ss
Catch
' We hit a problem of some description so return Nothing (null)
Return Nothing
End Try

End Function

Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As
System.EventArgs) Handles Button1.Click

' Measure the time for 1 million calls to the function

Dim _start As DateTime = DateTime.Now

For _i As Integer = 1 To 1000000
Dim _ss() As String = SplitDelimitedLine("1,2,""This is some, sample
text"",19.90,")
Next

Console.WriteLine(DateTime.Now.Subtract(_start).To talSeconds)

End Sub

On my machine it takes 3.281166 seconds and I do not see any significant
impact on the memory resources.
"klineb" <br********@hotmail.com> wrote in message
news:11*********************@i39g2000cwa.googlegro ups.com...
Tom,

Some of the fields are comment field that contain comma.

Ex. 1,2,"This is some, sample text",19.90,

Using SplitSting will not handle this.

Apr 4 '06 #8
klineb wrote:
<snip>
I have written this utility with VB.NET 2003 and when I parse all of
the files I run out of memory. The following functions seems to be the
main source of my leak. Any help optimizing this code is appreciated.

Public Function SplitDelimitedLine(ByVal CurrentLine As StringBuilder,
_
ByRef SplitString() As String) As Boolean

<snip, snip, snip>

I don't know if this is of any help, but you really don't need a
StringBuilder to "capture" the String slices. Whenever you find the
delimiter, you just need to know where the slice begins.

Disclaimer: I *didn't* test the code bellow

<AirCode>
Function SplitDelimitedLine( _
CurrentLine As StringBuilder, _
SplitString() As String _
) As Boolean

Dim Text As String = CurrentLine.ToString
Dim Max As Integer = Text.Length - 1
Dim SliceStart As Integer
DIm Total As Integer

For Index As Integer = 0 To Max

Dim IgnoreComma As Boolean

Select Case Text(Index)
Case gsDoubleQuote
IgnoreComma = Not IgnoreComma
Case gsComma
If Not IgnoreComma Then
Dim Count As Integer = Index - SliceStart
SplitString(Total) = Text.Substring(SliceStart, Count).Trim
SliceStart = Index + 1
Total += 1
End If
End Select
Next

If SliceStart <= Max Then
Dim Count As Integer = Max - SliceStart + 1
SplitString(Total) = Text.Substring(SliceStart, Count).Trim
End If

Return True
</AirCode>

Regards,

Branco

Apr 4 '06 #9
Brian,

I cannot imagen that this routine takes much memory. And if it would do,
what so ever.

It is a one time operation. The last thing you would think about is in my
opinion the amount of memory you use. Even if you have not enough than add
it. As I assume that you live in an North Atlantic Country, than one hour
thinking of the problem will cost probably more than 1Gb.

So you can only become in problem as you go over the 800Mb.

Just my thought,

Cor
Apr 4 '06 #10
Thank you everybody for your suggestions.

This really is not a one time issue. We still have 60 DOS clients to
convert. All of the clients are running different hardware so buying
memory is not really an option. I will try all of you suggestions and
post my results back.

Does anyone know if running some of the class in a seperate process
would help any?

Thanks again for all of the feedback.

Brian

Apr 13 '06 #11

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

226
by: Stephen C. Waterbury | last post by:
This seems like it ought to work, according to the description of reduce(), but it doesn't. Is this a bug, or am I missing something? Python 2.3.2 (#1, Oct 20 2003, 01:04:35) on linux2 Type...
193
by: Michael B. | last post by:
I was just thinking about this, specifically wondering if there's any features that the C specification currently lacks, and which may be included in some future standardization. Of course, I...
5
by: fatted | last post by:
I'm trying to write a function which splits a string (possibly multiple times) on a particular character and returns the strings which has been split. What I have below is kind of (oh dear!)...
10
by: Segfahlt | last post by:
I have a fairly simple C# program that just needs to open up a fixed width file, convert each record to tab delimited and append a field to the end of it. The input files are between 300M and...
3
by: Ricardo Q.G. | last post by:
in production environment we started to have a memory consumption problem at "asp.net wp". after a hard work we have discovered that Session.Abandon() does not reduce references to objects added to...
37
by: Ajai Jose | last post by:
Hi , I work on an ARM processor based embedded system. Code is mainly in C language. The project has a huge source base. We are supposed to optimise it. Most datastructures are declared as static...
7
by: william | last post by:
My question is: Specific memory block where my pointer pointing to changed strangely, seemingly that no statement changed it. Here are two examples I got: ***********1***************** I was...
3
Banfa
by: Banfa | last post by:
The project I work on has a bespoke hardware platform which is designed to go into a variety of different situations. However to keep things simple we really want the software for the platform to...
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.