By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
449,422 Members | 1,263 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 449,422 IT Pros & Developers. It's quick & easy.

File Stream - Performance is getting slower and slower - why?

P: n/a
Hello. I am trying to split a file with 334,386 lines into seperate files of
50,000 each.

This is the code i am running:

Dim intFragmentRawIndex As Integer
Dim swRawDataFile As StreamWriter
Dim intNewRawIndex As Integer
Dim strNewRawDataFileName As String
Dim intFragmentCallCount As Integer = 0
Dim strHeaderLine As String
Dim blnFileClosed As Boolean = False

strHeaderLine = colRawDataFile(1)

CreateRawDataFragment(intParentRawIndex, intNewRawIndex,
strNewRawDataFileName)

Dim myFileStream As New
System.IO.FileStream(strTempDirectoryPath & strNewRawDataFileName, _

FileMode.OpenOrCreate, FileAccess.Write, FileShare.None)

swRawDataFile = New StreamWriter(myFileStream)

swRawDataFile.WriteLine(strHeaderLine)

For i As Integer = 2 To colRawDataFile.Count

If intFragmentCallCount = 50000 Then

'Clear Stream Writer Buffer
swRawDataFile.Flush()

'Close file
swRawDataFile.Close()

'Set Call Count against raw data file
SetFragmentCallCount(intNewRawIndex, intFragmentCallCount)

'Reset call count
intFragmentCallCount = 0

'If not on final line of raw data file....
If i <> colRawDataFile.Count Then

CreateRawDataFragment(intParentRawIndex,
intNewRawIndex, strNewRawDataFileName)

myFileStream = New
System.IO.FileStream(strTempDirectoryPath & strNewRawDataFileName, _

FileMode.OpenOrCreate, FileAccess.Write, FileShare.None)

swRawDataFile = New StreamWriter(myFileStream)

swRawDataFile.WriteLine(strHeaderLine)

Else

blnFileClosed = True

End If

End If

swRawDataFile.WriteLine(colRawDataFile(i))

intFragmentCallCount += 1

Next

If Not blnFileClosed Then

'Close last fragment
swRawDataFile.Close()

'Set call count against last fragment
SetFragmentCallCount(intNewRawIndex, intFragmentCallCount)

End If
The first file creates in 3 mins.
The second file creates in 11 minutes.
The third file creates in 18 minutes.
I am still waiting for the forth file to create.

I am writing the same number of records to each file, so why would the time
it takes to write the file of the same size take longer each time?

I thought that calling the flush method of the stream would maintain
performance but this does not seem to be the case! What am i doing wrong?
--
welcome to the mooon !
Apr 21 '06 #1
Share this Question
Share on Google+
10 Replies


P: n/a
What are these routines :

- SetFragmentCallCount
- CreateRawDataFragment
What is this variable :
colRawDataFile ...

??

Apr 21 '06 #2

P: n/a
>>SetFragmentCallCount
CreateRawDataFragment Don't worry about these - basic database operations
colRawDataFile
This is the key point in this routine - it's basically the entire file,
loaded into a collection - therefore, this collection has 334,386 entries in
it.

--
welcome to the mooon !
"olrt" wrote:
What are these routines :

- SetFragmentCallCount
- CreateRawDataFragment
What is this variable :
colRawDataFile ...

??

Apr 21 '06 #3

P: n/a
Ok well i thought i'd try a different approach, so what I'm now trying is
appending 50,000 lines from the collection to a stringbuilder, and then
writing that entire stringbuilder to a file.

However, look at this log:

21/04/2006 14:09:06: Building String Start
21/04/2006 14:09:14: appended 10,000 lines to the stringbuilder
21/04/2006 14:09:39: appended 10,000 lines to the stringbuilder
21/04/2006 14:10:20: appended 10,000 lines to the stringbuilder
21/04/2006 14:11:20: appended 10,000 lines to the stringbuilder
21/04/2006 14:12:36: appended 10,000 lines to the stringbuilder
21/04/2006 14:12:36: append of 50,000 lines to file from stringbuilder
complete
21/04/2006 14:12:36: Building String Start
21/04/2006 14:14:05: appended 10,000 lines to the stringbuilder
21/04/2006 14:16:00: appended 10,000 lines to the stringbuilder
21/04/2006 14:18:36: appended 10,000 lines to the stringbuilder
21/04/2006 14:21:18: appended 10,000 lines to the stringbuilder
21/04/2006 14:23:58: appended 10,000 lines to the stringbuilder
21/04/2006 14:23:59: append of 50,000 lines to file from stringbuilder
complete
21/04/2006 14:23:59: Building String Start

I clear the stringbuilder between appending to the file using this code:
sbFileContent = New StringBuilder

However, there's still obviously a big slow down, why is this?
--
welcome to the mooon !
"m00nm0nkey" wrote:
Hello. I am trying to split a file with 334,386 lines into seperate files of
50,000 each.

This is the code i am running:

Dim intFragmentRawIndex As Integer
Dim swRawDataFile As StreamWriter
Dim intNewRawIndex As Integer
Dim strNewRawDataFileName As String
Dim intFragmentCallCount As Integer = 0
Dim strHeaderLine As String
Dim blnFileClosed As Boolean = False

strHeaderLine = colRawDataFile(1)

CreateRawDataFragment(intParentRawIndex, intNewRawIndex,
strNewRawDataFileName)

Dim myFileStream As New
System.IO.FileStream(strTempDirectoryPath & strNewRawDataFileName, _

FileMode.OpenOrCreate, FileAccess.Write, FileShare.None)

swRawDataFile = New StreamWriter(myFileStream)

swRawDataFile.WriteLine(strHeaderLine)

For i As Integer = 2 To colRawDataFile.Count

If intFragmentCallCount = 50000 Then

'Clear Stream Writer Buffer
swRawDataFile.Flush()

'Close file
swRawDataFile.Close()

'Set Call Count against raw data file
SetFragmentCallCount(intNewRawIndex, intFragmentCallCount)

'Reset call count
intFragmentCallCount = 0

'If not on final line of raw data file....
If i <> colRawDataFile.Count Then

CreateRawDataFragment(intParentRawIndex,
intNewRawIndex, strNewRawDataFileName)

myFileStream = New
System.IO.FileStream(strTempDirectoryPath & strNewRawDataFileName, _

FileMode.OpenOrCreate, FileAccess.Write, FileShare.None)

swRawDataFile = New StreamWriter(myFileStream)

swRawDataFile.WriteLine(strHeaderLine)

Else

blnFileClosed = True

End If

End If

swRawDataFile.WriteLine(colRawDataFile(i))

intFragmentCallCount += 1

Next

If Not blnFileClosed Then

'Close last fragment
swRawDataFile.Close()

'Set call count against last fragment
SetFragmentCallCount(intNewRawIndex, intFragmentCallCount)

End If
The first file creates in 3 mins.
The second file creates in 11 minutes.
The third file creates in 18 minutes.
I am still waiting for the forth file to create.

I am writing the same number of records to each file, so why would the time
it takes to write the file of the same size take longer each time?

I thought that calling the flush method of the stream would maintain
performance but this does not seem to be the case! What am i doing wrong?
--
welcome to the mooon !

Apr 21 '06 #4

P: n/a
Not sure why you are using a StringBuilder to split a file into 3...

I would just read directly from the source file and would write to the
"current" file just switching to a new file when appropriate (just perhaps
playing with buffered streams to improve performance).

--
Patrice

"m00nm0nkey" <m0********@discussions.microsoft.com> a écrit dans le message
de news: 91**********************************@microsoft.com...
Ok well i thought i'd try a different approach, so what I'm now trying is
appending 50,000 lines from the collection to a stringbuilder, and then
writing that entire stringbuilder to a file.

However, look at this log:

21/04/2006 14:09:06: Building String Start
21/04/2006 14:09:14: appended 10,000 lines to the stringbuilder
21/04/2006 14:09:39: appended 10,000 lines to the stringbuilder
21/04/2006 14:10:20: appended 10,000 lines to the stringbuilder
21/04/2006 14:11:20: appended 10,000 lines to the stringbuilder
21/04/2006 14:12:36: appended 10,000 lines to the stringbuilder
21/04/2006 14:12:36: append of 50,000 lines to file from stringbuilder
complete
21/04/2006 14:12:36: Building String Start
21/04/2006 14:14:05: appended 10,000 lines to the stringbuilder
21/04/2006 14:16:00: appended 10,000 lines to the stringbuilder
21/04/2006 14:18:36: appended 10,000 lines to the stringbuilder
21/04/2006 14:21:18: appended 10,000 lines to the stringbuilder
21/04/2006 14:23:58: appended 10,000 lines to the stringbuilder
21/04/2006 14:23:59: append of 50,000 lines to file from stringbuilder
complete
21/04/2006 14:23:59: Building String Start

I clear the stringbuilder between appending to the file using this code:
sbFileContent = New StringBuilder

However, there's still obviously a big slow down, why is this?
--
welcome to the mooon !
"m00nm0nkey" wrote:
Hello. I am trying to split a file with 334,386 lines into seperate files
of
50,000 each.

This is the code i am running:

Dim intFragmentRawIndex As Integer
Dim swRawDataFile As StreamWriter
Dim intNewRawIndex As Integer
Dim strNewRawDataFileName As String
Dim intFragmentCallCount As Integer = 0
Dim strHeaderLine As String
Dim blnFileClosed As Boolean = False

strHeaderLine = colRawDataFile(1)

CreateRawDataFragment(intParentRawIndex, intNewRawIndex,
strNewRawDataFileName)

Dim myFileStream As New
System.IO.FileStream(strTempDirectoryPath & strNewRawDataFileName, _

FileMode.OpenOrCreate, FileAccess.Write, FileShare.None)

swRawDataFile = New StreamWriter(myFileStream)

swRawDataFile.WriteLine(strHeaderLine)

For i As Integer = 2 To colRawDataFile.Count

If intFragmentCallCount = 50000 Then

'Clear Stream Writer Buffer
swRawDataFile.Flush()

'Close file
swRawDataFile.Close()

'Set Call Count against raw data file
SetFragmentCallCount(intNewRawIndex,
intFragmentCallCount)

'Reset call count
intFragmentCallCount = 0

'If not on final line of raw data file....
If i <> colRawDataFile.Count Then

CreateRawDataFragment(intParentRawIndex,
intNewRawIndex, strNewRawDataFileName)

myFileStream = New
System.IO.FileStream(strTempDirectoryPath & strNewRawDataFileName, _

FileMode.OpenOrCreate, FileAccess.Write, FileShare.None)

swRawDataFile = New StreamWriter(myFileStream)

swRawDataFile.WriteLine(strHeaderLine)

Else

blnFileClosed = True

End If

End If

swRawDataFile.WriteLine(colRawDataFile(i))

intFragmentCallCount += 1

Next

If Not blnFileClosed Then

'Close last fragment
swRawDataFile.Close()

'Set call count against last fragment
SetFragmentCallCount(intNewRawIndex,
intFragmentCallCount)

End If
The first file creates in 3 mins.
The second file creates in 11 minutes.
The third file creates in 18 minutes.
I am still waiting for the forth file to create.

I am writing the same number of records to each file, so why would the
time
it takes to write the file of the same size take longer each time?

I thought that calling the flush method of the stream would maintain
performance but this does not seem to be the case! What am i doing wrong?
--
welcome to the mooon !

Apr 21 '06 #5

P: n/a
>>> colRawDataFile
This is the key point in this routine - it's basically the entire
file, loaded into a collection - therefore, this collection has
334,386 entries in it.


And why do you load the entire file into a collection??? This doesn't
make sense to me (but maybe I don't have a complete understanding of
your solution)... I would try an approach like this (pseudo code):

FileReader input = ...;
FileWriter output = ...;
string data;
while (input is not EOF)
{
if (i = 50000)
{
// close old output
// create new output
}

output.write(input.ReadLine());
}

no need to fetch all lines into memory (a collection).

hth
Markus
Apr 21 '06 #6

P: n/a
m00nm0nkey <m0********@discussions.microsoft.com> wrote:
SetFragmentCallCount
CreateRawDataFragment Don't worry about these - basic database operations
colRawDataFile

This is the key point in this routine - it's basically the entire file,
loaded into a collection - therefore, this collection has 334,386 entries in
it.


What kind of collection? If it's some kind of linked list, it would get
horribly slow.

Have you tried removing pieces of the routine (such as the database
operations) and seeing whether that makes a difference?

If this doesn't help, could you post a short but complete program which
demonstrates the problem?

See http://www.pobox.com/~skeet/csharp/complete.html for details of
what I mean by that. (Ignore the fact that it talks about C# - the same
can be done in VB.NET easily.)

If the database calls aren't the problem, then stripping those out to
produce a short but complete program shouldn't be an issue, and you can
generate random strings to put into the collection.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too
Apr 21 '06 #7

P: n/a


m00nm0nkey wrote:
Hello. I am trying to split a file with 334,386 lines into seperate files of
50,000 each.


How about something along the lines of: read lines, one by one, write
them to current output. Every N lines open a new output.

void SplitIntoFiles(TextReader r, ulong limit, string nameFormat) {
ulong count = 0;
TextWriter w = null;
try {
for ( l = r.ReadLine(); l != null; l = r.ReadLine() ) {
if ( count % limit == 0 ) {
if ( w != null )
w.Dispose();
w = new TextWriter(string.Format(format, count);
}
++count;
w.WriteLine(l);
}
} finally {
if ( w != null )
w.Dispose();
}
}
SplitIntoFiles(new TextReader(input_path), 50000, input_path + ".{0}");

I haven't compiled the code, but you should be able to get the idea.

Note that the code above will add a newline to the end of the last
output-file, even if none was present in input_path. Also, the code will
not work as expected on files longer than ulong.Max lines.

--
Helge Jensen
mailto:he**********@slog.dk
sip:he**********@slog.dk
-=> Sebastian cover-music: http://ungdomshus.nu <=-
Apr 21 '06 #8

P: n/a
Helge Jensen <he**********@slog.dk> wrote:
Also, the code will not work as expected on files longer than
ulong.Max lines.


When you find a disk capable of storing a file with
18,446,744,073,709,551,615 lines, let me know :) At one byte per line,
that's still 16 exabytes.

http://en.wikipedia.org/wiki/Exabyte has some interesting stats on
exabytes.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too
Apr 21 '06 #9

P: n/a


Jon Skeet [C# MVP] wrote:
Helge Jensen <he**********@slog.dk> wrote:

Also, the code will not work as expected on files longer than
ulong.Max lines.


When you find a disk capable of storing a file with
18,446,744,073,709,551,615 lines, let me know :) At one byte per line,
that's still 16 exabytes.


It's not that I'm concerned about it, it just happens to be so :)

Since some streams are infinite (or atleast supposed infinite) and line
oriented, it makes sense to just note the fact that there is a limit on
the expected behaviour.

The code could be rewritten to work on arbitrary-length input (provided
the FS allows arbitraty length paths) but I don't think it's worth the
effort, and it's nice to have the line-offset in the file-name so...

--
Helge Jensen
mailto:he**********@slog.dk
sip:he**********@slog.dk
-=> Sebastian cover-music: http://ungdomshus.nu <=-
Apr 22 '06 #10

P: n/a
I am struggling with the same sort of problem. The program I'm building
needs to split a file containing up to 100.000 xml records. I need to
split those into 2 or 3 seperate files. I'm using different
XmlTextWriters to create those files, and clean them up, and release the
resources used by them.

Somehow, the writing of the files slows down with time proceeding, and I
cannot find a solution for this problem..

Code and pseudocode on request ;)

*** Sent via Developersdex http://www.developersdex.com ***
May 4 '06 #11

This discussion thread is closed

Replies have been disabled for this discussion.