473,657 Members | 2,456 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

File Stream - Performance is getting slower and slower - why?

Hello. I am trying to split a file with 334,386 lines into seperate files of
50,000 each.

This is the code i am running:

Dim intFragmentRawI ndex As Integer
Dim swRawDataFile As StreamWriter
Dim intNewRawIndex As Integer
Dim strNewRawDataFi leName As String
Dim intFragmentCall Count As Integer = 0
Dim strHeaderLine As String
Dim blnFileClosed As Boolean = False

strHeaderLine = colRawDataFile( 1)

CreateRawDataFr agment(intParen tRawIndex, intNewRawIndex,
strNewRawDataFi leName)

Dim myFileStream As New
System.IO.FileS tream(strTempDi rectoryPath & strNewRawDataFi leName, _

FileMode.OpenOr Create, FileAccess.Writ e, FileShare.None)

swRawDataFile = New StreamWriter(my FileStream)

swRawDataFile.W riteLine(strHea derLine)

For i As Integer = 2 To colRawDataFile. Count

If intFragmentCall Count = 50000 Then

'Clear Stream Writer Buffer
swRawDataFile.F lush()

'Close file
swRawDataFile.C lose()

'Set Call Count against raw data file
SetFragmentCall Count(intNewRaw Index, intFragmentCall Count)

'Reset call count
intFragmentCall Count = 0

'If not on final line of raw data file....
If i <> colRawDataFile. Count Then

CreateRawDataFr agment(intParen tRawIndex,
intNewRawIndex, strNewRawDataFi leName)

myFileStream = New
System.IO.FileS tream(strTempDi rectoryPath & strNewRawDataFi leName, _

FileMode.OpenOr Create, FileAccess.Writ e, FileShare.None)

swRawDataFile = New StreamWriter(my FileStream)

swRawDataFile.W riteLine(strHea derLine)

Else

blnFileClosed = True

End If

End If

swRawDataFile.W riteLine(colRaw DataFile(i))

intFragmentCall Count += 1

Next

If Not blnFileClosed Then

'Close last fragment
swRawDataFile.C lose()

'Set call count against last fragment
SetFragmentCall Count(intNewRaw Index, intFragmentCall Count)

End If
The first file creates in 3 mins.
The second file creates in 11 minutes.
The third file creates in 18 minutes.
I am still waiting for the forth file to create.

I am writing the same number of records to each file, so why would the time
it takes to write the file of the same size take longer each time?

I thought that calling the flush method of the stream would maintain
performance but this does not seem to be the case! What am i doing wrong?
--
welcome to the mooon !
Apr 21 '06 #1
10 3151
What are these routines :

- SetFragmentCall Count
- CreateRawDataFr agment
What is this variable :
colRawDataFile ...

??

Apr 21 '06 #2
>>SetFragmentCa llCount
CreateRawData Fragment Don't worry about these - basic database operations
colRawDataFil e
This is the key point in this routine - it's basically the entire file,
loaded into a collection - therefore, this collection has 334,386 entries in
it.

--
welcome to the mooon !
"olrt" wrote:
What are these routines :

- SetFragmentCall Count
- CreateRawDataFr agment
What is this variable :
colRawDataFile ...

??

Apr 21 '06 #3
Ok well i thought i'd try a different approach, so what I'm now trying is
appending 50,000 lines from the collection to a stringbuilder, and then
writing that entire stringbuilder to a file.

However, look at this log:

21/04/2006 14:09:06: Building String Start
21/04/2006 14:09:14: appended 10,000 lines to the stringbuilder
21/04/2006 14:09:39: appended 10,000 lines to the stringbuilder
21/04/2006 14:10:20: appended 10,000 lines to the stringbuilder
21/04/2006 14:11:20: appended 10,000 lines to the stringbuilder
21/04/2006 14:12:36: appended 10,000 lines to the stringbuilder
21/04/2006 14:12:36: append of 50,000 lines to file from stringbuilder
complete
21/04/2006 14:12:36: Building String Start
21/04/2006 14:14:05: appended 10,000 lines to the stringbuilder
21/04/2006 14:16:00: appended 10,000 lines to the stringbuilder
21/04/2006 14:18:36: appended 10,000 lines to the stringbuilder
21/04/2006 14:21:18: appended 10,000 lines to the stringbuilder
21/04/2006 14:23:58: appended 10,000 lines to the stringbuilder
21/04/2006 14:23:59: append of 50,000 lines to file from stringbuilder
complete
21/04/2006 14:23:59: Building String Start

I clear the stringbuilder between appending to the file using this code:
sbFileContent = New StringBuilder

However, there's still obviously a big slow down, why is this?
--
welcome to the mooon !
"m00nm0nkey " wrote:
Hello. I am trying to split a file with 334,386 lines into seperate files of
50,000 each.

This is the code i am running:

Dim intFragmentRawI ndex As Integer
Dim swRawDataFile As StreamWriter
Dim intNewRawIndex As Integer
Dim strNewRawDataFi leName As String
Dim intFragmentCall Count As Integer = 0
Dim strHeaderLine As String
Dim blnFileClosed As Boolean = False

strHeaderLine = colRawDataFile( 1)

CreateRawDataFr agment(intParen tRawIndex, intNewRawIndex,
strNewRawDataFi leName)

Dim myFileStream As New
System.IO.FileS tream(strTempDi rectoryPath & strNewRawDataFi leName, _

FileMode.OpenOr Create, FileAccess.Writ e, FileShare.None)

swRawDataFile = New StreamWriter(my FileStream)

swRawDataFile.W riteLine(strHea derLine)

For i As Integer = 2 To colRawDataFile. Count

If intFragmentCall Count = 50000 Then

'Clear Stream Writer Buffer
swRawDataFile.F lush()

'Close file
swRawDataFile.C lose()

'Set Call Count against raw data file
SetFragmentCall Count(intNewRaw Index, intFragmentCall Count)

'Reset call count
intFragmentCall Count = 0

'If not on final line of raw data file....
If i <> colRawDataFile. Count Then

CreateRawDataFr agment(intParen tRawIndex,
intNewRawIndex, strNewRawDataFi leName)

myFileStream = New
System.IO.FileS tream(strTempDi rectoryPath & strNewRawDataFi leName, _

FileMode.OpenOr Create, FileAccess.Writ e, FileShare.None)

swRawDataFile = New StreamWriter(my FileStream)

swRawDataFile.W riteLine(strHea derLine)

Else

blnFileClosed = True

End If

End If

swRawDataFile.W riteLine(colRaw DataFile(i))

intFragmentCall Count += 1

Next

If Not blnFileClosed Then

'Close last fragment
swRawDataFile.C lose()

'Set call count against last fragment
SetFragmentCall Count(intNewRaw Index, intFragmentCall Count)

End If
The first file creates in 3 mins.
The second file creates in 11 minutes.
The third file creates in 18 minutes.
I am still waiting for the forth file to create.

I am writing the same number of records to each file, so why would the time
it takes to write the file of the same size take longer each time?

I thought that calling the flush method of the stream would maintain
performance but this does not seem to be the case! What am i doing wrong?
--
welcome to the mooon !

Apr 21 '06 #4
Not sure why you are using a StringBuilder to split a file into 3...

I would just read directly from the source file and would write to the
"current" file just switching to a new file when appropriate (just perhaps
playing with buffered streams to improve performance).

--
Patrice

"m00nm0nkey " <m0********@dis cussions.micros oft.com> a écrit dans le message
de news: 91************* *************** **...icrosof t.com...
Ok well i thought i'd try a different approach, so what I'm now trying is
appending 50,000 lines from the collection to a stringbuilder, and then
writing that entire stringbuilder to a file.

However, look at this log:

21/04/2006 14:09:06: Building String Start
21/04/2006 14:09:14: appended 10,000 lines to the stringbuilder
21/04/2006 14:09:39: appended 10,000 lines to the stringbuilder
21/04/2006 14:10:20: appended 10,000 lines to the stringbuilder
21/04/2006 14:11:20: appended 10,000 lines to the stringbuilder
21/04/2006 14:12:36: appended 10,000 lines to the stringbuilder
21/04/2006 14:12:36: append of 50,000 lines to file from stringbuilder
complete
21/04/2006 14:12:36: Building String Start
21/04/2006 14:14:05: appended 10,000 lines to the stringbuilder
21/04/2006 14:16:00: appended 10,000 lines to the stringbuilder
21/04/2006 14:18:36: appended 10,000 lines to the stringbuilder
21/04/2006 14:21:18: appended 10,000 lines to the stringbuilder
21/04/2006 14:23:58: appended 10,000 lines to the stringbuilder
21/04/2006 14:23:59: append of 50,000 lines to file from stringbuilder
complete
21/04/2006 14:23:59: Building String Start

I clear the stringbuilder between appending to the file using this code:
sbFileContent = New StringBuilder

However, there's still obviously a big slow down, why is this?
--
welcome to the mooon !
"m00nm0nkey " wrote:
Hello. I am trying to split a file with 334,386 lines into seperate files
of
50,000 each.

This is the code i am running:

Dim intFragmentRawI ndex As Integer
Dim swRawDataFile As StreamWriter
Dim intNewRawIndex As Integer
Dim strNewRawDataFi leName As String
Dim intFragmentCall Count As Integer = 0
Dim strHeaderLine As String
Dim blnFileClosed As Boolean = False

strHeaderLine = colRawDataFile( 1)

CreateRawDataFr agment(intParen tRawIndex, intNewRawIndex,
strNewRawDataFi leName)

Dim myFileStream As New
System.IO.FileS tream(strTempDi rectoryPath & strNewRawDataFi leName, _

FileMode.OpenOr Create, FileAccess.Writ e, FileShare.None)

swRawDataFile = New StreamWriter(my FileStream)

swRawDataFile.W riteLine(strHea derLine)

For i As Integer = 2 To colRawDataFile. Count

If intFragmentCall Count = 50000 Then

'Clear Stream Writer Buffer
swRawDataFile.F lush()

'Close file
swRawDataFile.C lose()

'Set Call Count against raw data file
SetFragmentCall Count(intNewRaw Index,
intFragmentCall Count)

'Reset call count
intFragmentCall Count = 0

'If not on final line of raw data file....
If i <> colRawDataFile. Count Then

CreateRawDataFr agment(intParen tRawIndex,
intNewRawIndex, strNewRawDataFi leName)

myFileStream = New
System.IO.FileS tream(strTempDi rectoryPath & strNewRawDataFi leName, _

FileMode.OpenOr Create, FileAccess.Writ e, FileShare.None)

swRawDataFile = New StreamWriter(my FileStream)

swRawDataFile.W riteLine(strHea derLine)

Else

blnFileClosed = True

End If

End If

swRawDataFile.W riteLine(colRaw DataFile(i))

intFragmentCall Count += 1

Next

If Not blnFileClosed Then

'Close last fragment
swRawDataFile.C lose()

'Set call count against last fragment
SetFragmentCall Count(intNewRaw Index,
intFragmentCall Count)

End If
The first file creates in 3 mins.
The second file creates in 11 minutes.
The third file creates in 18 minutes.
I am still waiting for the forth file to create.

I am writing the same number of records to each file, so why would the
time
it takes to write the file of the same size take longer each time?

I thought that calling the flush method of the stream would maintain
performance but this does not seem to be the case! What am i doing wrong?
--
welcome to the mooon !

Apr 21 '06 #5
>>> colRawDataFile
This is the key point in this routine - it's basically the entire
file, loaded into a collection - therefore, this collection has
334,386 entries in it.


And why do you load the entire file into a collection??? This doesn't
make sense to me (but maybe I don't have a complete understanding of
your solution)... I would try an approach like this (pseudo code):

FileReader input = ...;
FileWriter output = ...;
string data;
while (input is not EOF)
{
if (i = 50000)
{
// close old output
// create new output
}

output.write(in put.ReadLine()) ;
}

no need to fetch all lines into memory (a collection).

hth
Markus
Apr 21 '06 #6
m00nm0nkey <m0********@dis cussions.micros oft.com> wrote:
SetFragmentCa llCount
CreateRawData Fragment Don't worry about these - basic database operations
colRawDataFil e

This is the key point in this routine - it's basically the entire file,
loaded into a collection - therefore, this collection has 334,386 entries in
it.


What kind of collection? If it's some kind of linked list, it would get
horribly slow.

Have you tried removing pieces of the routine (such as the database
operations) and seeing whether that makes a difference?

If this doesn't help, could you post a short but complete program which
demonstrates the problem?

See http://www.pobox.com/~skeet/csharp/complete.html for details of
what I mean by that. (Ignore the fact that it talks about C# - the same
can be done in VB.NET easily.)

If the database calls aren't the problem, then stripping those out to
produce a short but complete program shouldn't be an issue, and you can
generate random strings to put into the collection.

--
Jon Skeet - <sk***@pobox.co m>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too
Apr 21 '06 #7


m00nm0nkey wrote:
Hello. I am trying to split a file with 334,386 lines into seperate files of
50,000 each.


How about something along the lines of: read lines, one by one, write
them to current output. Every N lines open a new output.

void SplitIntoFiles( TextReader r, ulong limit, string nameFormat) {
ulong count = 0;
TextWriter w = null;
try {
for ( l = r.ReadLine(); l != null; l = r.ReadLine() ) {
if ( count % limit == 0 ) {
if ( w != null )
w.Dispose();
w = new TextWriter(stri ng.Format(forma t, count);
}
++count;
w.WriteLine(l);
}
} finally {
if ( w != null )
w.Dispose();
}
}
SplitIntoFiles( new TextReader(inpu t_path), 50000, input_path + ".{0}");

I haven't compiled the code, but you should be able to get the idea.

Note that the code above will add a newline to the end of the last
output-file, even if none was present in input_path. Also, the code will
not work as expected on files longer than ulong.Max lines.

--
Helge Jensen
mailto:he****** ****@slog.dk
sip:he********* *@slog.dk
-=> Sebastian cover-music: http://ungdomshus.nu <=-
Apr 21 '06 #8
Helge Jensen <he**********@s log.dk> wrote:
Also, the code will not work as expected on files longer than
ulong.Max lines.


When you find a disk capable of storing a file with
18,446,744,073, 709,551,615 lines, let me know :) At one byte per line,
that's still 16 exabytes.

http://en.wikipedia.org/wiki/Exabyte has some interesting stats on
exabytes.

--
Jon Skeet - <sk***@pobox.co m>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too
Apr 21 '06 #9


Jon Skeet [C# MVP] wrote:
Helge Jensen <he**********@s log.dk> wrote:

Also, the code will not work as expected on files longer than
ulong.Max lines.


When you find a disk capable of storing a file with
18,446,744,073, 709,551,615 lines, let me know :) At one byte per line,
that's still 16 exabytes.


It's not that I'm concerned about it, it just happens to be so :)

Since some streams are infinite (or atleast supposed infinite) and line
oriented, it makes sense to just note the fact that there is a limit on
the expected behaviour.

The code could be rewritten to work on arbitrary-length input (provided
the FS allows arbitraty length paths) but I don't think it's worth the
effort, and it's nice to have the line-offset in the file-name so...

--
Helge Jensen
mailto:he****** ****@slog.dk
sip:he********* *@slog.dk
-=> Sebastian cover-music: http://ungdomshus.nu <=-
Apr 22 '06 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

3
12448
by: Dan Sikorsky | last post by:
Uploading from browser to server using Msxml2.XMLHTTP takes a long time about 15 minutes for a 1.5MB file at 37.2Kbps, although it does get there. Is there anyway to speed things up? here's the code ... var sFilename = (document.form1.myFileName.value); // create ADO-stream Object var ado_stream = new ActiveXObject("ADODB.Stream"); // create XML document with default header and primary node
5
6153
by: reddy | last post by:
I am trying to insert a node into an XMLFile. using XMLTextwriter. My Question is Is it possible to do without using XMLDocument. Because its loading all the the file into memory. I just want to insert in the front. My code is give below. Is it possible to do without using XMLDOcument? Dim masterDoc As String = Request.PhysicalApplicationPath & "PageViews.xml" Dim writer As XmlTextWriter = Nothing
12
2455
by: Gustavo L. Fabro | last post by:
Greetings! Getting straight to the point, here are the results of my experiment. I've included my comments and questions after them. The timing: (The total time means the sum of each line's drawing time. Time is measured in clock ticks (from QueryPerformanceCounter() API). The processor resolution (QueryPerformanceFrequency()) for my
11
6619
by: Abhishek | last post by:
I have a problem transfering files using sockets from pocket pc(.net compact c#) to desktop(not using .net just mfc and sockets 2 API). The socket communication is not a issue and I am able to transfer data across.On the serve I am using Socket 2 API (recv function to read bytes)and not using ..NET. I use FileStream to open the file on the pocket pc, then associate a BinaryReader object with the stream and call ReadBytes to read all the...
9
35268
by: haibhoang | last post by:
I have a Windows Service that is trying to parse a large (> 1Gig) text file. I am keep getting OutOfMemoryException exception. Here is the code that's having problem: using (StreamReader streamReader = new StreamReader(stream, Encoding.ASCII)) { string line = ""; DateTime currentDate = DateTime.Now.Date; while (streamReader.Peek() > -1)
15
3643
by: Gan Quan | last post by:
I'm writing a c++ program that has many (100+) threads read/write files simultaneously. It works well if not considering the efficiency. The file i/o seems to be the bottleneck. This is my code to read from and write to files: #include <fstream> #include <sstream> #include <string>
7
9880
by: iporter | last post by:
I use the code below to authorise the download of certain files. Thus, instead of linking to the file in a wwwroot directory, I link to this code with the filename as a parameter, and the script streams the file if the user is authorised. This has worked fine on PDFs, DOCs, XLS, etc. until today, and 18MB file presents the error message 'format error: not a pdf or corrupt'. Is there a file size limit, or a default that needs...
6
3161
by: George2 | last post by:
Hello everyone, I have several physical file and I want to use file map (MapViewOfFileEx) to map the file into memory to improve performance. Each file is about several hundred M bytes. All the memory mapped files are kept open during my application. The mapping is successful, but the strange thing is,
1
2242
by: LiorO | last post by:
Hi, I need to implement a high performance file copying. The File.Copy method in .NET is fast enough however I need to control the speed since it should be performed in the background without affecting the process too much. I've implemented a buffered (FileStream) copy using a queue for the bufferes. The performance is about 60% slower than the File.Copy. I've heard something about unbuffered stream, however I couldn't find anything...
0
8823
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
8726
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
0
8603
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
7320
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
0
4151
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
0
4301
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
2726
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
2
1944
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
2
1604
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.