473,396 Members | 1,891 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,396 software developers and data experts.

Problem with reading a large text file

I have a Windows Service that is trying to parse a large (> 1Gig) text
file. I am keep getting OutOfMemoryException exception. Here is the
code that's having problem:

using (StreamReader streamReader = new
StreamReader(stream, Encoding.ASCII))
{
string line = "";
DateTime currentDate = DateTime.Now.Date;
while (streamReader.Peek() > -1)
{
line = streamReader.ReadLine();
}
}

I read the documentation and realized that the ReadLine() method is not
every efficient. Is there another way that I can do this?

Thanks,
Hai

Nov 17 '05 #1
9 35180
According to the StreamReader.ReadLine
documentation(http://msdn.microsoft.com/library/de...LineTopic.asp),
a OutOfMemoryException is generated when There is insufficient memory to
allocate a buffer for the returned string.

From the sounds of it, you are trying to have it read a line that is just
too big for it to be able to read at once.

One way around this would be to explicitly read in smaller blocks at a time.
Given that you are using a StreamReader, take a look at one of the two
versions of Read().

The first reads a single character at a time, while the second (the one that
takes arguments) reads a block from the stream of a specified size.

Which ever way you go, both are far safer than a ReadLine() or worse yet
ReadToEnd() with regards to trying to read extremely large sets of data all
at once.

Brendan
"ha*******@gmail.com" wrote:
I have a Windows Service that is trying to parse a large (> 1Gig) text
file. I am keep getting OutOfMemoryException exception. Here is the
code that's having problem:

using (StreamReader streamReader = new
StreamReader(stream, Encoding.ASCII))
{
string line = "";
DateTime currentDate = DateTime.Now.Date;
while (streamReader.Peek() > -1)
{
line = streamReader.ReadLine();
}
}

I read the documentation and realized that the ReadLine() method is not
every efficient. Is there another way that I can do this?

Thanks,
Hai

Nov 17 '05 #2
Every time when you are using "line = " in the loop .NET allocates new string
object in memory. In the same time previous content became eligible for GC,
but you never know when it happened. In case of have processing like parsing
it might not happened till the end of the loop. So you'll get out of memory.
To avoid this simply use StringBuilder. At this case you will allocate
memory only once before loop began.
"ha*******@gmail.com" wrote:
I have a Windows Service that is trying to parse a large (> 1Gig) text
file. I am keep getting OutOfMemoryException exception. Here is the
code that's having problem:

using (StreamReader streamReader = new
StreamReader(stream, Encoding.ASCII))
{
string line = "";
DateTime currentDate = DateTime.Now.Date;
while (streamReader.Peek() > -1)
{
line = streamReader.ReadLine();
}
}

I read the documentation and realized that the ReadLine() method is not
every efficient. Is there another way that I can do this?

Thanks,
Hai

Nov 17 '05 #3

"RayProg" <Ra*****@discussions.microsoft.com> wrote in message
news:AF**********************************@microsof t.com...
Every time when you are using "line = " in the loop .NET allocates new
string
object in memory. In the same time previous content became eligible for
GC,
but you never know when it happened. In case of have processing like
parsing
it might not happened till the end of the loop. So you'll get out of
memory.
To avoid this simply use StringBuilder. At this case you will allocate
memory only once before loop began.


No, this is not the reason for the OOM, Brendan is right to the point.
Also, your description does not reflect how the GC works, whenever the GC
reaches the gen0 threshold ( sizes vary between 256KB and a few MB) the CLR
will hijack the current thread and force a GC collection, nothing can stop
this from happening. Don't forget that an application has to enter the CLR
to instantiate a new object, at that time the CLR inspects the GC heap
statistics and decides to start a GC action when a "trigger point" is met.

Willy.
Nov 17 '05 #4
I followed your intruction but the process is so slow now.

using (Stream stream = System.IO.File.OpenRead(fileName))
{
using (StreamReader streamReader = new
StreamReader(stream, System.Text.Encoding.ASCII))
{
char[] buffer = new char[202];
int read = 0;
while (streamReader.Peek() > -1)
{
read = streamReader.Read(buffer, 0, 202);
}
}
}

Nov 17 '05 #5
If at all possible I would read more than 202 characters at a time.

I’m going to guess that the size of the each record you want to read from
your file is 202 characters long. If we assume a 1 gigabyte file of those
records (1,073,741,824 charactors/bytes long), you have roughly 5,315,553
such records. Reading a single record at a time requires 5.3 million separate
accesses to the disk.

On the other hand, if you increase the size of each read, and then parse out
that data you save yourself a huge amount of work... for example, lets say
you read 10 records in at a time... you bring the required # of separate disk
accesses to just ~300 thousand, much better than 5.3 million.

Increase the read size by another 10 fold and you drop your disk accesses
down to ~30 thousand times.

Depending on the amount of memory available, feel free to play around with
adjusting the amount of data you read each time. Granted, the larger amount
you read reduces the amount number of times you have to hit the disk... it
also increases the memory requirements of your application and the
possibility of other slowdowns. Keep testing and tweaking it until you get it
right, or at least as fast as is acceptable.

Just remember, disk access is one of the slowest forms of I/O you can do on
a computer.

Brendan
"ha*******@gmail.com" wrote:
I followed your intruction but the process is so slow now.

using (Stream stream = System.IO.File.OpenRead(fileName))
{
using (StreamReader streamReader = new
StreamReader(stream, System.Text.Encoding.ASCII))
{
char[] buffer = new char[202];
int read = 0;
while (streamReader.Peek() > -1)
{
read = streamReader.Read(buffer, 0, 202);
}
}
}

Nov 17 '05 #6
<ha*******@gmail.com> wrote in message news:11**********************@g49g2000cwa.googlegr oups.com...
Question:
Does the HUGE file have any carriage returns? (sounds like it doesn't)

I followed your intruction but the process is so slow now.

I am confused.
Prior to this it sounded like it didn't work at all. Is this true?
If so... how can it be slower NOW when it didn't work before?

What exactly do you mean by "the process is so slow now"?
Parsing a file over a Gig in size will never be fast.
using (Stream stream = System.IO.File.OpenRead(fileName))
{
using (StreamReader streamReader = new
StreamReader(stream, System.Text.Encoding.ASCII))
{
char[] buffer = new char[202];
int read = 0;
while (streamReader.Peek() > -1)
{
read = streamReader.Read(buffer, 0, 202);
}
}
}


You may be able to improve your performance by making the file reads more "Chunky".
Read 202000 bytes in one read instead of 1000 reads of 202 bytes.

Good luck,
Large file processing is fun
Bill

Nov 17 '05 #7
Bill Butler wrote:
Good luck,
Large file processing is fun
Bill


Exactly. For arbitrarily large files, many systems have a background thread
reading the file filling in a buffer using read ahead then synchronize the
consumtion of that buffer for parsing in a different thread. You can have
as simple or complex of a solution as you can imagine depending on what
exactly you're trying to optimize (ram usage, raw speed, etc.).

--
Gordon Smith (eMVP)
-- Avnet Applied Computing Solutions
Nov 17 '05 #8
<ha*******@gmail.com> wrote:
I have a Windows Service that is trying to parse a large (> 1Gig) text
file. I am keep getting OutOfMemoryException exception. Here is the
code that's having problem:

using (StreamReader streamReader = new
StreamReader(stream, Encoding.ASCII))
{
string line = "";
DateTime currentDate = DateTime.Now.Date;
while (streamReader.Peek() > -1)
{
line = streamReader.ReadLine();
}
}

I read the documentation and realized that the ReadLine() method is not
every efficient. Is there another way that I can do this?


A few points/questions:

1) Rather than calling Peek, the usual way of writing the above is:

while ( (line=streamReader.ReadLine()) != null)
{
// Do something with line
}

2) Given the above, you don't need to initialise line to "" to start
with.

3) What are you actually doing with the lines? If you're keeping them
in memory in an ArrayList or something, then yes, you'll run out of
memory. If you're just reading them and discarding them (as in your
code sample) you shouldn't have any problems.

4) What's the longest line in your text file? If it's enormous, that
could be the problem.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too
Nov 17 '05 #9
Brendan Grant <gr****@NOSPAMdahat.com> wrote:

<snip>
Just remember, disk access is one of the slowest forms of I/O you can do on
a computer.


Also note, however, that modern OSes to buffering. I've just written a
1GB file to disk, and then read it using the previously posted code but
with various different buffer sizes.

I would *expect* that as the file is as big as my physical memory, OS
file caching itself won't come into play here - only OS buffering.

Here are the results:

Size: Time taken
100: 00:00:42.1406250
200: 00:00:41.8906250
500: 00:00:41.6406250
5000: 00:00:42
50000: 00:00:41.7500000

(Note that this is on a laptop, so the disk is pretty slow.)

In other words, changing the buffer size really doesn't help here.
(I've tried a few other things, and they don't help much either...)

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too
Nov 17 '05 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

8
by: Rune Johansen | last post by:
Hi, I'm sorry if these questions are trivial, but I've searched the net and haven't had any luck finding the information I need. I need to perform some regular expression search and replace on...
7
by: Jay | last post by:
I have a very large text file (being read by a CGI script on a web server), and I get memory errors when I try to read the whole file into a list of strings. The problem is, I want to read the file...
2
by: Amy L. | last post by:
Is there a way through .net to read a very large text file (400MB+) backwards line by line. In system.io the filestream class has a "seek" method but the only read method requires you to know how...
6
by: jcrouse | last post by:
Here is a sniplet from a text file game name mapp description "Mappy (US) year 198 manufacturer "Namco history "\nMappy (c) 03/1983 Namco. \n\n- TRIVIA: \n\nLicensed to Bally Midway for US...
14
by: mfrsousa | last post by:
hi there, i have a huge large text file (350.000 lines) that i want to import to a MS Acccess Database, of course i don't want to use Access, but do it with C#. i already have tried the...
1
by: bipinosingh | last post by:
Hi All, I am having XML file whose DTD file is not available, Size of the file 900MB which is too big file. Each line of the file has field Name and the Value. What I am doing here is I...
1
by: stoogots2 | last post by:
I have written a Windows App in C# that needs to read a text file over the network, starting from the end of the file and reading backwards toward the beginning (looking for the last occurrence of a...
2
by: thanawala27 | last post by:
Hi, I'm facign a strange problem in reading a text file. The contents of my text file is: A1;B1;C1;D1 A2;B2;C2;D2 A3;B3;C3;D3
2
by: friend.blah | last post by:
i have a text file lets say in this format abc abs ajfhg agjfh fhs ghg jhgjs fjhg dj djk djghd dkfdf .... .... ...... i want to read the first line at certain time for eg : at 10clk
10
by: sarthur | last post by:
Hi Friends, I am trying to get the last 50 Mb of a large Text file(Over 2 GB) using VB. I have to add this to an Access form so that when the user clicks the button a pop up window asks...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.