473,406 Members | 2,312 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,406 software developers and data experts.

Reading a BIG text file (200Mb)

ACC
Hi!

I'm developing an application to analyze some information that is
inside a text file. The size of the text file goes from 50Mb to
220Mb...

The text file is like a table, with "rows" and "columns", and several
"cells" represent an objects in my application.

What I need to do is, read the text file line by line, parse the
information, create my objects and save it smewhere (ArrayList) so that
I can process it afterwards.

And the question is, how to do it in a efficient way??

Thank you in advance for your help,

ACC

Feb 16 '06 #1
13 6051
In .NET 2.0 . The System.IO.File class has a ReadAllLines("file.txt")
that accepts a filepath. That method returns a sting array of all the
lines in that file. Each line is seperated by a carriage return, line
feed. You can then further break each line apart by doing a split
using some delimiter. The .NET framework will be doing all of the file
reading for you so maybe they are doing it most effeciently.

I hope this helps.

Feb 16 '06 #2
ACC,

Is there any reason why you need all of the items in your array list at
one time? Can you process it in batches?

What is the format of your file? If it is a delimited text file, you
might want to consider loading it into a table in SQL Server through Bulk
Copy Services, and then perform your operations on the table. Of course,
this is only applicable if you are performing some sort of set operations.
If you are not (there is some super complex calc you are doing), then this
isn't appropriate.

Ultimately, processing each object as it comes in is the best way, and
only keeping around what you need (instead of all of them).

Hope this helps.

--
- Nicholas Paldino [.NET/C# MVP]
- mv*@spam.guard.caspershouse.com

"ACC" <ma*******@gmail.com> wrote in message
news:11**********************@g47g2000cwa.googlegr oups.com...
Hi!

I'm developing an application to analyze some information that is
inside a text file. The size of the text file goes from 50Mb to
220Mb...

The text file is like a table, with "rows" and "columns", and several
"cells" represent an objects in my application.

What I need to do is, read the text file line by line, parse the
information, create my objects and save it smewhere (ArrayList) so that
I can process it afterwards.

And the question is, how to do it in a efficient way??

Thank you in advance for your help,

ACC

Feb 16 '06 #3
This would be a VERY bad idea. If the file is 50-200MB, reading all of
those lines in the file into a string (or array of strings) is going to be a
performance nightmare.
--
- Nicholas Paldino [.NET/C# MVP]
- mv*@spam.guard.caspershouse.com

"tdavisjr" <td******@gmail.com> wrote in message
news:11*********************@g47g2000cwa.googlegro ups.com...
In .NET 2.0 . The System.IO.File class has a ReadAllLines("file.txt")
that accepts a filepath. That method returns a sting array of all the
lines in that file. Each line is seperated by a carriage return, line
feed. You can then further break each line apart by doing a split
using some delimiter. The .NET framework will be doing all of the file
reading for you so maybe they are doing it most effeciently.

I hope this helps.

Feb 16 '06 #4
But of course you can do all the reading on a different thread

Feb 16 '06 #5
It's still going to be a nightmare. You are going to read 200MB into
your app? Doing this on another thread isn't going to make a difference,
the fact that you are going to have an array of over 400 MB most likely is
NOT a good idea.

And I say 400 MB because chances are the file is ASCII and then when
read into a unicode string, it will double in size essentially.

Processing this line-by-line is the only way to go.

--
- Nicholas Paldino [.NET/C# MVP]
- mv*@spam.guard.caspershouse.com

"tdavisjr" <td******@gmail.com> wrote in message
news:11**********************@g47g2000cwa.googlegr oups.com...
But of course you can do all the reading on a different thread

Feb 16 '06 #6
I can see your point.

Feb 16 '06 #7
Hi,

"ACC" <ma*******@gmail.com> wrote in message
news:11**********************@g47g2000cwa.googlegr oups.com...
Hi!

I'm developing an application to analyze some information that is
inside a text file. The size of the text file goes from 50Mb to
220Mb...

The text file is like a table, with "rows" and "columns", and several
"cells" represent an objects in my application.

What I need to do is, read the text file line by line, parse the
information, create my objects and save it smewhere (ArrayList) so that
I can process it afterwards.

And the question is, how to do it in a efficient way??

Thank you in advance for your help,

Do you need all the data at once? you could read a record at a time and
update the variables where you keep the going data.

if you need ALL the data at once you would be better using MSDE , it's free
and as powerful as SQL server, you can use T-sql to better handle the data.
you can create a DTS to import the data into MSDE, luego usas ADO.net para
query la data, you can execute the DTS from inside your code so you could
change the source of the data.

--
Ignacio Machin,
ignacio.machin AT dot.state.fl.us
Florida Department Of Transportation
Feb 16 '06 #8
Hi,

"Nicholas Paldino [.NET/C# MVP]" <mv*@spam.guard.caspershouse.com> wrote in
message news:%2****************@tk2msftngp13.phx.gbl...
This would be a VERY bad idea. If the file is 50-200MB, reading all of
those lines in the file into a string (or array of strings) is going to be
a performance nightmare.

no only that, but almost useless as the data is still raw , you need to
split each line in a record.
--
Ignacio Machin,
ignacio.machin AT dot.state.fl.us
Florida Department Of Transportation
Feb 16 '06 #9
ACC
Thank you all for replying!!

Unfortenatelly, yes I need all the data....

I'll try to explain a little bit more.
The text file represents some processes that ran in a dsp, and each
line represents something that happened. The users will use my
application too check if the dsp is working as predicted.

The problem is that each user wants to check something diferent. So, I
never know what they would like to check when I load the file. For
instance, one may want to check if a specific process is occuring every
10 ms, and if not, then there is a problem in the dsp. Other might just
want to know the average of some values.

so, as you see, I need to work with all the data.

About sql server, I dont have it...

now... at this moment, I can actually reading a text file of 203Mb,
processing it and displaying it in a datagrid in 2 minutes. I'm using a
StreamReader, read line by line, process each line and pass to the
next.
Thank you for your help!

Feb 17 '06 #10
I can see what you are doing, but can I challenge the assumption that you
need to hold all the data in memory?

In particular, (line-by-line) readers are *incredibly* suited to performing
averages etc - you simply read one row at a time, find the value you want,
and then adjust a tally somewhere - either using a total/count pair (but
the total can get big) or by adjusting a floating average on the fly (less
accuarate, but less risk of overflow).

Equally, checking whether something happens every 10ms just involves keeping
a counter of the last time you found a match; if the row you are reading is
10ms past (assuming rows are incremental in time) then you have a problem.


If you can isolate the common "what you need to do" bits, I'm pretty sure
you can do most of this without ever loading a *single* line into an object,
let alone all of them. You could also do some tricks with delegates and/or
custom iterators to mean that you only write the stream code once, and then
just write stubs that work on the formatted data; in this scenario, perhaps
a custom iterator that returns a single instance per call would be in order
(so you only ever have one row in memory) - quite easy to do, and pretty
quick.

Otherwise, you are just going to cripple the client unless you have some
serious memory; and even then you'd have to switch most UI elements to
virtual mode, as they are *not* going to like having that many items in a UI
collection.

Just my thoughts,

Marc
Feb 17 '06 #11
Thinking of virtual mode - is there any way you could attach the UI to the
file using virtual mode? i.e. only read lines as requested by the UI?

You might have to parse the file once first and build an index of the offset
of every <x> lines so that you can jump directly to the right place in the
stream... and perhaps a file-system watcher so that you get notified if the
file changes (so you can re-index and update the UI), but it might be worth
a few minutes consideration,

Just a thought,

Marc
Feb 17 '06 #12
ACC
Marc,

is not that simple...

A line of the text file has at least the following

state number
the address of the memory that is being accessed
the data in the memory
the instance of time when occured

When the users just wants to check if objectx (with address = x) occurs
every 10ms is ok...

but he might want to Find all object1 (object2 address X followed by
object3 with Address Y and data superior to W followed by object4... )

The search is more complex... but might be possible!! :)
But I still need to load the textfile to a datagrid :(

Thanks
ACC
Marc Gravell wrote:
Thinking of virtual mode - is there any way you could attach the UI to the
file using virtual mode? i.e. only read lines as requested by the UI?

You might have to parse the file once first and build an index of the offset
of every <x> lines so that you can jump directly to the right place in the
stream... and perhaps a file-system watcher so that you get notified if the
file changes (so you can re-index and update the UI), but it might be worth
a few minutes consideration,

Just a thought,

Marc


Feb 17 '06 #13
Hi,

The problem is that each user wants to check something diferent. So, I
never know what they would like to check when I load the file. For
instance, one may want to check if a specific process is occuring every
10 ms, and if not, then there is a problem in the dsp. Other might just
want to know the average of some values.

so, as you see, I need to work with all the data.
you need all the data in memory only if you are going to make more than one
run on it and the results are not accumulatives.

About sql server, I dont have it...


you do not need sql server, MSDE is like a "sql server lite" and IT'S FREE

now, if on each run the file change (which probably does) then you need to
evaluate the usefulness of inserting that data first in the sql engine, if
you only do one run per file you do not need to do this and your current
solution is an acceptable compromise.
--
Ignacio Machin,
ignacio.machin AT dot.state.fl.us
Florida Department Of Transportation
Feb 17 '06 #14

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
by: fabrice | last post by:
Hello, I've got trouble reading a text file (event viewer dump) by using the getline() function... After 200 - 300 lines that are read correctly, it suddenly stops reading the rest of the...
14
by: Job Lot | last post by:
I have tab delimited text file which gets populated on daily basis via automated process. New entry is written at the bottom. I need to create a utility which makes a copy of this file with 10 most...
8
by: Phil Slater | last post by:
I'm trying to process a collection of text files, reading word by word. The program run hangs whenever it encounters a word with an accented letter (like rôle or passé) - ie something that's not a...
7
by: Drew Berkemeyer | last post by:
Hello, I'm using the following code to read a text file in VB.NET. Dim sr As StreamReader = File.OpenText(strFilePath) Dim input As String = sr.ReadLine() While Not input Is Nothing...
4
by: Nina | last post by:
Hi everyone, Do you know why the following code only read certain number of lines of text file, but not the entire file? Dim sr As StreamReader Dim str As String Dim al As ArrayList = New...
4
by: Amit Maheshwari | last post by:
I need to read text file having data either comma seperated or tab seperated or any custom seperator and convert into a DataSet in C# . I tried Microsoft Text Driver and Microsoft.Jet.OLEDB.4.0...
1
by: John | last post by:
I have a process that reads a text file then uploads the data into a database table. The text file has 10 lines at the end of the file that are blank BUT it appears that the enter key or space bar...
3
by: John | last post by:
How can I tell if a line begins with a number instead of a character? I need to read a text file and some of the data begins with a character which is causing me an issue in reading and uploading...
3
by: jasvinder singh | last post by:
Respected Sir/madam, Can you help in providing code in 'C' for Reading text file with n number of rows and columns and putting the result in arrays.The sample file is as follows: rim_label =...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.