473,657 Members | 2,591 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Reading in data from very large flat files

I have to read data from a flat file with millions of records. I wanted
to find the most efficient way of doing this. I was just going to use a
StreamReader and then break up the input line using Substring as there
are no delimiters however I have a spec for the format of the file. Is
using Substring the only way to do this or is there a more efficient
way?
while ((line = sr.ReadLine()) != null)
{
string param1 = line.Substring( 0,5);
string param2 = line.Substring( 5,2);
//etc..etc..
}

regards,

Joe

*** Sent via Developersdex http://www.developersdex.com ***
Nov 17 '05 #1
3 2926
booksnore <bo*******@nets cape.net> wrote:
I have to read data from a flat file with millions of records. I wanted
to find the most efficient way of doing this. I was just going to use a
StreamReader and then break up the input line using Substring as there
are no delimiters however I have a spec for the format of the file. Is
using Substring the only way to do this or is there a more efficient
way?
while ((line = sr.ReadLine()) != null)
{
string param1 = line.Substring( 0,5);
string param2 = line.Substring( 5,2);
//etc..etc..
}


That's a pretty efficient way of reading it. Are you then storing the
data in memory, or just processing each line in turn? If you're storing
them and there are lots of little fields, you might consider storing
just the whole line, and breaking it into bits when it's used. Each
string has a certain overhead, and if you have lots of strings with
just a few characters, that overhead could become significant.

--
Jon Skeet - <sk***@pobox.co m>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too
Nov 17 '05 #2
Depends on what you're doing with the data once you read it. If you only
need to read the data sequentially then that's a good method. If you need to
frequently jump from 1 record to another randomly then you should be able to
make use of each line being a fixed width to jump ahead/backwards in the
file and read only the desired information as required. But like Jon said
that's probably the best way (there are other ways but the gain wouldn't be
worth the coding, trust me) if you are reading the file sequentially.

"Jon Skeet [C# MVP]" <sk***@pobox.co m> wrote in message
news:MP******** *************** *@msnews.micros oft.com...
booksnore <bo*******@nets cape.net> wrote:
I have to read data from a flat file with millions of records. I wanted
to find the most efficient way of doing this. I was just going to use a
StreamReader and then break up the input line using Substring as there
are no delimiters however I have a spec for the format of the file. Is
using Substring the only way to do this or is there a more efficient
way?
while ((line = sr.ReadLine()) != null)
{
string param1 = line.Substring( 0,5);
string param2 = line.Substring( 5,2);
//etc..etc..
}


That's a pretty efficient way of reading it. Are you then storing the
data in memory, or just processing each line in turn? If you're storing
them and there are lots of little fields, you might consider storing
just the whole line, and breaking it into bits when it's used. Each
string has a certain overhead, and if you have lots of strings with
just a few characters, that overhead could become significant.

--
Jon Skeet - <sk***@pobox.co m>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too

Nov 17 '05 #3

Thanks for the replies. There will be some validation checks made on the
values of the resulting variable assignments but that will be on a line
by line basis (so for example I won't have to jump from one record and
check something against the last 10 records). The next step is that the
data is loaded into SQL Server following the validation checks, I was
going to batch insert by creating an xml document and feeding a stored
procedure using OPENXML. I am also going to performance test that method
against a DTS package load although I am not sure to what degree I can
perform effective validation checks uses DTS.

Joe

*** Sent via Developersdex http://www.developersdex.com ***
Nov 17 '05 #4

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

5
7304
by: uthuras | last post by:
Machine : AIX 5.2 Product : UDB DB2 Release 8.1 FP4a I have problem loading data into destination table. The data file is huge with more than 6 Million records. This what i have done 1. Export the data into flat file (del format) 2. use the load command to load the data At step 2, i found that there are some errors registered in the load
4
5968
by: Matthew Crema | last post by:
Hello, Say I have 1000 text files and each is a list of 32768 integers. I have written a C program to read this data into a large matrix. I am using fopen in combination with fscanf to read the data in. However, it takes about 20 seconds to complete and I wonder if there is a faster way. For example, I found that I could use 'fread' to read the data into a string that looks like this:
5
6441
by: rnorthedge | last post by:
I am working on a code library which needs to read in the data from large binary files. The files hold int, double and string data. This is the code for reading in the strings: protected internal override string ReadString() { stringLength = fileStream.ReadByte(); moInput.Read(byteArrayBuffer, 0, stringLength); return asciiEncoding.GetString(byteArrayBuffer, 0, stringLength ); }
12
3754
by: Chris Springer | last post by:
I'd like to get some feedback on the issue of storing data out to disk and where to store it. I've never been in a production environment in programming so you'll have to bear with me... My question is about storing data in a database. Yes I understand that you can link to a database in your program and read and write to the database etc etc. Well, that's all find and dandy but what if the person you're writing the application for...
14
2378
by: Manish | last post by:
The project I am developing doesn't involves database. I want to parse the mailbox file (.mbx) and store the summary in the text file for fast retrieval and display of information in the Inbox page. The sugegsted format are as: #1 ID : Subject : To Address: From Address...etc...
9
10732
by: mikelbell2000 | last post by:
I need to update a fairly wide non-indexed column for a very large number of rows in several tables. One solution we were tossing around was to avoid all the problems with performance and log space by exporting the data to flat files, using an external program to make the changes, then reload the data into the target tables. Preliminary tests showed good results. We ran into a problem where one table had a column defined as VARCHAR FOR...
2
7378
by: Ed | last post by:
Hope someone can help me out... I have been tasked to read some image data from an sql database and save the files to flat files. OK, sounds easy as I'v used BLOBs before. But this is an old database and I cannot get the image to work. The columns in the database are of type text. Here is one of the images text (in full) in the database (I hope you can see it):
5
14980
blazedaces
by: blazedaces | last post by:
Ok, so you know my problem, java is running out of memory reading with SAX, the event-based xml parser intended more-so than DOM for extremely large files. I'll try to explain what I've been doing and why I have to do it. Hopefully someone has a suggestion... Alright, so I'm using a gps-simulation program that outputs gps data, like longitude, lattitude, altitude, etc. (hundreds of terms, these are just the well known ones). In the newer...
0
2063
by: Winder | last post by:
Computer Data Recovery Help 24/7 Data recovering tools and services is our focus. We will recover your data in a cost effective and efficient manner. We recover all operating systems and media. Call for a free consultation. http://a.uuload.com/Computer-Data-Recovery.htm LiveVault's Online Recovery Service Protect vital data with LiveVault's offsite backup and data storage. http://a.uuload.com/Computer-Data-Recovery.htm
0
8305
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
8823
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
8730
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
8503
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
7321
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
0
5632
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
4151
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
0
4301
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
2726
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.