473,695 Members | 1,929 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Dealing with large files (random access)


This post is the 'sequel' ;) of the "Data Oriented vs Object Oriented
Design" post, but it can be read and treated apart from that one. I
will just quote the beginning of my previous message to expose the

New to .NET, I'm working on an Winforms client application using VS
2005 beta2. My needs considering data storage are the followings:

(1) Small files (0 < length < 10 mb), containing lots of small
'objects' that need to be loaded in memory at runtime, in order to
garantee small access time to each 'object'. Each of these 'objects'
collection should be able to easily bind to a DataGridView, AND to
provide *filtering* and *sorting* capability.

(2) 'Mid-large' files (0 < length < 100 mb), containing lots of
'mid-large objects'. Such files shouldn't be fully loaded in memory,
but the class that deals with those files must expose random
accessing/adding/removing/modifying 'object' functionalities .

In facts, a file of type (1) contains an index of the large 'objects'
contained in a corresponding file of type (2). Each index will be used
to fill a DataGridView, and each entry (row) will provide a
'reference' to the 'big object' stored in the corresponding type (2)
file (in addition to some other object-specific details that will be
displayed on the DGView of course).

Since this is a Windows client application, using a database with a
data provider is out of the question.
This post deals with type (2) files.

My question will go straight to the point: what is the most efficient
approach to deal with such files?

I haven't much thought about it yet. Again, using a serialized List<T>
(or any object Collection) would be fine, but randomly accessing an
object without loading the whole list into memory is easier said than

I'm sure you experienced people will have some great starting ideas ;)

Nov 17 '05 #1
1 2383
interesting assumption:
Since this is a Windows client application, using a database with a
data provider is out of the question.

Why in the world would you say that? Access databases and MSDE databases
are both completely free to distribute with your windows client application.
Download WebMatrix from the Microsoft site and you get MSDE.

For a good writeup of MSDE and comparisons with other tools, See

To see how this plays with the next version of MS software, See
Honestly, the idea of writing something new to do all this is difficult for
me to rationalize.

Your notion of packing all your small files into a big one is not better.
Your small files cannot grow without rewriting the large one. Why not use
the file system... that's what it's for!

--- Nick Malik [Microsoft]
MCSD, CFPS, Certified Scrummaster

Disclaimer: Opinions expressed in this forum are my own, and not
representative of my employer.
I do not answer questions on behalf of my employer. I'm just a
programmer helping programmers.
Nov 17 '05 #2

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

by: Laphan | last post by:
Hi All I know this is my 2nd (and final) cross-post, but which NG should I use for the below. I want to create a game that queries and updates text and numeric stats on a regular basis, so I'm looking for a data file method that allows me to query and update the file from a VB app. I use MS Access/ASP on a daily basis and was hoping to use this knowledge to
by: Brad Tilley | last post by:
I have some large files (between 2 & 4 GB) that I want to do a few things with. Here's how I've been using the md5 module in Python: original = file(path + f, 'rb') data = original.read(4096) original.close() verify = md5.new(data) print verify.hexdigest(), f Is reading the first 4096 bytes of the files and calculating the md5 sum
by: Joerg Schuster | last post by:
Hello, I am looking for a method to "shuffle" the lines of a large file. I have a corpus of sorted and "uniqed" English sentences that has been produced with (1): (1) sort corpus | uniq > corpus.uniq corpus.uniq is 80G large. The fact that every sentence appears only
by: shailesh kumar | last post by:
Hi, I need to design data interfaces for accessing files of very large sizes efficiently. The data will be accessed in chunks of fixed size ... My data interface should be able to do a random seek in the file, as well as sequential access block by block.... One aspect of the usage of this interface is that there is quite good chance of accessing same blocks again and again by the application..
by: jdev8080 | last post by:
We are looking at creating large XML files containing binary data (encoded as base64) and passing them to transformers that will parse and transform the data into different formats. Basically, we have images that have associated metadata and we are trying to develop a unified delivery mechanism. Our XML documents may be as large as 1GB and contain up to 100,000 images. My question is, has anyone done anything like this before?
by: Chris Foote | last post by:
Hi all. I have the need to store a large (10M) number of keys in a hash table, based on a tuple of (long_integer, integer). The standard python dictionary works well for small numbers of keys, but starts to perform badly for me inserting roughly 5M keys: # keys dictionary metakit (both using psyco) ------ ---------- ------- 1M 8.8s 22.2s
by: kaminekutte | last post by:
Hi everybody, I have been trying to parse a 100MB log file(tab separated). Basic aim is to read the file randomly, do some procesing and then display the contents of the file line by line. Working with randon access file class makes reading very slow. Is there any alternative to random access files class in java whereby faster file processing is possible??
by: sebastian.harko | last post by:
Helllo, What's the general accepted strategy for dealing with very large binary files in C# ? I have to do a program that reads some "multi frame bitmap " files which can reach up to one hundred megs so I need to know how to optimize reading a file.. Best regards, Seb
by: tekctrl | last post by:
Anyone: I have a simple MSAccess DB which was created from an old ASCII flatfile. It works fine except for something that just started happening. I'll enter info in a record, save the record, and try to move to another record and get an Access error "Record is too large". The record is only half filled, with many empty fields. If I remove the added data or delete some older data, then it saves ok and works fine again. Whenever I'm...
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.