473,796 Members | 2,482 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Program Design for Large volume file processing

Hi,

I am developing a C Program for reading over a million files of size 1
kilobytes each and sending the contents to another program using some
middle ware. I need some help on designing the program to process such
a large number of files in less than 8 hours.

TIA
Soren
Nov 14 '05 #1
6 2112
On 15 Dec 2004 19:50:00 -0800, sa*********@yah oo.com.sg (soren juhu)
wrote in comp.lang.c:
Hi,

I am developing a C Program for reading over a million files of size 1
kilobytes each and sending the contents to another program using some
middle ware. I need some help on designing the program to process such
a large number of files in less than 8 hours.

TIA
Soren


From the information you have provided in your post, the only advice
anyone could possibly give you would be to buy a faster computer with
faster hard disk drives to run your program on.

Even if you posted detailed information about the "processing " that
you had to do on the files, you don't have a C language question, you
have one about choosing the most efficient algorithm. For that you
need to post to an algorithm group such as news:comp.progr amming, and
be very explicit about the processing you need to do.

Once you have selected an algorithm, possibly with the help of an
appropriate group, if you have difficulties writing standard C code
that compiles and executes correctly, then post the problem code here,
explain your problems with it, and ask for C language advice.

--
Jack Klein
Home: http://JK-Technology.Com
FAQs for
comp.lang.c http://www.eskimo.com/~scs/C-faq/top.html
comp.lang.c++ http://www.parashift.com/c++-faq-lite/
alt.comp.lang.l earn.c-c++
http://www.contrib.andrew.cmu.edu/~a...FAQ-acllc.html
Nov 14 '05 #2
soren juhu wrote:

Hi,

I am developing a C Program for reading over a million files of size 1
kilobytes each and sending the contents to another program using some
middle ware. I need some help on designing the program to process such
a large number of files in less than 8 hours.


If we make the liberal assumption that you can locate and open each
file in 25 millisecs, that leaves you about 3800 seconds to process
1e9 bytes, or you will require a throughput in the order of 250k
bytes per second. Have fun.

--
Chuck F (cb********@yah oo.com) (cb********@wor ldnet.att.net)
Available for consulting/temporary embedded and systems.
<http://cbfalconer.home .att.net> USE worldnet address!

Nov 14 '05 #3
On Wed, 15 Dec 2004 19:50:00 -0800, soren juhu wrote:
Hi,

I am developing a C Program for reading over a million files of size 1
kilobytes each and sending the contents to another program using some
middle ware. I need some help on designing the program to process such
a large number of files in less than 8 hours.


Your main bottleneck is likely to be the file access. Accessing lots of
small files over a hard disk could end up being very slow, disks are much
more efficient reading large chunks of sequential data. You may want to
consider how your file os organised in the first place. If for example you
had the data written in 1K blocks in a single file (perhaps even do both)
the problem reduces to transferring a gigabyte of data which can be done
in seconds or minutes with normal LAN speeds.

This isn't a question about C but about the design of a system of
file management. You need to sit down and specify your real requirements,
e.g. why there are over a million 1K files in the first place and whether
a better approach is possible. There may be things you can do to aid this
transfer process when those million files are being generated (such as
append them to a single file, perhaps even put them in a database).

There is a lot you need to consider before worrying about C related issues.

Lawrence


Nov 14 '05 #4
soren juhu wrote:
Hi,

I am developing a C Program for reading over a million files of size 1
kilobytes each and sending the contents to another program using some
middle ware. I need some help on designing the program to process such
a large number of files in less than 8 hours.


Do the arithmetic.

1000000 * 1024 * 8 (assuming an 8-bit-byte platform for the moment)
comes to 8192000000 bits. If you have, say, 7 hours to transfer this
amount of data, you will need to throw bits down the wire at a
rate of at least 325 kbps. This should easily be within the reach
of modern network cards. I don't think you'll have a problem.

I suggest you write an "obvious" program, and then test it to
see if it's quick enough. If so, fabulous. If not, post it here
and maybe we can help you speed it up.

It's worth remembering that this newsgroup can't - or rather,
won't - help you on the networking aspects of such a program.
But they are likely to come up with some good ideas on the
rest of it, given the catalyst of some source code to inspect.

Best of luck.
Nov 14 '05 #5
soren juhu wrote:
Hi,

I am developing a C Program for reading over a million files of size 1 kilobytes each and sending the contents to another program using some
middle ware. I need some help on designing the program to process such a large number of files in less than 8 hours.

TIA
Soren

Hi all,

Sorry for my late reply, I am posting my message using Google Groups.

Thanks a lot for your valuable inputs to the problem. It definitely
helped in knowing where to start for solving the problem. I will surely
inform you about this development effort.

Thanks,
Soren

Nov 14 '05 #6
In article <01************ *************** *****@4ax.com>,
Jack Klein <ja*******@spam cop.net> wrote:
On 15 Dec 2004 19:50:00 -0800, sa*********@yah oo.com.sg (soren juhu)
wrote in comp.lang.c:
Hi,

I am developing a C Program for reading over a million files of size 1
kilobytes each and sending the contents to another program using some
middle ware. I need some help on designing the program to process such
a large number of files in less than 8 hours.

TIA
Soren
From the information you have provided in your post, the only advice
anyone could possibly give you would be to buy a faster computer with
faster hard disk drives to run your program on.

Even if you posted detailed information about the "processing " that
you had to do on the files, you don't have a C language question, you
have one about choosing the most efficient algorithm. For that you
need to post to an algorithm group such as news:comp.progr amming, and
be very explicit about the processing you need to do.


I interpret the question this way: what standard c-function
are appropriate and how should I use them.

My answer proves you wrong.

1. Assuming you can guarantee a maximum size of each file,
read them in one go in a static buffer of that size.
2. Go for the lowest level calls, (read/write) and
handle the rest yourself.
3. You total througput seems to be in reach for modern
disks. C is low overhead and shouldn't get in your
way for a reasonable amount of processing.

There is no way the OP ask about "processing to be done".
Once you have selected an algorithm, possibly with the help of an
appropriate group, if you have difficulties writing standard C code
There is no way you could mention an algorithm. You have not the
slightest clue, if you wanted to. The OP was well aware that
would be off topic.
that compiles and executes correctly, then post the problem code here,
explain your problems with it, and ask for C language advice.
Aren't we going overboard? Such that only home work questions
are appropriate?
--
Jack Klein


Groetjes Albert.

--
--
Albert van der Horst,Oranjestr 8,3511 RA UTRECHT,THE NETHERLANDS
One man-hour to invent,
One man-week to implement,
One lawyer-year to patent.
Nov 14 '05 #7

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

4
2924
by: Jason Murry | last post by:
I have a camera system (Axis) which stores JPG via FTP 1-10fps. There is also a motion jpg live stream. I am trying to store these images either in JPG or in video format so they can be reviewed at a later date. I would need to be able to pull a date-time range from the list. This means 86,400 - 2,160,000 images (around 110Kb a piece) per camera a day, depending on speed 1fps - 10fps. There will probably be 1-5 cameras typically. ...
10
2049
by: Nimit | last post by:
Hi, I wasn't sure which forum this post belongs to, so I've posted it to a couple forums that I thought may be appropriate. In giving me advice, please consider me a beginner. Below is a synopsis of my problem/question: SOME BACKGROUND: - I am writing a php based web application. - There is a very data intensive task I need to do that requires reading and lookup of a lot of data.
2
14447
by: Bob Day | last post by:
Using VS 2003, VB. Net, MSDE... Usining task sheduler, I wish to mute the volume in the .bat file that task scheduler runs (windows XP Pro). I don't see anyway to do this via a .bat line command (if there is, please let me know). So the next option would be to write a small .net program that would do it and run that via task scheduler. How would you mute the volume in code in the vb.net program? I'm not quire sure where to get...
8
2260
by: cat | last post by:
I had a long and heated discussion with other developers on my team on when it makes sense to throw an exception and when to use an alternate solution. The .NET documentation recommends that an exception should be thrown only in exceptional situations. It turned out that each of my colleagues had their own interpretation about what an "exceptional situation" may actually be. First of all, myself I’m against using exceptions extensively,...
13
4883
by: ragtag99 | last post by:
I posted this on comp.lang.asm.x86, alt.os.development, comp.arch, comp.lang.c++ Im working with windows xp professional, NTFS and programming with MASM, c++ (free compiler) or visual basic 6.0 === question 1 Primarily Im trying to design a program that has full control over a hard disk. What it needs to do is find out what sectors haven't been
2
1498
by: Joey | last post by:
I have written an app in C#/asp.net 2.0 that is a system built to handle a large number of scenarios. Part of that system involves allowing users to download large files. As part of my original design strategy, I chose to locate these downloads in a directory separate from the website file structure. The two primary purposes for this were: (1) it is more secure because users cannot link directly to the files and (2) it is modular,...
1
2418
by: epilogue | last post by:
Hey guys Im pretty new to Java and while I am finding it enjoyable i am getting several errors!!! Do you think you could help me on this particular question of an Assignment im doing. I have to make a simple program that takes 10 sets of numbers from a text file which it reads the Length, Width and Height from. Then the program works out the Surface Area, Volume and the Postage. The Surface Area, Volume and Postage sections work fine but the...
1
3898
by: =?Utf-8?B?UVNJRGV2ZWxvcGVy?= | last post by:
Using .NET 2.0 is it more efficient to copy files to a single folder versus spreading them across multiple folders. For instance if we have 100,000 files to be copied, Do we copy all of them to a single folder called 'All Files' Do we spread them out and copy them to multiple folders like Folder 000 - Copy files from 0 to 1000 Folder 001 - Copy files from 1000 to 2000 Folder 002 - Copy files from 2000 to 2999
0
9680
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
9528
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
1
10173
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
10006
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
6788
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
5441
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
0
5573
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
4116
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
2
3731
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.