469,898 Members | 1,533 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 469,898 developers. It's quick & easy.

dealing with huge data

ok so i have written a program in C where I am dealing with huge
data(millions and lots of iterations involved) and for some reason the
screen tends to freeze and I get no output every time I execute it.
However, I have tried to reduce the amount of data and the program
runs fine.
What could possibly be done to resolve this ?
Jun 27 '08 #1
16 1611
I forgot to mention this happened while I was trying to print data.

I have seen it can't work for extremely huge data.

Jun 27 '08 #2
pereges wrote:
>
ok so i have written a program in C where I am dealing with huge
data(millions and lots of iterations involved) and for some reason the
screen tends to freeze and I get no output every time I execute it.
However, I have tried to reduce the amount of data and the program
runs fine.
What could possibly be done to resolve this ?
There's a bug on line 42.

--
+-------------------------+--------------------+-----------------------+
| Kenneth J. Brody | www.hvcomputer.com | #include |
| kenbrody/at\spamcop.net | www.fptech.com | <std_disclaimer.h|
+-------------------------+--------------------+-----------------------+
Don't e-mail me at: <mailto:Th*************@gmail.com>

Jun 27 '08 #3
pereges wrote:

<program "freezing" on "huge data" and millions of iterations>
I forgot to mention this happened while I was trying to print data.
Print where? To a disk file? To a flash drive? To a screen? Some other
device? To memory? What's the code for the print function[s]? What are
the data structures involved? Did you try compiler optimisations? Did
you try implementation specific I/O routines (which are sometimes
faster than standard C ones)? Did you profile the program?
I have seen it can't work for extremely huge data.
Can't work or works too slowly for your taste?

Unless you show us your current code and where exactly it's performance
is not meeting your expectations, there's absolutely nothing that can
be said other than the generic advice to buy faster storage devices and
faster, more powerful hardware.

Jun 27 '08 #4
pereges wrote:
>
ok so i have written a program in C where I am dealing with huge
data (millions and lots of iterations involved) and for some
reason the screen tends to freeze and I get no output every time
I execute it. However, I have tried to reduce the amount of data
and the program runs fine.

What could possibly be done to resolve this ?
On the information supplied, I suspect that simply reducing the
amount of data will fix the problem. I am unable to estimate how
much it should be reduced.

--
[mail]: Chuck F (cbfalconer at maineline dot net)
[page]: <http://cbfalconer.home.att.net>
Try the download section.

** Posted from http://www.teranews.com **
Jun 27 '08 #5
CBFalconer said:
pereges wrote:
>>
ok so i have written a program in C where I am dealing with huge
data (millions and lots of iterations involved) and for some
reason the screen tends to freeze and I get no output every time
I execute it. However, I have tried to reduce the amount of data
and the program runs fine.

What could possibly be done to resolve this ?

On the information supplied, I suspect that simply reducing the
amount of data will fix the problem. I am unable to estimate how
much it should be reduced.
In a similar vein, it was reported a few years ago that a computer program,
on being told that 90% of accidents in the home involved either the top
stair or the bottom stair and being asked what to do to reduce accidents,
suggested removing the top and bottom stairs.

C programs regularly have to deal with very large amounts of data, and many
of them do so with admirable efficiency. The large amount of data, then,
is *not* the cause of the problem. Rather, it is when large amounts of
data are being processed that the problem manifests itself. Therefore,
reducing the amount of data will not only *not* fix the problem, but will
actually hide it, making it *harder* to fix.

The proper solution is to find and fix the bug that is causing the problem.
The way to do /that/ is to reduce, not the amount of *data*, but the
amount of *code* - until the OP has the smallest compilable program that
reproduces the problem. It is often the case that, in preparing such a
program, the author of the code will find the problem. But if not, at
least he or she now has a minimal program that can be presented for
analysis by C experts, such as those who regularly haunt the corridors of
comp.lang.c. I commend this strategy to the OP.

--
Richard Heathfield <http://www.cpax.org.uk>
Email: -http://www. +rjh@
Google users: <http://www.cpax.org.uk/prg/writings/googly.php>
"Usenet is a strange place" - dmr 29 July 1999
Jun 27 '08 #6
On Apr 23, 5:00*pm, Richard Heathfield <r...@see.sig.invalidwrote:
CBFalconer said:
pereges wrote:
ok so i have written a program in C where I am dealing with huge
data (millions and lots of iterations involved) and for some
reason the screen tends to freeze and I get no output every time
I execute it. However, *I have tried to reduce the amount of data
and the program runs fine.
What could possibly be done to resolve this ?
On the information supplied, I suspect that simply reducing the
amount of data will fix the problem. *I am unable to estimate how
much it should be reduced.

In a similar vein, it was reported a few years ago that a computer program,
on being told that 90% of accidents in the home involved either the top
stair or the bottom stair and being asked what to do to reduce accidents,
suggested removing the top and bottom stairs.

C programs regularly have to deal with very large amounts of data, and many
of them do so with admirable efficiency. The large amount of data, then,
is *not* the cause of the problem. Rather, it is when large amounts of
data are being processed that the problem manifests itself. Therefore,
reducing the amount of data will not only *not* fix the problem, but will
actually hide it, making it *harder* to fix.

The proper solution is to find and fix the bug that is causing the problem..
The way to do /that/ is to reduce, not the amount of *data*, but the
amount of *code* - until the OP has the smallest compilable program that
reproduces the problem. It is often the case that, in preparing such a
program, the author of the code will find the problem. But if not, at
least he or she now has a minimal program that can be presented for
analysis by C experts, such as those who regularly haunt the corridors of
comp.lang.c. I commend this strategy to the OP.
I don't think we can give good advice until the OP actually states
what his exact problem is.
This:
ok so i have written a program in C where I am dealing with huge
data (millions and lots of iterations involved) and for some
reason the screen tends to freeze and I get no output every time
I execute it. However, I have tried to reduce the amount of data
and the program runs fine.
Does not really tell us anything.

Millions of records? In what format? What operations are performed
against the data? What is the actual underlying problem that is being
solved?

Probably, there is a good, inexpensive and compact solution and likely
there are prebuilt tools that will already accomplish the job (or get
most of the way there).

"Big data" that "seems to freeze" doesn't mean anything.
Jun 27 '08 #7
On Apr 23, 10:25 pm, santosh <santosh....@gmail.comwrote:
pereges wrote:

<program "freezing" on "huge data" and millions of iterations>
I forgot to mention this happened while I was trying to print data.

Print where? To a disk file? To a flash drive? To a screen? Some other
device? To memory? What's the code for the print function[s]? What are
the data structures involved? Did you try compiler optimisations? Did
you try implementation specific I/O routines (which are sometimes
faster than standard C ones)? Did you profile the program?
I have seen it can't work for extremely huge data.

Can't work or works too slowly for your taste?

Unless you show us your current code and where exactly it's performance
is not meeting your expectations, there's absolutely nothing that can
be said other than the generic advice to buy faster storage devices and
faster, more powerful hardware.

There are ~ 500 lines in the code. If you don't mind reading it I will
definetely post it.
I didn't post it for a reason.
Jun 27 '08 #8
>ok so i have written a program in C where I am dealing with huge
>data(millions and lots of iterations involved) and for some reason the
screen tends to freeze and I get no output every time I execute it.
However, I have tried to reduce the amount of data and the program
runs fine.
What could possibly be done to resolve this ?
Are you SURE that the screen freezes, and it's not just taking
a long time? (When in doubt, let it run over a weekend.)

You don't give a very good idea of what your program is doing, but
some hints that might apply:

Your program almost certainly has at least one bug.

Make sure that every call to malloc() is checked, and that you
report any calls that run out of memory. Also check if the behavior
changes if you change limits on the amount of memory the process
can allocate (e.g. 'ulimit').

Use any tools (like 'ps') you might have to see how large the program
is and whether it's swapping so much little CPU gets used but much
swapping is done.

If it's a multi-process program, you might be deadlocking on
allocation of swap/page space.

Make sure that you do not use more memory than you allocated (often
called "buffer overflow", although this problem is a bit more general
than a buffer overflow). This can be difficult to find. If you
corrupt the data malloc() uses to keep track of free memory,
subsequent calls to malloc() or free() might infinite loop.

Add some output statements to the program so you can see how far
it gets. Include something at the start of the program, and, say,
after you have read all the input but before you begin processing it.

Jun 27 '08 #9
On Thu, 24 Apr 2008 00:00:04 +0000, Richard Heathfield wrote:

In a similar vein, it was reported a few years ago that a computer
program, on being told that 90% of accidents in the home involved either
the top stair or the bottom stair and being asked what to do to reduce
accidents, suggested removing the top and bottom stairs.

C programs regularly have to deal with very large amounts of data, and
many of them do so with admirable efficiency. The large amount of data,
then, is *not* the cause of the problem. Rather, it is when large
amounts of data are being processed that the problem manifests itself.
Therefore, reducing the amount of data will not only *not* fix the
problem, but will actually hide it, making it *harder* to fix.

The proper solution is to find and fix the bug that is causing the
problem. The way to do /that/ is to reduce, not the amount of *data*,
but the amount of *code* - until the OP has the smallest compilable
program that reproduces the problem. It is often the case that, in
preparing such a program, the author of the code will find the problem.
But if not, at least he or she now has a minimal program that can be
presented for analysis by C experts, such as those who regularly haunt
the corridors of comp.lang.c. I commend this strategy to the OP.

OMG, I am sure this is one of the best advices of
doing Software-Construction.
--
http://lispmachine.wordpress.com/
my email ID is at the above address

Jun 27 '08 #10
On Wed, 23 Apr 2008 20:16:25 -0700, pereges wrote:

There are ~ 500 lines in the code. If you don't mind reading it I will
definetely post it.
I didn't post it for a reason.

I know that. As Richard Heathfield said find and post the smallest
compilable unit.

--
http://lispmachine.wordpress.com/
my email ID is at the above address

Jun 27 '08 #11
pereges wrote:
santosh <santosh....@gmail.comwrote:
.... snip ...
>
>Unless you show us your current code and where exactly it's
performance is not meeting your expectations, there's absolutely
nothing that can be said other than the generic advice to buy
faster storage devices and faster, more powerful hardware.

There are ~ 500 lines in the code. If you don't mind reading it I
will definetely post it. I didn't post it for a reason.
Then you have some work to do. Cut it down to a compilable and
runnable program of 100 to 200 lines that has the same fault.
After that, if you haven't found the problem in the process,
publish the result together with the input data and fault.

--
[mail]: Chuck F (cbfalconer at maineline dot net)
[page]: <http://cbfalconer.home.att.net>
Try the download section.
** Posted from http://www.teranews.com **
Jun 27 '08 #12

"pereges" <Br*****@gmail.comwrote in message
news:9f**********************************@h1g2000p rh.googlegroups.com...
ok so i have written a program in C where I am dealing with huge
data(millions and lots of iterations involved) and for some reason the
screen tends to freeze and I get no output every time I execute it.
However, I have tried to reduce the amount of data and the program
runs fine.
What could possibly be done to resolve this ?
Do you expect the execution time to increase in proportion to the amount of
data?

What are the timings for N=10 (where N is some measure of the amount of
data)?. N=100, 1000, 10K, 1M, etc? What do you mean by huge anyway, how much
data are we talking about?

At what level of N does it stop working? What did you expect the execution
time to be? Does the machine make noises like lots of disk activity
(assuming you are not dealing with disk i/o anyway)? Sometimes when you
exceed machine memory everything gets a lot slower.

Can you measure what resources are being used at each point, like memory?

Your code is only 500 lines. Can you put print statements in to show what's
happening? Not for every iteration, but maybe only when N>X, some limit
above which you know it fails. Or after 100ms have passed since the last
output, etc.

(You mentioned you are printing to the screen anyway; so maybe you can tell
from the output, what point in the execution it has reached and can put in
extra debug output.)

It sounds like above a certain level of data, some limit or resource is
being exceeded, causing it to hang, or perhaps entering an endless loop
(those are a little different, I think..).

--
Bartc

Jun 27 '08 #13
On 24 Apr, 06:11, gordonb.ec...@burditt.org (Gordon Burditt) wrote:
ok so i have written a program in C where I am dealing with huge
data(millions and lots of iterations involved) and for some reason the
screen tends to freeze and I get no output every time I execute it.
However, *I have tried to reduce the amount of data and the program
runs fine.
What could possibly be done to resolve this ?
Are you SURE that the screen freezes, and it's not just taking
a long time? *(When in doubt, let it run over a weekend.)
sounds like it's just very slow

You don't give a very good idea of what your program is doing, but
some hints that might apply:

Your program almost certainly has at least one bug.
just on the principle that all programs have at least one bug?

Make sure that every call to malloc() is checked, and that you
report any calls that run out of memory. *Also check if the behavior
changes if you change limits on the amount of memory the process
can allocate (e.g. 'ulimit').

Use any tools (like 'ps') you might have to see how large the program
is and whether it's swapping so much little CPU gets used but much
swapping is done.

If it's a multi-process program, you might be deadlocking on
allocation of swap/page space.

Make sure that you do not use more memory than you allocated (often
called "buffer overflow", although this problem is a bit more general
than a buffer overflow). *This can be difficult to find. *If you
corrupt the data malloc() uses to keep track of free memory,
subsequent calls to malloc() or free() might infinite loop.

Add some output statements to the program so you can see how far
it gets. *Include something at the start of the program, and, say,
after you have read all the input but before you begin processing it.
maybe even consider a profiler
--
Nick Keighley

I'd rather write programs to write programs than write programs
Jun 27 '08 #14
freeing (using free) the memory allocated(using malloc()) has
certainly improved the performance of my program and now gives output
for even larger data. but still there are issues. i will post a
minimal version of my code later.
Jun 27 '08 #15
pereges wrote:
freeing (using free) the memory allocated(using malloc()) has
certainly improved the performance of my program and now gives output
for even larger data. but still there are issues. i will post a
minimal version of my code later.
This suggests that the slowdown was due to insufficient free memory and
the consequent "thrashing" that most OSes suffer under such conditions.
It may be that you could improve overall efficiency by using mmap
instead of malloc for your data file. Note that mmap is not part of
standard C (though it's functionally implemented under most of the
major mainstream OSes). For help with it please ask in a system
specific group like comp.unix.programmer.

Jun 27 '08 #16
On Apr 24, 7:30*am, santosh <santosh....@gmail.comwrote:
pereges wrote:
freeing (using free) the memory allocated(using malloc()) has
certainly improved the performance of my program and now gives output
for even larger data. but still there are issues. i will post a
minimal version of my code later.

This suggests that the slowdown was due to insufficient free memory and
the consequent "thrashing" that most OSes suffer under such conditions.
It may be that you could improve overall efficiency by using mmap
instead of malloc for your data file. Note that mmap is not part of
standard C (though it's functionally implemented under most of the
major mainstream OSes). For help with it please ask in a system
specific group like comp.unix.programmer.
I think it is a mistake to offer advice before clearly understanding
the problem.

There may be a triply nested loop that makes the problem O(N^3) in
which case it is scale of calculation that is the problem and almost
certainly the solution will be to modify the algorithm.

Besides, mmap() will not make any real difference if the file is
already completely loaded into memory. It will only be a convenience
if we need to page portions of it. If we are just reading a file
serially, the operating system buffers (assuming buffered I/O) will
have the same effect as paging through a memory map with less fuss.
If random access is needed in blocky chunks, then mmap() is ideal, but
we don't know that yet.

IMO-YMMV.
Jun 27 '08 #17

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

44 posts views Thread by flyingfred0 | last post: by
3 posts views Thread by Esger Abbink | last post: by
3 posts views Thread by MFRASER | last post: by
5 posts views Thread by amanatio | last post: by
29 posts views Thread by Tom wilson | last post: by
6 posts views Thread by Daniel Walzenbach | last post: by
66 posts views Thread by Johan Tibell | last post: by
reply views Thread by Salome Sato | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.