473,394 Members | 1,738 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,394 software developers and data experts.

dealing with huge data

ok so i have written a program in C where I am dealing with huge
data(millions and lots of iterations involved) and for some reason the
screen tends to freeze and I get no output every time I execute it.
However, I have tried to reduce the amount of data and the program
runs fine.
What could possibly be done to resolve this ?
Jun 27 '08 #1
16 2013
I forgot to mention this happened while I was trying to print data.

I have seen it can't work for extremely huge data.

Jun 27 '08 #2
pereges wrote:
>
ok so i have written a program in C where I am dealing with huge
data(millions and lots of iterations involved) and for some reason the
screen tends to freeze and I get no output every time I execute it.
However, I have tried to reduce the amount of data and the program
runs fine.
What could possibly be done to resolve this ?
There's a bug on line 42.

--
+-------------------------+--------------------+-----------------------+
| Kenneth J. Brody | www.hvcomputer.com | #include |
| kenbrody/at\spamcop.net | www.fptech.com | <std_disclaimer.h|
+-------------------------+--------------------+-----------------------+
Don't e-mail me at: <mailto:Th*************@gmail.com>

Jun 27 '08 #3
pereges wrote:

<program "freezing" on "huge data" and millions of iterations>
I forgot to mention this happened while I was trying to print data.
Print where? To a disk file? To a flash drive? To a screen? Some other
device? To memory? What's the code for the print function[s]? What are
the data structures involved? Did you try compiler optimisations? Did
you try implementation specific I/O routines (which are sometimes
faster than standard C ones)? Did you profile the program?
I have seen it can't work for extremely huge data.
Can't work or works too slowly for your taste?

Unless you show us your current code and where exactly it's performance
is not meeting your expectations, there's absolutely nothing that can
be said other than the generic advice to buy faster storage devices and
faster, more powerful hardware.

Jun 27 '08 #4
pereges wrote:
>
ok so i have written a program in C where I am dealing with huge
data (millions and lots of iterations involved) and for some
reason the screen tends to freeze and I get no output every time
I execute it. However, I have tried to reduce the amount of data
and the program runs fine.

What could possibly be done to resolve this ?
On the information supplied, I suspect that simply reducing the
amount of data will fix the problem. I am unable to estimate how
much it should be reduced.

--
[mail]: Chuck F (cbfalconer at maineline dot net)
[page]: <http://cbfalconer.home.att.net>
Try the download section.

** Posted from http://www.teranews.com **
Jun 27 '08 #5
CBFalconer said:
pereges wrote:
>>
ok so i have written a program in C where I am dealing with huge
data (millions and lots of iterations involved) and for some
reason the screen tends to freeze and I get no output every time
I execute it. However, I have tried to reduce the amount of data
and the program runs fine.

What could possibly be done to resolve this ?

On the information supplied, I suspect that simply reducing the
amount of data will fix the problem. I am unable to estimate how
much it should be reduced.
In a similar vein, it was reported a few years ago that a computer program,
on being told that 90% of accidents in the home involved either the top
stair or the bottom stair and being asked what to do to reduce accidents,
suggested removing the top and bottom stairs.

C programs regularly have to deal with very large amounts of data, and many
of them do so with admirable efficiency. The large amount of data, then,
is *not* the cause of the problem. Rather, it is when large amounts of
data are being processed that the problem manifests itself. Therefore,
reducing the amount of data will not only *not* fix the problem, but will
actually hide it, making it *harder* to fix.

The proper solution is to find and fix the bug that is causing the problem.
The way to do /that/ is to reduce, not the amount of *data*, but the
amount of *code* - until the OP has the smallest compilable program that
reproduces the problem. It is often the case that, in preparing such a
program, the author of the code will find the problem. But if not, at
least he or she now has a minimal program that can be presented for
analysis by C experts, such as those who regularly haunt the corridors of
comp.lang.c. I commend this strategy to the OP.

--
Richard Heathfield <http://www.cpax.org.uk>
Email: -http://www. +rjh@
Google users: <http://www.cpax.org.uk/prg/writings/googly.php>
"Usenet is a strange place" - dmr 29 July 1999
Jun 27 '08 #6
On Apr 23, 5:00*pm, Richard Heathfield <r...@see.sig.invalidwrote:
CBFalconer said:
pereges wrote:
ok so i have written a program in C where I am dealing with huge
data (millions and lots of iterations involved) and for some
reason the screen tends to freeze and I get no output every time
I execute it. However, *I have tried to reduce the amount of data
and the program runs fine.
What could possibly be done to resolve this ?
On the information supplied, I suspect that simply reducing the
amount of data will fix the problem. *I am unable to estimate how
much it should be reduced.

In a similar vein, it was reported a few years ago that a computer program,
on being told that 90% of accidents in the home involved either the top
stair or the bottom stair and being asked what to do to reduce accidents,
suggested removing the top and bottom stairs.

C programs regularly have to deal with very large amounts of data, and many
of them do so with admirable efficiency. The large amount of data, then,
is *not* the cause of the problem. Rather, it is when large amounts of
data are being processed that the problem manifests itself. Therefore,
reducing the amount of data will not only *not* fix the problem, but will
actually hide it, making it *harder* to fix.

The proper solution is to find and fix the bug that is causing the problem..
The way to do /that/ is to reduce, not the amount of *data*, but the
amount of *code* - until the OP has the smallest compilable program that
reproduces the problem. It is often the case that, in preparing such a
program, the author of the code will find the problem. But if not, at
least he or she now has a minimal program that can be presented for
analysis by C experts, such as those who regularly haunt the corridors of
comp.lang.c. I commend this strategy to the OP.
I don't think we can give good advice until the OP actually states
what his exact problem is.
This:
ok so i have written a program in C where I am dealing with huge
data (millions and lots of iterations involved) and for some
reason the screen tends to freeze and I get no output every time
I execute it. However, I have tried to reduce the amount of data
and the program runs fine.
Does not really tell us anything.

Millions of records? In what format? What operations are performed
against the data? What is the actual underlying problem that is being
solved?

Probably, there is a good, inexpensive and compact solution and likely
there are prebuilt tools that will already accomplish the job (or get
most of the way there).

"Big data" that "seems to freeze" doesn't mean anything.
Jun 27 '08 #7
On Apr 23, 10:25 pm, santosh <santosh....@gmail.comwrote:
pereges wrote:

<program "freezing" on "huge data" and millions of iterations>
I forgot to mention this happened while I was trying to print data.

Print where? To a disk file? To a flash drive? To a screen? Some other
device? To memory? What's the code for the print function[s]? What are
the data structures involved? Did you try compiler optimisations? Did
you try implementation specific I/O routines (which are sometimes
faster than standard C ones)? Did you profile the program?
I have seen it can't work for extremely huge data.

Can't work or works too slowly for your taste?

Unless you show us your current code and where exactly it's performance
is not meeting your expectations, there's absolutely nothing that can
be said other than the generic advice to buy faster storage devices and
faster, more powerful hardware.

There are ~ 500 lines in the code. If you don't mind reading it I will
definetely post it.
I didn't post it for a reason.
Jun 27 '08 #8
>ok so i have written a program in C where I am dealing with huge
>data(millions and lots of iterations involved) and for some reason the
screen tends to freeze and I get no output every time I execute it.
However, I have tried to reduce the amount of data and the program
runs fine.
What could possibly be done to resolve this ?
Are you SURE that the screen freezes, and it's not just taking
a long time? (When in doubt, let it run over a weekend.)

You don't give a very good idea of what your program is doing, but
some hints that might apply:

Your program almost certainly has at least one bug.

Make sure that every call to malloc() is checked, and that you
report any calls that run out of memory. Also check if the behavior
changes if you change limits on the amount of memory the process
can allocate (e.g. 'ulimit').

Use any tools (like 'ps') you might have to see how large the program
is and whether it's swapping so much little CPU gets used but much
swapping is done.

If it's a multi-process program, you might be deadlocking on
allocation of swap/page space.

Make sure that you do not use more memory than you allocated (often
called "buffer overflow", although this problem is a bit more general
than a buffer overflow). This can be difficult to find. If you
corrupt the data malloc() uses to keep track of free memory,
subsequent calls to malloc() or free() might infinite loop.

Add some output statements to the program so you can see how far
it gets. Include something at the start of the program, and, say,
after you have read all the input but before you begin processing it.

Jun 27 '08 #9
On Thu, 24 Apr 2008 00:00:04 +0000, Richard Heathfield wrote:

In a similar vein, it was reported a few years ago that a computer
program, on being told that 90% of accidents in the home involved either
the top stair or the bottom stair and being asked what to do to reduce
accidents, suggested removing the top and bottom stairs.

C programs regularly have to deal with very large amounts of data, and
many of them do so with admirable efficiency. The large amount of data,
then, is *not* the cause of the problem. Rather, it is when large
amounts of data are being processed that the problem manifests itself.
Therefore, reducing the amount of data will not only *not* fix the
problem, but will actually hide it, making it *harder* to fix.

The proper solution is to find and fix the bug that is causing the
problem. The way to do /that/ is to reduce, not the amount of *data*,
but the amount of *code* - until the OP has the smallest compilable
program that reproduces the problem. It is often the case that, in
preparing such a program, the author of the code will find the problem.
But if not, at least he or she now has a minimal program that can be
presented for analysis by C experts, such as those who regularly haunt
the corridors of comp.lang.c. I commend this strategy to the OP.

OMG, I am sure this is one of the best advices of
doing Software-Construction.
--
http://lispmachine.wordpress.com/
my email ID is at the above address

Jun 27 '08 #10
On Wed, 23 Apr 2008 20:16:25 -0700, pereges wrote:

There are ~ 500 lines in the code. If you don't mind reading it I will
definetely post it.
I didn't post it for a reason.

I know that. As Richard Heathfield said find and post the smallest
compilable unit.

--
http://lispmachine.wordpress.com/
my email ID is at the above address

Jun 27 '08 #11
pereges wrote:
santosh <santosh....@gmail.comwrote:
.... snip ...
>
>Unless you show us your current code and where exactly it's
performance is not meeting your expectations, there's absolutely
nothing that can be said other than the generic advice to buy
faster storage devices and faster, more powerful hardware.

There are ~ 500 lines in the code. If you don't mind reading it I
will definetely post it. I didn't post it for a reason.
Then you have some work to do. Cut it down to a compilable and
runnable program of 100 to 200 lines that has the same fault.
After that, if you haven't found the problem in the process,
publish the result together with the input data and fault.

--
[mail]: Chuck F (cbfalconer at maineline dot net)
[page]: <http://cbfalconer.home.att.net>
Try the download section.
** Posted from http://www.teranews.com **
Jun 27 '08 #12

"pereges" <Br*****@gmail.comwrote in message
news:9f**********************************@h1g2000p rh.googlegroups.com...
ok so i have written a program in C where I am dealing with huge
data(millions and lots of iterations involved) and for some reason the
screen tends to freeze and I get no output every time I execute it.
However, I have tried to reduce the amount of data and the program
runs fine.
What could possibly be done to resolve this ?
Do you expect the execution time to increase in proportion to the amount of
data?

What are the timings for N=10 (where N is some measure of the amount of
data)?. N=100, 1000, 10K, 1M, etc? What do you mean by huge anyway, how much
data are we talking about?

At what level of N does it stop working? What did you expect the execution
time to be? Does the machine make noises like lots of disk activity
(assuming you are not dealing with disk i/o anyway)? Sometimes when you
exceed machine memory everything gets a lot slower.

Can you measure what resources are being used at each point, like memory?

Your code is only 500 lines. Can you put print statements in to show what's
happening? Not for every iteration, but maybe only when N>X, some limit
above which you know it fails. Or after 100ms have passed since the last
output, etc.

(You mentioned you are printing to the screen anyway; so maybe you can tell
from the output, what point in the execution it has reached and can put in
extra debug output.)

It sounds like above a certain level of data, some limit or resource is
being exceeded, causing it to hang, or perhaps entering an endless loop
(those are a little different, I think..).

--
Bartc

Jun 27 '08 #13
On 24 Apr, 06:11, gordonb.ec...@burditt.org (Gordon Burditt) wrote:
ok so i have written a program in C where I am dealing with huge
data(millions and lots of iterations involved) and for some reason the
screen tends to freeze and I get no output every time I execute it.
However, *I have tried to reduce the amount of data and the program
runs fine.
What could possibly be done to resolve this ?
Are you SURE that the screen freezes, and it's not just taking
a long time? *(When in doubt, let it run over a weekend.)
sounds like it's just very slow

You don't give a very good idea of what your program is doing, but
some hints that might apply:

Your program almost certainly has at least one bug.
just on the principle that all programs have at least one bug?

Make sure that every call to malloc() is checked, and that you
report any calls that run out of memory. *Also check if the behavior
changes if you change limits on the amount of memory the process
can allocate (e.g. 'ulimit').

Use any tools (like 'ps') you might have to see how large the program
is and whether it's swapping so much little CPU gets used but much
swapping is done.

If it's a multi-process program, you might be deadlocking on
allocation of swap/page space.

Make sure that you do not use more memory than you allocated (often
called "buffer overflow", although this problem is a bit more general
than a buffer overflow). *This can be difficult to find. *If you
corrupt the data malloc() uses to keep track of free memory,
subsequent calls to malloc() or free() might infinite loop.

Add some output statements to the program so you can see how far
it gets. *Include something at the start of the program, and, say,
after you have read all the input but before you begin processing it.
maybe even consider a profiler
--
Nick Keighley

I'd rather write programs to write programs than write programs
Jun 27 '08 #14
freeing (using free) the memory allocated(using malloc()) has
certainly improved the performance of my program and now gives output
for even larger data. but still there are issues. i will post a
minimal version of my code later.
Jun 27 '08 #15
pereges wrote:
freeing (using free) the memory allocated(using malloc()) has
certainly improved the performance of my program and now gives output
for even larger data. but still there are issues. i will post a
minimal version of my code later.
This suggests that the slowdown was due to insufficient free memory and
the consequent "thrashing" that most OSes suffer under such conditions.
It may be that you could improve overall efficiency by using mmap
instead of malloc for your data file. Note that mmap is not part of
standard C (though it's functionally implemented under most of the
major mainstream OSes). For help with it please ask in a system
specific group like comp.unix.programmer.

Jun 27 '08 #16
On Apr 24, 7:30*am, santosh <santosh....@gmail.comwrote:
pereges wrote:
freeing (using free) the memory allocated(using malloc()) has
certainly improved the performance of my program and now gives output
for even larger data. but still there are issues. i will post a
minimal version of my code later.

This suggests that the slowdown was due to insufficient free memory and
the consequent "thrashing" that most OSes suffer under such conditions.
It may be that you could improve overall efficiency by using mmap
instead of malloc for your data file. Note that mmap is not part of
standard C (though it's functionally implemented under most of the
major mainstream OSes). For help with it please ask in a system
specific group like comp.unix.programmer.
I think it is a mistake to offer advice before clearly understanding
the problem.

There may be a triply nested loop that makes the problem O(N^3) in
which case it is scale of calculation that is the problem and almost
certainly the solution will be to modify the algorithm.

Besides, mmap() will not make any real difference if the file is
already completely loaded into memory. It will only be a convenience
if we need to page portions of it. If we are just reading a file
serially, the operating system buffers (assuming buffered I/O) will
have the same effect as paging through a memory map with less fuss.
If random access is needed in blocky chunks, then mmap() is ideal, but
we don't know that yet.

IMO-YMMV.
Jun 27 '08 #17

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

44
by: flyingfred0 | last post by:
A small software team (developers, leads and even the manager when he's had time) has been using (wx)Python/PostgreSQL for over 2 years and developed a successful 1.0 release of a client/server...
0
by: hakhan | last post by:
Hello, I need to store huge(+/- 100MB) data. Furthermore, my GUI application must select data portions from these huge data files in order to do some post-processing. I wonder in which format I...
3
by: Esger Abbink | last post by:
Hello, it is very possible that this is a well described problem, but I have not been able to find the solution. On two production server (7.2rc2) of ours the data directory is growing to...
3
by: MFRASER | last post by:
I have to import a CSV file into my system and I am currently spliting the rows and then the columns and inserting that data into my objects. This is very slow, does anyone have another way of...
5
by: amanatio | last post by:
I have a huge form with many data bound controls on it and 34 tables in database (and of course 34 data adapters and 34 datasets). The form is extremely slow to design (huge delay when I go to code...
29
by: Tom wilson | last post by:
I can't believe this is such an impossibility... I have an asp.net page. It accepts data through on form fields and includes a submit button. The page loads up and you fill out some stuff. ...
6
by: Daniel Walzenbach | last post by:
Hi, I have a web application which sometimes throws an “out of memory” exception. To get an idea what happens I traced some values using performance monitor and got the following values (for...
66
by: Johan Tibell | last post by:
I've written a piece of code that uses sockets a lot (I know that sockets aren't portable C, this is not a question about sockets per se). Much of my code ended up looking like this: if...
0
by: raylopez99 | last post by:
I ran afoul of this Compiler error CS1612 recently, when trying to modify a Point, which I had made have a property. It's pointless to do this (initially it will compile, but you'll run into...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.