473,387 Members | 1,863 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,387 software developers and data experts.

Large data array (operations) via disk disk files

Hi,

I have a need to manipulate a large matrix, say, A(N,N) (of real) 8GB
which can't fit in physical memory (2 BG). But the nature of
computation
requires the operation on only a portion of the data, e.g. 500 MB (0.5
GB)
at a time.

The procedure is as follows:

1. Generate data and store the data in array A(N,N), N is HUGE.

2. Do computation on A in loops, e.g.

for i = 1, N
for j = 1, N
compute something using A (a portion)
end
end

How can I implement the procedure to accommodate the large memory
needs?

Thanks,
Zin

Nov 13 '06 #1
12 2448
ge******@gmail.com wrote:
Hi,

I have a need to manipulate a large matrix, say, A(N,N) (of real) 8GB
which can't fit in physical memory (2 BG). But the nature of
computation
requires the operation on only a portion of the data, e.g. 500 MB (0.5
GB)
at a time.

The procedure is as follows:

1. Generate data and store the data in array A(N,N), N is HUGE.

2. Do computation on A in loops, e.g.

for i = 1, N
for j = 1, N
compute something using A (a portion)
end
end

How can I implement the procedure to accommodate the large memory
needs?
Two solutions:

If performance is an issue, use a box with enough memory.

Otherwise use the memory mapped file support provided by your operating
environment and map the required portion of the matrix into memory. How
you do this will be OS specific and best asked on an OS group.

--
Ian Collins.
Nov 13 '06 #2

ge******@gmail.com wrote:
Hi,

I have a need to manipulate a large matrix, say, A(N,N) (of real) 8GB
which can't fit in physical memory (2 BG). But the nature of
computation
requires the operation on only a portion of the data, e.g. 500 MB (0.5
GB)
at a time.

The procedure is as follows:

1. Generate data and store the data in array A(N,N), N is HUGE.

2. Do computation on A in loops, e.g.

for i = 1, N
for j = 1, N
compute something using A (a portion)
end
end

How can I implement the procedure to accommodate the large memory
needs?

Thanks,
Zin
Two possibilities
1) if the data has a lot of missing elements or inferred constants
(like zero) as elements, then you could use sparse matrix processing,
where you have a marked link to each next element. Marking the element
means you identify the value with :-
either the previous and last element coordinates
or just the element's row and column number.
This needs the value of the cell and the coordinates of the cell (three
values) per cell.
Sometimes you can get by with just the column number and process by row
and so only need to note when a column index "1" appears for the next
counted row .
This problem is often attacked with generised linked list processing
routines.

These methods will use less memory space if the needed elements occupy
less than one third of the theoretical maximum (N*N), (or one half in
the linear case) but only will be really useful if the prortion is far
less, like one fifth or lower.

2) rework the algorithm you wish to use, so that it needs less elements
in memory at one time than the available memory, for the operation to
proceed.

If that doesn't work then use virtual memory by treating the disk as a
random access file by row, with all of a row in each "record" and try
to use an algorthm that processes on a column-within-row basis.

Nov 13 '06 #3
2. Do computation on A in loops, e.g.
>
for i = 1, N
for j = 1, N
compute something using A (a portion)
end
end

How can I implement the procedure to accommodate the large memory
needs?
I suspect you need to look up the keyword "blocking", perhaps together with
the words "array" or "matrix". That does require breaking up the naive
sequence of operations of your double loop above, and depending on the nature
of your "compute something", might require some careful thinking to get this
reordered sequence do the correct thing.

Jan
Nov 14 '06 #4
>
for i = 1, N
for j = 1, N
compute something using A (a portion)
end
end
You need to say more about how exactly
you want to "manipulate" your matrix ---
simplest manipulations (e.g. scaling) require exactly one
matrix element at a time.

If your algorithm really requires more than one matrix block
at a time --- split your matrix in manageable blocks that could
be addressed separately (swapped in or out). These blocks dont need
to reside within a single file, of course.

Alexei

Nov 14 '06 #5

<ge******@gmail.comwrote in message
news:11**********************@h48g2000cwc.googlegr oups.com...
Hi,

I have a need to manipulate a large matrix, say, A(N,N) (of real) 8GB
which can't fit in physical memory (2 BG). But the nature of
computation
requires the operation on only a portion of the data, e.g. 500 MB (0.5
GB)
at a time.

The procedure is as follows:

1. Generate data and store the data in array A(N,N), N is HUGE.

2. Do computation on A in loops, e.g.

for i = 1, N
for j = 1, N
compute something using A (a portion)
end
end

How can I implement the procedure to accommodate the large memory
needs?
Use a Windows's 98 or XP's virtual memory/pagefile of 10 GB or so ;)

Windows will do the rest.

Bye,
Skybuck.
Nov 14 '06 #6
Skybuck Flying wrote:
<ge******@gmail.comwrote in message
news:11**********************@h48g2000cwc.googlegr oups.com...
>>Hi,

I have a need to manipulate a large matrix, say, A(N,N) (of real) 8GB
which can't fit in physical memory (2 BG). But the nature of
computation
requires the operation on only a portion of the data, e.g. 500 MB (0.5
GB)
at a time.

The procedure is as follows:

1. Generate data and store the data in array A(N,N), N is HUGE.

2. Do computation on A in loops, e.g.

for i = 1, N
for j = 1, N
compute something using A (a portion)
end
end

How can I implement the procedure to accommodate the large memory
needs?


Use a Windows's 98 or XP's virtual memory/pagefile of 10 GB or so ;)

Windows will do the rest.

Bye,
Skybuck.

That is a spectacularly useless suggestion if the program is not
intended to run on a Windows machine... The OP doesn't say either way,
so you have a 50/50 chance at best.

Jim
Nov 14 '06 #7
J. F. Cornwall wrote:
Skybuck Flying wrote:
<ge******@gmail.comwrote in message
news:11**********************@h48g2000cwc.googlegr oups.com...
>Hi,

I have a need to manipulate a large matrix, say, A(N,N) (of real) 8GB
which can't fit in physical memory (2 BG). But the nature of
computation
requires the operation on only a portion of the data, e.g. 500 MB (0.5
GB)
at a time.

The procedure is as follows:

1. Generate data and store the data in array A(N,N), N is HUGE.

2. Do computation on A in loops, e.g.

for i = 1, N
for j = 1, N
compute something using A (a portion)
end
end

How can I implement the procedure to accommodate the large memory
needs?

Use a Windows's 98 or XP's virtual memory/pagefile of 10 GB or so ;)

Windows will do the rest.

Bye,
Skybuck.
That is a spectacularly useless suggestion if the program is not
intended to run on a Windows machine... The OP doesn't say either way,
so you have a 50/50 chance at best.
Besides, AFAIK, 32 bit versions of Windows don't support a 10 GB paging
file. Also only Windows has access to it.

Nov 14 '06 #8

"santosh" <sa*********@gmail.comwrote in message
news:11*********************@b28g2000cwb.googlegro ups.com...
J. F. Cornwall wrote:
>Skybuck Flying wrote:
<ge******@gmail.comwrote in message
news:11**********************@h48g2000cwc.googlegr oups.com...

Hi,

I have a need to manipulate a large matrix, say, A(N,N) (of real) 8GB
which can't fit in physical memory (2 BG). But the nature of
computation
requires the operation on only a portion of the data, e.g. 500 MB (0.5
GB)
at a time.

The procedure is as follows:

1. Generate data and store the data in array A(N,N), N is HUGE.

2. Do computation on A in loops, e.g.

for i = 1, N
for j = 1, N
compute something using A (a portion)
end
end

How can I implement the procedure to accommodate the large memory
needs?
Use a Windows's 98 or XP's virtual memory/pagefile of 10 GB or so ;)

Windows will do the rest.

Bye,
Skybuck.

That is a spectacularly useless suggestion if the program is not
intended to run on a Windows machine... The OP doesn't say either way,
so you have a 50/50 chance at best.
Neh more 99.9 procent chance ;)

Windows dominates >D =D
>
Besides, AFAIK, 32 bit versions of Windows don't support a 10 GB paging
file. Also only Windows has access to it.
Get a new PC.

I am using Windows XP 64 bit <- it totally sucks but hey it's the future ;)

Strangly enough I have two pagefiles each 8 GB.

One one each harddisk. (I have two harddisks)

I think Windows XP moved the pagefile to one of the disks.. or maybe it was
me lol.

So I allocated two pagefiles one on each disk just in case ;)

If I can allocate 8 GB I can probably allocate 10 GB's as well or even more
;)

Surely other operating systems have the same virtual memory future ? <- if
not gjez get a decent OS lol.

Bye,
Skybuck.
Nov 15 '06 #9
Skybuck Flying wrote
(in article <ej**********@news1.zwoll1.ov.home.nl>):
Get a new PC.
A Mac Pro would be a nice choice.
I am using Windows XP 64 bit <- it totally sucks but hey it's the future ;)
One of the rarest of all Skybuck statements, a correct one.
There are very few of these in the wild. You are correct, the
future for Windows users really does suck.
--
Randy Howard (2reply remove FOOBAR)
"The power of accurate observation is called cynicism by those
who have not got it." - George Bernard Shaw

Nov 15 '06 #10

"Randy Howard" <ra*********@FOOverizonBAR.netwrote in message
news:00*****************************@news.verizon. net...
Skybuck Flying wrote
(in article <ej**********@news1.zwoll1.ov.home.nl>):
>Get a new PC.

A Mac Pro would be a nice choice.
>I am using Windows XP 64 bit <- it totally sucks but hey it's the future
;)

One of the rarest of all Skybuck statements, a correct one.
There are very few of these in the wild. You are correct, the
future for Windows users really does suck.
It sucks now, it will improve in the future :D

So is the way of Microsoft, Software and Hardware manufacturers and there
new, still buggy, Windows drivers and components ;)

Bye,
Skybuck.
Nov 15 '06 #11
Specifically, the problem needs to be addressed on Linux, 64-bit. It
looks to me that one needs to deal with virtual memory directly.

Ideally, the solution will look like the following:

1. Map A to an addressible space.

2. Generate entries of A(i,j), i = 1..N, j = 1..N.

3. Compute something in the loops with reference to A(i,j), i = 1..N, j
= 1..N.

The fact that only a portion of A is used at a time during the
computation makes the use of memory mapped file a sound solution. But
one seems to have to keep track of the offset in the file in the
subsequent calls to mmap() when accessing to different portion of A,
which doesn't sound convenient, no?

Is there any better solution, such that one can do something as simple
as

A = vm_create(...)

in 1 above without worrying about memory limit as in the call to
malloc() so that the rest of the code can be implemented without extra
programming efforts in memory manipulation?

Thanks,
Zin

Ian Collins wrote:
ge******@gmail.com wrote:
Hi,

I have a need to manipulate a large matrix, say, A(N,N) (of real) 8GB
which can't fit in physical memory (2 BG). But the nature of
computation
requires the operation on only a portion of the data, e.g. 500 MB (0.5
GB)
at a time.

The procedure is as follows:

1. Generate data and store the data in array A(N,N), N is HUGE.

2. Do computation on A in loops, e.g.

for i = 1, N
for j = 1, N
compute something using A (a portion)
end
end

How can I implement the procedure to accommodate the large memory
needs?
Two solutions:

If performance is an issue, use a box with enough memory.

Otherwise use the memory mapped file support provided by your operating
environment and map the required portion of the matrix into memory. How
you do this will be OS specific and best asked on an OS group.

--
Ian Collins.
Nov 15 '06 #12
ge******@gmail.com wrote:
Specifically, the problem needs to be addressed on Linux, 64-bit. It
looks to me that one needs to deal with virtual memory directly.
What is the value of SIZE_MAX for your C implementation?
Ideally, the solution will look like the following:

1. Map A to an addressible space.

2. Generate entries of A(i,j), i = 1..N, j = 1..N.

3. Compute something in the loops with reference to A(i,j), i = 1..N, j
= 1..N.

The fact that only a portion of A is used at a time during the
computation makes the use of memory mapped file a sound solution. But
one seems to have to keep track of the offset in the file in the
subsequent calls to mmap() when accessing to different portion of A,
which doesn't sound convenient, no?

Is there any better solution, such that one can do something as simple
as

A = vm_create(...)

in 1 above without worrying about memory limit as in the call to
malloc() so that the rest of the code can be implemented without extra
programming efforts in memory manipulation?
On a properly implemented C compiler for a 64 bit environment, SIZE_MAX
should be sufficient for your needs. Have you tried using plain
malloc().

Generally, if your working set will fit within physical memory, then
just malloc() the whole amount and let the OS do the grunt work. Often
under Linux, memory won't actually be allocated until you write to it.
Also having fast a HDD for swap space will improve matters somewhat.

But if don't mind somewhat more involvement mmap() would probably be a
more suited solution, as with it, you can provide the OS with more
information of your actual memory needs and let it optimise itself
accordingly.

Nov 15 '06 #13

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

6
by: guillaume | last post by:
I have to read and process a large ASCII file containing a mesh : a list of points and triangles. The file is 100 MBytes. I first tried to do it in memory but I think I am running out of memory...
6
by: Chris | last post by:
I have a set of routines, the first of which reads lots and lots of data from disparate regions of disk. This read routine takes 40 minutes on a P3-866 (with IDE drives). This routine populates...
7
by: Joseph | last post by:
Hi, I'm having bit of questions on recursive pointer. I have following code that supports upto 8K files but when i do a file like 12K i get a segment fault. I Know it is in this line of code. ...
3
by: Wayne Marsh | last post by:
Hi all. I am working on an audio application which needs reasonably fast access to large amounts of data. For example, the program may load a 120 second stereo sound sample stored at 4bytes per...
10
by: Peter Duniho | last post by:
This is kind of a question about C# and kind of one about the framework. Hopefully, there's an answer in there somewhere. :) I'm curious about the status of 32-bit vs 64-bit in C# and the...
17
by: Luc Mercier | last post by:
Hi Folks, I'm new here, and I need some advice for what tool to use. I'm using XML for benchmarking purposes. I'm writing some scientific programs which I want to analyze. My program generates...
3
by: mediratta | last post by:
Hi, I want to allocate memory for a large matrix, whose size will be around 2.5 million x 17000. Three fourth of its rows will have all zeroes, but it is not known which will be those rows. If I...
4
by: raidvvan | last post by:
Hi there, We have been looking for some time now for a database system that can fit a large distributed computing project, but we haven't been able to find one. I was hoping that someone can...
25
by: tekctrl | last post by:
Anyone: I have a simple MSAccess DB which was created from an old ASCII flatfile. It works fine except for something that just started happening. I'll enter info in a record, save the record,...
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.