473,395 Members | 1,466 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,395 software developers and data experts.

Out of memory with 2-dimensional float/double matrix

Hello,

I´m using a very large 2-dimensional double matrix (45.000x45.000)
in my program.

At the initialization of the matrix:

double[,] matrix = new double[45000,45000] I am getting an out of memory
error.

The program is running on a Windows 2003 Server with 8 GB Ram
and 20 GB virtual Ram.

So I thought this must be enough.
If I am sure, a double value uses 8 Byte.
So the matrix must be 45000*45000*8Byte = 15.087 GB.

Is this right?

My first question is now... must a matrix in C# fit in the Main Memory
(8 GB) or are party of the matrix automatically copied to the virtual
Memory if the matrix is too large?

In other words... can I handle a matrix which takes more than 8 GB Ram?

I also tried to use float instead of double but I am getting the same error.

Is there something smaller than a float which I could use?
I need 4 signs behind the comma.
I am logged in as normal user at the Windows 2003 Server.
Do I maybe have some virtual resource restrictions?
It is a "standard-installation".

I would be happy if somebody has an idea how to go on bugtracking my
problem.
Regards,

Martin
Dec 16 '06 #1
21 4608
Martin,

you didn't state whether your C# application is 32-bit or 64-bit, but if it
is 32-bit (which is the default, one could say), then you are trying to do
something that is not possible in 32-bit applications.

That is, each 32-bit application can only address 2^32 = 4 gigabytes of
virtual address space, of which at least one gigabyte (1 GB, by default 2
GB) is reserved to the operating system. It doesn't matter how much memory
in total your server has, you can't allocate more than 2 or 3 GB of memory
in a single application. See here:

http://msdn2.microsoft.com/en-gb/library/aa366912.aspx

If your application is a 64-bit one, then your memory allocation should
succeed given that the server has enough free memory and you've specified
your application as being "large address aware". If you are in the 32-bit
world, then there's no other option than to divide your array into smaller
parts, or use for instance a file/database to store the data.

Hope this helps.

--
Regards,

Mr. Jani Järvinen
C# MVP
Helsinki, Finland
ja***@removethis.dystopia.fi
http://www.saunalahti.fi/janij/
Dec 16 '06 #2
Hello Jani,

thank you very much for your help.
you didn't state whether your C# application is 32-bit or 64-bit, but if it
is 32-bit (which is the default, one could say), then you are trying to do
something that is not possible in 32-bit applications.
Can you tell me how to check this?

The server has 4xIntel Xeon 3.4 Ghz CPUs and the Operation System is
Windows 2003 Enterprise with SP 1.0

I´ve read that Xeon is a 32-Bit CPU but there are versions with the
EM64T Extension.

How can I check if my CPU has a EM64T Extension or not?

And what about the Win2003 Enterprise OS.
How can I check if it is a 64Bit Version of Win2003 or not?
Regards,
Martin
Dec 16 '06 #3
"Jani Järvinen [MVP]" <ja***@removethis.dystopia.fiwrote in message
news:Om**************@TK2MSFTNGP06.phx.gbl...
Martin,

you didn't state whether your C# application is 32-bit or 64-bit, but if it is 32-bit
(which is the default, one could say), then you are trying to do something that is not
possible in 32-bit applications.

That is, each 32-bit application can only address 2^32 = 4 gigabytes of virtual address
space, of which at least one gigabyte (1 GB, by default 2 GB) is reserved to the operating
system. It doesn't matter how much memory in total your server has, you can't allocate
more than 2 or 3 GB of memory in a single application. See here:

http://msdn2.microsoft.com/en-gb/library/aa366912.aspx

If your application is a 64-bit one, then your memory allocation should succeed given that
the server has enough free memory and you've specified your application as being "large
address aware". If you are in the 32-bit world, then there's no other option than to
divide your array into smaller parts, or use for instance a file/database to store the
data.

Nope, even on 64 bit this application will fail, the maximum size of an object is limited to
2GB in all current versions of the CLR. Sure you can allocate 8 array's of 2GB each on 64
bit, but not a single array of 16GB.

Willy.

Dec 16 '06 #4
On Sat, 16 Dec 2006 12:13:11 +0100, Martin Pöpping
<ma******@despammed.comwrote:
>Hello,

I´m using a very large 2-dimensional double matrix (45.000x45.000)
in my program.

At the initialization of the matrix:

double[,] matrix = new double[45000,45000] I am getting an out of memory
error.

The program is running on a Windows 2003 Server with 8 GB Ram
and 20 GB virtual Ram.

So I thought this must be enough.
If I am sure, a double value uses 8 Byte.
So the matrix must be 45000*45000*8Byte = 15.087 GB.

Is this right?

My first question is now... must a matrix in C# fit in the Main Memory
(8 GB) or are party of the matrix automatically copied to the virtual
Memory if the matrix is too large?

In other words... can I handle a matrix which takes more than 8 GB Ram?

I also tried to use float instead of double but I am getting the same error.

Is there something smaller than a float which I could use?
I need 4 signs behind the comma.
I am logged in as normal user at the Windows 2003 Server.
Do I maybe have some virtual resource restrictions?
It is a "standard-installation".

I would be happy if somebody has an idea how to go on bugtracking my
problem.
Regards,

Martin
How many of the positions in your array are non-zero? You might be
able to save memory by using a sparse matrix representation.

Do you need random access to the array or can you process it serially?
Serial processing might allow you to keep it in a file rather than in
memory and just retrieve it one row at a time.

You could use a database to store the numbers, 16GB is not large for a
database. Just retrieve the numbers you need from the database as
required.

rossum

Dec 16 '06 #5
Martin,

I'm just curious. Why do you need a matrix that big? Maybe I or
someone else can offer other options.

Brian

Dec 16 '06 #6
Brian Gideon schrieb:
I'm just curious. Why do you need a matrix that big? Maybe I or
someone else can offer other options.
Hi Brian,

I´m doing a document "pre-clustering" with a graphical algorithm
(connected components).

That´s why I need a comparison of each document to each other.

If I have 45.000 documents f.e. I need a 45.000x45.000 document matrix.

I know that I can compress the size of the matrix by using an
upper triangle matrix, but after my calculations this would also do not
fit to my memory space.

So I really have to devide my matrix into several smaller matrixes.
Regards,

Martin
Dec 17 '06 #7

Martin Pöpping wrote:
Brian Gideon schrieb:
I'm just curious. Why do you need a matrix that big? Maybe I or
someone else can offer other options.

Hi Brian,

I´m doing a document "pre-clustering" with a graphical algorithm
(connected components).

That´s why I need a comparison of each document to each other.

If I have 45.000 documents f.e. I need a 45.000x45.000 document matrix.
Or you need some other structure that stores, for each document, its
relationship to every other document. Is there a reason that you need
the entire thing in memory all at once? Could you use some form of
indexed file to store the relationships? What kinds of operations are
you doing with the matrix elements, and how many operations do you need
to do? Is there some sort of performance requirement?

If all you need to do is store the results of comparing every document
with every other document, then I would think that a file would be a
better structure, given the amount of data you're talking about. If, on
the other hand, you need to perform complex traversals over the data
over and over again, you need things to be in memory.

Dec 17 '06 #8
Hi Martin,
there are also other ways you can go about figurung out which labels are
equivalent in your connected component algorithm, one would be to have a
single array of 45,000 items where each array elemnt is a linked list
containing all of the other labels that are equivalent, after the first pass
of your data you can then choose one label for each item in your list and do
a second pass through your data to rename the labels. I have done this a few
times when I have used this algorithm in image processing and it works very
well and is very fast. Just some other ideas instead of trying to build a
matrix of all possible combinations.

Mark.
--
http://www.markdawson.org
"Martin Pöpping" wrote:
Brian Gideon schrieb:
I'm just curious. Why do you need a matrix that big? Maybe I or
someone else can offer other options.

Hi Brian,

I´m doing a document "pre-clustering" with a graphical algorithm
(connected components).

That´s why I need a comparison of each document to each other.

If I have 45.000 documents f.e. I need a 45.000x45.000 document matrix.

I know that I can compress the size of the matrix by using an
upper triangle matrix, but after my calculations this would also do not
fit to my memory space.

So I really have to devide my matrix into several smaller matrixes.
Regards,

Martin
Dec 17 '06 #9
Martin Pöpping <ma******@despammed.comwrote:
>I´m doing a document "pre-clustering" with a graphical algorithm
(connected components).
That´s why I need a comparison of each document to each other.
If I have 45.000 documents f.e. I need a 45.000x45.000 document matrix.
(sorry I've only been to a couple of lectures on the topic and don't
really know the field) but this seems like the time where you start
reading up on algorithms and datastructures for solving the problem,
not the time to look up on language support for features.

What algorithm are you using? Your matrix seems O(n^2) but a random
glance through google gives me the impression that people are using
O(n.log n) or O(n) for their specialized clustering problems. One page
I saw reckoned they could cluster 100.000 documents of 10 variables in
under 50 minutes. And they say that an n^2 matrix would have taken
40gb but they did it in only 40mb.
http://www.clustan.com/clustering_large_datasets.html

--
Lucian
Dec 17 '06 #10
rossum schrieb:
How many of the positions in your array are non-zero? You might be
able to save memory by using a sparse matrix representation.
Yes that´s what I was planned to do.
Using a upper triangular Matrix[1]
You could use a database to store the numbers, 16GB is not large for a
database. Just retrieve the numbers you need from the database as
required.
Yes that is not a bad idea. Thank you.
Maybe I am trying this as a last option.
Regards,
Martin
[1] http://en.wikipedia.org/wiki/Triangular_matrix
Dec 18 '06 #11
Bruce Wood schrieb:
>I´m doing a document "pre-clustering" with a graphical algorithm
(connected components).

That´s why I need a comparison of each document to each other.

If I have 45.000 documents f.e. I need a 45.000x45.000 document matrix.
What kinds of operations are you doing with the matrix elements, and how many operations do you need
to do? Is there some sort of performance requirement?
With a threshold value I am konverting the matrix to a binary Matrix (0:
documents are not equal, 1: documents are (partially) equal).

After this I am using a graph algorithm to calculate the connected
components. So I think it would be better if there is a chance to keep
the values in memory.
Regards,
Martin

Dec 18 '06 #12
Hello!
How can I check if my CPU has a EM64T Extension or not?
You would need an utility that is able to display this information, or, you
would need to call the "CPUID" instruction in native code to get the
information. I once wrote a Delphi Win32 DLL that does this call and then a
C# managed wrapper around this DLL, but there are freeware utilities as
well.

For example CPU-Z (www.cpuid.com) should be able to tell.

--
Regards,

Mr. Jani Järvinen
C# MVP
Helsinki, Finland
ja***@removethis.dystopia.fi
http://www.saunalahti.fi/janij/
Dec 18 '06 #13
Hello!
Nope, even on 64 bit this application will fail, the maximum size of an
object is limited to 2GB in all current versions of the CLR.
Thanks for pointing this out Willy, I was thinking in terms of what the
platform can do. So if you wanted to do this in C#, you'd need to resort to
using P/Invoke and not managed code.

--
Regards,

Mr. Jani Järvinen
C# MVP
Helsinki, Finland
ja***@removethis.dystopia.fi
http://www.saunalahti.fi/janij/
Dec 18 '06 #14
I do agree with Lucian. Large 2 dimension array is quite unlikely. You
should look at what Collection classes can help you if you feel you do not
have one then I will suggest you to look at unmanage code to build the
algorithm to gain the performance.

chanmm

"Lucian Wischik" <lu***@wischik.comwrote in message
news:jl********************************@4ax.com...
Martin Pöpping <ma******@despammed.comwrote:
>>I´m doing a document "pre-clustering" with a graphical algorithm
(connected components).
That´s why I need a comparison of each document to each other.
If I have 45.000 documents f.e. I need a 45.000x45.000 document matrix.

(sorry I've only been to a couple of lectures on the topic and don't
really know the field) but this seems like the time where you start
reading up on algorithms and datastructures for solving the problem,
not the time to look up on language support for features.

What algorithm are you using? Your matrix seems O(n^2) but a random
glance through google gives me the impression that people are using
O(n.log n) or O(n) for their specialized clustering problems. One page
I saw reckoned they could cluster 100.000 documents of 10 variables in
under 50 minutes. And they say that an n^2 matrix would have taken
40gb but they did it in only 40mb.
http://www.clustan.com/clustering_large_datasets.html

--
Lucian
Dec 18 '06 #15
"Jani Järvinen [MVP]" <ja***@removethis.dystopia.fiwrote in message
news:ut**************@TK2MSFTNGP03.phx.gbl...
Hello!
>Nope, even on 64 bit this application will fail, the maximum size of an object is limited
to 2GB in all current versions of the CLR.

Thanks for pointing this out Willy, I was thinking in terms of what the platform can do.
So if you wanted to do this in C#, you'd need to resort to using P/Invoke and not managed
code.
Or C++/CLI in mixed mode.

Willy.

Dec 18 '06 #16
chanmm schrieb:
I do agree with Lucian. Large 2 dimension array is quite unlikely. You
should look at what Collection classes can help you if you feel you do
not have one then I will suggest you to look at unmanage code to build
the algorithm to gain the performance.
Unmanaged Code means I have to write a c++ DLL?

I tried it now with a single array and a linked list.
But I´m still having the same problem with out of memory.

Regards,
Martin
Dec 18 '06 #17
"Martin Pöpping" <ma******@despammed.comwrote in message
news:em**********@newsreader3.netcologne.de...
chanmm schrieb:
>I do agree with Lucian. Large 2 dimension array is quite unlikely. You should look at
what Collection classes can help you if you feel you do not have one then I will suggest
you to look at unmanage code to build the algorithm to gain the performance.

Unmanaged Code means I have to write a c++ DLL?

I tried it now with a single array and a linked list.
But I´m still having the same problem with out of memory.

Regards,
Martin
Like as been said before in this thread, you can't allocate such huge array's on Windows 32
bit, not from managed code nor from unmanaged code. A 32 bit process has simply not the
space available in the process heap to allocate more than a total of 2GB of data, more, it
has less than this available as contiguous space. The amount of free contiguous space varies
from application to application and even more important - varies at run time due to heap
fragmentation. Free *contiguous* heap space in general varies between ~1.2 and ~1.8 GB, but
on a 32 bit OS, you should never assume this to be guaranteed.

Willy.

Dec 18 '06 #18

chanmm wrote:
I do agree with Lucian. Large 2 dimension array is quite unlikely. You
should look at what Collection classes can help you if you feel you do not
have one then I will suggest you to look at unmanage code to build the
algorithm to gain the performance.

chanmm
I really hate to see someone move away from managed code because there
isn't a suitable collection class. And I just don't think the marginal
performance difference between managed and unmanaged code is going to
matter that much anyway.

Brian

Dec 18 '06 #19
Do you might to put your code here with some explanation or email it to me?
No promise but I can try to look at it.

chanmm

"Martin Pöpping" <ma******@despammed.comwrote in message
news:em**********@newsreader3.netcologne.de...
chanmm schrieb:
>I do agree with Lucian. Large 2 dimension array is quite unlikely. You
should look at what Collection classes can help you if you feel you do
not have one then I will suggest you to look at unmanage code to build
the algorithm to gain the performance.

Unmanaged Code means I have to write a c++ DLL?

I tried it now with a single array and a linked list.
But I´m still having the same problem with out of memory.

Regards,
Martin
Dec 19 '06 #20
Do you mind to put your code here with some explanation or email it to me?
No promise but I can try to look at it.

chanmm
"Martin Pöpping" <ma******@despammed.comwrote in message
news:em**********@newsreader3.netcologne.de...
chanmm schrieb:
>I do agree with Lucian. Large 2 dimension array is quite unlikely. You
should look at what Collection classes can help you if you feel you do
not have one then I will suggest you to look at unmanage code to build
the algorithm to gain the performance.

Unmanaged Code means I have to write a c++ DLL?

I tried it now with a single array and a linked list.
But I´m still having the same problem with out of memory.

Regards,
Martin
Dec 19 '06 #21
Jani Järvinen [MVP] <ja***@removethis.dystopia.fiwrote:
Nope, even on 64 bit this application will fail, the maximum size of an
object is limited to 2GB in all current versions of the CLR.
Thanks for pointing this out Willy, I was thinking in terms of what the
platform can do. So if you wanted to do this in C#, you'd need to resort to
using P/Invoke and not managed code.
You could use a jagged array instead of a truly multi-dimensional
array. That way you'd have 45,000 arrays of longs each of which would
be 8*45K=360K, and one array of arrays which would be 4*45K=180K.

--
Jon Skeet - <sk***@pobox.com>
http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too
Dec 19 '06 #22

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

0
by: Andreas Suurkuusk | last post by:
Hi, I just noticed your post in the "C# memory problem: no end for our problem?" thread. In the post you implied that I do not how the garbage collector works and that I mislead people. Since...
4
by: Frank Esser | last post by:
I am using SQL 8 Personal edition with sp2 applied. I set the max server memory to 32MB and leave the min server memory at 0. When my application starts hitting the database hard the memory usage...
4
by: Franklin Lee | last post by:
Hi All, I use new to allocate some memory,even I doesn't use delete to release them. When my Application exit, OS will release them. Am I right? If I'm right, how about Thread especally on...
9
by: Mike P | last post by:
I know everything about reference counting and making sure you don't have large objects lying around. I have also profiled my app with multiple tools. I know about the fact GC collects memory but...
22
by: xixi | last post by:
hi, we are using db2 udb v8.1 for windows, i have changed the buffer pool size to accommadate better performance, say size 200000, if i have multiple connection to the same database from...
14
by: Alessandro Monopoli | last post by:
Hi all, I'm searching a PORTABLE way to get the available and total physical memory. Something like "getTotalMemory" and it returns the memory installed on my PC in bytes, and...
1
by: Nick Craig-Wood | last post by:
I've been dumping a database in a python code format (for use with Python on S60 mobile phone actually) and I've noticed that it uses absolutely tons of memory as compared to how much the data...
5
by: kumarmdb2 | last post by:
Hi guys, For last few days we are getting out of private memory error. We have a development environment. We tried to figure out the problem but we believe that it might be related to the OS...
1
by: Jean-Paul Calderone | last post by:
On Tue, 22 Apr 2008 14:54:37 -0700 (PDT), yzghan@gmail.com wrote: The test doesn't demonstrate any leaks. It does demonstrate that memory usage can remain at or near peak memory usage even after...
5
by: cham | last post by:
Hi, I am working on c++ in a linux system ( Fedora core 4 ), kernel version - 2.6.11-1.1369_FC4 gcc version - 4.0.0 20050519 ( Red Hat 4.0.0-8 ) In my code i am creating a vector to store...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.