Creating and Accessing Very Big Arrays in C

djhulme

Hi,

I'm using GCC. Please could you tell me, what is the maximum number of
array elements that I can create in C, i.e.

char* anArray = (char*) calloc( ??MAX?? , sizeof(char) ) ;

I've managed to create arrays using DOUBLE data types, but when I try
to access the array, the compiler complains that the number is not an
INT, i.e.

// this succeeds
char* anArray = (char*) calloc( 999999999999999999 , sizeof(char) ) ;

// this does not compile because the number 1234567891234584 is not an
INT
anArray[1234567891234584] = 1 ;

Your help and advice would be sinceraly appreciated.

Thanking you in anticipation,

Daniel

Aug 8 '06 #1

Subscribe Post Reply

2950

Ian Collins

dj*****@gmail.com wrote:

Hi,

I'm using GCC. Please could you tell me, what is the maximum number of
array elements that I can create in C, i.e.

char* anArray = (char*) calloc( ??MAX?? , sizeof(char) ) ;

Depends on your system.

I've managed to create arrays using DOUBLE data types, but when I try
to access the array, the compiler complains that the number is not an
INT, i.e.

Should that be 'int'?

// this succeeds
char* anArray = (char*) calloc( 999999999999999999 , sizeof(char) ) ;

Realy?

// this does not compile because the number 1234567891234584 is not an
INT
anArray[1234567891234584] = 1 ;

If you can allocate more than INT_MAX elements, you should probably
write 1234567891234584LL to tell the compiler the constant is a long
long, otherwise it will default to int.

--
Ian Collins.

Aug 8 '06 #2

Old Wolf

Ian Collins wrote:

dj*****@gmail.com wrote:
// this does not compile because the number 1234567891234584 is not an
INT
anArray[1234567891234584] = 1 ;
If you can allocate more than INT_MAX elements, you should probably
write 1234567891234584LL to tell the compiler the constant is a long
long, otherwise it will default to int.

Constants always take on a type that can represent them
(unless there is no such type). In this case, the constant
will be a long long without you having to specify it --
assuming that the number exceeds LONG_MAX, and that
the compiler is in C99 mode.

I don't know where this myth came from that there is
something wrong with un-suffixed constants.

Aug 8 '06 #3

Ian Collins

Old Wolf wrote:

Ian Collins wrote:

>>dj*****@gmail.com wrote:

>>>// this does not compile because the number 1234567891234584 is not an
INT
anArray[1234567891234584] = 1 ;

If you can allocate more than INT_MAX elements, you should probably
write 1234567891234584LL to tell the compiler the constant is a long
long, otherwise it will default to int.

Constants always take on a type that can represent them
(unless there is no such type). In this case, the constant
will be a long long without you having to specify it --
assuming that the number exceeds LONG_MAX, and that
the compiler is in C99 mode.

I don't know where this myth came from that there is
something wrong with un-suffixed constants.

In my case, the dim and distant past!

Cheers,

--
Ian Collins.

Aug 8 '06 #4

Tom St Denis

dj*****@gmail.com wrote:

I've managed to create arrays using DOUBLE data types, but when I try
to access the array, the compiler complains that the number is not an
INT, i.e.

// this succeeds
char* anArray = (char*) calloc( 999999999999999999 , sizeof(char) ) ;

I doubt you are allocating what you think you are (hint: truncation).
Also why are you casting the return value?

How many times can people say "don't cast void pointers" before newbs
actually read the advice and do it?

// this does not compile because the number 1234567891234584 is not an
INT
anArray[1234567891234584] = 1 ;

Your help and advice would be sinceraly appreciated.

This usually is a sign you are trying to address objects larger than
your platform can natively address. E.g. on a 32-bit platform the
object has more than 4 billion entries. On a 64-bit platform it means
you have an object with 2^64 entries which you most certainly don't
have enough disk or memory for.

Why are you trying to use such large arrays?

There are ways of emulating sparsely populated arrays. Perhaps that's
what you want?

Tom

Aug 9 '06 #5

jaysome

On 8 Aug 2006 16:22:52 -0700, "Old Wolf" <ol*****@inspire.net.nz>
wrote:

>Ian Collins wrote:
>dj*****@gmail.com wrote:
// this does not compile because the number 1234567891234584 is not an
INT
anArray[1234567891234584] = 1 ;

If you can allocate more than INT_MAX elements, you should probably
write 1234567891234584LL to tell the compiler the constant is a long
long, otherwise it will default to int.

Constants always take on a type that can represent them
(unless there is no such type). In this case, the constant
will be a long long without you having to specify it --
assuming that the number exceeds LONG_MAX, and that
the compiler is in C99 mode.

I don't know where this myth came from that there is
something wrong with un-suffixed constants.

Which begs the question, why would one ever use a suffixed constant?

--
jay

Aug 9 '06 #6

jacob navia

jaysome wrote:

On 8 Aug 2006 16:22:52 -0700, "Old Wolf" <ol*****@inspire.net.nz>
wrote:

>>Ian Collins wrote:

>>>dj*****@gmail.com wrote:

// this does not compile because the number 1234567891234584 is not an
INT
anArray[1234567891234584] = 1 ;
If you can allocate more than INT_MAX elements, you should probably
write 1234567891234584LL to tell the compiler the constant is a long
long, otherwise it will default to int.

Constants always take on a type that can represent them
(unless there is no such type). In this case, the constant
will be a long long without you having to specify it --
assuming that the number exceeds LONG_MAX, and that
the compiler is in C99 mode.

I don't know where this myth came from that there is
something wrong with un-suffixed constants.

Which begs the question, why would one ever use a suffixed constant?

long double a = 1e500;

This is an error because 1e500 is a double constant and its maximum
value can be only around 1e300. To avoid
getting garbage you HAVE to put the 'L' like

long double a = 1e500L;

Aug 9 '06 #7

Harald van DÄ³k

jaysome wrote:

On 8 Aug 2006 16:22:52 -0700, "Old Wolf" <ol*****@inspire.net.nz>
wrote:
>>Constants always take on a type that can represent them
(unless there is no such type). In this case, the constant
will be a long long without you having to specify it --
assuming that the number exceeds LONG_MAX, and that
the compiler is in C99 mode.

I don't know where this myth came from that there is
something wrong with un-suffixed constants.

Which begs the question, why would one ever use a suffixed constant?

long long int i = 1LL << 40;

Aug 9 '06 #8

Keith Thompson

jacob navia <ja***@jacob.remcomp.frwrites:

jaysome wrote:
>On 8 Aug 2006 16:22:52 -0700, "Old Wolf" <ol*****@inspire.net.nz>
wrote:

[...]

>>>I don't know where this myth came from that there is
something wrong with un-suffixed constants.
Which begs the question, why would one ever use a suffixed constant?

long double a = 1e500;

This is an error because 1e500 is a double constant and its maximum
value can be only around 1e300. To avoid
getting garbage you HAVE to put the 'L' like

long double a = 1e500L;

Interesting point. Integer literals have a type that depends on the
value of the literal and the range of the type; floating-point
literals don't. I suppose that since floating-point has both range
and precision, it would be more difficult to construct a consistent
set of rules.

(If DBL_MAX >= 1e500, the above declaration is ok, but you do need the
suffix to be safe.)

So the previous question can be changed to: why would one ever use a
suffixed *integer* constant?

One example is an argument to a variadic function, but that's probably
fairly rare. In most contexts, numeric values will be implicitly
converted to the required type.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <* <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.

Aug 9 '06 #9

tgur

Tom St Denis wrote:

How many times can people say "don't cast void pointers" before newbs
actually read the advice and do it?

Could you explain, why casting void pointers is bad?

Aug 9 '06 #10

Richard Bos

tgur <no@mail.plwrote:

Tom St Denis wrote:

How many times can people say "don't cast void pointers" before newbs
actually read the advice and do it?
Could you explain, why casting void pointers is bad?

Because it is rarely if ever necessary.

There are a couple of occasions when an ordinary programmer should cast
a void pointer or should cast any other pointer to void *. One is when
passing them to a variadic function. Another is inside the comparison
function for qsort().
The case Tom complains about, which is on receiving a void pointer from
malloc() and friends, is _not_ one of these occasions. Often, the
perceived need to cast malloc() is a result of one or more of three
faults: not #including <stdlib.h>, using a C++ compiler instead of a C
one, and a lack of clue about void pointers in C.

Richard

Aug 9 '06 #11

Ancient_Hacker

The limit depends on the particular compiler, memory model, hardware,
and operating system.

Many compilers nowdays generate 32-bit code and 32-bit addresses, which
limits their addressing ability to an absolute maximum of about 4
billion bytes of data plus 4 billion bytes of code. Sometimes enhanced
by another 4GB of heap data.

In addition to that the operating system often grabs a part of the
potential address space. For instance, various versions of Windows
reserve 1/4 to 1/2 the address space for their own use.

In addition, some older motherboards don't implement all 32 address
lines, so you may be limited to 1/2, 1/4, or 1/8th of 4GB.

Then in addition you're limited by the amount of memory the OS can fake
using virtual memory. If your system vm file is only 1GB in size,
that's the upper limit.

THen in addition to all the above, if you want fast access to the
array, you're mostly limited by the amount of actual, real RAM the
computer has, and in addition the other programs and OS tasks running
will subtract from that.

So in a typical 32-bit PC (and I KNOW there are other architectures),
you'll have, best case, about 1GB of fast RAM, that's 250 million
32-bit floats, twice that if you fall back on virtual memory.

If you upgrade to a 64-bit PC, then you can typically address 2^48
bytes of memory. But get real, 2^48 bytes of actual RAM would cost
about 2^42 dollars, about 4 trillion dolalrs.
Perhaps if you explained what you want to do, we can suggest ways of
doing it without requiring ridiculous amounts of memory.

Aug 9 '06 #12

djhulme

Perhaps if you explained what you want to do, we can suggest ways of

doing it without requiring ridiculous amounts of memory.

I think this is very wise. Thanks for everyone's comments so far.

I'm effectively creating massive 7 dimensional hypermatricies in C.
These matrices are of type Char and only contain boolean values (1 or
0). They are incredibly sparse (I know that there are sparse matrix
techniques, but I haven't found any good implementations in C yet -
can you recommend any?).

I'll write in pseudo code, but I hope you get the gist.

Lets say I want to create a large 7D hypermatrix:

n = 100 ;
m = new boolean_matrix[n][n][n][n][n][n][n] ;

The problem is, this takes up way too much memory.

To save memory, alternatively, I could create a 1D array that stores
n^7 elements:

m = new Boolean_matrix[n*n*n*n*n*n*n] ;

But n^7 (where n = 100) is bigger than the larges INT (2^32).

In C, it seems that I can create an array with a size contained within
DOUBLE number of elements, but I can't access the array once created,
hence my original problem:

-------------------------------------------------------

I've managed to create arrays using DOUBLE data types, but when I try
to access the array, the compiler complains that the number is not an
INT, i.e.
// this succeeds
char* anArray = (char*) calloc( 999999999999999999 , sizeof(char) ) ;
// this does not compile because the number 1234567891234584 is not an
INT
anArray[1234567891234584] = 1 ;

----------------------------------------------------------

There is certainly mileage in using other data storage classes to
represent my sparse matrices, i.e. hashtable, trees, etc - but I
haven't found any that are intuitive and/or efficient.

If anyone has any suggestions , I'd be sincerely grateful

Daniel

Aug 9 '06 #13

Ancient_Hacker

Okay, 7 dimensions is a lot. If you want 100 elements every which way,
that's 10^14 elements. Since 10^3 is about 2^10, that's about 2^42
bits, or 2^39 bytes, so that's easily addressable on a system with a
x64 CPU.

Nowdays memory is selling for about $60 a gigabyte, and you need about
550 gigabytes, so you better dig deep and find $32,767 just for RAM.
And you'll need a rather special motherboard that can accept that many
memory DIMMS.

I think you may have to scale back the number of dimensions or the
extent along a dimension. Just dropping ONE dimension, down to 6,
brings it within the realm of feasibility, without getting a big
federal grant.

Or scaling the extent is even better , as the total memory will go down
as the seventh power.

----

Exactly how sparse is this matrix? If it's less than 10% filled, it
might be possible to use some sparse-matrix techniques.

Aug 9 '06 #14

Al Balmer

On Wed, 09 Aug 2006 10:29:08 GMT, rl*@hoekstra-uitgeverij.nl (Richard
Bos) wrote:

>tgur <no@mail.plwrote:

>Tom St Denis wrote:

How many times can people say "don't cast void pointers" before newbs
actually read the advice and do it?

Could you explain, why casting void pointers is bad?

Because it is rarely if ever necessary.

There are a couple of occasions when an ordinary programmer should cast
a void pointer or should cast any other pointer to void *. One is when
passing them to a variadic function.

That one I get.

Another is inside the comparison
function for qsort().

This one I don't. Why would you need a cast there?

This seems OK (at least the compiler doesn't complain):

int compare(const void *arg1, const void *arg2)
{
const int *warg1 = arg1;
const int *warg2 = arg2;

return *warg1 - *warg2;
}

Actually, the compiler doesn't complain if I leave out all the
"const", either. (With include for stdlib.h and a call to qsort.)

>The case Tom complains about, which is on receiving a void pointer from
malloc() and friends, is _not_ one of these occasions. Often, the
perceived need to cast malloc() is a result of one or more of three
faults: not #including <stdlib.h>, using a C++ compiler instead of a C
one, and a lack of clue about void pointers in C.

Richard

--
Al Balmer
Sun City, AZ

Aug 9 '06 #15

Keith Thompson

"Ancient_Hacker" <gr**@comcast.netwrites:

The limit depends on the particular compiler, memory model, hardware,
and operating system.

[...]

What limit? Please quote context.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <* <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.

Aug 9 '06 #16

Keith Thompson

dj*****@gmail.com writes:

>Perhaps if you explained what you want to do, we can suggest ways of
doing it without requiring ridiculous amounts of memory.

I think this is very wise. Thanks for everyone's comments so far.

I'm effectively creating massive 7 dimensional hypermatricies in C.
These matrices are of type Char and only contain boolean values (1 or
0). They are incredibly sparse (I know that there are sparse matrix
techniques, but I haven't found any good implementations in C yet -
can you recommend any?).

I'll write in pseudo code, but I hope you get the gist.

Lets say I want to create a large 7D hypermatrix:

n = 100 ;
m = new boolean_matrix[n][n][n][n][n][n][n] ;

Your pseudo code looks a whole lot like C++.

The problem is, this takes up way too much memory.

To save memory, alternatively, I could create a 1D array that stores
n^7 elements:

m = new Boolean_matrix[n*n*n*n*n*n*n] ;

Why do you think that would save memory? You're still storing the
same number of elements.

But n^7 (where n = 100) is bigger than the larges INT (2^32).

It's "int", not "INT".

In C, it seems that I can create an array with a size contained within
DOUBLE number of elements, but I can't access the array once created,
hence my original problem:

What do you mean by "DOUBLE"? C has a type called "double", but it's
a floating-point type, and it can't be used as an array index.

The fact that your array size exceeds the largest value of type int
isn't the problem. The problem is just that you're trying to allocate
a really really huge amount of memory.

-------------------------------------------------------

I've managed to create arrays using DOUBLE data types, but when I try
to access the array, the compiler complains that the number is not an
INT, i.e.
// this succeeds
char* anArray = (char*) calloc( 999999999999999999 , sizeof(char) ) ;

The first argument to calloc() is of type size_t, which is an unsigned
integer type. Your argument 999999999999999999 will be implicitly
converted to size_t. If your size_t is 64 bits, then
999999999999999999 is a valid value of that type (and the calloc()
call is likely to fail because you almost certainly don't have that
much memory available). If your size_t is smaller than 60 bits, your
argument will be converted by, in effect, dropping the high-order bits.
For example, if size_t is 32 bits, then the call
calloc(999999999999999999, sizeof(char))
is equivalent to
calloc(2808348671, sizeof(char))
You're still not likely to have that much available memory -- but
unless you check the result of calloc(), your program will never know
that.

(On some systems, particularly Linux, malloc or calloc will pretend to
allocate however much memory you ask for, and problems won't show up
until you try to use that memory.)

// this does not compile because the number 1234567891234584 is not an
INT
anArray[1234567891234584] = 1 ;

----------------------------------------------------------

What is an "INT"? Remember that C is case-sensitive. There is, as
you know, a built-in type called "int".

This declaration:

int anArray[1234567891234584];

is legal *unless* it exceeds an implementation-defined limit (which it
most likely does).

There is certainly mileage in using other data storage classes to
represent my sparse matrices, i.e. hashtable, trees, etc - but I
haven't found any that are intuitive and/or efficient.

If anyone has any suggestions , I'd be sincerely grateful

On any modern system, you just aren't going to be able to allocate and
use the amount of memory you want. You'll need to use some kind of
sparse representation. (I don't have any pointers for you.)

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <* <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.

Aug 9 '06 #17

Al Balmer

On 9 Aug 2006 08:35:30 -0700, dj*****@gmail.com wrote:

>
There is certainly mileage in using other data storage classes to
represent my sparse matrices, i.e. hashtable, trees, etc - but I
haven't found any that are intuitive and/or efficient.

A list or array containing the coordinates of the non-zero elements
seems intuitive enough to me. Whether it's efficient or not depends on
what manipulation you need to do with the matrices.

--
Al Balmer
Sun City, AZ

Aug 9 '06 #18

Ancient_Hacker

Keith Thompson wrote:

What limit? Please quote context.

Good point. I think we were talking about the limit on the amount of
memory allocatable for this guy's array.

He wants a 100x100x100x100x100x100x100 array of boolean and is a bit
fuzzy on what this entails.

Aug 9 '06 #19

Herbert Rosenau

On Wed, 9 Aug 2006 10:13:01 UTC, tgur <no@mail.plwrote:

Tom St Denis wrote:

How many times can people say "don't cast void pointers" before newbs
actually read the advice and do it?
Could you explain, why casting void pointers is bad?

should you please read the postings of the last 5 days. Youl'll find
the answer multiple times. You'll learn something more by that.

Or use google and search this group for malloc. Ypu'll get multiple
thousend times the answer you needs.

--
Tschau/Bye
Herbert

Visit http://www.ecomstation.de the home of german eComStation
eComStation 1.2 Deutsch ist da!

Aug 9 '06 #20

Herbert Rosenau

On Wed, 9 Aug 2006 15:35:30 UTC, dj*****@gmail.com wrote:

Perhaps if you explained what you want to do, we can suggest ways of
doing it without requiring ridiculous amounts of memory.

I think this is very wise. Thanks for everyone's comments so far.

I'm effectively creating massive 7 dimensional hypermatricies in C.
These matrices are of type Char and only contain boolean values (1 or
0). They are incredibly sparse (I know that there are sparse matrix
techniques, but I haven't found any good implementations in C yet -
can you recommend any?).

I'll write in pseudo code, but I hope you get the gist.

Lets say I want to create a large 7D hypermatrix:

n = 100 ;
m = new boolean_matrix[n][n][n][n][n][n][n] ;

The problem is, this takes up way too much memory.

Hint: as you use only 1 bit of a variable you may simply use unsigned
int and therin 16, 32 or 64 bit. Or to make something easier to prot
use an array of char and use each of the 8 bits guaranteed by the
standard to reduce the number of vatiables needed.

To save memory, alternatively, I could create a 1D array that stores
n^7 elements:

Why does you think a nxnxnxnxnxnxnxn array costs more memory than an
array of size
n*n*n*n*n*n*n?

m = new Boolean_matrix[n*n*n*n*n*n*n] ;

When you needs C++ go to the C++ group. new is no C operator.

But n^7 (where n = 100) is bigger than the larges INT (2^32).

In C, it seems that I can create an array with a size contained within
DOUBLE number of elements, but I can't access the array once created,
hence my original problem:

Look at the definition of malloc on your documentation of your
compiler. I doin't think that malloc is defined as
void*malloc(long|double n); but as void*malloc(size_t n);

-------------------------------------------------------

I've managed to create arrays using DOUBLE data types, but when I try
to access the array, the compiler complains that the number is not an
INT, i.e.
// this succeeds
char* anArray = (char*) calloc( 999999999999999999 , sizeof(char) ) ;

No. You casts the result of malloc and that may enter you the lands of
undefined behavior.

Are here only idiots who are unable to readf what gets multiple times
in a week and multiple thousend times/year written here?

>
// this does not compile because the number 1234567891234584 is not an
INT
anArray[1234567891234584] = 1 ;

----------------------------------------------------------

There is certainly mileage in using other data storage classes to
represent my sparse matrices, i.e. hashtable, trees, etc - but I
haven't found any that are intuitive and/or efficient.

If anyone has any suggestions , I'd be sincerely grateful

Daniel

Ask the documentation of your system how many bytes a single
application can occupay. malloc will never give you even that amount
but something less.

--
Tschau/Bye
Herbert

Visit http://www.ecomstation.de the home of german eComStation
eComStation 1.2 Deutsch ist da!

Aug 9 '06 #21

jmcgill

Ancient_Hacker wrote:

Nowdays memory is selling for about $60 a gigabyte

Memory that fits the machines that can address that kind of space
doesn't cost $60...

Maybe something in the grid world.

Maybe this problem can be addressed like a distributed.net project.

Aug 9 '06 #22

Dann Corbit

"Ancient_Hacker" <gr**@comcast.netwrote in message
news:11**********************@n13g2000cwa.googlegr oups.com...

>
Keith Thompson wrote:

>What limit? Please quote context.

Good point. I think we were talking about the limit on the amount of
memory allocatable for this guy's array.

He wants a 100x100x100x100x100x100x100 array of boolean and is a bit
fuzzy on what this entails.

100,000,000,000,000 bits = 12,500,000,000,000 bytes

It's not impossible. He will have to use virtual memory on a 64 bit system.

Here is one terabyte for $575 (unwrap lines to examine):
http://www.compuplus.com/i-LaCie-Big-Disk-Extreme-10-
Terabyte-with-Triple-Interface-USB-20-FireWire-400-FireWire-
800-Hard-Drive-300797U-1005838~.html?sid=05gf807gg4xi7rw

So the disk would cost him about $8000.

If the array is sparse, he might not need anything so extreme.

Aug 9 '06 #23

Keith Thompson

"Herbert Rosenau" <os****@pc-rosenau.dewrites:

On Wed, 9 Aug 2006 15:35:30 UTC, dj*****@gmail.com wrote:

[...]

>// this succeeds
char* anArray = (char*) calloc( 999999999999999999 , sizeof(char) ) ;

No. You casts the result of malloc and that may enter you the lands of
undefined behavior.

No, the cast itself doesn't cause undefined behavior. If the code is
valid without the cast, it's valid with the cast. The cast *can* mask
certain errors, such as failing to #include <stdlib.hor using a C++
compiler, and it's therefore a bad idea, but we need to be clear about
*why* it's a bad idea.

The 999999999999999999 argument is a problem, as I said in another
followup. sizeof(char) is 1 by definition. calloc() sets the
allocated memory to all-bits-zero, which is likely to be a waste of
time.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <* <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.

Aug 9 '06 #24

Keith Thompson

jmcgill <jm*****@email.arizona.eduwrites:

Ancient_Hacker wrote:
>Nowdays memory is selling for about $60 a gigabyte

Memory that fits the machines that can address that kind of space
doesn't cost $60...

Maybe something in the grid world.

Maybe this problem can be addressed like a distributed.net project.

Maybe, but the OP has made it clear that the array is sparse. Trying
to figure out how to allocate the whole thing is a waste of time; he
needs to figure out a compact representation for his sparse array.

(If the full array were a reasonable size, representing the whole
thing, holes and all, might be a good approach; the resulting code
could be easier to write and quicker to execute than something that
has to traverse a more complex data structure. The meaning of
"reasonable size" varies over time, as memories get larger, faster,
and cheaper. But we're not yet to the point where an array of 10**14
elements is reasonable.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <* <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.

Aug 9 '06 #25

Ancient_Hacker

Dann Corbit wrote:

"Ancient_Hacker" <gr**@comcast.netwrote in message
news:11**********************@n13g2000cwa.googlegr oups.com...

Keith Thompson wrote:

What limit? Please quote context.

Good point. I think we were talking about the limit on the amount of
memory allocatable for this guy's array.

He wants a 100x100x100x100x100x100x100 array of boolean and is a bit
fuzzy on what this entails.

100,000,000,000,000 bits = 12,500,000,000,000 bytes

It's not impossible. He will have to use virtual memory on a 64 bit system.

Virtual memory is a particularly poor performer when used with sparse
arrays.
Poor as in easily a million times slower than real RAM. If we assume a
page size of 2MB (x86 large), 1GB of real RAM, and 1% sparse matrix,
every bit randomly accessed is likely to cause a page fault ( there's
only once chance in 550 it's in RAM). Seeking over 550 GB and reading
2MB to just access one bit is a huge performance hit.
He's either gotta use sparse matrix techniques, remove a dimension or
two, or shrink the extent of each dimension, as noted in my prevous
post.

http://www.compuplus.com/i-LaCie-Big-Disk-Extreme-10-
Terabyte-with-Triple-Interface-USB-20-FireWire-400-FireWire-
800-Hard-Drive-300797U-1005838~.html?sid=05gf807gg4xi7rw

So the disk would cost him about $8000.

If the array is sparse, he might not need anything so extreme.

Aug 9 '06 #26

Dann Corbit

"Ancient_Hacker" <gr**@comcast.netwrote in message
news:11**********************@b28g2000cwb.googlegr oups.com...

>
Dann Corbit wrote:
>"Ancient_Hacker" <gr**@comcast.netwrote in message
news:11**********************@n13g2000cwa.googleg roups.com...
>
Keith Thompson wrote:

What limit? Please quote context.
Good point. I think we were talking about the limit on the amount of
memory allocatable for this guy's array.

He wants a 100x100x100x100x100x100x100 array of boolean and is a bit
fuzzy on what this entails.

100,000,000,000,000 bits = 12,500,000,000,000 bytes

It's not impossible. He will have to use virtual memory on a 64 bit
system.

Virtual memory is a particularly poor performer when used with sparse
arrays.
Poor as in easily a million times slower than real RAM. If we assume a
page size of 2MB (x86 large), 1GB of real RAM, and 1% sparse matrix,
every bit randomly accessed is likely to cause a page fault ( there's
only once chance in 550 it's in RAM). Seeking over 550 GB and reading
2MB to just access one bit is a huge performance hit.
He's either gotta use sparse matrix techniques, remove a dimension or
two, or shrink the extent of each dimension, as noted in my prevous
post.

Right. Hence my comment about sparse arrays below. I had not read the
original post and only responded to the content that you see.

>http://www.compuplus.com/i-LaCie-Big-Disk-Extreme-10-
Terabyte-with-Triple-Interface-USB-20-FireWire-400-FireWire-
800-Hard-Drive-300797U-1005838~.html?sid=05gf807gg4xi7rw

So the disk would cost him about $8000.

If the array is sparse, he might not need anything so extreme.

Aug 9 '06 #27

Old Wolf

Keith Thompson wrote:

So the previous question can be changed to: why would one ever use a
suffixed *integer* constant?

One example is an argument to a variadic function, but that's probably
fairly rare. In most contexts, numeric values will be implicitly
converted to the required type.

When it's used in an expression that needs wider
precision, or needs to be unsigned, eg:

long x = 1000L * 1000;

unsigned int x = a * 256u + b;

Aug 9 '06 #28

jmcgill

Keith Thompson wrote:

>Maybe this problem can be addressed like a distributed.net project.

Maybe, but the OP has made it clear that the array is sparse. Trying
to figure out how to allocate the whole thing is a waste of time; he
needs to figure out a compact representation for his sparse array.

Now I'm following the thread in hopes of learning something of the
techniques for dealing with sparse matrices. I even located the box
that might contain a couple of linear algebra books just in case.

Aug 9 '06 #29

Keith Thompson

jmcgill <jm*****@email.arizona.eduwrites:

Keith Thompson wrote:

>>Maybe this problem can be addressed like a distributed.net project.
Maybe, but the OP has made it clear that the array is sparse. Trying
to figure out how to allocate the whole thing is a waste of time; he
needs to figure out a compact representation for his sparse array.

Now I'm following the thread in hopes of learning something of the
techniques for dealing with sparse matrices. I even located the box
that might contain a couple of linear algebra books just in case.

To review: you were originally trying to create a 7-dimensional array,
with the length of each array being 100. That's 1e14 elements, which
is way beyond what's feasible -- but only a few of those elements will
actually be used.

The most obvious solution that springs to mind is a hash table, a data
structure that maps a key to a value. Construct the key from the 7
indices. Our own CBFalconer provides a freeware hash table package;
see <http://cbfalconer.home.att.net/download/index.htm>. (I haven't
used it myself, but Chuck's reputation leads me to assume it's good
code.)

A simpler solution is a simple linear array of structures, where each
structure contains the 7 indices and the data element. But this
requires you to search the array (either linearly or using a binary
search), and is likely to be less efficient than a hash table. Or you
can use an AVL tree, or whatever data structure strikes your fancy.

If there's some consistent pattern to the set of elements that you're
using, you might be able to take advantage of that. For example, if
the 2nd, 3rd, 4th, 5th, and 6th indices are always zero, you can use a
2-dimensional array; that's an absurd example, but you get the idea.

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <* <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.

Aug 9 '06 #30

jmcgill

Keith Thompson wrote:

To review: you were originally trying to create a 7-dimensional array,
with the length of each array being 100. That's 1e14 elements, which
is way beyond what's feasible -- but only a few of those elements will
actually be used.

The most obvious solution that springs to mind is a hash table, a data
structure that maps a key to a value.

It's the only solution that I came up with, and it sprang to mind
immediately. Need to learn to trust my instincts I guess.

Aug 10 '06 #31

Richard Bos

Al Balmer <al******@att.netwrote:

On Wed, 09 Aug 2006 10:29:08 GMT, rl*@hoekstra-uitgeverij.nl (Richard
Bos) wrote:

tgur <no@mail.plwrote:

Could you explain, why casting void pointers is bad?
Because it is rarely if ever necessary.

There are a couple of occasions when an ordinary programmer should cast
a void pointer or should cast any other pointer to void *. One is when
passing them to a variadic function.

That one I get.

Another is inside the comparison
function for qsort().

This one I don't. Why would you need a cast there?

This seems OK (at least the compiler doesn't complain):

int compare(const void *arg1, const void *arg2)
{
const int *warg1 = arg1;
const int *warg2 = arg2;

return *warg1 - *warg2;
}

You don't need one if you write it like this. You do need one (well,
two, actually) if you write it like this:

int compare(const void *arg1, const void *arg2)
{
return *(const int *)arg1 - *(const int *)arg2;
}

Some people prefer one way; some the other. Both are valid, a good C
programmer is familiar with both, and the latter needs the casts.

Richard

Aug 10 '06 #32

Keith Thompson

rl*@hoekstra-uitgeverij.nl (Richard Bos) writes:

Al Balmer <al******@att.netwrote:
>On Wed, 09 Aug 2006 10:29:08 GMT, rl*@hoekstra-uitgeverij.nl (Richard
Bos) wrote:
>tgur <no@mail.plwrote:
Could you explain, why casting void pointers is bad?

Because it is rarely if ever necessary.

There are a couple of occasions when an ordinary programmer should cast
a void pointer or should cast any other pointer to void *. One is when
passing them to a variadic function.

That one I get.

Another is inside the comparison
function for qsort().

This one I don't. Why would you need a cast there?

This seems OK (at least the compiler doesn't complain):

int compare(const void *arg1, const void *arg2)
{
const int *warg1 = arg1;
const int *warg2 = arg2;

return *warg1 - *warg2;
}

You don't need one if you write it like this. You do need one (well,
two, actually) if you write it like this:

int compare(const void *arg1, const void *arg2)
{
return *(const int *)arg1 - *(const int *)arg2;
}

Some people prefer one way; some the other. Both are valid, a good C
programmer is familiar with both, and the latter needs the casts.

Note that this comparison function fails if the arguments are, for
example, INT_MIN and INT_MAX. Better:

int compare(const void *arg1, const void *arg2)
{
if (*(int *)arg1) < *(int *)arg2) {
return -1;
}
else if (*(int *)arg1) < *(int *)arg2) {
return 1;
}
else {
return 0;
}
}

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <* <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.

Aug 10 '06 #33

djhulme

Are here only idiots who are unable to readf what gets multiple times

in a week and multiple thousend times/year written here?

Derogatory remarks aren't useful. I can assure you that I'm not an
idiot, and I'd appreciate it if you'd talk to me (and anyone else
for that matter) as if you're talking to someone who is relatively
competent, in my case a Computer Scientist.

What is an "INT"? Remember that C is case-sensitive. There is, as
you know, a built-in type called "int".
It's "int", not "INT".

I know. For clarity I was trying to distinguish between discussing what
could be considered typos (e.g, int) and datatypes (e.g, INT). Sorry
for the confusion.

m = new Boolean_matrix[n*n*n*n*n*n*n] ;

When you needs C++ go to the C++ group. new is no C operator.

That's why I said that I was writing in Pseudo code - I don't
want to get bogged down in the specific code - my question was very
general.

The most obvious solution that springs to mind is a hash table, a data
structure that maps a key to a value. Construct the key from the 7
indices. Our own CBFalconer provides a freeware hash table package;
see <http://cbfalconer.home.att.net/download/index.htm>. (I haven't
used it myself, but Chuck's reputation leads me to assume it's good
code.)

Once again, thanks for everyone's comments.

I've implemented a hashtable, but it is very slow. In fact, my
'n' isn't always 100, in some cases it may be 1000 or more. I
definitely think that sparse matrix representations are the way
forward, but haven't found any good (multidimensional) ones as yet.

Why do you think that would save memory? You're still storing the
same number of elements.

As for memory differences between storing multidimensional arrays
versus a one dimensional array with the same number of elements, I
don't know the specifics, but it has something to do with how much
memory a pointer takes up.

For instance (forgive me if I don't get the calculations exact, this
is just for illustration), if I wanted to store a 100x100x100 3D array
of 'chars', a 1D array would allocate 1,000,000 bytes. If I created
a 3D array, I would need to have 100 pointers, with each pointer
pointing to 100 pointers, which each point to 100bytes:

assuming a pointer is 32bits = 4bytes

100x(4x100)x(4x100))) = 16,000,000bytes
16 times more memory than a 1D matrix with the same number of elements.

If anyone knows of any good (simple and optimised) sparse matrix
libraries in C, or anyone has any better ideas to solve my problem,
I'd be very grateful to hear them.

Thanks again,

Daniel

PS I have 3gig of memory and probably the maximum 7D hypermatrix of
chars I will need to create is when my 'n' is about 2000. My
matrices are probably 75% sparse (75% zeros too 25% ones) .

Aug 10 '06 #34

Richard Heathfield

Keith Thompson said:

<snip>

Note that this comparison function fails

[...because it uses subtraction...]

if the arguments are, for
example,

[...pointers to objects with the values...]

INT_MIN and INT_MAX. Better:

int compare(const void *arg1, const void *arg2)
{
if (*(int *)arg1) < *(int *)arg2) {
return -1;
}
else if (*(int *)arg1) < *(int *)arg2) {
return 1;
}
else {
return 0;
}
}

You're quite right - but I would prefer this:

int compare(const void *vi1, const void *vi2)
{
const int *i1 = vi1;
const int *i2 = vi2;
return (*i1 *i2) - (*i1 < *i2);
}

partly because it's shorter, partly because it's simpler, partly because
it's easier to read (admittedly only if you are accustomed to the idiom it
uses!), and partly because it doesn't cast away constness.

--
Richard Heathfield
"Usenet is a strange place" - dmr 29/7/1999
http://www.cpax.org.uk
email: rjh at above domain (but drop the www, obviously)

Aug 10 '06 #35

Philip Potter

<dj*****@gmail.comwrote in message
news:11**********************@h48g2000cwc.googlegr oups.com...

What is an "INT"? Remember that C is case-sensitive. There is, as
you know, a built-in type called "int".
It's "int", not "INT".

I know. For clarity I was trying to distinguish between discussing what
could be considered typos (e.g, int) and datatypes (e.g, INT). Sorry
for the confusion.

No worries. Just remember that in the C world it's *less* confusing to use
lowercase builtin types, since that's the only form the compiler is
guaranteed to accept, and hence the form everyone is used to.

Why do you think that would save memory? You're still storing the
same number of elements.

As for memory differences between storing multidimensional arrays
versus a one dimensional array with the same number of elements, I
don't know the specifics, but it has something to do with how much
memory a pointer takes up.

For instance (forgive me if I don't get the calculations exact, this
is just for illustration), if I wanted to store a 100x100x100 3D array
of 'chars', a 1D array would allocate 1,000,000 bytes. If I created
a 3D array, I would need to have 100 pointers, with each pointer
pointing to 100 pointers, which each point to 100bytes:

assuming a pointer is 32bits = 4bytes

100x(4x100)x(4x100))) = 16,000,000bytes
16 times more memory than a 1D matrix with the same number of elements.

You're a bit confused here.

char c1d[1000000];
This allocates 10^6 bytes (well, chars).

char c3d[100][100][100];
This *also* allocates 10^6 bytes.
There are no pointers here. This is an array-of-arrays-of-arrays-of-char.
Please look at the comp.lang.c FAQ, question 6.2.

If you were doing this using pointers:

char *c1d = malloc(1000000);

This allocates sizeof(char *) + 1000000 * sizeof(char).

int i,j;
char ***c3d = malloc(100 * sizeof(*c3d));
for(i=0; i<100; i++)
c3d[i] = malloc(100 * sizeof(*c3d[i]));
for(i=0; i<100; i++)
for(j=0; j<100; j++)
c3d[i][j] = malloc(100 * sizeof(*c3d[i][j]);

This allocates:
1 * sizeof(char ***) for c3d itself
100 * sizeof(char **) for c3d[] (first malloc)
100 * 100 * sizeof(char *) for c3d[][] (second set of mallocs)
100 * 100 * 100 * sizeof(char) for c3d[][][] (third set of mallocs)

assuming sizeof(any pointer) == 4 * sizeof(char), the total is:
4 + 400 + 40000 + 1000000
= 1040404 bytes. Not much bigger than that 1d array, really.

PS I have 3gig of memory and probably the maximum 7D hypermatrix of
chars I will need to create is when my 'n' is about 2000. My
matrices are probably 75% sparse (75% zeros too 25% ones) .

That's not particularly sparse. Storing all the bits would be more efficient
than storing the positions of all of the set bits, which is what a sparse
representation would do.

I think you could get much better space saving from packing 8 bits into 1
byte. However, you're still a bit stymied when you have to store 2000^7
bits. Is there any information in the data you could take advantage of? Do
the bits tend to come in groups, and could you take advantage of this?

Philip

PS You may remember me as "the guy with the long hair" from lectures. :)

Aug 10 '06 #36

Keith Thompson

Richard Heathfield <in*****@invalid.invalidwrites:

Keith Thompson said:

<snip>

>Note that this comparison function fails

[...because it uses subtraction...]

>if the arguments are, for
example,

[...pointers to objects with the values...]

>INT_MIN and INT_MAX. Better:

int compare(const void *arg1, const void *arg2)
{
if (*(int *)arg1) < *(int *)arg2) {
return -1;
}
else if (*(int *)arg1) < *(int *)arg2) {
return 1;
}
else {
return 0;
}
}

You're quite right - but I would prefer this:

int compare(const void *vi1, const void *vi2)
{
const int *i1 = vi1;
const int *i2 = vi2;
return (*i1 *i2) - (*i1 < *i2);
}

partly because it's shorter, partly because it's simpler, partly because
it's easier to read (admittedly only if you are accustomed to the idiom it
uses!), and partly because it doesn't cast away constness.

Agreed, and the use of the result of the "<" and ">" operators is a
nice touch -- clever, but not quite *too* clever. I suppose I didn't
see the need to leave the "const" in the pointer casts, since the
result was just being immediately dereferenced.

I might prefer this:

int compare(const void *vi1, const void *vi2)
{
const int i1 = *(int*)vi1;
const int i2 = *(int*)vi2;
return (i1 i2) - (i1 < i2);
}

It does have casts, which yours doesn't (a point in your favor), but
it uses them to extract the int values, which are what we're really
interested in.

(This is an alternative; it's not a clearly superior one.)

--
Keith Thompson (The_Other_Keith) ks***@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <* <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.

Aug 10 '06 #37

ena8t8si

tgur wrote:

Tom St Denis wrote:

How many times can people say "don't cast void pointers" before newbs
actually read the advice and do it?
Could you explain, why casting void pointers is bad?

Casting in general is bad. Writing a cast where it
isn't necessary, because conversion would take care
of it, is bad. The reasons are casting can introduce
errors or disguise other errors. There are some cases
where casting is really necessary, but most new C
programmers cast more than they need to (and unfortunately
sometimes not when they should). Casting a void pointer
is almost never necessary, and in most cases where
a function won't compile without a cast the function
can be rewritten so an assignment takes care of the
conversion without needing to write the cast.

Aug 10 '06 #38

Al Balmer

On Thu, 10 Aug 2006 06:32:16 GMT, rl*@hoekstra-uitgeverij.nl (Richard
Bos) wrote:

>Al Balmer <al******@att.netwrote:

>On Wed, 09 Aug 2006 10:29:08 GMT, rl*@hoekstra-uitgeverij.nl (Richard
Bos) wrote:

>tgur <no@mail.plwrote:

Could you explain, why casting void pointers is bad?

Because it is rarely if ever necessary.

There are a couple of occasions when an ordinary programmer should cast
a void pointer or should cast any other pointer to void *. One is when
passing them to a variadic function.

That one I get.

Another is inside the comparison
function for qsort().

This one I don't. Why would you need a cast there?

This seems OK (at least the compiler doesn't complain):

int compare(const void *arg1, const void *arg2)
{
const int *warg1 = arg1;
const int *warg2 = arg2;

return *warg1 - *warg2;
}

You don't need one if you write it like this. You do need one (well,
two, actually) if you write it like this:

int compare(const void *arg1, const void *arg2)
{
return *(const int *)arg1 - *(const int *)arg2;
}

Some people prefer one way; some the other. Both are valid, a good C
programmer is familiar with both, and the latter needs the casts.

I'll tell the secret <g>. The reason I often do it my way is that I do
a good deal of upgrading legacy code to ISO-compatible. When I need to
introduce void * parameters, the method I outlined lets me change the
function definition line, add a line for each parameter, and not touch
the rest of the function.

--
Al Balmer
Sun City, AZ

Aug 10 '06 #39

Creating and Accessing Very Big Arrays in C

Similar topics