472,096 Members | 1,312 Online

# Storing 15 million items in an array

Hi there,

I'm trying to create a data structure to store at least 13-15 million items that I need constant time access to.

Memory is a real issue, so I tried implementing the structure as a simple array (int a[10000000]), but I've found out, since arrays use the stack, that I overflow it around 100k items.

I don't know how to use the heap and still allow constant time access to my data members, and implement it in assembly to cut memory usage.

Do any of you guys have any ideas that could help?

Thank you.

Best regards,
Thomas J.
Jun 18 '09 #1
8 13312
donbock
2,425 Expert 2GB
Find the size of each item.
Multiply that size by the number of items to compute the amount of storage needed.
Does that amount seem reasonable for the amount of RAM in your computer?

If int is 16-bits on your machine, then you need 20MB of RAM for the array.
If int is 32-bits on your machine, then you need 40MB of RAM for the array.

Try declaring a static array:
Expand|Select|Wrap|Line Numbers
1. static int a[10000000];
The 'static' keyword keeps the array off the stack.

If you don't have enough RAM for the array, then you need to come up with a new scheme -- you can't keep the entire array in memory.
Jun 18 '09 #2
Thanks so much Don! Highly appreciate it!

Best regards,
Tom
Jun 18 '09 #3
JosAH
11,448 Expert 8TB
@donbock
You can index 32K-1 elements in an array then which makes a bit less than 32K*2 bytes == 64K bytes.

kind regards,

Jos
Jun 18 '09 #4
weaknessforcats
9,208 Expert Mod 8TB
Using the heap just means you have to create your array at run-time.

You don't say whether you are using C++ or not.

If you are using C++, then use a vector. A vector is required to implement an array. The vector manages its array on the heap. You can create the vector of the correct size. The vector itself can be on the stack.

Expand|Select|Wrap|Line Numbers
1. vector<int> a(10000000);
If you are using C, then allocate your array on the heap:

Expand|Select|Wrap|Line Numbers
1. int* a = malloc(10000000 * sizeof(int));
In both cases a[10] is the 11th element.

At the end of the program the vector will clean itself up whereas the array allocated by malloc will need to be freed:

Expand|Select|Wrap|Line Numbers
1. free(a);
Jun 18 '09 #5
donbock
2,425 Expert 2GB
@JosAH
Jos makes a good point. You should use a 'long' (rather than an 'int') variable for the array index.
Jun 18 '09 #6
JosAH
11,448 Expert 8TB
@donbock
Your implementation may not allow you to do that; the type of an index has to fit in a ptrdiff_t and it is defined as an ordinary int in most implementations.

kind regars,

Jos
Jun 18 '09 #7
donbock
2,425 Expert 2GB
@JosAH
My habit when being careful (that is, when it isn't obvious that the array is small), is to use type size_t for array index variables. These two types are related as follows:
Expand|Select|Wrap|Line Numbers
1. typedef ptrdiff_t signed T;
2. typedef size_t unsigned T;
where T is required [I think] to be the same integer type for both.

This is a difference that doesn't make much difference. If ptrdiff_t is too small to count up to one million then so is size_t. On the other hand, all kinds of things would go wrong if a size_t variable couldn't hold sizeof the array. For instance, you probably couldn't malloc the array.
Jun 18 '09 #8
Banfa
9,065 Expert Mod 8TB
@thomasjbake
Just because you want constant time access doesn't mean that you need to use a single array to hold your data.

For instance imagine 1000 arrays each holding 1000 items. No array will break the size limits even on a 16 bit machine. I imagine this implemented as an array of 1000 pointers each malloc'd an array of 1000 items (ints).

My access function takes a long ix and if my array of 1000 pointers is called aDataArray returns

That is a constant time look-up. It is however a slower look-up than occurs if you use a single array, it's got all that computation to start with.

My point is that by saying "constant time access" you don't really say very much. You certainly say nothing of how fast the access needs to be. An array of 1000000 items where it takes 10 seconds to access each item is still constant time access.

So when you said "constant time access" did you also intend to include some sort of statement of speed of access too?
Jun 18 '09 #9