469,957 Members | 2,724 Online

# optimal access/insertion associative array

Hi

I am looking for an optimal data-structure in order to replace a
map<int,float>.
I use it as some kind of sparse array representation.

The facts:
- the population in the data-structures can be bigger than 10^6
- the maximum number of elements (upper bound) is known and fixed
- the key is an integer, the element is a float value
- I need to do lots of access/insertion operations, but I don't care

Any help is welcome.
If this post is not adapted to this group, please let me know.
Thanks.

Tom

Aug 1 '05 #1
10 2223
Tom wrote:
I am looking for an optimal data-structure in order to replace a
map<int,float>.
I use it as some kind of sparse array representation.

The facts:
- the population in the data-structures can be bigger than 10^6
1.1 * 10^6 is bigger, and so is 10^100. How much bigger are we talking
- the maximum number of elements (upper bound) is known and fixed
And how does it relate to the size of the population? Is it known at
compile-time or at run-time?
- the key is an integer, the element is a float value
- I need to do lots of access/insertion operations, but I don't care

Have you looked at a hash_map?

V
Aug 1 '05 #2
Thanks for the quick answer Victor
1.1 * 10^6 is bigger, and so is 10^100. How much bigger are we talking about?
Let's say the upper bound is 10^9. But it is very often comprised
between 10^5 and 10^6.
And how does it relate to the size of the population?
it is the same.
Is it known at compile-time or at run-time?
the upper bound is known at run-time.
Have you looked at a hash_map?

Not yet because it seems <hash_map> is not available with Visual C++.

Aug 1 '05 #3
Tom wrote:
Thanks for the quick answer Victor

1.1 * 10^6 is bigger, and so is 10^100. How much bigger are we talking about?

Let's say the upper bound is 10^9. But it is very often comprised
between 10^5 and 10^6.

Does a single collection ever grow? Do you know how many elements are
going to be in a particular collection before you start inserting?
And how does it relate to the size of the population?

it is the same.

I don't understand then. You said the array is sparse. In that case you
either don't want to keep the extra values (which would make the actual
population much smaller than the upper limit of the "index" or "key") or
you need to somehow indicate that a certain value is "missing". So, which
is it? Say, you have the upper value 100, and the population is no larger
than 10. You could then use an array of 100 values out of which only up
to 10 values would be valid, and the validity you could verify through
another (small) array of "valid indices". I don't think this is a good
idea because if I understood correctly, you would need an array of 10^9 FP
values, which is rather big, when only using about 10^5 to 10^6 of them.

Storing only as many values as you have in the "population" requires that
the "array" is sorted for quick retrieval. That's what 'map' gives you,
kind of. BTW, what's wrong with 'map'? Is it not fast enough?

Another requirement you left unspecified: does your collection ever reach
the point of "no need to change any longer", after which you only retrieve
values and don't insert new ones? Do you ever need to remove any values
from the collection?
Is it known at compile-time or at run-time?

the upper bound is known at run-time.

What's your definition of "upper bound"? I thought that when you say 10^9
that means that 1000000000 _is_ the upper bound.
Have you looked at a hash_map?

Not yet because it seems <hash_map> is not available with Visual C++.

Right, it isn't. You need to get the SGI's version of STL or find
a hash_map implementation in some other library (Boost probably has it).

V
Aug 1 '05 #4
> Does a single collection ever grow? Do you know how many elements are going to be in a particular collection before you start inserting?

The collection is empty when I start inserting/accessing. And it always
grow to 10^5-10^6.
And how does it relate to the size of the population?
Correction: the size (upper bound) of the population is around 10^9,
but I will only visit 10^6, which is the reason I want a sparse
representation.
does your collection ever reach the point of "no need to change any longer", after which you only retrieve values and don't insert new ones?
Yes definitely. I will need to access a fraction of the elements that
have been inserted (1% of them). Using the maps for that is efficient
enough.
Do you ever need to remove any values from the collection?
Never.
BTW, what's wrong with 'map'? Is it not fast enough?

It's fast, but I wish I could make it even faster. Inserting points in
the map cost me ~30% of my program's time.
If I can reduce it, it will allow me to work with bigger datasets and
to use several of these data-structures at the same time.

Thanks,

Tom

Aug 1 '05 #5
Tom wrote:
[...]
BTW, what's wrong with 'map'? Is it not fast enough?

It's fast, but I wish I could make it even faster. Inserting points in
the map cost me ~30% of my program's time.
If I can reduce it, it will allow me to work with bigger datasets and
to use several of these data-structures at the same time.

I don't know what else to recommend you. I've recently had to change from
storing some [relatively small] objects in a 'std::set' and looking them
up in it, to a different approach altogether. I managed to speed up some
part of my program by more than 100-fold. The main change was to remove
a lookup and instead do a sequential processing of data, which were stored
elsewhere anyway. I am not saying that it should necessarily help in your
case, but try to look at the problem from a different point of view. What
if instead of storing all those values you try to recalculate them, or is
it definitely impossible? What if instead of trying to speed up storing
and retrieval you try to speed up recalculation? Maybe cache some values
to do so... Anyway, think outside the box.

V
Aug 1 '05 #6
Tom wrote:
I am looking for an optimal data-structure
in order to replace a map<int,float>.
I use it as some kind of sparse array representation.

The facts:
- the population in the data-structures can be bigger than 10^6
- the maximum number of elements (upper bound) is known and fixed
- the key is an integer, the element is a float value
- I need to do lots of access/insertion operations, but I don't care

Any help is welcome.

How is this "data-structure" different from a map?
If it is a map, then std::map<int, float>
is probably as close to optimal as you can get.
Aug 1 '05 #7
I use this datastructure to compute a wave propagation in part of a
multidimensional domain so recalculation would involves the neighboring
values. It will definitely be too much

Aug 1 '05 #8
This is what I am currently using.
I wanted to know if the constraints (integer key, upper bound on the
key, etc...) could lead to a faster implementation.
Thanks.

Aug 1 '05 #9
Tom wrote:
This is what I am currently using.
I wanted to know if the constraints
(integer key, upper bound on the key, etc...)
could lead to a faster implementation.

It sounds like a Quality of Implementation (QoI) issue.
Tell us which [compiler] implementation you are using
and maybe we can redirect you to an appropriate forum.
Aug 2 '05 #10
"Tom" <th**************@gmail.com> wrote in message
Have you looked at a hash_map?

Not yet because it seems <hash_map> is not available with Visual C++.

Visual C++ 7 (.NET) has <hash_map> with hash_map declared in namespace
stdext.
Aug 2 '05 #11

### This discussion thread is closed

Replies have been disabled for this discussion.