469,950 Members | 1,349 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 469,950 developers. It's quick & easy.

How to find out memory occupied by vector, set, or map?

I'm trying to develop several interesting components
of a simple search engine follow some text books.

A book introduces some ways to compress dictionary,
allow it to stay on main memory.
In the dictionary, is a list of millions of words
with related data of several bytes.

Word / Term Freq(int) / Pointer to disk file
Hello 100

One method is:
concatenate all the words into one long contiguous string
and an array of 4-byte character pointers is used to access.

But I think STL map is a suitable data structure for this problem.
I don't know the detailed internal implementation of map.
I think it's implemented as a binary search tree.
Every node occupy just enough memory. (I'm not sure)
I also need the search feature of BST.

So I'd like to know if it is a need to build the data structure
myself.
How can I know the whole memory occupied by a map,
as well as other data structures such as vector, set.
(Or memory occupied by a map element).

BTW, should I use std::string or char[] to store the millions of
words?
How much is the string overhead?
Jun 27 '08 #1
1 2023
On May 20, 4:58 am, Lambda <stephenh...@gmail.comwrote:
I'm trying to develop several interesting components
of a simple search engine follow some text books.
A book introduces some ways to compress dictionary,
allow it to stay on main memory.
In the dictionary, is a list of millions of words
with related data of several bytes.
Word / Term Freq(int) / Pointer to disk file
Hello 100
One method is:
concatenate all the words into one long contiguous string
and an array of 4-byte character pointers is used to access.
A classical string pool. It's not a solution to your problem,
just a very special way to represent strings.
But I think STL map is a suitable data structure for this
problem. I don't know the detailed internal implementation of
map. I think it's implemented as a binary search tree. Every
node occupy just enough memory. (I'm not sure) I also need the
search feature of BST.
Every node is dynamically allocated, and contains several
pointers in addition to the raw data, so it can be fairly
expensive in terms of memory use. On the other hand, for
something small, like a couple of million elements, why not.

Note that access time is O(lg n), and not constant. For more
than a couple of thousand elements, a hash table will be faster
(provided you use a good hash function).
So I'd like to know if it is a need to build the data
structure myself. How can I know the whole memory occupied by
a map, as well as other data structures such as vector, set.
(Or memory occupied by a map element).
You can't really, since there are so many hidden overheads. I'd
just go ahead and implement using std::map, but carefully
encapsulating it so that I could replace it if it did cause a
problem later.
BTW, should I use std::string or char[] to store the millions
of words?
std::string, definitely.
How much is the string overhead?
Again, it depends on the implementation. A lot of
implementations today use a small string optimization which
avoids dynamic allocation for strings shorter than some cut-off
size; if your words are English, there's a good chance that most
of them will fall under this limit.

All sorts of optimizations are possible, once you have the
application working, and can see if they're necessary.

--
James Kanze (GABI Software) email:ja*********@gmail.com
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34
Jun 27 '08 #2

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

9 posts views Thread by Birgit Rahm | last post: by
2 posts views Thread by anand | last post: by
1 post views Thread by trialproduct2004 | last post: by
18 posts views Thread by Jan | last post: by
18 posts views Thread by ma740988 | last post: by
17 posts views Thread by christophe.chazeau | last post: by
39 posts views Thread by Ravi | last post: by
4 posts views Thread by alien.0101 | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.