Connecting Tech Pros Worldwide Help | Site Map

How to find out memory occupied by vector, set, or map?

 
LinkBack Thread Tools Search this Thread
  #1  
Old June 27th, 2008, 04:43 PM
Lambda
Guest
 
Posts: n/a
Default How to find out memory occupied by vector, set, or map?

I'm trying to develop several interesting components
of a simple search engine follow some text books.

A book introduces some ways to compress dictionary,
allow it to stay on main memory.
In the dictionary, is a list of millions of words
with related data of several bytes.

Word / Term Freq(int) / Pointer to disk file
Hello 100

One method is:
concatenate all the words into one long contiguous string
and an array of 4-byte character pointers is used to access.

But I think STL map is a suitable data structure for this problem.
I don't know the detailed internal implementation of map.
I think it's implemented as a binary search tree.
Every node occupy just enough memory. (I'm not sure)
I also need the search feature of BST.

So I'd like to know if it is a need to build the data structure
myself.
How can I know the whole memory occupied by a map,
as well as other data structures such as vector, set.
(Or memory occupied by a map element).

BTW, should I use std::string or char[] to store the millions of
words?
How much is the string overhead?

  #2  
Old June 27th, 2008, 04:43 PM
James Kanze
Guest
 
Posts: n/a
Default Re: How to find out memory occupied by vector, set, or map?

On May 20, 4:58 am, Lambda <stephenh...@gmail.comwrote:
Quote:
I'm trying to develop several interesting components
of a simple search engine follow some text books.
Quote:
A book introduces some ways to compress dictionary,
allow it to stay on main memory.
In the dictionary, is a list of millions of words
with related data of several bytes.
Quote:
Word / Term Freq(int) / Pointer to disk file
Hello 100
Quote:
One method is:
concatenate all the words into one long contiguous string
and an array of 4-byte character pointers is used to access.
A classical string pool. It's not a solution to your problem,
just a very special way to represent strings.
Quote:
But I think STL map is a suitable data structure for this
problem. I don't know the detailed internal implementation of
map. I think it's implemented as a binary search tree. Every
node occupy just enough memory. (I'm not sure) I also need the
search feature of BST.
Every node is dynamically allocated, and contains several
pointers in addition to the raw data, so it can be fairly
expensive in terms of memory use. On the other hand, for
something small, like a couple of million elements, why not.

Note that access time is O(lg n), and not constant. For more
than a couple of thousand elements, a hash table will be faster
(provided you use a good hash function).
Quote:
So I'd like to know if it is a need to build the data
structure myself. How can I know the whole memory occupied by
a map, as well as other data structures such as vector, set.
(Or memory occupied by a map element).
You can't really, since there are so many hidden overheads. I'd
just go ahead and implement using std::map, but carefully
encapsulating it so that I could replace it if it did cause a
problem later.
Quote:
BTW, should I use std::string or char[] to store the millions
of words?
std::string, definitely.
Quote:
How much is the string overhead?
Again, it depends on the implementation. A lot of
implementations today use a small string optimization which
avoids dynamic allocation for strings shorter than some cut-off
size; if your words are English, there's a good chance that most
of them will fall under this limit.

All sorts of optimizations are possible, once you have the
application working, and can see if they're necessary.

--
James Kanze (GABI Software) email:james.kanze@gmail.com
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34
 

Bookmarks

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On

Popular Articles

What is Bytes?

We are a network of experts and professionals in IT and software development that help one another with answers to tough questions and share insights. Get the best answers to your questions from over 220,662 network members.