470,870 Members | 1,387 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 470,870 developers. It's quick & easy.

STL Map uses hashing?

I am reading about hashing techniques. The map data structure
available in C++ STL uses hashing techniques?
Jun 27 '08 #1
15 2796
Vinodh wrote:
I am reading about hashing techniques. The map data structure
available in C++ STL uses hashing techniques?
std::map std::multimap std::set std::multiset are often implemented
using Red-Black Tree

In tr1, there are unordered_xxx types using hash table
which were implemented by some vendors as hash_xxx for extension before tr1

--
Best Regards
Barry
Jun 27 '08 #2
In article <4353ee37-4fa4-4bc0-924f-482523333e21
@u12g2000prd.googlegroups.com>, pv**********@gmail.com says...
I am reading about hashing techniques. The map data structure
available in C++ STL uses hashing techniques?
No -- it uses a balanced tree of some sort (e.g. AVL or red-black tree).

TR1 (and C++ 0x) add unordered_[multi]set and unordered_[multi]map,
which are intended to be based on hashing.

--
Later,
Jerry.

The universe is a figment of its own imagination.
Jun 27 '08 #3
In article <MP************************@news.sunsite.dk>,
Jerry Coffin <jc*****@taeus.comwrote:
>In article <4353ee37-4fa4-4bc0-924f-482523333e21
@u12g2000prd.googlegroups.com>, pv**********@gmail.com says...
>I am reading about hashing techniques. The map data structure
available in C++ STL uses hashing techniques?

No -- it uses a balanced tree of some sort (e.g. AVL or red-black tree).
For information, this is because of requirements:

A STL map is required to have a worse case find() of O log(N).
Balanced trees give you that. Although hash maps are typically faster
than balanced tree on large amount of random data, they do not
guarantee a worse case scenario of O log(N).

>TR1 (and C++ 0x) add unordered_[multi]set and unordered_[multi]map,
which are intended to be based on hashing.
I.e. TR1 did accept that hash maps are also useful and added them.
Now all we need is for the developpers to be able to make an informed
choice on the correct one to use for a particular application. :-)

Yan


Jun 27 '08 #4
On May 30, 4:28 pm, ytrem...@nyx.nyx.net (Yannick Tremblay) wrote:
In article <MPG.22a85e2e38d3b341989...@news.sunsite.dk>,
Jerry Coffin <jcof...@taeus.comwrote:
In article <4353ee37-4fa4-4bc0-924f-482523333e21
@u12g2000prd.googlegroups.com>, pvinodhku...@gmail.com says...
I am reading about hashing techniques. The map data structure
available in C++ STL uses hashing techniques?
No -- it uses a balanced tree of some sort (e.g. AVL or
red-black tree).
For information, this is because of requirements:
A STL map is required to have a worse case find() of O log(N).
Balanced trees give you that. Although hash maps are
typically faster than balanced tree on large amount of random
data, they do not guarantee a worse case scenario of O log(N).
I don't know about the "typically faster". Their average access
time is faster IF the hash function is good. Typically, I find
that it's not always that good (although the situation is
improving).

--
James Kanze (GABI Software) email:ja*********@gmail.com
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34
Jun 27 '08 #5
James Kanze wrote:
I don't know about the "typically faster". Their average access
time is faster IF the hash function is good. Typically, I find
that it's not always that good (although the situation is
improving).
Question: Will the new C++ standard require for the libraries to
provide high-quality default hashing functions for internal types (and
perhaps some other standard types such as std::string), or will the user
be forced to always provide one himself?

Creating good-quality hashing functions are not a trivial task at all
(and the subject of continuing extensive research). One cannot expect
the average user to invent a good-quality function without extensive
knowledge and experience on the subject.
Jun 27 '08 #6
On 2008-05-31 11:50:57 -0400, Juha Nieminen <no****@thanks.invalidsaid:
James Kanze wrote:
>I don't know about the "typically faster". Their average access
time is faster IF the hash function is good. Typically, I find
that it's not always that good (although the situation is
improving).

Question: Will the new C++ standard require for the libraries to
provide high-quality default hashing functions for internal types (and
perhaps some other standard types such as std::string), or will the user
be forced to always provide one himself?
"High quality" depends on the distribution of the actual values that
you're using. A "high quality" integer hash function for one
application may be an utter failure in another application.
>
Creating good-quality hashing functions are not a trivial task at all
(and the subject of continuing extensive research). One cannot expect
the average user to invent a good-quality function without extensive
knowledge and experience on the subject.
Indeed. That's why I've never thought that standardizing hashed
containers was a good idea. There's just too much flexibility, making
them hard to specify well, with the result that naive users can get
horrible performance without knowing why.

But, to answer your question, no, there is no requirement for "high
quality" hashing functions. Implementations will be required to provide
hashing functions for the builtin types, pointers, std::string,
std::u16string, std::u32string, std::wstring, std::error_code, and
std::thread::id.

--
Pete
Roundhouse Consulting, Ltd. (www.versatilecoding.com) Author of "The
Standard C++ Library Extensions: a Tutorial and Reference
(www.petebecker.com/tr1book)

Jun 27 '08 #7
On 2008-05-31 17:50, Juha Nieminen wrote:
James Kanze wrote:
>I don't know about the "typically faster". Their average access
time is faster IF the hash function is good. Typically, I find
that it's not always that good (although the situation is
improving).

Question: Will the new C++ standard require for the libraries to
provide high-quality default hashing functions for internal types (and
perhaps some other standard types such as std::string), or will the user
be forced to always provide one himself?

Creating good-quality hashing functions are not a trivial task at all
(and the subject of continuing extensive research). One cannot expect
the average user to invent a good-quality function without extensive
knowledge and experience on the subject.
Currently the parametrised hash struct (which is used by the unordered
containers) is required to be instantiable for integers, float, double,
pointers, and string-types. It says nothing about the quality of the
implementation.

--
Erik Wikström
Jun 27 '08 #8
Pete Becker schrieb:
> Creating good-quality hashing functions are not a trivial task at all
(and the subject of continuing extensive research). One cannot expect
the average user to invent a good-quality function without extensive
knowledge and experience on the subject.

Indeed. That's why I've never thought that standardizing hashed
containers was a good idea. There's just too much flexibility, making
them hard to specify well, with the result that naive users can get
horrible performance without knowing why.

But, to answer your question, no, there is no requirement for "high
quality" hashing functions. Implementations will be required to provide
hashing functions for the builtin types, pointers, std::string,
std::u16string, std::u32string, std::wstring, std::error_code, and
std::thread::id.
What about std::pair<>, tuple<>, etc.?

How to do a good hashing function for a combined type like

struct key
{
int key1;
std::string key2;
};

?

--
Thomas
Jun 27 '08 #9
On 2008-05-31 12:42:50 -0400, "Thomas J. Gritzan"
<ph*************@gmx.desaid:
Pete Becker schrieb:
>> Creating good-quality hashing functions are not a trivial task at all
(and the subject of continuing extensive research). One cannot expect
the average user to invent a good-quality function without extensive
knowledge and experience on the subject.

Indeed. That's why I've never thought that standardizing hashed
containers was a good idea. There's just too much flexibility, making
them hard to specify well, with the result that naive users can get
horrible performance without knowing why.

But, to answer your question, no, there is no requirement for "high
quality" hashing functions. Implementations will be required to provide
hashing functions for the builtin types, pointers, std::string,
std::u16string, std::u32string, std::wstring, std::error_code, and
std::thread::id.

What about std::pair<>, tuple<>, etc.?
They're not on the list.
>
How to do a good hashing function for a combined type like

struct key
{
int key1;
std::string key2;
};

?
It depends strongly on how the values are distributed.

--
Pete
Roundhouse Consulting, Ltd. (www.versatilecoding.com) Author of "The
Standard C++ Library Extensions: a Tutorial and Reference
(www.petebecker.com/tr1book)

Jun 27 '08 #10
In article <Bt***************@read4.inet.fi>, no****@thanks.invalid
says...

[ ... ]
Question: Will the new C++ standard require for the libraries to
provide high-quality default hashing functions for internal types (and
perhaps some other standard types such as std::string), or will the user
be forced to always provide one himself?
Hash functions are required in the standard library. <functionalhas
specializations of hash<for the obvious arithmetic types, plus
pointers, string types (string, u16string, u32string, wstring) and
error_codes and thread::id's.

As usual, the quality is up to the individual implementation. [N2521,
unord.hash/2]: "The return value of operator() is unspecified, except
that equal arguments shall yield the same result."

--
Later,
Jerry.

The universe is a figment of its own imagination.
Jun 27 '08 #11
On May 31, 6:02 pm, Pete Becker <p...@versatilecoding.comwrote:
On 2008-05-31 11:50:57 -0400, Juha Nieminen <nos...@thanks.invalidsaid:
[...]
But, to answer your question, no, there is no requirement for
"high quality" hashing functions. Implementations will be
required to provide hashing functions for the builtin types,
pointers, std::string, std::u16string, std::u32string,
std::wstring, std::error_code, and std::thread::id.
But not std::type_info?

--
James Kanze (GABI Software) email:ja*********@gmail.com
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34
Jun 27 '08 #12
On 2008-05-31 19:20:35 -0400, James Kanze <ja*********@gmail.comsaid:
On May 31, 6:02 pm, Pete Becker <p...@versatilecoding.comwrote:
>On 2008-05-31 11:50:57 -0400, Juha Nieminen <nos...@thanks.invalidsaid:

[...]
>But, to answer your question, no, there is no requirement for
"high quality" hashing functions. Implementations will be
required to provide hashing functions for the builtin types,
pointers, std::string, std::u16string, std::u32string,
std::wstring, std::error_code, and std::thread::id.

But not std::type_info?
Not yet. <g>

--
Pete
Roundhouse Consulting, Ltd. (www.versatilecoding.com) Author of "The
Standard C++ Library Extensions: a Tutorial and Reference
(www.petebecker.com/tr1book)

Jun 27 '08 #13
In article <f3**********************************@l42g2000hsc. googlegroups.com>,
James Kanze <ja*********@gmail.comwrote:
>On May 30, 4:28 pm, ytrem...@nyx.nyx.net (Yannick Tremblay) wrote:
>A STL map is required to have a worse case find() of O log(N).
Balanced trees give you that. Although hash maps are
typically faster than balanced tree on large amount of random
data, they do not guarantee a worse case scenario of O log(N).

I don't know about the "typically faster". Their average access
time is faster IF the hash function is good. Typically, I find
that it's not always that good (although the situation is
improving).
Ok, sorry about my lack of definition for "typically":

Assuming a non-perfect hash function that return you a bucket
identifier (almost necessary since a perfect hash function would be too
expensive for it to be worthwhile), this hash function makes
assumptions on the data it receives (e.g. random). The hash table
making use of this hash function will be very fast when presented with
"typical" data (i.e. data that generally meet the assumptions used for
creating the hash function). Unfortunately, there may be some worse
case scenario under which the performance of the hash table will be
abyssimal.
Yan
Jun 27 '08 #14
On Jun 2, 3:28 pm, ytrem...@nyx.nyx.net (Yannick Tremblay) wrote:
In article
<f323e752-ab41-4697-9e74-635e235be...@l42g2000hsc.googlegroups.com>,
James Kanze <james.ka...@gmail.comwrote:
On May 30, 4:28 pm, ytrem...@nyx.nyx.net (Yannick Tremblay) wrote:
A STL map is required to have a worse case find() of O log(N).
Balanced trees give you that. Although hash maps are
typically faster than balanced tree on large amount of random
data, they do not guarantee a worse case scenario of O log(N).
I don't know about the "typically faster". Their average access
time is faster IF the hash function is good. Typically, I find
that it's not always that good (although the situation is
improving).
Ok, sorry about my lack of definition for "typically":
Assuming a non-perfect hash function that return you a bucket
identifier (almost necessary since a perfect hash function
would be too expensive for it to be worthwhile), this hash
function makes assumptions on the data it receives (e.g.
random). The hash table making use of this hash function will
be very fast when presented with "typical" data (i.e. data
that generally meet the assumptions used for creating the hash
function). Unfortunately, there may be some worse case
scenario under which the performance of the hash table will be
abyssimal.
That's not my point. The performance of a hash table depends
very much on the quality of the hash function. While a perfect
hash function is only possible if the exact set of data are
known in advance, hash functions do differ enormously in their
quality, and I've seen more than a few cases where even some
very competent programmers have come up with relatively poor
hash functions; the most obvious is in Java's hash map, where
Java was forced to change the hash function as a result. I
suspect that part of the reason why the original STL had a
balanced tree, but not a hash table, is that one can sort of
expect an average programmer to get the ordering relationship
right, but you can't expect him to define a good hash function.
(There's also the point that if he gets the ordering function
wrong, there's a good chance of the program not working at all,
so he'll notice. Where as in the case of a hash function, the
program will only be a little bit slower than it should be.)

--
James Kanze (GABI Software) email:ja*********@gmail.com
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34
Jun 27 '08 #15
"James Kanze" <ja*********@gmail.comwrote in message
news:17**********************************@79g2000h sk.googlegroups.com...
On Jun 2, 3:28 pm, ytrem...@nyx.nyx.net (Yannick Tremblay) wrote:
I suspect that part of the reason why the original STL had a
balanced tree, but not a hash table, is that one can sort of
expect an average programmer to get the ordering relationship
right, but you can't expect him to define a good hash function.
(There's also the point that if he gets the ordering function
wrong, there's a good chance of the program not working at all,
so he'll notice. Where as in the case of a hash function, the
program will only be a little bit slower than it should be.)
I believe that I implemented the first associative-array class in C++ (20
years ago -- how time flies! -- see
http://www.softwarepreservation.org/...nig-assoc.pdf),
and the architecture of the STL map class was partly based on my design; so
I'm pretty confident that I understand why it was designed as it was :-)

James Kanze's description is correct, as far as it goes; but it misses one
point that I considered socially (or, if you prefer, politically) important
at the time: If a user supplies a naive hash function that results in
terrible performance, the user is likely to blame the library (or the
language!) rather than the hash function. Therefore, when I designed the
associative-array class, I felt that it was more important to obtain
acceptable worst-case performance easily than it was to obtain excellent
average-case performance with substantial effort.
Jun 27 '08 #16

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

2 posts views Thread by Pat | last post: by
1 post views Thread by snowteo | last post: by
11 posts views Thread by Wm. Scott Miller | last post: by
10 posts views Thread by Dino M. Buljubasic | last post: by
19 posts views Thread by Ole Nielsby | last post: by
8 posts views Thread by Maya | last post: by
6 posts views Thread by Jayender | last post: by
1 post views Thread by Tinku | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.