STL Map uses hashing?

Vinodh

I am reading about hashing techniques. The map data structure
available in C++ STL uses hashing techniques?

Jun 27 '08 #1

Subscribe Post Reply

2970

Barry

Vinodh wrote:

I am reading about hashing techniques. The map data structure
available in C++ STL uses hashing techniques?

std::map std::multimap std::set std::multiset are often implemented
using Red-Black Tree

In tr1, there are unordered_xxx types using hash table
which were implemented by some vendors as hash_xxx for extension before tr1

--
Best Regards
Barry

Jun 27 '08 #2

Jerry Coffin

In article <4353ee37-4fa4-4bc0-924f-482523333e21
@u12g2000prd.googlegroups.com>, pv**********@gmail.com says...

I am reading about hashing techniques. The map data structure
available in C++ STL uses hashing techniques?

No -- it uses a balanced tree of some sort (e.g. AVL or red-black tree).

TR1 (and C++ 0x) add unordered_[multi]set and unordered_[multi]map,
which are intended to be based on hashing.

--
Later,
Jerry.

The universe is a figment of its own imagination.

Jun 27 '08 #3

Yannick Tremblay

In article <MP************************@news.sunsite.dk>,
Jerry Coffin <jc*****@taeus.comwrote:

>In article <4353ee37-4fa4-4bc0-924f-482523333e21
@u12g2000prd.googlegroups.com>, pv**********@gmail.com says...
>I am reading about hashing techniques. The map data structure
available in C++ STL uses hashing techniques?

No -- it uses a balanced tree of some sort (e.g. AVL or red-black tree).

For information, this is because of requirements:

A STL map is required to have a worse case find() of O log(N).
Balanced trees give you that. Although hash maps are typically faster
than balanced tree on large amount of random data, they do not
guarantee a worse case scenario of O log(N).

>TR1 (and C++ 0x) add unordered_[multi]set and unordered_[multi]map,
which are intended to be based on hashing.

I.e. TR1 did accept that hash maps are also useful and added them.
Now all we need is for the developpers to be able to make an informed
choice on the correct one to use for a particular application. :-)

Yan

Jun 27 '08 #4

James Kanze

On May 30, 4:28 pm, ytrem...@nyx.nyx.net (Yannick Tremblay) wrote:

In article <MPG.22a85e2e38d3b341989...@news.sunsite.dk>,
Jerry Coffin <jcof...@taeus.comwrote:

In article <4353ee37-4fa4-4bc0-924f-482523333e21
@u12g2000prd.googlegroups.com>, pvinodhku...@gmail.com says...
I am reading about hashing techniques. The map data structure
available in C++ STL uses hashing techniques?

No -- it uses a balanced tree of some sort (e.g. AVL or
red-black tree).

For information, this is because of requirements:

A STL map is required to have a worse case find() of O log(N).
Balanced trees give you that. Although hash maps are
typically faster than balanced tree on large amount of random
data, they do not guarantee a worse case scenario of O log(N).

I don't know about the "typically faster". Their average access
time is faster IF the hash function is good. Typically, I find
that it's not always that good (although the situation is
improving).

--
James Kanze (GABI Software) email:ja*********@gmail.com
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

Jun 27 '08 #5

Juha Nieminen

James Kanze wrote:

I don't know about the "typically faster". Their average access
time is faster IF the hash function is good. Typically, I find
that it's not always that good (although the situation is
improving).

Question: Will the new C++ standard require for the libraries to
provide high-quality default hashing functions for internal types (and
perhaps some other standard types such as std::string), or will the user
be forced to always provide one himself?

Creating good-quality hashing functions are not a trivial task at all
(and the subject of continuing extensive research). One cannot expect
the average user to invent a good-quality function without extensive
knowledge and experience on the subject.

Jun 27 '08 #6

Pete Becker

On 2008-05-31 11:50:57 -0400, Juha Nieminen <no****@thanks.invalidsaid:

James Kanze wrote:
>I don't know about the "typically faster". Their average access
time is faster IF the hash function is good. Typically, I find
that it's not always that good (although the situation is
improving).

Question: Will the new C++ standard require for the libraries to
provide high-quality default hashing functions for internal types (and
perhaps some other standard types such as std::string), or will the user
be forced to always provide one himself?

"High quality" depends on the distribution of the actual values that
you're using. A "high quality" integer hash function for one
application may be an utter failure in another application.

>
Creating good-quality hashing functions are not a trivial task at all
(and the subject of continuing extensive research). One cannot expect
the average user to invent a good-quality function without extensive
knowledge and experience on the subject.

Indeed. That's why I've never thought that standardizing hashed
containers was a good idea. There's just too much flexibility, making
them hard to specify well, with the result that naive users can get
horrible performance without knowing why.

But, to answer your question, no, there is no requirement for "high
quality" hashing functions. Implementations will be required to provide
hashing functions for the builtin types, pointers, std::string,
std::u16string, std::u32string, std::wstring, std::error_code, and
std::thread::id.

--
Pete
Roundhouse Consulting, Ltd. (www.versatilecoding.com) Author of "The
Standard C++ Library Extensions: a Tutorial and Reference
(www.petebecker.com/tr1book)

Jun 27 '08 #7

=?UTF-8?B?RXJpayBXaWtzdHLDtm0=?=

On 2008-05-31 17:50, Juha Nieminen wrote:

James Kanze wrote:
>I don't know about the "typically faster". Their average access
time is faster IF the hash function is good. Typically, I find
that it's not always that good (although the situation is
improving).

Question: Will the new C++ standard require for the libraries to
provide high-quality default hashing functions for internal types (and
perhaps some other standard types such as std::string), or will the user
be forced to always provide one himself?

Creating good-quality hashing functions are not a trivial task at all
(and the subject of continuing extensive research). One cannot expect
the average user to invent a good-quality function without extensive
knowledge and experience on the subject.

Currently the parametrised hash struct (which is used by the unordered
containers) is required to be instantiable for integers, float, double,
pointers, and string-types. It says nothing about the quality of the
implementation.

--
Erik WikstrÃ¶m

Jun 27 '08 #8

Thomas J. Gritzan

Pete Becker schrieb:

> Creating good-quality hashing functions are not a trivial task at all
(and the subject of continuing extensive research). One cannot expect
the average user to invent a good-quality function without extensive
knowledge and experience on the subject.

Indeed. That's why I've never thought that standardizing hashed
containers was a good idea. There's just too much flexibility, making
them hard to specify well, with the result that naive users can get
horrible performance without knowing why.

But, to answer your question, no, there is no requirement for "high
quality" hashing functions. Implementations will be required to provide
hashing functions for the builtin types, pointers, std::string,
std::u16string, std::u32string, std::wstring, std::error_code, and
std::thread::id.

What about std::pair<>, tuple<>, etc.?

How to do a good hashing function for a combined type like

struct key
{
int key1;
std::string key2;
};

?

--
Thomas

Jun 27 '08 #9

Pete Becker

On 2008-05-31 12:42:50 -0400, "Thomas J. Gritzan"
<ph*************@gmx.desaid:

Pete Becker schrieb:

>> Creating good-quality hashing functions are not a trivial task at all
(and the subject of continuing extensive research). One cannot expect
the average user to invent a good-quality function without extensive
knowledge and experience on the subject.

Indeed. That's why I've never thought that standardizing hashed
containers was a good idea. There's just too much flexibility, making
them hard to specify well, with the result that naive users can get
horrible performance without knowing why.

But, to answer your question, no, there is no requirement for "high
quality" hashing functions. Implementations will be required to provide
hashing functions for the builtin types, pointers, std::string,
std::u16string, std::u32string, std::wstring, std::error_code, and
std::thread::id.

What about std::pair<>, tuple<>, etc.?

They're not on the list.

>
How to do a good hashing function for a combined type like

struct key
{
int key1;
std::string key2;
};

?

It depends strongly on how the values are distributed.

--
Pete
Roundhouse Consulting, Ltd. (www.versatilecoding.com) Author of "The
Standard C++ Library Extensions: a Tutorial and Reference
(www.petebecker.com/tr1book)

Jun 27 '08 #10

Jerry Coffin

In article <Bt***************@read4.inet.fi>, no****@thanks.invalid
says...

[ ... ]

Question: Will the new C++ standard require for the libraries to
provide high-quality default hashing functions for internal types (and
perhaps some other standard types such as std::string), or will the user
be forced to always provide one himself?

Hash functions are required in the standard library. <functionalhas
specializations of hash<for the obvious arithmetic types, plus
pointers, string types (string, u16string, u32string, wstring) and
error_codes and thread::id's.

As usual, the quality is up to the individual implementation. [N2521,
unord.hash/2]: "The return value of operator() is unspecified, except
that equal arguments shall yield the same result."

--
Later,
Jerry.

The universe is a figment of its own imagination.

Jun 27 '08 #11

James Kanze

On May 31, 6:02 pm, Pete Becker <p...@versatilecoding.comwrote:

On 2008-05-31 11:50:57 -0400, Juha Nieminen <nos...@thanks.invalidsaid:

[...]

But, to answer your question, no, there is no requirement for
"high quality" hashing functions. Implementations will be
required to provide hashing functions for the builtin types,
pointers, std::string, std::u16string, std::u32string,
std::wstring, std::error_code, and std::thread::id.

But not std::type_info?

--
James Kanze (GABI Software) email:ja*********@gmail.com
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

Jun 27 '08 #12

Pete Becker

On 2008-05-31 19:20:35 -0400, James Kanze <ja*********@gmail.comsaid:

On May 31, 6:02 pm, Pete Becker <p...@versatilecoding.comwrote:
>On 2008-05-31 11:50:57 -0400, Juha Nieminen <nos...@thanks.invalidsaid:

[...]
>But, to answer your question, no, there is no requirement for
"high quality" hashing functions. Implementations will be
required to provide hashing functions for the builtin types,
pointers, std::string, std::u16string, std::u32string,
std::wstring, std::error_code, and std::thread::id.

But not std::type_info?

Not yet. <g>

--
Pete
Roundhouse Consulting, Ltd. (www.versatilecoding.com) Author of "The
Standard C++ Library Extensions: a Tutorial and Reference
(www.petebecker.com/tr1book)

Jun 27 '08 #13

Yannick Tremblay

In article <f3**********************************@l42g2000hsc. googlegroups.com>,
James Kanze <ja*********@gmail.comwrote:

>On May 30, 4:28 pm, ytrem...@nyx.nyx.net (Yannick Tremblay) wrote:

>A STL map is required to have a worse case find() of O log(N).
Balanced trees give you that. Although hash maps are
typically faster than balanced tree on large amount of random
data, they do not guarantee a worse case scenario of O log(N).

I don't know about the "typically faster". Their average access
time is faster IF the hash function is good. Typically, I find
that it's not always that good (although the situation is
improving).

Ok, sorry about my lack of definition for "typically":

Assuming a non-perfect hash function that return you a bucket
identifier (almost necessary since a perfect hash function would be too
expensive for it to be worthwhile), this hash function makes
assumptions on the data it receives (e.g. random). The hash table
making use of this hash function will be very fast when presented with
"typical" data (i.e. data that generally meet the assumptions used for
creating the hash function). Unfortunately, there may be some worse
case scenario under which the performance of the hash table will be
abyssimal.
Yan

Jun 27 '08 #14

James Kanze

On Jun 2, 3:28 pm, ytrem...@nyx.nyx.net (Yannick Tremblay) wrote:

In article
<f323e752-ab41-4697-9e74-635e235be...@l42g2000hsc.googlegroups.com>,
James Kanze <james.ka...@gmail.comwrote:

On May 30, 4:28 pm, ytrem...@nyx.nyx.net (Yannick Tremblay) wrote:

A STL map is required to have a worse case find() of O log(N).
Balanced trees give you that. Although hash maps are
typically faster than balanced tree on large amount of random
data, they do not guarantee a worse case scenario of O log(N).

I don't know about the "typically faster". Their average access
time is faster IF the hash function is good. Typically, I find
that it's not always that good (although the situation is
improving).

Ok, sorry about my lack of definition for "typically":

Assuming a non-perfect hash function that return you a bucket
identifier (almost necessary since a perfect hash function
would be too expensive for it to be worthwhile), this hash
function makes assumptions on the data it receives (e.g.
random). The hash table making use of this hash function will
be very fast when presented with "typical" data (i.e. data
that generally meet the assumptions used for creating the hash
function). Unfortunately, there may be some worse case
scenario under which the performance of the hash table will be
abyssimal.

That's not my point. The performance of a hash table depends
very much on the quality of the hash function. While a perfect
hash function is only possible if the exact set of data are
known in advance, hash functions do differ enormously in their
quality, and I've seen more than a few cases where even some
very competent programmers have come up with relatively poor
hash functions; the most obvious is in Java's hash map, where
Java was forced to change the hash function as a result. I
suspect that part of the reason why the original STL had a
balanced tree, but not a hash table, is that one can sort of
expect an average programmer to get the ordering relationship
right, but you can't expect him to define a good hash function.
(There's also the point that if he gets the ordering function
wrong, there's a good chance of the program not working at all,
so he'll notice. Where as in the case of a hash function, the
program will only be a little bit slower than it should be.)

--
James Kanze (GABI Software) email:ja*********@gmail.com
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

Jun 27 '08 #15

Andrew Koenig

"James Kanze" <ja*********@gmail.comwrote in message
news:17**********************************@79g2000h sk.googlegroups.com...
On Jun 2, 3:28 pm, ytrem...@nyx.nyx.net (Yannick Tremblay) wrote:

I suspect that part of the reason why the original STL had a
balanced tree, but not a hash table, is that one can sort of
expect an average programmer to get the ordering relationship
right, but you can't expect him to define a good hash function.
(There's also the point that if he gets the ordering function
wrong, there's a good chance of the program not working at all,
so he'll notice. Where as in the case of a hash function, the
program will only be a little bit slower than it should be.)

I believe that I implemented the first associative-array class in C++ (20
years ago -- how time flies! -- see
http://www.softwarepreservation.org/...nig-assoc.pdf),
and the architecture of the STL map class was partly based on my design; so
I'm pretty confident that I understand why it was designed as it was :-)

James Kanze's description is correct, as far as it goes; but it misses one
point that I considered socially (or, if you prefer, politically) important
at the time: If a user supplies a naive hash function that results in
terrible performance, the user is likely to blame the library (or the
language!) rather than the hash function. Therefore, when I designed the
associative-array class, I felt that it was more important to obtain
acceptable worst-case performance easily than it was to obtain excellent
average-case performance with substantial effort.

Jun 27 '08 #16

Similar topics

Hashing function

by: Pat | last post by:

I want to look for some one-to-one hashing function. In C++, any one-to-one hashing function?

C / C++

Hashing

by: snowteo | last post by:

Hi,I have to do this exercises can you help me: 1)Write a program to implement exetendible hashing.If the table is small enough to fin in main memory,how does its performance compare with open and...

C / C++

Securing hashing algorithm

by: Wm. Scott Miller | last post by:

Hello all! We are building applications here and have hashing algorithms to secure secrets (e.g passwords) by producing one way hashes. Now, I've read alot and I've followed most of the advice...

C# / C Sharp

hashing passwords

by: Dino M. Buljubasic | last post by:

Hi, I am using MD5 to hash my passwords and add them to database as hashed. I have noticed though that some passwords don't get recognized and I suppose that it happen because hashing might...

Visual Basic .NET

Hashing and comparing of arrays

by: Ole Nielsby | last post by:

How does the GetHashCode() of an array object behave? Does it combine the GetHashCode() of its elements, or does it create a sync block for the object? I want to use readonly arrays as...

C# / C Sharp

Simple hashing

by: Maya | last post by:

Hello all, I'm using MD5 hashing in my application to give unique values to huge list of items my application receives, originally every item's name was difficult to use as an id for this item...

C# / C Sharp

Hashing and Encryption

by: Jayender | last post by:

Hi, What is the difference between Hashing and Encryption ?

C# / C Sharp

What hashing algorithm outputs numbers only?

by: wkatz | last post by:

Hi, Gurus. What hashing algorithm outputs hash value as numbers only? For example, if you pass a “John Q. Public” it will output 23324. If there is no such hashing, how hard is it to hire somebody to...

C / C++

Dynamic Hashing

by: Tinku | last post by:

Hi friends I know Static Hashing and i know about Dynamic Hashing, still i have problem to make program with Dynamic Hashing I am new in "C" world, please help me, my problem is: i have to...

C / C++

Cloud Servers without Credit Card and Email Registration: A Simpler Way to Get on the Cloud

by: CloudSolutions | last post by:

Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...

General

Wordpress or something else?

by: Faith0G | last post by:

I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...

Content Management Systems

One-click Importing Excel Data into a*Database

by: ryjfgjl | last post by:

In our work, we often need to import Excel data into databases (such as MySQL, SQL Server, Oracle) for data analysis and processing. Usually, we use database tools like Navicat or the Excel import...

Microsoft Excel

Easy Steps to Fix "Canon Printer Won't Connect to WiFi Network"

by: taylorcarr | last post by:

A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...

General

How to turn on java script in a villaon keypad mobile phone

by: Charles Arthur | last post by:

How do i turn on java script on a villaon, callus and itel keypad mobile phone

Java

Batch import of multiple excel files into the database

by: ryjfgjl | last post by:

If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...

Data Management

Merging data from multiple Excel files

by: ryjfgjl | last post by:

In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...

Data Management

Navigating the Data Structures and Algorithms (DSA)

by: BarryA | last post by:

What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...

Algorithms / Advanced Math

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware