std::set versus tr1::unordered_set

Joe Gottman

What are the advantages and disadvantages of the tree-based std::set versus
the hash-based tr1::unordered_set?

set advantages:
1) iterators remain valid after insert and erase (except for iterators
to the erased element).
2) Sets can be compared using operator== or operator<. The STL set
functions, (set_union, set_intersection, etc), easily work on sets.
3) Can do searches based on ordering (for instance, given a set of
strings, it is easy to find all the elements that start with 'e').

set disadvantages:
1) Searches, insertions, and deletions are O(log n).
2) Depends on operator< or another ordering predicate.
unordered_set advantages:
1) Searches, insertions, and deletions are amortized O(1).
2) Iterators remain valid after erase (except for iterators to the erased
element).
unordered_set disadvantages:
1) Depends on a GOOD hash function, as well as operator== or another
equivalence predicate.
2) Iterators are invalidated by insertions (this can be avoided by
calling rehash() before loading the table.)
3) It is more difficult to compare unordered_sets or to do union,
intersection, etc.

One thing I'm not sure about is the space issue. Sets require three
pointers and at least a bool for balance information for every element.
Unordered_sets require at least O(bucket_count) space for the buckets, but I
don't know what the space overhead is for the buckets or how much extra
space is required for each element of a bucket.

Joe Gottman

Jul 23 '05 #1

Subscribe Post Reply

5041

Howard Hinnant

In article <zs*********************@twister.southeast.rr.com> ,
"Joe Gottman" <jg******@carolina.rr.com> wrote:

What are the advantages and disadvantages of the tree-based std::set versus
the hash-based tr1::unordered_set? 2) Sets can be compared using operator== or operator<. The STL set
functions, (set_union, set_intersection, etc), easily work on sets. -- 3) It is more difficult to compare unordered_sets or to do union,
intersection, etc.
I unsuccessfully tried to get an operator== into tr1::unordered_*.
Assuming a good hash function such an operation can have the intended
semantics and O(N) complexity.

// simplified pseudo code!
bool operator==(const C& x, const C& y)
{
if (x.size() != y.size()) // assumes O(1) size()
return false;
for each element in x, look it up in y // if look up is O(1)
if it doesn't exist in y return false // then loop is O(N)
return true;
}

The final decision was to let implementors provide it as an extension,
but not to require it.
One thing I'm not sure about is the space issue. Sets require three
pointers and at least a bool for balance information for every element.
Alignment requirements often bump that up to 4 words. The Metrowerks
implementation stuffs the bool into a low order bit of one of the
pointers for platforms where alignment > 1. That keeps the overhead
down to 3 words/element.
Unordered_sets require at least O(bucket_count) space for the buckets, but I
don't know what the space overhead is for the buckets or how much extra
space is required for each element of a bucket.

This is going to vary with implementation. The Metrowerks hash ... um
unordered containers are based on the classic structure. Assuming a
good hash and loading of 1, you've got a one bucket per element.
There's a word for the bucket and a an additional word for the element
in a bucket in case of a collision (2 words overhead / element). This
design restricts the interface to forward iterators. If you go with
bidirectional iterators, you up the space requirements to 3 words
overhead / element.

As load factor rises above 1, elements start to share the 1 word/bucket
overhead and thus the overhead/element drops. However this is rarely a
good deal for the client (sacrificing performance). And as the load
factor drops below 1, you start to collect empty buckets which still
have an overhead and thus your per element overhead rises (much like a
vector with a small size and large capacity). Again, this is usually
not a good deal for the client.

-Howard

Jul 23 '05 #2

Similar topics

std::string vs. Unicode UTF-8

by: Wolfgang Draxinger | last post by:

I understand that it is perfectly possible to store UTF-8 strings in a std::string, however doing so can cause some implicaions. E.g. you can't count the amount of characters by length() | size()....

C / C++

Is there a hash algorithm with direct access to hash elements andreference count?

by: Bo Peng | last post by:

Dear list, I am looking for a way to store a large amount of unique sequences that will be accessed by objects. The most important operations are: 1. Direct access to the sequences (from...

C / C++

STL vector memory footprint

by: Varun Kacholia | last post by:

Apologies if this has been answered somewhere, but Google did not produce any concrete results. I would like to find out the memory footprint of a vector<T>. I tried to dig in the STL code and...

C / C++

Occurence problem: different ideas

by: utab | last post by:

Dear all, I tried to solve the occurence problem: to find the distinct occurences of a word in an input. I know that I could use map and other STD lib functions. I tried to do it the hard way. I...

C / C++

What is the hash function that is used in hash_map / hash_set

by: yuyang08 | last post by:

Hello, everyone, I am wondering what is the hash function that is used in hash_map/hash_set. Can I replace it with my own hash function? Any comments on this? Thanks! -Andy

C / C++

Requirements of TR1 associative containers

by: yonil | last post by:

I hope this is the correct group for this... I'm currently implementing the TR1 associative containers according to specification found in...

C / C++

Tutorial and example to SGI STL hash_set

by: Pierre Couderc | last post by:

Generally, is there somewhere a good tutorial and examplefor the use of SGI STL hash_set? I am lost in SGI documentation. More specifically, i am trying to use hat I need that a hash_set : ...

C / C++

Modifying elements of hash_set.

by: Amit Bhatia | last post by:

Hi, I was wondering if I have a hash_set, can I modify its elements using an iterator: given the fact that the changes I make will not change the position or the key of the object that the...

C / C++

Eror passing const references

by: wellingj | last post by:

A little back ground on what I'm trying to do: I'm making a generic weighted graph class (vertexes and edges althought I don't call them that) to implement some pathfinding algorithms like A* and D*....

C / C++

Merging data from multiple Excel files

by: ryjfgjl | last post by:

In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...

Data Management

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

General

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...

General

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...

Windows Server

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

Online Marketing

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...

Windows Server

Discussion: How does Zigbee compare with other wireless protocols in smart home applications?

by: tracyyun | last post by:

Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

General