473,396 Members | 1,998 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,396 software developers and data experts.

std::set versus tr1::unordered_set

What are the advantages and disadvantages of the tree-based std::set versus
the hash-based tr1::unordered_set?

set advantages:
1) iterators remain valid after insert and erase (except for iterators
to the erased element).
2) Sets can be compared using operator== or operator<. The STL set
functions, (set_union, set_intersection, etc), easily work on sets.
3) Can do searches based on ordering (for instance, given a set of
strings, it is easy to find all the elements that start with 'e').

set disadvantages:
1) Searches, insertions, and deletions are O(log n).
2) Depends on operator< or another ordering predicate.
unordered_set advantages:
1) Searches, insertions, and deletions are amortized O(1).
2) Iterators remain valid after erase (except for iterators to the erased
element).
unordered_set disadvantages:
1) Depends on a GOOD hash function, as well as operator== or another
equivalence predicate.
2) Iterators are invalidated by insertions (this can be avoided by
calling rehash() before loading the table.)
3) It is more difficult to compare unordered_sets or to do union,
intersection, etc.

One thing I'm not sure about is the space issue. Sets require three
pointers and at least a bool for balance information for every element.
Unordered_sets require at least O(bucket_count) space for the buckets, but I
don't know what the space overhead is for the buckets or how much extra
space is required for each element of a bucket.

Joe Gottman
Jul 23 '05 #1
1 5041
In article <zs*********************@twister.southeast.rr.com> ,
"Joe Gottman" <jg******@carolina.rr.com> wrote:
What are the advantages and disadvantages of the tree-based std::set versus
the hash-based tr1::unordered_set? 2) Sets can be compared using operator== or operator<. The STL set
functions, (set_union, set_intersection, etc), easily work on sets. -- 3) It is more difficult to compare unordered_sets or to do union,
intersection, etc.
I unsuccessfully tried to get an operator== into tr1::unordered_*.
Assuming a good hash function such an operation can have the intended
semantics and O(N) complexity.

// simplified pseudo code!
bool operator==(const C& x, const C& y)
{
if (x.size() != y.size()) // assumes O(1) size()
return false;
for each element in x, look it up in y // if look up is O(1)
if it doesn't exist in y return false // then loop is O(N)
return true;
}

The final decision was to let implementors provide it as an extension,
but not to require it.
One thing I'm not sure about is the space issue. Sets require three
pointers and at least a bool for balance information for every element.
Alignment requirements often bump that up to 4 words. The Metrowerks
implementation stuffs the bool into a low order bit of one of the
pointers for platforms where alignment > 1. That keeps the overhead
down to 3 words/element.
Unordered_sets require at least O(bucket_count) space for the buckets, but I
don't know what the space overhead is for the buckets or how much extra
space is required for each element of a bucket.


This is going to vary with implementation. The Metrowerks hash ... um
unordered containers are based on the classic structure. Assuming a
good hash and loading of 1, you've got a one bucket per element.
There's a word for the bucket and a an additional word for the element
in a bucket in case of a collision (2 words overhead / element). This
design restricts the interface to forward iterators. If you go with
bidirectional iterators, you up the space requirements to 3 words
overhead / element.

As load factor rises above 1, elements start to share the 1 word/bucket
overhead and thus the overhead/element drops. However this is rarely a
good deal for the client (sacrificing performance). And as the load
factor drops below 1, you start to collect empty buckets which still
have an overhead and thus your per element overhead rises (much like a
vector with a small size and large capacity). Again, this is usually
not a good deal for the client.

-Howard
Jul 23 '05 #2

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

32
by: Wolfgang Draxinger | last post by:
I understand that it is perfectly possible to store UTF-8 strings in a std::string, however doing so can cause some implicaions. E.g. you can't count the amount of characters by length() | size()....
4
by: Bo Peng | last post by:
Dear list, I am looking for a way to store a large amount of unique sequences that will be accessed by objects. The most important operations are: 1. Direct access to the sequences (from...
12
by: Varun Kacholia | last post by:
Apologies if this has been answered somewhere, but Google did not produce any concrete results. I would like to find out the memory footprint of a vector<T>. I tried to dig in the STL code and...
3
by: utab | last post by:
Dear all, I tried to solve the occurence problem: to find the distinct occurences of a word in an input. I know that I could use map and other STD lib functions. I tried to do it the hard way. I...
4
by: yuyang08 | last post by:
Hello, everyone, I am wondering what is the hash function that is used in hash_map/hash_set. Can I replace it with my own hash function? Any comments on this? Thanks! -Andy
1
by: yonil | last post by:
I hope this is the correct group for this... I'm currently implementing the TR1 associative containers according to specification found in...
2
by: Pierre Couderc | last post by:
Generally, is there somewhere a good tutorial and examplefor the use of SGI STL hash_set? I am lost in SGI documentation. More specifically, i am trying to use hat I need that a hash_set : ...
5
by: Amit Bhatia | last post by:
Hi, I was wondering if I have a hash_set, can I modify its elements using an iterator: given the fact that the changes I make will not change the position or the key of the object that the...
0
by: wellingj | last post by:
A little back ground on what I'm trying to do: I'm making a generic weighted graph class (vertexes and edges althought I don't call them that) to implement some pathfinding algorithms like A* and D*....
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.