By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
424,665 Members | 2,212 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 424,665 IT Pros & Developers. It's quick & easy.

Dual search data structure

P: n/a
Hi all,
does any one know of any data structure in which I can search
the key and value equally fast? Though I could use hashes, but I can
only search in constant time on key but not on value. If I want to
search the value then I will have to do a linear search.
Another related questions I have are
1. Does any one know a hash function that does not lead to collisions
and what is it time complexity?
2. Why is it that to grow the size of an hash array do you need to
create a bigger array and then rehash the keys in the old array? In my
views there could be better strategies to grow the array, or may be not
just grow the array instead use associated pointers to store another
hash-array which can also contain keys mapped with totally different
hash function. This way the size of hash can be grown without much
memory overhead and also re-insertion is not needed. I am not very
sure that whether this is a standard practice or not. If somebody can
throw me some links or pointers to similar literature then it would
help me a lot.

Any help is appreciated,
Thanks,
Divick

Nov 22 '05 #1
Share this Question
Share on Google+
12 Replies


P: n/a
Divick wrote:
1. Does any one know a hash function that does not lead to collisions
and what is it time complexity?


There is no such thing. Good hash functions are those for which the
fastest way to find a collision is to use brute force. For practical
purposes, any of the famous ones are fine.

http://en.wikipedia.org/wiki/Cryptog..._hash_function

Ryan

Nov 22 '05 #2

P: n/a
Divick wrote:
Hi all,
does any one know of any data structure in which I can search
the key and value equally fast? Though I could use hashes, but I can
only search in constant time on key but not on value. If I want to
search the value then I will have to do a linear search.


I had a similar problem not too long ago... my solution was to wrap a
std::map<Key, Obj*> and a std::multiset<Obj*, ptr_less<Obj> > in my own
class. The deal was that I could get the Key from the object, so the
multiset worked as an index onto the map.

hth
~KS

Nov 22 '05 #3

P: n/a
Divick wrote:
Hi all,
does any one know of any data structure in which I can search
the key and value equally fast?
If you're talking standard C++, then 'std::set' is the only such structure
because the value _is_ the key in it. If you're not talking standard C++,
then you're in a wrong newsgroup. General programming questions should be
probably addressed to 'comp.programming'.
[...]


V
Nov 22 '05 #4

P: n/a
Divick wrote:
Hi all,
does any one know of any data structure in which I can search
the key and value equally fast?


You can try using the bimap class in the following link:

http://www.codeproject.com/vcpp/stl/bimap.asp

Nov 22 '05 #5

P: n/a
Divick wrote:
Hi all,
does any one know of any data structure in which I can search
the key and value equally fast? Though I could use hashes, but I can
only search in constant time on key but not on value. If I want to
search the value then I will have to do a linear search.
It's possible to have one container be the reverse index of the
contents of another container. The simplest form is to store the items
in two different containers, each keyed on a different field. Of course
the approach is not very memory-efficient. So to save memory, the items
could be placed in a std::set and then store its iterators in one or
more std::sets, each one keyed to a particular field in the stored
item. In essense, the containers are serving as "indexes" into a group
of records.

Besides the additional overhead incurred with multiple indexes, there
are additional housekeeping tasks required as well. Deleting items has
to to delete the iterator from all of the indexes. Similarly, adding a
new item has to add it to each index container as well. But as long as
there is one class whose API manages all interaction with the records,
these requirements are not unmangeable.
Another related questions I have are
1. Does any one know a hash function that does not lead to collisions
and what is it time complexity?
The identity hash function would have no collisions since the hash is
the value of the item itself. Of course hash values tend to be smaller
than the original value, and any time you have to squeeze the same
amount of data into smaller space, there will be collisions.
2. Why is it that to grow the size of an hash array do you need to
create a bigger array and then rehash the keys in the old array? In my
views there could be better strategies to grow the array, or may be not
just grow the array instead use associated pointers to store another
hash-array which can also contain keys mapped with totally different
hash function. This way the size of hash can be grown without much
memory overhead and also re-insertion is not needed. I am not very
sure that whether this is a standard practice or not. If somebody can
throw me some links or pointers to similar literature then it would
help me a lot.


Generally once a hash container starts filling up its buckets it will
double its allocated size and reinsert the items into the new, roomier
hash. There's not much to be gained by more elaborate methods to avoid
rehashing, since the easiest way to avoid it is to specify a reasonable
estimate of the expected size of the hashed container when it is first
created.

Greg

Nov 22 '05 #6

P: n/a
Divick wrote:
Hi all,
does any one know of any data structure in which I can search
the key and value equally fast? Though I could use hashes, but I can
only search in constant time on key but not on value. If I want to
search the value then I will have to do a linear search.
This is done by an augmented data structure: a structure which contains
two or more structures, maintaining them in parallel.

Databases are one obvious example. You define a table, and a primary
key to find records. Additionally, you can define secondary keys. The
RDBMS software builds separate indexes for all of the keys, and
maintains those indexes when you insert and remove records.

A hash table is an index for finding records. For each key, you have to
build a separate index.
Another related questions I have are
1. Does any one know a hash function that does not lead to collisions
and what is it time complexity?
Such functions are called perfect hashing functions. To create a
perfect hashing function, you have to know all of the keys in advance.
Then, you have to search for a function which maps all of the keys to
some integer values without any collisions.

There is a GNU program called gperf which generates perfect hashing
functions for a set of strings. Its output is the C code for that
hashing function.

How might this be useful? It could be useful in creating a fast look-up
table for the reserved keywords of a programming language, to use
within a parser.
2. Why is it that to grow the size of an hash array do you need to
create a bigger array and then rehash the keys in the old array?
This isn't necessary. Only some algorithms require it. In my old hash
table implementation (in the Kazlib library) this rehashing does not
happen. The hash table entries remember the hash values of the keys:
the raw ``unsigned long'' output values from the hashing function, not
reduced into the table size. The table sizes are powers of two, so
reducing a hashing value to the table size is just a masking operation:
taking the N lowest bits. When the table doubles in size, all it means
that one more bit from each hash value becomes significant. All entries
which have a 1 in that bit position have to move to a chain in the
upper half of the table. The ones which have a 0 in that newly revealed
bit position stay in the same chain. These sister chains are all a
fixed displacement apart. For instance if the table goes from 16 to 32
chains, then the odd elements move from chain 0 to chain 8, from 1 to
9, 2 to 10 ... through 15 to 13. It's a simple loop that just checks
one bit and flips some link pointers.

In my views there could be better strategies to grow the array, or may be not
just grow the array instead use associated pointers to store another
hash-array which can also contain keys mapped with totally different
hash function. This way the size of hash can be grown without much
memory overhead and also re-insertion is not needed. I am not very
sure that whether this is a standard practice or not. If somebody can
The implied algorithm is not clear from your vague description.
throw me some links or pointers to similar literature then it would
help me a lot.


There are lazy ways to grow hash tables. Instead of actually moving the
elements, you can leave them where they are, but modify the lookup
algorithm to find them anyway. When searching for an item, if you don't
find it, then try the sister chain in the bottom half of the table.
I.e. pretend that the table is still the small one and try that way.
Keep subdividing recursively. If you find the item, then move it to its
proper location in the full table. Newly inserted items go to their
proper locations relative to the new size, of course.

Nov 22 '05 #7

P: n/a
g
!!!!!!! boost multi-index container !!!!!!

Nov 22 '05 #8

P: n/a
Is it threadsafe? As far as I know stl is not thread safe. I forgot to
mention that I need a thread safe implementation.

Divick

Nov 22 '05 #9

P: n/a
>>If you're talking standard C++, then 'std::set' is the only such structure
because the value _is_ the key in it. If you're not talking standard C++,
then you're in a wrong newsgroup. General programming questions should be
probably addressed to 'comp.programming'.


Definitely it is meant for C++ newsgrp. In other languages, like perl
and lisp I suppose there are data structures which could solve the
purpose but I want the data structure in C++ only.

Divick

Nov 22 '05 #10

P: n/a
>>!!!!!!! boost multi-index container !!!!!!
Is it thread safe?

Divick

Nov 22 '05 #11

P: n/a
Greg,
thanks for your solution but that was the first solution that I
thought of but i feel it is ineligant because of too much of book
keeping and the implementation would be prone to bugs. Above all this
more than doubles the memory requirements.

Divick

Nov 22 '05 #12

P: n/a
Hi Kaz,
Databases are one obvious example. You define a table, and a primary
key to find records .... Well I don't want to use data bases in my application as it would incur
a lot of over head.
This isn't necessary. Only some algorithms require it. In my old hash
table implementation (in the Kazlib library) this rehashing does not .... Sorry I don't quite get your description. I would appreciate, if you
could explain it little more elaborately.
The implied algorithm is not clear from your vague description.

Well it indeed is really vague. It is just an idea and I have not
implemented it actually. What I am talking of is a sort of multi-level
hash data structure where the look up/ insertion/deletion would not be
average case O(1) but essentially n*T(operation) where n is nth
increment in size and T(operation) is the time taken to do some
operation on hash. Another way to describe this kind of data structure
is multi-level hash, where at the beginning of hash array or end, there
would be a pointer to another hash if the current hash is almost full.
This linked hash would be searched if the entry is not in the first
hash.

If still the idea is very vague, I would write a corresponding pseudo
code and then post it. I would ceratainly like to have more feedbacks
about this still.

Thanks,
Divick.

Nov 22 '05 #13

This discussion thread is closed

Replies have been disabled for this discussion.