Hi all,
does any one know of any data structure in which I can search
the key and value equally fast? Though I could use hashes, but I can
only search in constant time on key but not on value. If I want to
search the value then I will have to do a linear search.
Another related questions I have are
1. Does any one know a hash function that does not lead to collisions
and what is it time complexity?
2. Why is it that to grow the size of an hash array do you need to
create a bigger array and then rehash the keys in the old array? In my
views there could be better strategies to grow the array, or may be not
just grow the array instead use associated pointers to store another
hash-array which can also contain keys mapped with totally different
hash function. This way the size of hash can be grown without much
memory overhead and also re-insertion is not needed. I am not very
sure that whether this is a standard practice or not. If somebody can
throw me some links or pointers to similar literature then it would
help me a lot.
Any help is appreciated,
Thanks,
Divick 12 3419
Divick wrote: 1. Does any one know a hash function that does not lead to collisions and what is it time complexity?
There is no such thing. Good hash functions are those for which the
fastest way to find a collision is to use brute force. For practical
purposes, any of the famous ones are fine. http://en.wikipedia.org/wiki/Cryptog..._hash_function
Ryan
Divick wrote: Hi all, does any one know of any data structure in which I can search the key and value equally fast? Though I could use hashes, but I can only search in constant time on key but not on value. If I want to search the value then I will have to do a linear search.
I had a similar problem not too long ago... my solution was to wrap a
std::map<Key, Obj*> and a std::multiset<Obj*, ptr_less<Obj> > in my own
class. The deal was that I could get the Key from the object, so the
multiset worked as an index onto the map.
hth
~KS
Divick wrote: Hi all, does any one know of any data structure in which I can search the key and value equally fast?
If you're talking standard C++, then 'std::set' is the only such structure
because the value _is_ the key in it. If you're not talking standard C++,
then you're in a wrong newsgroup. General programming questions should be
probably addressed to 'comp.programming'.
[...]
V
Divick wrote: Hi all, does any one know of any data structure in which I can search the key and value equally fast? Though I could use hashes, but I can only search in constant time on key but not on value. If I want to search the value then I will have to do a linear search.
It's possible to have one container be the reverse index of the
contents of another container. The simplest form is to store the items
in two different containers, each keyed on a different field. Of course
the approach is not very memory-efficient. So to save memory, the items
could be placed in a std::set and then store its iterators in one or
more std::sets, each one keyed to a particular field in the stored
item. In essense, the containers are serving as "indexes" into a group
of records.
Besides the additional overhead incurred with multiple indexes, there
are additional housekeeping tasks required as well. Deleting items has
to to delete the iterator from all of the indexes. Similarly, adding a
new item has to add it to each index container as well. But as long as
there is one class whose API manages all interaction with the records,
these requirements are not unmangeable.
Another related questions I have are 1. Does any one know a hash function that does not lead to collisions and what is it time complexity?
The identity hash function would have no collisions since the hash is
the value of the item itself. Of course hash values tend to be smaller
than the original value, and any time you have to squeeze the same
amount of data into smaller space, there will be collisions.
2. Why is it that to grow the size of an hash array do you need to create a bigger array and then rehash the keys in the old array? In my views there could be better strategies to grow the array, or may be not just grow the array instead use associated pointers to store another hash-array which can also contain keys mapped with totally different hash function. This way the size of hash can be grown without much memory overhead and also re-insertion is not needed. I am not very sure that whether this is a standard practice or not. If somebody can throw me some links or pointers to similar literature then it would help me a lot.
Generally once a hash container starts filling up its buckets it will
double its allocated size and reinsert the items into the new, roomier
hash. There's not much to be gained by more elaborate methods to avoid
rehashing, since the easiest way to avoid it is to specify a reasonable
estimate of the expected size of the hashed container when it is first
created.
Greg
Divick wrote: Hi all, does any one know of any data structure in which I can search the key and value equally fast? Though I could use hashes, but I can only search in constant time on key but not on value. If I want to search the value then I will have to do a linear search.
This is done by an augmented data structure: a structure which contains
two or more structures, maintaining them in parallel.
Databases are one obvious example. You define a table, and a primary
key to find records. Additionally, you can define secondary keys. The
RDBMS software builds separate indexes for all of the keys, and
maintains those indexes when you insert and remove records.
A hash table is an index for finding records. For each key, you have to
build a separate index.
Another related questions I have are 1. Does any one know a hash function that does not lead to collisions and what is it time complexity?
Such functions are called perfect hashing functions. To create a
perfect hashing function, you have to know all of the keys in advance.
Then, you have to search for a function which maps all of the keys to
some integer values without any collisions.
There is a GNU program called gperf which generates perfect hashing
functions for a set of strings. Its output is the C code for that
hashing function.
How might this be useful? It could be useful in creating a fast look-up
table for the reserved keywords of a programming language, to use
within a parser.
2. Why is it that to grow the size of an hash array do you need to create a bigger array and then rehash the keys in the old array?
This isn't necessary. Only some algorithms require it. In my old hash
table implementation (in the Kazlib library) this rehashing does not
happen. The hash table entries remember the hash values of the keys:
the raw ``unsigned long'' output values from the hashing function, not
reduced into the table size. The table sizes are powers of two, so
reducing a hashing value to the table size is just a masking operation:
taking the N lowest bits. When the table doubles in size, all it means
that one more bit from each hash value becomes significant. All entries
which have a 1 in that bit position have to move to a chain in the
upper half of the table. The ones which have a 0 in that newly revealed
bit position stay in the same chain. These sister chains are all a
fixed displacement apart. For instance if the table goes from 16 to 32
chains, then the odd elements move from chain 0 to chain 8, from 1 to
9, 2 to 10 ... through 15 to 13. It's a simple loop that just checks
one bit and flips some link pointers.
In my views there could be better strategies to grow the array, or may be not just grow the array instead use associated pointers to store another hash-array which can also contain keys mapped with totally different hash function. This way the size of hash can be grown without much memory overhead and also re-insertion is not needed. I am not very sure that whether this is a standard practice or not. If somebody can
The implied algorithm is not clear from your vague description.
throw me some links or pointers to similar literature then it would help me a lot.
There are lazy ways to grow hash tables. Instead of actually moving the
elements, you can leave them where they are, but modify the lookup
algorithm to find them anyway. When searching for an item, if you don't
find it, then try the sister chain in the bottom half of the table.
I.e. pretend that the table is still the small one and try that way.
Keep subdividing recursively. If you find the item, then move it to its
proper location in the full table. Newly inserted items go to their
proper locations relative to the new size, of course.
!!!!!!! boost multi-index container !!!!!!
Is it threadsafe? As far as I know stl is not thread safe. I forgot to
mention that I need a thread safe implementation.
Divick
>>If you're talking standard C++, then 'std::set' is the only such structure because the value _is_ the key in it. If you're not talking standard C++, then you're in a wrong newsgroup. General programming questions should be probably addressed to 'comp.programming'.
Definitely it is meant for C++ newsgrp. In other languages, like perl
and lisp I suppose there are data structures which could solve the
purpose but I want the data structure in C++ only.
Divick
>>!!!!!!! boost multi-index container !!!!!!
Is it thread safe?
Divick
Greg,
thanks for your solution but that was the first solution that I
thought of but i feel it is ineligant because of too much of book
keeping and the implementation would be prone to bugs. Above all this
more than doubles the memory requirements.
Divick
Hi Kaz, Databases are one obvious example. You define a table, and a primary key to find records ....
Well I don't want to use data bases in my application as it would incur
a lot of over head.
This isn't necessary. Only some algorithms require it. In my old hash table implementation (in the Kazlib library) this rehashing does not ....
Sorry I don't quite get your description. I would appreciate, if you
could explain it little more elaborately.
The implied algorithm is not clear from your vague description.
Well it indeed is really vague. It is just an idea and I have not
implemented it actually. What I am talking of is a sort of multi-level
hash data structure where the look up/ insertion/deletion would not be
average case O(1) but essentially n*T(operation) where n is nth
increment in size and T(operation) is the time taken to do some
operation on hash. Another way to describe this kind of data structure
is multi-level hash, where at the beginning of hash array or end, there
would be a pointer to another hash if the current hash is almost full.
This linked hash would be searched if the entry is not in the first
hash.
If still the idea is very vague, I would write a corresponding pseudo
code and then post it. I would ceratainly like to have more feedbacks
about this still.
Thanks,
Divick. This thread has been closed and replies have been disabled. Please start a new discussion. Similar topics |
by: John Dalberg |
last post by:
I am planning to build a server to be used as a SQL Server and web server.
Right now I can only use a single box for both.
I have read some threads were dual processors are having problems with...
|
by: Simon Sadedin |
last post by:
Folks,
I’m hoping someone can give me some pointers to resolving an issue with postgres and it’s ability to utilize multiple CPUs effectively.
The issue is that no matter how much query...
|
by: Julie |
last post by:
What is the *fastest* way in .NET to search large on-disk text files (100+ MB)
for a given string.
The files are unindexed and unsorted, and for the purposes of my immediate
requirements, can't...
|
by: Adam Hartshorne |
last post by:
Hi All,
I need to create a series of "dual graphs", I was wondering if anybody
knew of a library which contains an efficient data structure to hold
such a graph and method to traverse/search.
...
|
by: Harry Haller |
last post by:
What is the fastest way to search a client-side database?
I have about 60-65 kb of data downloaded to the client which is
present in 3 dynamically created list boxes. The boxes are filled from
3...
| |
by: Harry Haller |
last post by:
What is the fastest way to search a client-side database?
I have about 60-65 kb of data downloaded to the client which is
present in 3 dynamically created list boxes. The boxes are filled from
3...
|
by: robert |
last post by:
I'd like to use multiple CPU cores for selected time consuming Python computations (incl. numpy/scipy) in a frictionless manner.
Interprocess communication is tedious and out of question, so I...
|
by: Andy |
last post by:
Hi guys,
I'm sorry, I'm not sure this is the correct group to be asking this
kind of question...
I'm going to develop a software package that includes a web server
(and PHP code) , a...
|
by: Safalra (Stephen Morley) |
last post by:
(Note: I'm not trying to do anything stupid based on the user's screen size
- I'm just curious.)
At work today I was looking at the visitor statistics for a client website,
and noticed that...
|
by: Hystou |
last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
|
by: Oralloy |
last post by:
Hello folks,
I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>".
The problem is that using the GNU compilers,...
| |
by: Hystou |
last post by:
Overview:
Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
|
by: tracyyun |
last post by:
Dear forum friends,
With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
|
by: agi2029 |
last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
|
by: isladogs |
last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM).
In this session, we are pleased to welcome a new...
|
by: conductexam |
last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and...
|
by: TSSRALBI |
last post by:
Hello
I'm a network technician in training and I need your help.
I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs.
The...
| |
by: bsmnconsultancy |
last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence...
| |