Dual search data structure

Divick

Hi all,
does any one know of any data structure in which I can search
the key and value equally fast? Though I could use hashes, but I can
only search in constant time on key but not on value. If I want to
search the value then I will have to do a linear search.
Another related questions I have are
1. Does any one know a hash function that does not lead to collisions
and what is it time complexity?
2. Why is it that to grow the size of an hash array do you need to
create a bigger array and then rehash the keys in the old array? In my
views there could be better strategies to grow the array, or may be not
just grow the array instead use associated pointers to store another
hash-array which can also contain keys mapped with totally different
hash function. This way the size of hash can be grown without much
memory overhead and also re-insertion is not needed. I am not very
sure that whether this is a standard practice or not. If somebody can
throw me some links or pointers to similar literature then it would
help me a lot.

Any help is appreciated,
Thanks,
Divick

Nov 22 '05 #1

Subscribe Reply

3419

rwf_20

Divick wrote:

1. Does any one know a hash function that does not lead to collisions
and what is it time complexity?

There is no such thing. Good hash functions are those for which the
fastest way to find a collision is to use brute force. For practical
purposes, any of the famous ones are fine.

http://en.wikipedia.org/wiki/Cryptog..._hash_function

Ryan

Nov 22 '05 #2

KeithSpook

Divick wrote:

Hi all,
does any one know of any data structure in which I can search
the key and value equally fast? Though I could use hashes, but I can
only search in constant time on key but not on value. If I want to
search the value then I will have to do a linear search.

I had a similar problem not too long ago... my solution was to wrap a
std::map<Key, Obj*> and a std::multiset<Obj*, ptr_less<Obj> > in my own
class. The deal was that I could get the Key from the object, so the
multiset worked as an index onto the map.

hth
~KS

Nov 22 '05 #3

Victor Bazarov

Divick wrote:

Hi all,
does any one know of any data structure in which I can search
the key and value equally fast?
If you're talking standard C++, then 'std::set' is the only such structure
because the value _is_ the key in it. If you're not talking standard C++,
then you're in a wrong newsgroup. General programming questions should be
probably addressed to 'comp.programming'.
[...]

Nov 22 '05 #4

Axter

Divick wrote:

Hi all,
does any one know of any data structure in which I can search
the key and value equally fast?

You can try using the bimap class in the following link:

http://www.codeproject.com/vcpp/stl/bimap.asp

Nov 22 '05 #5

Greg

Divick wrote:

Hi all,
does any one know of any data structure in which I can search
the key and value equally fast? Though I could use hashes, but I can
only search in constant time on key but not on value. If I want to
search the value then I will have to do a linear search.
It's possible to have one container be the reverse index of the
contents of another container. The simplest form is to store the items
in two different containers, each keyed on a different field. Of course
the approach is not very memory-efficient. So to save memory, the items
could be placed in a std::set and then store its iterators in one or
more std::sets, each one keyed to a particular field in the stored
item. In essense, the containers are serving as "indexes" into a group
of records.

Besides the additional overhead incurred with multiple indexes, there
are additional housekeeping tasks required as well. Deleting items has
to to delete the iterator from all of the indexes. Similarly, adding a
new item has to add it to each index container as well. But as long as
there is one class whose API manages all interaction with the records,
these requirements are not unmangeable.
Another related questions I have are
1. Does any one know a hash function that does not lead to collisions
and what is it time complexity?
The identity hash function would have no collisions since the hash is
the value of the item itself. Of course hash values tend to be smaller
than the original value, and any time you have to squeeze the same
amount of data into smaller space, there will be collisions.
2. Why is it that to grow the size of an hash array do you need to
create a bigger array and then rehash the keys in the old array? In my
views there could be better strategies to grow the array, or may be not
just grow the array instead use associated pointers to store another
hash-array which can also contain keys mapped with totally different
hash function. This way the size of hash can be grown without much
memory overhead and also re-insertion is not needed. I am not very
sure that whether this is a standard practice or not. If somebody can
throw me some links or pointers to similar literature then it would
help me a lot.

Generally once a hash container starts filling up its buckets it will
double its allocated size and reinsert the items into the new, roomier
hash. There's not much to be gained by more elaborate methods to avoid
rehashing, since the easiest way to avoid it is to specify a reasonable
estimate of the expected size of the hashed container when it is first
created.

Greg

Nov 22 '05 #6

Kaz Kylheku

Divick wrote:

Hi all,
does any one know of any data structure in which I can search
the key and value equally fast? Though I could use hashes, but I can
only search in constant time on key but not on value. If I want to
search the value then I will have to do a linear search.
This is done by an augmented data structure: a structure which contains
two or more structures, maintaining them in parallel.

Databases are one obvious example. You define a table, and a primary
key to find records. Additionally, you can define secondary keys. The
RDBMS software builds separate indexes for all of the keys, and
maintains those indexes when you insert and remove records.

A hash table is an index for finding records. For each key, you have to
build a separate index.
Another related questions I have are
1. Does any one know a hash function that does not lead to collisions
and what is it time complexity?
Such functions are called perfect hashing functions. To create a
perfect hashing function, you have to know all of the keys in advance.
Then, you have to search for a function which maps all of the keys to
some integer values without any collisions.

There is a GNU program called gperf which generates perfect hashing
functions for a set of strings. Its output is the C code for that
hashing function.

How might this be useful? It could be useful in creating a fast look-up
table for the reserved keywords of a programming language, to use
within a parser.
2. Why is it that to grow the size of an hash array do you need to
create a bigger array and then rehash the keys in the old array?
This isn't necessary. Only some algorithms require it. In my old hash
table implementation (in the Kazlib library) this rehashing does not
happen. The hash table entries remember the hash values of the keys:
the raw ``unsigned long'' output values from the hashing function, not
reduced into the table size. The table sizes are powers of two, so
reducing a hashing value to the table size is just a masking operation:
taking the N lowest bits. When the table doubles in size, all it means
that one more bit from each hash value becomes significant. All entries
which have a 1 in that bit position have to move to a chain in the
upper half of the table. The ones which have a 0 in that newly revealed
bit position stay in the same chain. These sister chains are all a
fixed displacement apart. For instance if the table goes from 16 to 32
chains, then the odd elements move from chain 0 to chain 8, from 1 to
9, 2 to 10 ... through 15 to 13. It's a simple loop that just checks
one bit and flips some link pointers.

In my views there could be better strategies to grow the array, or may be not
just grow the array instead use associated pointers to store another
hash-array which can also contain keys mapped with totally different
hash function. This way the size of hash can be grown without much
memory overhead and also re-insertion is not needed. I am not very
sure that whether this is a standard practice or not. If somebody can
The implied algorithm is not clear from your vague description.
throw me some links or pointers to similar literature then it would
help me a lot.

There are lazy ways to grow hash tables. Instead of actually moving the
elements, you can leave them where they are, but modify the lookup
algorithm to find them anyway. When searching for an item, if you don't
find it, then try the sister chain in the bottom half of the table.
I.e. pretend that the table is still the small one and try that way.
Keep subdividing recursively. If you find the item, then move it to its
proper location in the full table. Newly inserted items go to their
proper locations relative to the new size, of course.

Nov 22 '05 #7

!!!!!!! boost multi-index container !!!!!!

Nov 22 '05 #8

Divick

Is it threadsafe? As far as I know stl is not thread safe. I forgot to
mention that I need a thread safe implementation.

Divick

Nov 22 '05 #9

Divick

>>If you're talking standard C++, then 'std::set' is the only such structure

because the value _is_ the key in it. If you're not talking standard C++,
then you're in a wrong newsgroup. General programming questions should be
probably addressed to 'comp.programming'.

Definitely it is meant for C++ newsgrp. In other languages, like perl
and lisp I suppose there are data structures which could solve the
purpose but I want the data structure in C++ only.

Divick

Nov 22 '05 #10

Divick

>>!!!!!!! boost multi-index container !!!!!!
Is it thread safe?

Divick

Nov 22 '05 #11

Divick

Greg,
thanks for your solution but that was the first solution that I
thought of but i feel it is ineligant because of too much of book
keeping and the implementation would be prone to bugs. Above all this
more than doubles the memory requirements.

Divick

Nov 22 '05 #12

Divick

Hi Kaz,

Databases are one obvious example. You define a table, and a primary
key to find records .... Well I don't want to use data bases in my application as it would incur
a lot of over head.
This isn't necessary. Only some algorithms require it. In my old hash
table implementation (in the Kazlib library) this rehashing does not .... Sorry I don't quite get your description. I would appreciate, if you
could explain it little more elaborately.
The implied algorithm is not clear from your vague description.

Well it indeed is really vague. It is just an idea and I have not
implemented it actually. What I am talking of is a sort of multi-level
hash data structure where the look up/ insertion/deletion would not be
average case O(1) but essentially n*T(operation) where n is nth
increment in size and T(operation) is the time taken to do some
operation on hash. Another way to describe this kind of data structure
is multi-level hash, where at the beginning of hash array or end, there
would be a pointer to another hash if the current hash is almost full.
This linked hash would be searched if the entry is not in the first
hash.

If still the idea is very vague, I would write a corresponding pseudo
code and then post it. I would ceratainly like to have more feedbacks
about this still.

Thanks,
Divick.

Nov 22 '05 #13

Similar topics

2967

Dual Xeon or Dual P4 for SQL Server?

by: John Dalberg | last post by:

I am planning to build a server to be used as a SQL Server and web server. Right now I can only use a single box for both. I have read some threads were dual processors are having problems with...

Microsoft SQL Server

2528

poor cpu utilization on dual cpu box

by: Simon Sadedin | last post by:

Folks, I’m hoping someone can give me some pointers to resolving an issue with postgres and it’s ability to utilize multiple CPUs effectively. The issue is that no matter how much query...

PostgreSQL Database

48982

Fastest way to search text file for string

by: Julie | last post by:

What is the *fastest* way in .NET to search large on-disk text files (100+ MB) for a given string. The files are unindexed and unsorted, and for the purposes of my immediate requirements, can't...

C# / C Sharp

2060

Dual-Graph Library

by: Adam Hartshorne | last post by:

Hi All, I need to create a series of "dual graphs", I was wondering if anybody knew of a library which contains an efficient data structure to hold such a graph and method to traverse/search. ...

C / C++

1766

At last something interesing to discuss: What is the fastest way to search a client-side database?

by: Harry Haller | last post by:

What is the fastest way to search a client-side database? I have about 60-65 kb of data downloaded to the client which is present in 3 dynamically created list boxes. The boxes are filled from 3...

Javascript

2409

What is the fastest way to search a (javascript) database?

by: Harry Haller | last post by:

What is the fastest way to search a client-side database? I have about 60-65 kb of data downloaded to the client which is present in 3 dynamically created list boxes. The boxes are filled from 3...

Javascript

4359

Exploiting Dual Core's with Py_NewInterpreter's separated GIL ?

by: robert | last post by:

I'd like to use multiple CPU cores for selected time consuming Python computations (incl. numpy/scipy) in a frictionless manner. Interprocess communication is tedious and out of question, so I...

Python

1888

Development for dual core machine

by: Andy | last post by:

Hi guys, I'm sorry, I'm not sure this is the correct group to be asking this kind of question... I'm going to develop a software package that includes a web server (and PHP code) , a...

Python

6760

window.screen.width on dual display systems

by: Safalra (Stephen Morley) | last post by:

(Note: I'm not trying to do anything stupid based on the user's screen size - I'm just curious.) At work today I was looking at the visitor statistics for a client website, and noticed that...

Javascript

7093

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...

Windows Server

7291

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

C / C++

7012

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...

Windows Server

7468

Discussion: How does Zigbee compare with other wireless protocols in smart home applications?

by: tracyyun | last post by:

Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

General

5598

AI Job Threat for Devs

by: agi2029 | last post by:

Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

Career Advice

5023

Access Europe - Using VBA to create a class based on a table - Wed 1 May

by: isladogs | last post by:

The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

Microsoft Access / VBA

4690

Couldn’t get equations in html when convert word .docx file to html file in C#.

by: conductexam | last post by:

I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and...

C# / C Sharp

3180

Trying to create a lan-to-lan vpn between two differents networks

by: TSSRALBI | last post by:

Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The...

Networking - Hardware / Configuration

402

Comprehensive Guide to Website Development in Toronto: Expert Insights from BSMN Consultancy

by: bsmnconsultancy | last post by:

In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence...

General