Hash Map API Preference

Michael B Allen

I would like to know your preferences regarding a C API for a hash
map. Specifically, when inserting a data item into a hash map there are
two special cases that require consideration.

1) If two keys generate the same hash value should a comparison function
be used to compare keys or should data items be compared? If data items
are compared it is not necessary to store the keys with each entry
which has an impact on question 2.

2) If two keys generate the same hash value and the comparison function
finds the entries to be equal, which of the following resolutions is
preferred?

A) Return an error and otherwise do nothing. The user will be required
to first remove the old item.

B) Provide void **replaced parameters to accept the displaced data item
(and key if they are being stored) and then insert the new data item
in place of the displaced data item.

int hashmap_put(str uct hashmap *h, void *key, void *data, void **replaced);

OR

int hashmap_put(str uct hashmap *h,
void *key, void *data,
void **replaced_key, void **replaced_data );

C) Delete the data item (and key if they are being stored) using
function pointers provided when the hash map was created and then
insert the new data item in place of the deleted data item.

typedef int (*del_fn)(void *context, void *object);

int hashmap_create( struct hashmap *h,
unsigned int size,
hash_fn hash,
compare_fn cmp,
del_fn key_del,
del_fn data_del,
void *context);

This code will be part of libmba.

Mike

Nov 13 '05 #1

Subscribe Reply

5586

Eric Sosman

Michael B Allen wrote:

I would like to know your preferences regarding a C API for a hash
map. Specifically, when inserting a data item into a hash map there are
two special cases that require consideration.

1) If two keys generate the same hash value should a comparison function
be used to compare keys or should data items be compared? If data items
are compared it is not necessary to store the keys with each entry
which has an impact on question 2.
I'm not entirely sure I understand what you're asking -- the
distinction between comparing keys and comparing "data items" is
not clear to me. If "key" is understood in the usual sense of "a
unique identifier," then the keys and only the keys should be
compared -- furthermore, the hash value should be computed from
the key alone, without reference to the associated data. If "data
item" means "the key and all its associated data," comparison seems
either pointless or absurd depending on the viewpoint.
2) If two keys generate the same hash value and the comparison function
finds the entries to be equal, which of the following resolutions is
preferred?

A) Return an error and otherwise do nothing. The user will be required
to first remove the old item.

B) Provide void **replaced parameters to accept the displaced data item
(and key if they are being stored) and then insert the new data item
in place of the displaced data item.
[...]
C) Delete the data item (and key if they are being stored) using
function pointers provided when the hash map was created and then
insert the new data item in place of the deleted data item.
All these strategies (and others, too) are reasonable, and
one could dream up applications in which each would be the
"obvious" preferred behavior. For a general-purpose package
I'd choose (A): the interface is "small," and the user can
build the other behaviors atop it (and other operations like
deletion) if desired.
This code will be part of libmba.

Business schools teach programming?

--
Er*********@sun .com

Nov 13 '05 #2

CBFalconer

Michael B Allen wrote:

I would like to know your preferences regarding a C API for a
hash map. Specifically, when inserting a data item into a hash
map there are two special cases that require consideration.
.... snip ...
int hashmap_create( struct hashmap *h,
unsigned int size,
hash_fn hash,
compare_fn cmp,
del_fn key_del,
del_fn data_del,
void *context);

You might take a look at the interface for hashlib, which was
partially the result of a similar discussion here some time ago.
hashlib is available at:

<http://cbfalconer.home .att.net/download/>

--
Chuck F (cb********@yah oo.com) (cb********@wor ldnet.att.net)
Available for consulting/temporary embedded and systems.
<http://cbfalconer.home .att.net> USE worldnet address!

Nov 13 '05 #3

Michael B Allen

On Tue, 18 Nov 2003 11:51:29 -0500, CBFalconer wrote:

Michael B Allen wrote:

I would like to know your preferences regarding a C API for a hash map.
Specifically, when inserting a data item into a hash map there are two
special cases that require consideration.

... snip ...

int hashmap_create( struct hashmap *h,
unsigned int size,
hash_fn hash,
compare_fn cmp,
del_fn key_del,
del_fn data_del,
void *context);

You might take a look at the interface for hashlib, which was partially
the result of a similar discussion here some time ago. hashlib is
available at:

<http://cbfalconer.home .att.net/download/>

What's interesting about this is it uses an entirely different method
from the one I was thinking of. Conceptually a map maps a key to a value
so I was thinking a separate key should be used with get and put like:

hashmap_insert( h, key, data)
hashmap_find(h, key)

But you don't use a key at all. Instead it looks like you just use an
opaque pointer and extract all information from it using a variety of
functions provided by the user. Presumably to lookup a data item it
would be necessary to use an additional structure to (e.g. if you were
just mapping names to objects you could not use the character string
directly with find).

And if two objects are equal it looks like you return a pointer to the
offending element without removing it? I suppose that is two votes for
case A.

Mike

Nov 13 '05 #4

Michael B Allen

On Tue, 18 Nov 2003 10:46:54 -0500, Eric Sosman wrote:

Michael B Allen wrote:

I would like to know your preferences regarding a C API for a hash map.
Specifically, when inserting a data item into a hash map there are two
special cases that require consideration.

1) If two keys generate the same hash value should a comparison
function
be used to compare keys or should data items be compared? If data
items are compared it is not necessary to store the keys with each
entry which has an impact on question 2.
I'm not entirely sure I understand what you're asking -- the
distinction between comparing keys and comparing "data items" is not
clear to me. If "key" is understood in the usual sense of "a unique
identifier," then the keys and only the keys should be compared --
furthermore, the hash value should be computed from the key alone,
without reference to the associated data. If "data item" means "the key
and all its associated data," comparison seems either pointless or
absurd depending on the viewpoint.

Consider the following where "foo" is the key and obj is the data item:

hashmap_put("fo o", obj)
...
obj = hashmap_get("fo o")

If the same hash value is generated for the key "foo" and "bar" (unlikely
but possible) then it is necessary to use a comparison function to
definitively determine wheather or not the keys are equal. My original
question considered comparing the data object instead. That was a foolish
mistake as it does not make any sense. Please disregard the question.

2) If two keys generate the same hash value and the comparison function
finds the entries to be equal, which of the following resolutions is
preferred?

A) Return an error and otherwise do nothing. The user will be
required to first remove the old item. <snip> All these strategies (and others, too) are reasonable, and
one could dream up applications in which each would be the "obvious"
preferred behavior. For a general-purpose package I'd choose (A): the
interface is "small," and the user can build the other behaviors atop it
(and other operations like deletion) if desired.

Mmm, how does one implement deletion atop of the case A behavior?
This code will be part of libmba.

Business schools teach programming?

Right. It's for creating useless spreadsheets and those stupid questions
asked in "business requirements" meetings.

Mike

Nov 13 '05 #5

Eric Sosman

Michael B Allen wrote:

On Tue, 18 Nov 2003 10:46:54 -0500, Eric Sosman wrote:
Michael B Allen wrote:
[...]
2) If two keys generate the same hash value and the comparison function
finds the entries to be equal, which of the following resolutions is
preferred?

A) Return an error and otherwise do nothing. The user will be
required to first remove the old item.

<snip>
All these strategies (and others, too) are reasonable, and
one could dream up applications in which each would be the "obvious"
preferred behavior. For a general-purpose package I'd choose (A): the
interface is "small," and the user can build the other behaviors atop it
(and other operations like deletion) if desired.

Mmm, how does one implement deletion atop of the case A behavior?

Now it's my turn to apologize for unclear wording.
I meant that behaviors B and C (snipped; they are variations
on "replace the existing stuff with the new stuff") can be
implemented atop A plus a deletion operation.

In my own hash package (I guess everybody writes one
eventually), the insertion function returns a pointer to
the item inserted or a pointer to the pre-existing item
with the same key (or NULL for other kinds of failure):

ptr = HTInsert(table_ handle, &key_and_dat a);
if (ptr == &key_and_dat a)
// insertion succeeded
else if (ptr != NULL)
// collision; ptr points to existing item
else
// other failure, probably realloc()

This is essentially your case A.

--
Er*********@sun .com

Nov 13 '05 #6

CBFalconer

Michael B Allen wrote:

On Tue, 18 Nov 2003 11:51:29 -0500, CBFalconer wrote:
Michael B Allen wrote:

I would like to know your preferences regarding a C API for a
hash map. Specifically, when inserting a data item into a hash
map there are two special cases that require consideration.

... snip ...

int hashmap_create( struct hashmap *h,
unsigned int size,
hash_fn hash,
compare_fn cmp,
del_fn key_del,
del_fn data_del,
void *context);

You might take a look at the interface for hashlib, which was
partially the result of a similar discussion here some time ago.
hashlib is available at:

<http://cbfalconer.home .att.net/download/>

What's interesting about this is it uses an entirely different
method from the one I was thinking of. Conceptually a map maps
a key to a value so I was thinking a separate key should be used
with get and put like:

hashmap_insert( h, key, data)
hashmap_find(h, key)

But you don't use a key at all. Instead it looks like you just
use an opaque pointer and extract all information from it using
a variety of functions provided by the user. Presumably to
lookup a data item it would be necessary to use an additional
structure to (e.g. if you were just mapping names to objects
you could not use the character string directly with find).

And if two objects are equal it looks like you return a pointer
to the offending element without removing it? I suppose that is
two votes for case A.

I don't know what your case A etc. is any more, but the
fundamental thing the system does is answer the question "is this
item in the data base". If so, it returns a pointer to the stored
data (which is not the same as the queried data). The user may
now do things with an auxiliary field, such as a count. However
the user does NOT own the data pointed at, that is under the
control of the hashlib data base. He MUST NOT alter the key
value.

Another basic operation is "install this item". The installation
process makes a copy, via provided routines, and this is the point
at which a count field would be initialized.

The only way to remove the storage from the control of the data
base (prior to closing it) is by the delete operation. This again
returns a pointer to data, but that data is no longer under the
control of the data base, and storage disposition etc. is the
users responsibility. It cannot be re-stored, since install makes
a copy.

The markov and wdfreq usage examples show some fairly complex
manipulations.

At any rate, the complete usage is intended to be understandable
via the hashlib.h header file alone.

--
Chuck F (cb********@yah oo.com) (cb********@wor ldnet.att.net)
Available for consulting/temporary embedded and systems.
<http://cbfalconer.home .att.net> USE worldnet address!

Nov 13 '05 #7

Eric Sosman

CBFalconer wrote:

Michael B Allen wrote:

And if two objects are equal it looks like you return a pointer
to the offending element without removing it? I suppose that is
two votes for case A.

I don't know what your case A etc. is any more, but the
fundamental thing the system does is answer the question "is this
item in the data base". If so, it returns a pointer to the stored
data (which is not the same as the queried data). The user may
now do things with an auxiliary field, such as a count. However
the user does NOT own the data pointed at, that is under the
control of the hashlib data base. He MUST NOT alter the key
value.

It's worth pointing out that the question of who owns
the stored data is a design choice, not an immutable tenet
of hash tables. ("Don't change the key" *is* such a tenet,
though.) CBF chose to have the table own the stored data,
cloning the user-supplied datum (via user-supplied functions)
at need. In my own package I chose oppositely: the caller
always owns the data, and the table merely keeps track of
which items are currently "in."

CBF's design is perhaps more reliable than mine, since
the user is less likely to commit the accident of deallocating
an object while it's still in the table. On the other hand,
my design allows a single object to exist in an arbitrary
number of hash tables simultaneously: I can search the same
set of data on both telephone number and postal code. (CBF
can do this, too, but only by using an intermediate "reference"
object to point to the real data, and allowing the references
to be copied into multiple tables.)

There is no One Single Best Way to design an interface.
CBF's design beats mine in some respects; I feel mine beats
his in others; trade-offs are always present. In the end,
an interface design says as much about the designer's personal
preferences as it does about the problem to be solved.

--
Er*********@sun .com

Nov 13 '05 #8

CBFalconer

Eric Sosman wrote:

CBFalconer wrote:
.... snip ...
I don't know what your case A etc. is any more, but the
fundamental thing the system does is answer the question "is this
item in the data base". If so, it returns a pointer to the stored
data (which is not the same as the queried data). The user may
now do things with an auxiliary field, such as a count. However
the user does NOT own the data pointed at, that is under the
control of the hashlib data base. He MUST NOT alter the key
value.

It's worth pointing out that the question of who owns
the stored data is a design choice, not an immutable tenet
of hash tables. ("Don't change the key" *is* such a tenet,
though.) CBF chose to have the table own the stored data,
cloning the user-supplied datum (via user-supplied functions)
at need. In my own package I chose oppositely: the caller
always owns the data, and the table merely keeps track of
which items are currently "in."

CBF's design is perhaps more reliable than mine, since
the user is less likely to commit the accident of deallocating
an object while it's still in the table. On the other hand,
my design allows a single object to exist in an arbitrary
number of hash tables simultaneously: I can search the same
set of data on both telephone number and postal code. (CBF
can do this, too, but only by using an intermediate "reference"
object to point to the real data, and allowing the references
to be copied into multiple tables.)

There is no One Single Best Way to design an interface.
CBF's design beats mine in some respects; I feel mine beats
his in others; trade-offs are always present. In the end,
an interface design says as much about the designer's personal
preferences as it does about the problem to be solved.

A nice exposition of why choices are made :-) Mine were made
largely on the basis of 'this way will minimize my usage errors'.

--
Chuck F (cb********@yah oo.com) (cb********@wor ldnet.att.net)
Available for consulting/temporary embedded and systems.
<http://cbfalconer.home .att.net> USE worldnet address!

Nov 13 '05 #9

Similar topics

4181

STL hash_map hash

by: Murali | last post by:

I have a requirement where I have to use two unsigned ints as a key in a STL hash map. A couple of ways to do this is 1. create a struct with two unsigned ints and use that as key (write my own HashFcn and EqualKey template args) or, 2. convert the two unsigned ints to char*s, concatenate them and use that as Key. For method 1, the difficulty I am having is in writing the HashFcn. HashFcn requires the following method

C / C++

12708

Flash calling window.location.hash causes IE6 crash

by: Simula | last post by:

I am developing an HTML javascript application and I want to preserve state in a way that can be book-marked. I chose HTML anchors as a means of preserving state. When the application changes state, the HTML page URL would change from default.html to default.html#stateA. example: http://pearstudios.com/javascript/locationHashAndFlash.html This has worked perfectly within HTML and javascript alone, however, when connecting Flash and...

Javascript

3798

Better crypto hash functions, long, with code

by: Bryan Olson | last post by:

The current Python standard library provides two cryptographic hash functions: MD5 and SHA-1 . The authors of MD5 originally stated: It is conjectured that it is computationally infeasible to produce two messages having the same message digest. That conjecture is false, as demonstrated by Wang, Feng, Lai and Yu in 2004 . Just recently, Wang, Yu, and Lin showed a short- cut solution for finding collisions in SHA-1 . Their result

Python

5248

Object Hash vs. object Array preference

by: VK | last post by:

Hello, In my object I have getDirectory() method which returns 2-dimentional array (or an imitation of 2-dimentional array using two JavaScript objects with auto-handled length property - please let's us do not go into an "each dot over i" clarification discussion now - however you want to call - you call it ;-) array contains records of all files in the current dir. array contains records of all subs in the current dir

Javascript

4320

don't need dictionary's keys - hash table?

by: kdotsky | last post by:

Hello, I am using some very large dictionaries with keys that are long strings (urls). For a large dictionary these keys start to take up a significant amount of memory. I do not need access to these keys -- I only need to be able to retrieve the value associated with a certain key, so I do not want to have the keys stored in memory. Could I just hash() the url strings first and use the resulting integer as the key? I think what I'm...

Python

7027

General Hash Functions In Python

by: Arash Partow | last post by:

Hi all, I've ported various hash functions to python if anyone is interested: def RSHash(key): a = 378551 b = 63689 hash = 0

Python

3234

My hash table is in need of peer review

by: Johan Tibell | last post by:

I would be grateful if someone had a minute or two to review my hash table implementation. It's not yet commented but hopefully it's short and idiomatic enough to be readable. Some of the code (i.e. the get_hash function) is borrowed from various snippets I found on the net. Thee free function could probably need some love. I have been thinking about having a second linked list of all entries so that the cost of freeing is in proportion to...

C / C++

3917

accessing char as int through union

by: Hallvard B Furuseth | last post by:

Is the code below valid? Generally a value must be accessed through the same type it was stored as, but there is an exception for data stored through a character type. I'm not sure if that applies in this case though: #include <limits.h> unsigned foo(void) { static const union { unsigned char str;

C / C++

1825

How to say $a=$b->{"A"} ||={} in Python?

by: beginner | last post by:

Hi All. I'd like to do the following in more succint code: if k in b: a=b else: a={} b=a

Python

9794

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...

General

9642

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...

Windows Server

10780

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...

C / C++

10539

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...

Windows Server

7753

Access Europe - Using VBA to create a class based on a table - Wed 1 May

by: isladogs | last post by:

The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...

Microsoft Access / VBA

5623

Trying to create a lan-to-lan vpn between two differents networks

by: TSSRALBI | last post by:

Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...

Networking - Hardware / Configuration

4420

transfer the data from one system to another through ip address

by: 6302768590 | last post by:

Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system

C# / C Sharp

3968

How to add payments to a PHP MySQL app.

by: muto222 | last post by:

How can i add a mobile payment intergratation into php mysql website.

PHP

3077

Comprehensive Guide to Website Development in Toronto: Expert Insights from BSMN Consultancy

by: bsmnconsultancy | last post by:

In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

General