473,398 Members | 2,389 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,398 software developers and data experts.

Hashing function (possibly OT)

Hi,

I apologise in advance if this is off topic, but would appreciate any
pointers you chaps might be able to provide.

I'm relatively novice in the art of C so am after any suggestions about
implementing a very simple hash (or quasi hash)

I need to be able to store a counter and a timestamp for requests from
particular device MAC addresses (6 bytes, presented as xx:xx:xx:xx:xx:xx
where x is a hex digit), ideally in a dynamic data structure. Additionally I
need to be able to access the data quickly, hence my thoughts to use a
hashing function.

This may be naive but I'd thought to code something along the line of a tree
structure, with each octet representing a node holding 256 pointers to the
next set of nodes (from most significant to least significant octect) with
the last octect holding pointers to a simple record holding the data.

This would mean that at most i would need to traverse 6 tree nodes before
hitting the data whatever the volume of addresses.(?)

Does this seem feasible?

If so (and this is where I get unstuck :o)), in C i'd have a structure
defined as an array of 255 pointers, then allocate space for one of these
structures as I parse each octect of the mac address. I'm having difficulty
in defining the structures correctly so that the last but one octet
references the actual data rather than another set of node pointers.

Can someone please advise on the best way to do this? Or if you think the
above is complete pap suggest any easier or more sensible ( please :P)
solutions. I'm open to advice/flaming/suggestions!

Cheers,
Matt
Nov 13 '05 #1
2 2063
Matt Bull wrote:

I need to be able to store a counter and a timestamp for requests from
particular device MAC addresses (6 bytes, presented as xx:xx:xx:xx:xx:xx
where x is a hex digit), ideally in a dynamic data structure. Additionally I
need to be able to access the data quickly, hence my thoughts to use a
hashing function.

This may be naive but I'd thought to code something along the line of a tree
structure, with each octet representing a node holding 256 pointers to the
next set of nodes (from most significant to least significant octect) with
the last octect holding pointers to a simple record holding the data.
<off-topic>

This approach would probably waste a lot of memory. For
example, suppose your entire data base contains just two six-byte
keys that happen to differ in the first byte. You'll allocate
one 256-entry node for the first byte and two for each of the
second through sixth bytes -- eleven 256-entry nodes in all, just
for twelve bytes of key data?

Incidentally, the scheme you describe is known as a "trie,"
and doesn't sound like a hashing method at all.

</off-topic>
If so (and this is where I get unstuck :o)), in C i'd have a structure
defined as an array of 255 pointers,
Don't you mean 256?
then allocate space for one of these
structures as I parse each octect of the mac address. I'm having difficulty
in defining the structures correctly so that the last but one octet
references the actual data rather than another set of node pointers.
The first step is to define your "payload," the data
structure you'll find at the eventual end of whatever search
path you'll take. A typical way to do this in C is with a
struct, something like

typedef struct {
TimeStampType when;
CounterType count;
OtherType other;
} Payload;

Next, you need something capable of pointing to 256
different Payloads. An array of pointers is just the trick:

typedef Payload *Level6[256];

That is, a `Level6' object is an array of 256 pointers to
`Payload' objects. At the next level you'll need a bunch of
pointers to Level6's, and so on up to the Level1 root:

typedef Level6 *Level5[256];
typedef Level5 *Level4[256];
typedef Level4 *Level3[256];
typedef Level3 *Level2[256];
typedef Level2 *Level1[256];

/* finally, here's the root of your trie: */
Level1 root;

.... and there you have it: The lone Level1 object points to 256
Level2's, each Level2 points to 256 Level3's, and so on until
the chosen Level6 object points to the desired Payload.

However, this is an incredibly verbose way to proceed! The
novice C programmer is usually comfortable with the idea of a
"pointer to something" but starts to get queasy when confronted
with a "pointer to pointer to something." The LevelN types have
no real function other than to assign names to various "somethings"
so the novice can visualize them more easily -- throw away all
those names and you get the exact same structure, expressed in
terms of multiple levels of pointers:

Payload *****root[256];

(Actually, there's a subtle difference between the two
formulations: the LevelN version carries the "256-ness" information
along while the latter version doesn't, and this will affect the way
you calculate how much memory to allocate for each node. Still,
the essential structure of a six-level trie is the same.)
Can someone please advise on the best way to do this? Or if you think the
above is complete pap suggest any easier or more sensible ( please :P)
solutions. I'm open to advice/flaming/suggestions!


<off-topic>

Choice of data structure and search/update method really hasn't
anything to do with the C language. Once a choice has been made,
the question of how best to express the solution in C would be
topical, but the selection itself is driven by criteria that would
be just as valid (or invalid) for Fortran, Java, BASIC, COBOL, ...

Also, you haven't revealed enough about your situation to allow
anything more than informed guesswork. For example, how many of
these six-byte keys do you need to store? You say you need to
access them "quickly," but you don't mention how many million queries
per second you need to sustain. Once entered, do the keys persist
forever or can they be deleted? And so on, and so forth -- and
NONE of these apparently vital considerations has anything at all
to do with C. Look elsewhere for advice, and return here if you're
having difficulty with coding the chosen solution in C.

</off-topic>

Good luck!

--
Er*********@sun.com
Nov 13 '05 #2

"Eric Sosman" <Er*********@sun.com> wrote in message
news:3F***************@sun.com...
Matt Bull wrote:

I need to be able to store a counter and a timestamp for requests from
particular device MAC addresses (6 bytes, presented as xx:xx:xx:xx:xx:xx
where x is a hex digit), ideally in a dynamic data structure. Additionally I need to be able to access the data quickly, hence my thoughts to use a
hashing function.

This may be naive but I'd thought to code something along the line of a tree structure, with each octet representing a node holding 256 pointers to the next set of nodes (from most significant to least significant octect) with the last octect holding pointers to a simple record holding the data.
<off-topic>

This approach would probably waste a lot of memory. For
example, suppose your entire data base contains just two six-byte
keys that happen to differ in the first byte. You'll allocate
one 256-entry node for the first byte and two for each of the
second through sixth bytes -- eleven 256-entry nodes in all, just
for twelve bytes of key data?

Incidentally, the scheme you describe is known as a "trie,"
and doesn't sound like a hashing method at all.


I've analysed most of the data we are processing at the moment and the keys
would be in quite a low number of most significant byte "chunks", where the
vast majority of difference lies in lower 3 octets. So it might be ok.

This method seemed to offer the best of both ease of programming and memory
use at the time, although I'm open to suggestions for another method that
wouldn't make my brain melt :o)
</off-topic>
If so (and this is where I get unstuck :o)), in C i'd have a structure
defined as an array of 255 pointers,
Don't you mean 256?


Yes I did! hehe
The first step is to define your "payload," the data
structure you'll find at the eventual end of whatever search
path you'll take. A typical way to do this in C is with a
struct, something like
<snipped good stuff>
(Actually, there's a subtle difference between the two
formulations: the LevelN version carries the "256-ness" information
along while the latter version doesn't, and this will affect the way
you calculate how much memory to allocate for each node. Still,
the essential structure of a six-level trie is the same.)

Ahhh thas excellent, just the info I was after!
Can someone please advise on the best way to do this? Or if you think the above is complete pap suggest any easier or more sensible ( please :P)
solutions. I'm open to advice/flaming/suggestions!


<off-topic>

Choice of data structure and search/update method really hasn't
anything to do with the C language. Once a choice has been made,
the question of how best to express the solution in C would be
topical, but the selection itself is driven by criteria that would
be just as valid (or invalid) for Fortran, Java, BASIC, COBOL, ...


Yes, apologies again for the off topic sway of the query. I'm somewhat stuck
with implementing in C as thats the only supported API the application this
code must interact with supports.
Also, you haven't revealed enough about your situation to allow
anything more than informed guesswork. For example, how many of
these six-byte keys do you need to store? You say you need to
access them "quickly," but you don't mention how many million queries
per second you need to sustain. Once entered, do the keys persist
forever or can they be deleted? And so on, and so forth -- and
NONE of these apparently vital considerations has anything at all
to do with C. Look elsewhere for advice, and return here if you're
having difficulty with coding the chosen solution in C.

Well, at the moment it needs to retain around 100,000 individual "payloads",
persisting in memory until the application is terminated, but generally only
seeing 100-500 new or duplicated keys per second.
</off-topic>

Good luck!


Thanks very much for the information Eric, you've been very helpful!

I'll let you know how I get on.

:o)

Cheers,
Matt
Nov 13 '05 #3

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

2
by: Pat | last post by:
I want to look for some one-to-one hashing function. In C++, any one-to-one hashing function?
11
by: Wm. Scott Miller | last post by:
Hello all! We are building applications here and have hashing algorithms to secure secrets (e.g passwords) by producing one way hashes. Now, I've read alot and I've followed most of the advice...
10
by: Dino M. Buljubasic | last post by:
Hi, I am using MD5 to hash my passwords and add them to database as hashed. I have noticed though that some passwords don't get recognized and I suppose that it happen because hashing might...
19
by: Ole Nielsby | last post by:
How does the GetHashCode() of an array object behave? Does it combine the GetHashCode() of its elements, or does it create a sync block for the object? I want to use readonly arrays as...
6
by: Jayender | last post by:
Hi, What is the difference between Hashing and Encryption ?
4
by: wkatz | last post by:
Hi, Gurus. What hashing algorithm outputs hash value as numbers only? For example, if you pass a “John Q. Public” it will output 23324. If there is no such hashing, how hard is it to hire somebody to...
11
by: January Weiner | last post by:
Hello, I need to use a hashing function for relatively short strings (roughly 16 characters or less). The data that needs to be accessed via hash is just a simple int. I just need to look up...
15
by: Vinodh | last post by:
I am reading about hashing techniques. The map data structure available in C++ STL uses hashing techniques?
24
by: Johnny Jörgensen | last post by:
I'm wondering (and hoping that somebody will be able to answer this): If I calculate the hash value of files (either MD5 or SHA1), can I then be sure that: 1) Two files with the same hash...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.