By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
446,193 Members | 843 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 446,193 IT Pros & Developers. It's quick & easy.

Hash structure or Binary Search Tree...

P: n/a
I'm reposting this, as everyone went off on the "did this say c++"
tangent, when someone thought he was reading a different group. Hope I get
better results this time.

I need an efficient structure to store a counter and a series of IP
addresses. I would like to use something that's not going to slow down
once given large amounts of data. I'd like it to run as close to O(n) as
possible.

The main goal of this project is to create a daemon, which receives
data delimited by a new line, hashes the data, looks it up and increments
the associated counter. If the counter reaches a configured threshold,
send back 1 to the client, otherwise send 0.

Any comments/questions would be much appreciated...

Thank You,
Sig
Nov 14 '05 #1
Share this Question
Share on Google+
7 Replies


P: n/a
exonic wrote:
....snip ...
I need an efficient structure to store a counter and a series of
IP addresses. I would like to use something that's not going to
slow down once given large amounts of data. I'd like it to run
as close to O(n) as possible.

The main goal of this project is to create a daemon, which
receives data delimited by a new line, hashes the data, looks it
up and increments the associated counter. If the counter reaches a
configured threshold, send back 1 to the client, otherwise send 0.

Any comments/questions would be much appreciated...


You could use the hashlib.zip library. This reduces your problem
to optimizing the hash functions, and the status feedback from the
library will help you do this. It will run O(1), which is
considerably better than O(n). See:

<http://cbfalconer.home.att.net/download/>

A slight modification of the wdfreq demo program should do it.

--
Chuck F (cb********@yahoo.com) (cb********@worldnet.att.net)
Available for consulting/temporary embedded and systems.
<http://cbfalconer.home.att.net> USE worldnet address!
Nov 14 '05 #2

P: n/a
exonic wrote:
I'm reposting this, as everyone went off on the "did this say c++"
tangent, when someone thought he was reading a different group. Hope I get
better results this time.
Well, you really have a data structure/algorithm question as opposed to
a question about the C language. The newsgroup news:comp.programming
would be the right place to ask. (And no, the fact that you're writing
this in C is not relevant.)
I need an efficient structure to store a counter and a series of IP
addresses. I would like to use something that's not going to slow down
once given large amounts of data. I'd like it to run as close to O(n) as
possible.
O(n)? You're joking, right? ;-) You'd achieve O(n) by using a linked
list, the most naive possible approach. A balanced binary tree would
give you O(log n) -- and a hash table O(1) (though, the devil is inthe
constants).
The main goal of this project is to create a daemon, which receives
data delimited by a new line, hashes the data, looks it up and increments
the associated counter. If the counter reaches a configured threshold,
send back 1 to the client, otherwise send 0.
Daemons are also not topical here. Going to news:comp.unix.programmer
would be appropriate.
Any comments/questions would be much appreciated...

<OT>
In a situation like this, the likely access pattern of the IP addresses
can be quite significant. A web search on some of the following might be
fruitful:

Red-black trees
AVL trees
Splay trees
Skip lists
</OT>

In any event, *please* read the FAQ at:

http://www.eskimo.com/~scs/C-faq/top.html

HTH,
--ag

--
Artie Gold -- Austin, Texas

Nov 14 '05 #3

P: n/a
On Mon, 05 Jan 2004 11:33:09 -0500, exonic <ex****@triton.net> wrote
in comp.lang.c:
I'm reposting this, as everyone went off on the "did this say c++"
tangent, when someone thought he was reading a different group. Hope I get
better results this time.

I need an efficient structure to store a counter and a series of IP
addresses. I would like to use something that's not going to slow down
once given large amounts of data. I'd like it to run as close to O(n) as
possible.

The main goal of this project is to create a daemon, which receives
data delimited by a new line, hashes the data, looks it up and increments
the associated counter. If the counter reaches a configured threshold,
send back 1 to the client, otherwise send 0.

Any comments/questions would be much appreciated...

Thank You,
Sig


Why would anyone bother to hash (IPV4) IP addresses? Each and every
distinct IP address is guaranteed to fit into an unsigned long.

--
Jack Klein
Home: http://JK-Technology.Com
FAQs for
comp.lang.c http://www.eskimo.com/~scs/C-faq/top.html
comp.lang.c++ http://www.parashift.com/c++-faq-lite/
alt.comp.lang.learn.c-c++
http://www.contrib.andrew.cmu.edu/~a...FAQ-acllc.html
Nov 14 '05 #4

P: n/a
Jack Klein wrote:
exonic <ex****@triton.net> wrote

.... snip ...
I need an efficient structure to store a counter and a series of
IP addresses. I would like to use something that's not going to
slow down once given large amounts of data. I'd like it to run
as close to O(n) as possible.

The main goal of this project is to create a daemon, which
receives data delimited by a new line, hashes the data, looks it
up and increments the associated counter. If the counter reaches a
configured threshold, send back 1 to the client, otherwise send 0.


Why would anyone bother to hash (IPV4) IP addresses? Each and every
distinct IP address is guaranteed to fit into an unsigned long.


Because he only wants a subset of all possible addresses,
dependant on what were used. Suitable structures include an
adaptive hash table (hashlib) and a linked list. Only the hash
table offers O(1) access times AFAICT.

The hash table solution also automatically adapts to IPV6
addresses, and can easily handle input in either binary (32 or 48
bit words) or text strings.

--
Chuck F (cb********@yahoo.com) (cb********@worldnet.att.net)
Available for consulting/temporary embedded and systems.
<http://cbfalconer.home.att.net> USE worldnet address!
Nov 14 '05 #5

P: n/a
CBFalconer wrote:
Jack Klein wrote:
exonic <ex****@triton.net> wrote

... snip ...
> I need an efficient structure to store a counter and a series of
> IP addresses. I would like to use something that's not going to
> slow down once given large amounts of data. I'd like it to run
> as close to O(n) as possible.
>
> The main goal of this project is to create a daemon, which
> receives data delimited by a new line, hashes the data, looks it
> up and increments the associated counter. If the counter reaches a
> configured threshold, send back 1 to the client, otherwise send 0.


Why would anyone bother to hash (IPV4) IP addresses? Each and every
distinct IP address is guaranteed to fit into an unsigned long.


Because he only wants a subset of all possible addresses,
dependant on what were used. Suitable structures include an
adaptive hash table (hashlib) and a linked list. Only the hash
table offers O(1) access times AFAICT.


A trie offers O(1) access, plus it keeps the data ordered.

Jeremy.
Nov 14 '05 #6

P: n/a
Jeremy Yallop wrote:

CBFalconer wrote:
Jack Klein wrote:
exonic <ex****@triton.net> wrote

... snip ...
> I need an efficient structure to store a counter and a series of
> IP addresses. I would like to use something that's not going to
> slow down once given large amounts of data. I'd like it to run
> as close to O(n) as possible.
>
> The main goal of this project is to create a daemon, which
> receives data delimited by a new line, hashes the data, looks it
> up and increments the associated counter. If the counter reaches a
> configured threshold, send back 1 to the client, otherwise send 0.

Why would anyone bother to hash (IPV4) IP addresses? Each and every
distinct IP address is guaranteed to fit into an unsigned long.


Because he only wants a subset of all possible addresses,
dependant on what were used. Suitable structures include an
adaptive hash table (hashlib) and a linked list. Only the hash
table offers O(1) access times AFAICT.


A trie offers O(1) access, plus it keeps the data ordered.


Does it? I would have expected O(logN) offhand.

--
Chuck F (cb********@yahoo.com) (cb********@worldnet.att.net)
Available for consulting/temporary embedded and systems.
<http://cbfalconer.home.att.net> USE worldnet address!
Nov 14 '05 #7

P: n/a
CBFalconer wrote:

Jeremy Yallop wrote:

CBFalconer wrote:

Because he only wants a subset of all possible addresses,
dependant on what were used. Suitable structures include an
adaptive hash table (hashlib) and a linked list. Only the hash
table offers O(1) access times AFAICT.


A trie offers O(1) access, plus it keeps the data ordered.


Does it? I would have expected O(logN) offhand.


<off-topic>

It's a different "N." A trie takes O(log N) where
N is the length of a single key, not the number of keys
stored. If all keys have the same N, log N is a constant
and the retrieval time is thus O(1).

A few months back somebody on a different newsgroup
was proposing to store about 100000 IPv4 addresses in
four-level trie with 256-way branching at each node.
A few back-of-the-envelope computations convinced him
this would be an extravagantly wasteful data structure:
the top two levels would be full or nearly so, but the
bottom two levels (the bulk of the trie) would be almost
empty, averaging less than two keys apiece.

See also Knuth "The Art of Computer Programming,
Volume III: Sorting and Searching," Section 6.3.

</off-topic>

--
Er*********@sun.com
Nov 14 '05 #8

This discussion thread is closed

Replies have been disabled for this discussion.