By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
443,923 Members | 1,443 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 443,923 IT Pros & Developers. It's quick & easy.

Data structure recommendation?

P: n/a
Hi all-

I'm looking for a data structure that is a bit like a dictionary or a
hash map. In particular, I want a mapping of floats to objects.
However, I want to map a RANGE of floats to an object.

This will be used for timestamped storage / lookup, where the float
represents the timestamp.
get(x) should return the object with the "newest" (biggest) timestamp
y <= x, if it exists.
Example:

foo = Foo()

foo.get(1.5)
-None
foo.put(1.3, 'a')
foo.put(2.6, 'b')
foo.get(1.5)
-'a'
foo.get(7.8)
-'b'
foo.put(5.0, 'c')
foo.get(7.8)
-'c'

In otherwords, by the end here, for foo.get(x),
x < 1.3 maps to None,
1.3 <= x < 2.6 maps to 'a',
2.6 <= x < 5.0 maps to 'b',
5.0 <= x maps to 'c'.

I know that foo.get() will be called many times for each foo.put(). Is
there any way to achieve O(1) performance for foo.get(), maybe via
some kind of hash function? Or is the best thing to use some kind of
binary search?

Thanks for any advice.

-Steven
Apr 7 '08 #1
Share this Question
Share on Google+
6 Replies


P: n/a
I know that foo.get() will be called many times for each foo.put(). Is
there any way to achieve O(1) performance for foo.get(), maybe via
some kind of hash function? Or is the best thing to use some kind of
binary search?
If you know something about the density of the input values, O(1) is
possible. E.g if there is a guarantee that there will be between 1
and 10 values per unit of input, then truncate the "event time" to
the next-lower int, and use that as an index k into a list; the list
item will be a list of events between k and k+1. As you know that
there is an upper bound to the number of such events (10), you
know that searching the list will take bounded (i.e. constant) time.
Likewise, as you know that there will be atleast one event per
(k,k+1) interval, you know that you have to scan only one list.

If you know no such thing, you'll have to use a binary search.

Regards,
Martin
Apr 7 '08 #2

P: n/a
I know that foo.get() will be called many times for each foo.put(). Is
there any way to achieve O(1) performance for foo.get(), maybe via
some kind of hash function? Or is the best thing to use some kind of
binary search?
If you know something about the density of the input values, O(1) is
possible. E.g if there is a guarantee that there will be between 1
and 10 values per unit of input, then truncate the "event time" to
the next-lower int, and use that as an index k into a list; the list
item will be a list of events between k and k+1. As you know that
there is an upper bound to the number of such events (10), you
know that searching the list will take bounded (i.e. constant) time.
Likewise, as you know that there will be atleast one event per
(k,k+1) interval, you know that you have to scan only one list.

If you know no such thing, you'll have to use a binary search.

Regards,
Martin
Apr 7 '08 #3

P: n/a
Few more notes on the code:

You may use the @property in such situations (or you may just use
attributes, dropping the property). Note that Python doesn't inline
functions calls like Java HotSpot does quite often.
def __children(self):
raise NotImplementedError()
children = property(__children)
==(not tested!)
@property
def __children(self):
raise NotImplementedError()
If the profiling shows this is s slow, and it's used on lot of items,
there are faster O(1) algorithms for this, like this one:
http://aspn.activestate.com/ASPN/Coo.../Recipe/466330
@staticmethod
def determine_median(numbers):
return sorted(numbers)[ len(numbers) / 2 ]

I have seen you use lot of double underscores (I think one may often
suffice, but usually two aren't a problem):
def __children(self):

Stripped of comments, docstrings, and empty lines the code is just 280
lines long (it's 1498 with them!), so a translation doesn't seem too
much work (expecially if the original version was Java). The code seem
really well commented, organized, etc. And I like it, it looks better
than 90+% of the commercial Python code I see :-)

Bye,
bearophile
Apr 8 '08 #4

P: n/a
More bits from your code:

neighbours = list()
==>
neighbours = []

If you have a recent enough version of Python you can use:
candidate_is_neighbour = any(distance < n[1] for n in neighbours)
Instead of:
candidate_is_neighbour = bool([1 for n in neighbours if distance <
n[1]])

It's shorter & simpler, and it stops as soon as it finds a true
condition.

Bye,
bearophile
Apr 8 '08 #5

P: n/a
* be************@lycos.com:
>
Please plug such good things. It seems the Python community is an
endless source of interesting modules I didn't know about. Your
(single) module looks very nice. I'll take a better look later.
Could you please send me an email with an existing From: address? I
tried to reply to you but apparently your From: is forged.

J.
--
I can tell a Whopper[tm] from a BigMac[tm] and Coke[tm] from Pepsi[tm].
[Agree] [Disagree]
<http://www.slowlydownward.com/NODATA/data_enter2.html>
Apr 10 '08 #6

P: n/a
Jochen Schulz:
Could you please send me an email with an existing From: address? I
tried to reply to you but apparently your From: is forged.
Sorry for the delay, I'll send you an email.
In the meantime I have created a fast BK-tree implementation for
Psyco, on my PC it's about 26+ times faster than mspace (that uses
psyco still), but it's less complete. You can find it here:
http://aspn.activestate.com/ASPN/Coo.../Recipe/572156

(I have created a D version too, that's about 4.5 times faster than my
Python version, so it's about 126 times faster than the mspace with
Psyco).

Enjoy,
bearophile
Jun 27 '08 #7

This discussion thread is closed

Replies have been disabled for this discussion.