443,923 Members | 1,443 Online
Need help? Post your question and get tips & solutions from a community of 443,923 IT Pros & Developers. It's quick & easy.

# Data structure recommendation?

 P: n/a Hi all- I'm looking for a data structure that is a bit like a dictionary or a hash map. In particular, I want a mapping of floats to objects. However, I want to map a RANGE of floats to an object. This will be used for timestamped storage / lookup, where the float represents the timestamp. get(x) should return the object with the "newest" (biggest) timestamp y <= x, if it exists. Example: foo = Foo() foo.get(1.5) -None foo.put(1.3, 'a') foo.put(2.6, 'b') foo.get(1.5) -'a' foo.get(7.8) -'b' foo.put(5.0, 'c') foo.get(7.8) -'c' In otherwords, by the end here, for foo.get(x), x < 1.3 maps to None, 1.3 <= x < 2.6 maps to 'a', 2.6 <= x < 5.0 maps to 'b', 5.0 <= x maps to 'c'. I know that foo.get() will be called many times for each foo.put(). Is there any way to achieve O(1) performance for foo.get(), maybe via some kind of hash function? Or is the best thing to use some kind of binary search? Thanks for any advice. -Steven Apr 7 '08 #1
6 Replies

 P: n/a I know that foo.get() will be called many times for each foo.put(). Is there any way to achieve O(1) performance for foo.get(), maybe via some kind of hash function? Or is the best thing to use some kind of binary search? If you know something about the density of the input values, O(1) is possible. E.g if there is a guarantee that there will be between 1 and 10 values per unit of input, then truncate the "event time" to the next-lower int, and use that as an index k into a list; the list item will be a list of events between k and k+1. As you know that there is an upper bound to the number of such events (10), you know that searching the list will take bounded (i.e. constant) time. Likewise, as you know that there will be atleast one event per (k,k+1) interval, you know that you have to scan only one list. If you know no such thing, you'll have to use a binary search. Regards, Martin Apr 7 '08 #2

 P: n/a I know that foo.get() will be called many times for each foo.put(). Is there any way to achieve O(1) performance for foo.get(), maybe via some kind of hash function? Or is the best thing to use some kind of binary search? If you know something about the density of the input values, O(1) is possible. E.g if there is a guarantee that there will be between 1 and 10 values per unit of input, then truncate the "event time" to the next-lower int, and use that as an index k into a list; the list item will be a list of events between k and k+1. As you know that there is an upper bound to the number of such events (10), you know that searching the list will take bounded (i.e. constant) time. Likewise, as you know that there will be atleast one event per (k,k+1) interval, you know that you have to scan only one list. If you know no such thing, you'll have to use a binary search. Regards, Martin Apr 7 '08 #3

 P: n/a Few more notes on the code: You may use the @property in such situations (or you may just use attributes, dropping the property). Note that Python doesn't inline functions calls like Java HotSpot does quite often. def __children(self): raise NotImplementedError() children = property(__children) ==(not tested!) @property def __children(self): raise NotImplementedError() If the profiling shows this is s slow, and it's used on lot of items, there are faster O(1) algorithms for this, like this one: http://aspn.activestate.com/ASPN/Coo.../Recipe/466330 @staticmethod def determine_median(numbers): return sorted(numbers)[ len(numbers) / 2 ] I have seen you use lot of double underscores (I think one may often suffice, but usually two aren't a problem): def __children(self): Stripped of comments, docstrings, and empty lines the code is just 280 lines long (it's 1498 with them!), so a translation doesn't seem too much work (expecially if the original version was Java). The code seem really well commented, organized, etc. And I like it, it looks better than 90+% of the commercial Python code I see :-) Bye, bearophile Apr 8 '08 #4

 P: n/a More bits from your code: neighbours = list() ==> neighbours = [] If you have a recent enough version of Python you can use: candidate_is_neighbour = any(distance < n[1] for n in neighbours) Instead of: candidate_is_neighbour = bool([1 for n in neighbours if distance < n[1]]) It's shorter & simpler, and it stops as soon as it finds a true condition. Bye, bearophile Apr 8 '08 #5

 P: n/a * be************@lycos.com: > Please plug such good things. It seems the Python community is an endless source of interesting modules I didn't know about. Your (single) module looks very nice. I'll take a better look later. Could you please send me an email with an existing From: address? I tried to reply to you but apparently your From: is forged. J. -- I can tell a Whopper[tm] from a BigMac[tm] and Coke[tm] from Pepsi[tm]. [Agree] [Disagree] Apr 10 '08 #6

 P: n/a Jochen Schulz: Could you please send me an email with an existing From: address? I tried to reply to you but apparently your From: is forged. Sorry for the delay, I'll send you an email. In the meantime I have created a fast BK-tree implementation for Psyco, on my PC it's about 26+ times faster than mspace (that uses psyco still), but it's less complete. You can find it here: http://aspn.activestate.com/ASPN/Coo.../Recipe/572156 (I have created a D version too, that's about 4.5 times faster than my Python version, so it's about 126 times faster than the mspace with Psyco). Enjoy, bearophile Jun 27 '08 #7

### This discussion thread is closed

Replies have been disabled for this discussion.