By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
464,286 Members | 1,250 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 464,286 IT Pros & Developers. It's quick & easy.

Floats as keys in dict

P: n/a
Hi

I am making a script to optimiza by dynamic programming. I do not know
the vertices and nodes before the calculation, so I have decided to
store the nodes I have in play as keys in a dict.

However, the dict keys are then floats and I have to round the values
of new possible nodes in each step. When profiling I see that the most
time consuming part of my script is rounding.

Is there a faster way than round() or is there a better way to test
than 'in' or should I store the keys in another way than a dict?

tia,
--
Brian (remove the sport for mail)
http://www.et.web.mek.dtu.dk/Staff/be/be.html
http://www.rugbyklubben-speed.dk
Aug 1 '07 #1
Share this Question
Share on Google+
5 Replies

P: n/a
Brian Elmegaard <br***@rkspeed-rugby.dkwrote:
I am making a script to optimiza by dynamic programming. I do not know
the vertices and nodes before the calculation, so I have decided to
store the nodes I have in play as keys in a dict.

However, the dict keys are then floats and I have to round the values
of new possible nodes in each step. When profiling I see that the most
time consuming part of my script is rounding.

Is there a faster way than round() or is there a better way to test
than 'in' or should I store the keys in another way than a dict?
You may want to consider keeping a sorted list and using standard
library module bisect for searches and insertions -- its behavior is
O(log N) for a search, O(N) for an insertion, but it might be that in
your case saving the rounding could be worth it.

Otherwise, you need to consider a different container, based either on
comparisons (e.g. AVL trees, of which there are several 3rd party
implementations as Python extensions) or on a hashing function that will
give the same hash for two numbers that are "close enough" (e.g., hash
ignoring the lowest N bits of the float's mantissa for some N).

round() operates on decimals and that may not be as fast as working on
binary representations, but, to be fast, a helper function giving the
"hash of a binary-rounded float" would have to be coded in C (or maybe
use psyco).
Alex
Aug 1 '07 #2

P: n/a
Alex Martelli wrote:
Brian Elmegaard <br***@rkspeed-rugby.dkwrote:
>I am making a script to optimiza by dynamic programming. I do not know
the vertices and nodes before the calculation, so I have decided to
store the nodes I have in play as keys in a dict.

However, the dict keys are then floats and I have to round the values
of new possible nodes in each step. When profiling I see that the most
time consuming part of my script is rounding.

Is there a faster way than round() or is there a better way to test
than 'in' or should I store the keys in another way than a dict?

You may want to consider keeping a sorted list and using standard
library module bisect for searches and insertions -- its behavior is
O(log N) for a search, O(N) for an insertion, but it might be that in
your case saving the rounding could be worth it.

Otherwise, you need to consider a different container, based either on
comparisons (e.g. AVL trees, of which there are several 3rd party
implementations as Python extensions) or on a hashing function that will
give the same hash for two numbers that are "close enough" (e.g., hash
ignoring the lowest N bits of the float's mantissa for some N).
That might be a bit dangerous in cases close to changes in exponent,
though, where you could get numbers that were very close but had hugely
different mantissa values because their exponents were one different, no?
round() operates on decimals and that may not be as fast as working on
binary representations, but, to be fast, a helper function giving the
"hash of a binary-rounded float" would have to be coded in C (or maybe
use psyco).
Your first suggestion was better, I think. Bisect would do the job. I
have always thought that dict's numerical index semantics were suspect,
but it's probably way too late to alter that now. [I'm assuming that
Py3.0 is retaining the numerical equivalence relations].

regards
Steve
--
Steve Holden +1 571 484 6266 +1 800 494 3119
Holden Web LLC/Ltd http://www.holdenweb.com
Skype: holdenweb http://del.icio.us/steve.holden
--------------- Asciimercial ------------------
Get on the web: Blog, lens and tag the Internet
Many services currently offer free registration
----------- Thank You for Reading -------------

Aug 1 '07 #3

P: n/a
Steve Holden <st***@holdenweb.comwrites:
Alex Martelli wrote:
[snip]

Thanks a lot for your intersting answers. I will start out taking a
look at bisect.
--
Brian (remove the sport for mail)
http://www.et.web.mek.dtu.dk/Staff/be/be.html
http://www.rugbyklubben-speed.dk
Aug 1 '07 #4

P: n/a
Brian Elmegaard wrote:
However, the dict keys are then floats and I have to round the values
of new possible nodes in each step.
Be careful with this. If you have two values that are
very close together, but on different sides of a rounding
boundary, they will end up as distinct keys even though
they "should" be regarded as equal.

I don't think there's any way to use a dictionary for
this that works properly in all cases. Whatever scheme
you come up with for comparison and hashing will suffer
from this problem one way or another.

So, like Alex said, you need a different data structure.

--
Greg
Aug 2 '07 #5

P: n/a
greg <gr**@cosc.canterbury.ac.nzwrites:
Be careful with this. If you have two values that are
very close together, but on different sides of a rounding
boundary, they will end up as distinct keys even though
they "should" be regarded as equal.
I don't think this is a big problem. It will only give me one more
node.

Wouldn't the same be possible if I use bisect?

--
Brian (remove the sport for mail)
http://www.et.web.mek.dtu.dk/Staff/be/be.html
http://www.rugbyklubben-speed.dk
Aug 2 '07 #6

This discussion thread is closed

Replies have been disabled for this discussion.