444,119 Members | 2,064 Online
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 444,119 IT Pros & Developers. It's quick & easy.

# Anyway to clarify this code? (dictionaries)

 P: n/a I've been searching thru the library documentation, and this is the best code I can produce for this alogorithm: I'd like to return a dictionary which is a copy of 'another' dictionary whoes values are bigger than 'x' and has the keys 'keys': def my_search (another, keys, x): temp = another.fromkeys(keys) return dict([[k,v] for k in temp.keys() for v in temp.values() if v>=x]) Is there any way to improve this code? I want to avoid converting the dictionary to a list and then to a dictionay. Are there speed penalties for such a conversion? Bye. Nov 23 '05 #1
9 Replies

 P: n/a "javuchi" writes: I've been searching thru the library documentation, and this is the best code I can produce for this alogorithm: I'd like to return a dictionary which is a copy of 'another' dictionary whoes values are bigger than 'x' and has the keys 'keys': def my_search (another, keys, x): temp = another.fromkeys(keys) return dict([[k,v] for k in temp.keys() for v in temp.values() if v>=x]) Is there any way to improve this code? Lots of them. Let's start by pointing out two bugs: You're creating a list that's the "cross product" of keys and values, then handing that to dict. You're handing it a list with entries for the same key. That behavior may be defined, but I wouldn't count on it. fromkeys sets the values in temp to the same value - None. So temp.values() is a list of None's, so v is None every time through the loop. So you could do what that code does with: def my_search(another, keys, x): if None >= x: return another.fromkeys(keys) else: return dict() You probably want something like: def my_search(another, keys, x): return dict([[k,another[k]] for k in keys if another[k] >= x] If you really want to avoid indexing another twice, you could do: def my_search(another, keys, x): return dict([[k, v] for k, v in another.items() if v >= x and k in keys]) But then you're looking through all the keys in another, and searching through keys multiple times, which probably adds up to a lot more wasted work than indexing another twice. I want to avoid converting the dictionary to a list and then to a dictionay. Are there speed penalties for such a conversion? No. But it does take time to do the conversion. I think I'd write it out "longhand": def my_search(another, keys, x): new = dict() for k in keys: if another[k] >= x: new[k] = another[k] return new This makes it clear that you only index another twice if you actually use it. The list comprehension will do the loop in C, but it means you have to scan two lists instead of one. If you're worried about which is faster, measure it on your target platform. http://www.mired.org/home/mwm/ Independent WWW/Perforce/FreeBSD/Unix consultant, email for more information. Nov 23 '05 #2

 P: n/a Mike Meyer wrote: def my_search(another, keys, x): return dict([[k, v] for k, v in another.items() if v >= x and k in keys]) But then you're looking through all the keys in another, and searching through keys multiple times, which probably adds up to a lot more wasted work than indexing another twice. Would you mind clarify ? Do you mean "k in keys" is a scan rather than a lookup ? I find it to be pretty clean and straight forward. I think one way or another, one need to loop through one of them, then index search the other. It may help a bit to take the len() and loop through the shorter one. This seems like a SQL equivalent. select * from a where a.key=b.key and a.v >= x Nov 23 '05 #3

 P: n/a On 22 Nov 2005 17:58:28 -0800, "javuchi" wrote: I've been searching thru the library documentation, and this is thebest code I can produce for this alogorithm:I'd like to return a dictionary which is a copy of 'another' dictionarywhoes values are bigger than 'x' and has the keys 'keys':def my_search (another, keys, x): temp = another.fromkeys(keys) return dict([[k,v] for k in temp.keys() for v in temp.values()if v>=x])Is there any way to improve this code?I want to avoid converting the dictionary to a list and then to adictionay. Are there speed penalties for such a conversion?Bye. another = dict(zip('abcd', iter(random.random, 2))) import random another = dict(zip('abcd', iter(random.random, 2))) for k,v in another.items(): print k,v ... a 0.606494662034 c 0.273998760342 b 0.358066029098 d 0.774406432218 If keys are few compared to the number of keys in another, this may be prefereable: def my_search(another, keys, x): return dict((k,another[k]) for k in keys if another[k]>x) ... my_search(another, 'cb', .3) {'b': 0.35806602909756235} my_search(another, 'abcd', .4) {'a': 0.60649466203365532, 'd': 0.77440643221840166} This sounds like homework though ... ? Regards, Bengt Richter Nov 23 '05 #4

 P: n/a Mike Meyer wrote: def my_search(another, keys, x): new = dict() for k in keys: if another[k] >= x: new[k] = another[k] return new BTW, this would raise exception if k is not in another. Nov 23 '05 #5

 P: n/a Bengt Richter wrote: >>> def my_search(another, keys, x): return dict((k,another[k]) for k in keys if another[k]>x) ... >>> my_search(another, 'cb', .3) {'b': 0.35806602909756235} >>> my_search(another, 'abcd', .4) {'a': 0.60649466203365532, 'd': 0.77440643221840166} Do you need to guard the case "k not in another" ? Nov 23 '05 #6

 P: n/a "bo****@gmail.com" writes: Mike Meyer wrote: def my_search(another, keys, x): return dict([[k, v] for k, v in another.items() if v >= x and k in keys]) But then you're looking through all the keys in another, and searching through keys multiple times, which probably adds up to a lot more wasted work than indexing another twice. Would you mind clarify ? Do you mean "k in keys" is a scan rather than a lookup ? I find it to be pretty clean and straight forward. I assumed keys was a simple sequence of some kind, because you passed it to fromkeys. I guess it could be set or a dictionary, in which case "k in keys" would be a lookup. Were you trying to force a lookup by creating a dict with the keys from k via fromkeys? If so, using a set would have the same effect, and be a lot clearer: temp = set(keys) return dict([[k, v] for k, v in another.items() if v >= x and k in temp]) I think one way or another, one need to loop through one of them, then index search the other. It may help a bit to take the len() and loop through the shorter one. First, remember the warnings about premature optimization. The following might be worth looking into: use = set(another) - set(keys) return dict([[k, another[k]] for k in use if another[k] >= x] Though I still think I prefer the longhand version: out = dict() for key in set(another) - set(keys): if another[k] >= x: out[k] = another[k] The set difference is still going to loop through one and do lookups in the other, but it'll happen in C instead of Python. Unless your lists are *very* long, the performance differences will probably negligible, and are liable to change as you change the underlying platform. So I'd recommend you choose the form that's mostt readable to you, and go with that. http://www.mired.org/home/mwm/ Independent WWW/Perforce/FreeBSD/Unix consultant, email for more information. Nov 23 '05 #7

 P: n/a Mike Meyer wrote: First, remember the warnings about premature optimization. Which is why I said the one-liner(your first one) is clean and clear, and bug free in one go. use = set(another) - set(keys) return dict([[k, another[k]] for k in use if another[k] >= x] Though I still think I prefer the longhand version: out = dict() for key in set(another) - set(keys): if another[k] >= x: out[k] = another[k] This is definitely better than the other long hand version as the set operation remove the potential problem of another[k] raise KeyError. Nov 23 '05 #8

 P: n/a javuchi wrote: I want to avoid converting the dictionary to a list and then to a dictionay. Are there speed penalties for such a conversion? You mean, is it faster to write, test, debug and execute slow Python code rather than letting Python's built-in routines written in fast C do the job? I have no idea. Perhaps you should try it and see. Write some code to do it all manually, and time it. Make sure you use realistic test data: if your users will be using dictionaries with 10,000 items, there is no point in testing only dictionaries with 10 items. For accuracy, run (say) 20 tests, and look at the average speed. Of better still, use the timeit module. -- Steven. Nov 23 '05 #9

 P: n/a On 22 Nov 2005 19:52:40 -0800, "bo****@gmail.com" wrote: Bengt Richter wrote: >>> def my_search(another, keys, x): return dict((k,another[k]) for k in keys if another[k]>x) ... >>> my_search(another, 'cb', .3) {'b': 0.35806602909756235} >>> my_search(another, 'abcd', .4) {'a': 0.60649466203365532, 'd': 0.77440643221840166}Do you need to guard the case "k not in another" ? Good catch ;-) What did the OP want as a value if any for that case? None? or no entry at all? Taking a cue from Mike, I like the set method of getting the common keys, to eliminate the entry (untested) def my_search(another, keys, x): return dict((k,another[k]) for k in (set(another)&set(keys)) if another[k]>x) otherwise, to get Nones, maybe (untested) def my_search(another, keys, x): return dict((k,another.get(k)) for k in keys if k not in another or another[k]>x) Regards, Bengt Richter Nov 23 '05 #10

### This discussion thread is closed

Replies have been disabled for this discussion.