By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
444,119 Members | 2,064 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 444,119 IT Pros & Developers. It's quick & easy.

Anyway to clarify this code? (dictionaries)

P: n/a
I've been searching thru the library documentation, and this is the
best code I can produce for this alogorithm:

I'd like to return a dictionary which is a copy of 'another' dictionary
whoes values are bigger than 'x' and has the keys 'keys':

def my_search (another, keys, x):
temp = another.fromkeys(keys)
return dict([[k,v] for k in temp.keys() for v in temp.values()
if v>=x])

Is there any way to improve this code?
I want to avoid converting the dictionary to a list and then to a
dictionay. Are there speed penalties for such a conversion?

Bye.

Nov 23 '05 #1
Share this Question
Share on Google+
9 Replies


P: n/a
"javuchi" <ja*****@gmail.com> writes:
I've been searching thru the library documentation, and this is the
best code I can produce for this alogorithm:

I'd like to return a dictionary which is a copy of 'another' dictionary
whoes values are bigger than 'x' and has the keys 'keys':

def my_search (another, keys, x):
temp = another.fromkeys(keys)
return dict([[k,v] for k in temp.keys() for v in temp.values()
if v>=x])

Is there any way to improve this code?
Lots of them. Let's start by pointing out two bugs:

You're creating a list that's the "cross product" of keys and values,
then handing that to dict. You're handing it a list with entries for
the same key. That behavior may be defined, but I wouldn't count on
it.

fromkeys sets the values in temp to the same value - None. So
temp.values() is a list of None's, so v is None every time through the
loop.

So you could do what that code does with:

def my_search(another, keys, x):
if None >= x:
return another.fromkeys(keys)
else:
return dict()
You probably want something like:

def my_search(another, keys, x):
return dict([[k,another[k]] for k in keys if another[k] >= x]

If you really want to avoid indexing another twice, you could do:

def my_search(another, keys, x):
return dict([[k, v] for k, v in another.items() if v >= x and k in keys])

But then you're looking through all the keys in another, and searching
through keys multiple times, which probably adds up to a lot more
wasted work than indexing another twice.
I want to avoid converting the dictionary to a list and then to a
dictionay. Are there speed penalties for such a conversion?


No. But it does take time to do the conversion. I think I'd write it
out "longhand":

def my_search(another, keys, x):
new = dict()
for k in keys:
if another[k] >= x:
new[k] = another[k]
return new

This makes it clear that you only index another twice if you actually
use it. The list comprehension will do the loop in C, but it means you
have to scan two lists instead of one. If you're worried about which
is faster, measure it on your target platform.

<mike
--
Mike Meyer <mw*@mired.org> http://www.mired.org/home/mwm/
Independent WWW/Perforce/FreeBSD/Unix consultant, email for more information.
Nov 23 '05 #2

P: n/a

Mike Meyer wrote:
def my_search(another, keys, x):
return dict([[k, v] for k, v in another.items() if v >= x and k in keys])

But then you're looking through all the keys in another, and searching
through keys multiple times, which probably adds up to a lot more
wasted work than indexing another twice.

Would you mind clarify ? Do you mean "k in keys" is a scan rather than
a lookup ? I find it to be pretty clean and straight forward.

I think one way or another, one need to loop through one of them, then
index search the other. It may help a bit to take the len() and loop
through the shorter one.

This seems like a SQL equivalent.

select * from a where a.key=b.key and a.v >= x

Nov 23 '05 #3

P: n/a
On 22 Nov 2005 17:58:28 -0800, "javuchi" <ja*****@gmail.com> wrote:
I've been searching thru the library documentation, and this is the
best code I can produce for this alogorithm:

I'd like to return a dictionary which is a copy of 'another' dictionary
whoes values are bigger than 'x' and has the keys 'keys':

def my_search (another, keys, x):
temp = another.fromkeys(keys)
return dict([[k,v] for k in temp.keys() for v in temp.values()
if v>=x])

Is there any way to improve this code?
I want to avoid converting the dictionary to a list and then to a
dictionay. Are there speed penalties for such a conversion?

Bye.
another = dict(zip('abcd', iter(random.random, 2)))
import random
another = dict(zip('abcd', iter(random.random, 2)))
for k,v in another.items(): print k,v ...
a 0.606494662034
c 0.273998760342
b 0.358066029098
d 0.774406432218

If keys are few compared to the number of keys in another, this may be prefereable:
def my_search(another, keys, x): return dict((k,another[k]) for k in keys if another[k]>x) ... my_search(another, 'cb', .3) {'b': 0.35806602909756235} my_search(another, 'abcd', .4)

{'a': 0.60649466203365532, 'd': 0.77440643221840166}

This sounds like homework though ... ?

Regards,
Bengt Richter
Nov 23 '05 #4

P: n/a

Mike Meyer wrote:
def my_search(another, keys, x):
new = dict()
for k in keys:
if another[k] >= x:
new[k] = another[k]
return new

BTW, this would raise exception if k is not in another.

Nov 23 '05 #5

P: n/a

Bengt Richter wrote:
>>> def my_search(another, keys, x): return dict((k,another[k]) for k in keys if another[k]>x) ... >>> my_search(another, 'cb', .3) {'b': 0.35806602909756235} >>> my_search(another, 'abcd', .4)

{'a': 0.60649466203365532, 'd': 0.77440643221840166}

Do you need to guard the case "k not in another" ?

Nov 23 '05 #6

P: n/a
"bo****@gmail.com" <bo****@gmail.com> writes:
Mike Meyer wrote:
def my_search(another, keys, x):
return dict([[k, v] for k, v in another.items() if v >= x and k in keys])
But then you're looking through all the keys in another, and searching
through keys multiple times, which probably adds up to a lot more
wasted work than indexing another twice. Would you mind clarify ? Do you mean "k in keys" is a scan rather than
a lookup ? I find it to be pretty clean and straight forward.


I assumed keys was a simple sequence of some kind, because you passed
it to fromkeys. I guess it could be set or a dictionary, in which case
"k in keys" would be a lookup. Were you trying to force a lookup by
creating a dict with the keys from k via fromkeys? If so, using a set
would have the same effect, and be a lot clearer:

temp = set(keys)
return dict([[k, v] for k, v in another.items() if v >= x and k in temp])
I think one way or another, one need to loop through one of them, then
index search the other. It may help a bit to take the len() and loop
through the shorter one.


First, remember the warnings about premature optimization. The
following might be worth looking into:

use = set(another) - set(keys)
return dict([[k, another[k]] for k in use if another[k] >= x]

Though I still think I prefer the longhand version:

out = dict()
for key in set(another) - set(keys):
if another[k] >= x:
out[k] = another[k]

The set difference is still going to loop through one and do lookups
in the other, but it'll happen in C instead of Python.

Unless your lists are *very* long, the performance differences will
probably negligible, and are liable to change as you change the
underlying platform. So I'd recommend you choose the form that's mostt
readable to you, and go with that.

<mike
--
Mike Meyer <mw*@mired.org> http://www.mired.org/home/mwm/
Independent WWW/Perforce/FreeBSD/Unix consultant, email for more information.
Nov 23 '05 #7

P: n/a

Mike Meyer wrote:
First, remember the warnings about premature optimization.
Which is why I said the one-liner(your first one) is clean and clear,
and bug free in one go.

use = set(another) - set(keys)
return dict([[k, another[k]] for k in use if another[k] >= x]

Though I still think I prefer the longhand version:

out = dict()
for key in set(another) - set(keys):
if another[k] >= x:
out[k] = another[k]

This is definitely better than the other long hand version as the set
operation remove the potential problem of another[k] raise KeyError.

Nov 23 '05 #8

P: n/a
javuchi wrote:
I want to avoid converting the dictionary to a list and then to a
dictionay. Are there speed penalties for such a conversion?


You mean, is it faster to write, test, debug and
execute slow Python code rather than letting Python's
built-in routines written in fast C do the job?

I have no idea. Perhaps you should try it and see.
Write some code to do it all manually, and time it.

Make sure you use realistic test data: if your users
will be using dictionaries with 10,000 items, there is
no point in testing only dictionaries with 10 items.
For accuracy, run (say) 20 tests, and look at the
average speed. Of better still, use the timeit module.

--
Steven.

Nov 23 '05 #9

P: n/a
On 22 Nov 2005 19:52:40 -0800, "bo****@gmail.com" <bo****@gmail.com> wrote:

Bengt Richter wrote:
>>> def my_search(another, keys, x): return dict((k,another[k]) for k in keys if another[k]>x)

...
>>> my_search(another, 'cb', .3)

{'b': 0.35806602909756235}
>>> my_search(another, 'abcd', .4)

{'a': 0.60649466203365532, 'd': 0.77440643221840166}

Do you need to guard the case "k not in another" ?

Good catch ;-)
What did the OP want as a value if any for that case? None? or no entry at all?
Taking a cue from Mike, I like the set method of getting the common keys, to eliminate the entry (untested)

def my_search(another, keys, x):
return dict((k,another[k]) for k in (set(another)&set(keys)) if another[k]>x)

otherwise, to get Nones, maybe (untested)

def my_search(another, keys, x):
return dict((k,another.get(k)) for k in keys if k not in another or another[k]>x)

Regards,
Bengt Richter
Nov 23 '05 #10

This discussion thread is closed

Replies have been disabled for this discussion.