Ok, the answer is easy: For historical reasons - built-in sets exist
only since Python 2.4.
Anyway, I was thinking about whether it would be possible and desirable
to change the old behavior in future Python versions and let dict.keys()
and dict.values() both return sets instead of lists.
If d is a dict, code like:
for x in d.keys():
...
or even
for x in sorted(d.keys()):
...
would still work and do the same.
However, the older idiom
k = d.keys()
k.sort()
for x in k:
...
would break (since you cannot sort a map directly).
So it seems not possible to change the behavior since it would break old
code. Maybe keys() and values() could be made deprecated functions,
superseded by keyset() and valueset(), but probably this would not be
worth the effort.
Another related question: Actually, dicts can be considered as
subclasses of sets and thus could inherit a lot of set methods and
operators. Only intersection and union must be treated with care,
because dictionaries must be mappings (i.e. map to exactly one value).
In fact, some inheritance from sets has already been implemented:
For instance, by allowing the set operator "in" for dictionaries,
instead of "has_key".
The len(), clear(), copy() and update() functions can also be
interpreted as inherited from set. The dictionary method popitem() is
actually the inherited set method pop(). (d.pop() without arguments
could actually do the same as d.popitem(); I guess the suffix "item" was
chosen to remind you that it returns a key/value pair, not only a value.)
But could other set methods also be useful?
A dictionary could inherit the remove() and discard() method from sets.
d.remove(x) would do the same as del d[x], but d.discard(x) would not
raise a KeyError if x not in d.
Or, for instance, if
a = { 1:11, 2:12 }; b = { 2:22, 3:13 }, c = { 2:32 }
then
c.issubset(b) == c <= b == True
b.issuperset(c) == b >= c == True
a.difference(b) == a - b == { 1:11 }
a.s.symmetric_difference(b) == a ^ b == { 1:11, 3:13 }
a.update(b) == a |= b == a = { 1:11, 2:22, 3:13 }
a.intersection_update(b) == a &= b == a = { 2:22 }
a.difference_update(b) == a -= b == a = { 1:11 }
a.symmetric_difference_update(b) == a ^= b == a = { 1:11, 3:13 }
Of these, a |= b may be particularly interesting as short notation for
a.update(b).
-- Christoph 90 10478
Christoph Zwerschke <ci**@online.de> writes: Ok, the answer is easy: For historical reasons - built-in sets exist only since Python 2.4.
Anyway, I was thinking about whether it would be possible and desirable to change the old behavior in future Python versions and let dict.keys() and dict.values() both return sets instead of lists.
Two (rhetorical, since you dropped the idea) questions:
Are you sure dict.values() should be a set? After all, values aren't
guaranteed to be unique, so dict(a = 1, b = 1).values() currently
returns [1, 1], but would return set([1]) under your proposal.
What about dict.items()?
For instance, by allowing the set operator "in" for dictionaries, instead of "has_key".
"in" already works for dicdtionaries: d = dict(a = 1, b = 1) 'a' in d
True 'f' in d
False
But could other set methods also be useful?
Looks like it.
<mike
--
Mike Meyer <mw*@mired.org> http://www.mired.org/home/mwm/
Independent WWW/Perforce/FreeBSD/Unix consultant, email for more information.
Christoph Zwerschke wrote: Anyway, I was thinking about whether it would be possible and desirable to change the old behavior in future Python versions and let dict.keys() and dict.values() both return sets instead of lists.
You answer the question whether it would be possible (no, because of
backwards compatibility); you don't attempt to answer the question
whether it would be desirable. Why do you think that would be
a good idea?
If you want the set of keys of a dictionary d, then do set(d): d={1:2,3:4} set(d)
set([1, 3])
As Mike Meyer explains, the same is meaningless for .values():
they might not be unique. For .items(), conversion into a set
does would appear to be meaningful, but doesn't actually work:
if the values contain unhashable objects, the set of tuples
cannot be constructed (as set members must be immutable).
Regards,
Martin
You can already get a set from a dictionary's keys in an efficient manner: l = dict.fromkeys(range(10)) set(l)
Set([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
Jeff
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)
iD8DBQFDhR6sJd01MZaTXX0RAvz5AJoD02CCRhNt5xsTpZI7rj iVrYUywACfYNzL
vhkMLH1qXOV1i8R1Je5cSzk=
=9crP
-----END PGP SIGNATURE-----
Christoph Zwerschke wrote: Ok, the answer is easy: For historical reasons - built-in sets exist only since Python 2.4.
Anyway, I was thinking about whether it would be possible and desirable to change the old behavior in future Python versions and let dict.keys() and dict.values() both return sets instead of lists.
Definitely not. I believe it's currently guaranteed that the order of
the items in dict.keys() and dict.values() will match (i.e. the index of
any key in its list will be the same as the index of the corresponding
value in its list). This property is almost certainly used in some
code, so it can't be broken without good reason.
-Peter
Peter Hansen wrote: Christoph Zwerschke wrote: Ok, the answer is easy: For historical reasons - built-in sets exist only since Python 2.4.
Anyway, I was thinking about whether it would be possible and desirable to change the old behavior in future Python versions and let dict.keys() and dict.values() both return sets instead of lists.
Definitely not. I believe it's currently guaranteed that the order of the items in dict.keys() and dict.values() will match (i.e. the index of any key in its list will be the same as the index of the corresponding value in its list). This property is almost certainly used in some code, so it can't be broken without good reason.
Interesting, but why it guaranteed this as functionality wise they are
two different things and since dict is unordered, there is no need for
such guarantee, would it be just implementation consequence ?
This to me is even more uncertain than the zip(it,it) case. bo****@gmail.com wrote: Peter Hansen wrote:I believe it's currently guaranteed that the order of the items in dict.keys() and dict.values() will match (i.e. the index of any key in its list will be the same as the index of the corresponding value in its list).
Interesting, but why it guaranteed this as functionality wise they are two different things and since dict is unordered, there is no need for such guarantee, would it be just implementation consequence ?
From the docs ( http://docs.python.org/lib/typesmapping.html):
"""
Keys and values are listed in an arbitrary order which is non-random,
varies across Python implementations, and depends on the dictionary's
history of insertions and deletions. If items(), keys(), values(),
iteritems(), iterkeys(), and itervalues() are called with no intervening
modifications to the dictionary, the lists will directly correspond.
This allows the creation of (value, key) pairs using zip(): "pairs =
zip(a.values(), a.keys())". The same relationship holds for the
iterkeys() and itervalues() methods: "pairs = zip(a.itervalues(),
a.iterkeys())" provides the same value for pairs. Another way to create
the same list is "pairs = [(v, k) for (k, v) in a.iteritems()]".
"""
Does that explain it adequately?
-Peter
Peter Hansen wrote: bo****@gmail.com wrote: Peter Hansen wrote:I believe it's currently guaranteed that the order of the items in dict.keys() and dict.values() will match (i.e. the index of any key in its list will be the same as the index of the corresponding value in its list).
Interesting, but why it guaranteed this as functionality wise they are two different things and since dict is unordered, there is no need for such guarantee, would it be just implementation consequence ?
From the docs (http://docs.python.org/lib/typesmapping.html):
""" Keys and values are listed in an arbitrary order which is non-random, varies across Python implementations, and depends on the dictionary's history of insertions and deletions. If items(), keys(), values(), iteritems(), iterkeys(), and itervalues() are called with no intervening modifications to the dictionary, the lists will directly correspond. This allows the creation of (value, key) pairs using zip(): "pairs = zip(a.values(), a.keys())". The same relationship holds for the iterkeys() and itervalues() methods: "pairs = zip(a.itervalues(), a.iterkeys())" provides the same value for pairs. Another way to create the same list is "pairs = [(v, k) for (k, v) in a.iteritems()]". """
Does that explain it adequately?
Thanks, but yes and no.
Which is also my initial puzzle, items() and iteritems() already gives
you the tuples, why such gurantee or the others ? Doesn't that violate
the general idiom that if we can do certain thing in one way, there
better be one and only one way.
Are there other usage scenarios that would be benefit from this as the
example given is not convincing for me.
One reason might be Practicality. The zip() versions handily beat the
listcomp versions on a 10kitem dict. (python2.4)
$ timeit.py -s 'd = dict.fromkeys(range(10000))' '[(v, k) for (k, v) in d.iteritems()]'
100 loops, best of 3: 5.05 msec per loop
$ timeit.py -s 'd = dict.fromkeys(range(10000))' '[(v, k) for (k, v) in d.items()]'
100 loops, best of 3: 7.14 msec per loop
$ timeit.py -s 'd = dict.fromkeys(range(10000))' 'zip(d.itervalues(), d.iterkeys())'
100 loops, best of 3: 3.13 msec per loop
$ timeit.py -s 'd = dict.fromkeys(range(10000))' 'zip(d.values(), d.keys())'
100 loops, best of 3: 3.28 msec per loop
For comparison,
$ timeit.py -s 'd = dict.fromkeys(range(10000))' 'd.items()'
100 loops, best of 3: 2.19 msec per loop
Maybe there are also other reasons to promise this property that I'm not
aware of. Certainly it seems like this property is useful and not hard
to provide for "non-perverse" implementations, much like the
now-documented stability of sort().
Jeff
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)
iD8DBQFDhTuAJd01MZaTXX0RAp2YAKCCaFalUL3ogLc/JKcIwuX7+N8QigCdG+28
8HTzKvaPYMZxKPamLPajBzw=
=zqeT
-----END PGP SIGNATURE-----
jep...@unpythonic.net wrote: One reason might be Practicality. The zip() versions handily beat the listcomp versions on a 10kitem dict. (python2.4)
$ timeit.py -s 'd = dict.fromkeys(range(10000))' '[(v, k) for (k, v) in d.iteritems()]' 100 loops, best of 3: 5.05 msec per loop $ timeit.py -s 'd = dict.fromkeys(range(10000))' '[(v, k) for (k, v) in d.items()]' 100 loops, best of 3: 7.14 msec per loop $ timeit.py -s 'd = dict.fromkeys(range(10000))' 'zip(d.itervalues(), d.iterkeys())' 100 loops, best of 3: 3.13 msec per loop $ timeit.py -s 'd = dict.fromkeys(range(10000))' 'zip(d.values(), d.keys())' 100 loops, best of 3: 3.28 msec per loop
For comparison, $ timeit.py -s 'd = dict.fromkeys(range(10000))' 'd.items()' 100 loops, best of 3: 2.19 msec per loop
Maybe there are also other reasons to promise this property that I'm not aware of. Certainly it seems like this property is useful and not hard to provide for "non-perverse" implementations, much like the
Thanks, but we don't need the list comprehension(or do we) ? As the
return of d.iteritems() is already a list of tuples.
now-documented stability of sort().
Jeff bo****@gmail.com wrote: Which is also my initial puzzle, items() and iteritems() already gives you the tuples, why such gurantee or the others ? Doesn't that violate the general idiom that if we can do certain thing in one way, there better be one and only one way.
Are there other usage scenarios that would be benefit from this as the example given is not convincing for me.
As Jeff's reply emphasizes, the "examples" show tuples with *value* and
then *key*, not the other way around which is what .items() and
..itemitems() gives you.
Anyway, you didn't ask for good examples of use, just "why is it
guaranteed", and that's all I was showing you. Whether it was a good
idea might be an interesting discussion, but probably not a particularly
useful one given once again that it's unlikely this particular feature
could be undone now that code exists (we can assume) which is dependent
on it.
-Peter
Peter Hansen wrote: bo****@gmail.com wrote: Which is also my initial puzzle, items() and iteritems() already gives you the tuples, why such gurantee or the others ? Doesn't that violate the general idiom that if we can do certain thing in one way, there better be one and only one way.
Are there other usage scenarios that would be benefit from this as the example given is not convincing for me.
As Jeff's reply emphasizes, the "examples" show tuples with *value* and then *key*, not the other way around which is what .items() and .itemitems() gives you.
Anyway, you didn't ask for good examples of use, just "why is it guaranteed", and that's all I was showing you. Whether it was a good idea might be an interesting discussion, but probably not a particularly useful one given once again that it's unlikely this particular feature could be undone now that code exists (we can assume) which is dependent on it.
Please don't take it as a challenge, it was not. If it was, it was
about the need for the guarantee, not about you not giving me the
answer I want. But it is really wondering why ?
As for the (k,v) vs (v,k), I still don't think it is a good example. I
can always use index to access the tuple elements and most other
functions expect the first element to be the key. For example :
a=d.items()
do something about a
b = dict(a)
But using the imaginary example :
a = zip(d.values(), d.keys())
do something about a
b = dict(a) # probably not what I want.
"bo****@gmail.com" <bo****@gmail.com> writes: Which is also my initial puzzle, items() and iteritems() already gives you the tuples, why such gurantee or the others ? Doesn't that violate the general idiom that if we can do certain thing in one way, there better be one and only one way.
Backwards compatability. The guarantee on the order of keys() and
values() predates items() (and iteritems()). You can't remove the
guarantee without potentially breaking old code, and that's Not Done -
at least not to code that wasn't broken to begin with.
Maybe dropping the guarantee should be considered for P3K, on the off
chance that either keys or values could be made faster at some point
in the future. But I don't see it as a big deal.
<mike
--
Mike Meyer <mw*@mired.org> http://www.mired.org/home/mwm/
Independent WWW/Perforce/FreeBSD/Unix consultant, email for more information.
Mike Meyer wrote: "bo****@gmail.com" <bo****@gmail.com> writes: Which is also my initial puzzle, items() and iteritems() already gives you the tuples, why such gurantee or the others ? Doesn't that violate the general idiom that if we can do certain thing in one way, there better be one and only one way.
Backwards compatability. The guarantee on the order of keys() and values() predates items() (and iteritems()). You can't remove the guarantee without potentially breaking old code, and that's Not Done - at least not to code that wasn't broken to begin with.
Maybe dropping the guarantee should be considered for P3K, on the off chance that either keys or values could be made faster at some point in the future. But I don't see it as a big deal.
Thanks, that explain it. I don't know that there was no
items()/iteritems() in earlier version. bo****@gmail.com wrote: Please don't take it as a challenge, it was not. If it was, it was about the need for the guarantee, not about you not giving me the answer I want. But it is really wondering why ?
If you really want to know merely the *need* for the guarantee, as in
why was this ever put in in the first place, you will be sure of a
correct answer only if it's provided by those who wrote the code in the
first place. Anything else is guessing or mind-reading. I wasn't
trying to do even that, but just to point to where I had seen the
afore-mentioned guarantee. My responsibility ended there, and with that
I am outta here. :-)
-Peter
(And don't worry, I didn't take it as a challenge. If you thought I
took offense somewhere, please re-read my words without that thought in
mind and I think you'll see merely straight-forward factual responses to
your questions, as they were meant to be. Cheers.) bo****@gmail.com wrote: jep...@unpythonic.net wrote: [...]Maybe there are also other reasons to promise this property that I'm not aware of. Certainly it seems like this property is useful and not hard to provide for "non-perverse" implementations, much like the
Thanks, but we don't need the list comprehension(or do we) ? As the return of d.iteritems() is already a list of tuples.
No it isn't: it's a dictionary item iterator. This can be used to
provide a list, but it's self-evidently an iterator, not a list: d = dict.fromkeys(range(5)) d.iteritems
<built-in method iteritems of dict object at 0x4df604> d.iteritems()
<dictionary-itemiterator object at 0x4e02e0> [x for x in d.iteritems()]
[(0, None), (1, None), (2, None), (3, None), (4, None)]
regards
Steve
--
Steve Holden +44 150 684 7255 +1 800 494 3119
Holden Web LLC www.holdenweb.com
PyCon TX 2006 www.python.org/pycon/
Steve Holden wrote: bo****@gmail.com wrote: jep...@unpythonic.net wrote: [...]Maybe there are also other reasons to promise this property that I'm not aware of. Certainly it seems like this property is useful and not hard to provide for "non-perverse" implementations, much like the
Thanks, but we don't need the list comprehension(or do we) ? As the return of d.iteritems() is already a list of tuples. No it isn't: it's a dictionary item iterator. This can be used to provide a list, but it's self-evidently an iterator, not a list:
Oops, typo. I was meant to say d.items(). stand corrected.
Mike Meyer wrote: Backwards compatability. The guarantee on the order of keys() and values() predates items() (and iteritems()).
according to the SVN repository, the type originally had "keys" and
"has_key" only. "values" and "items" were both added in the same
checkin (may 1993).
performance is of course another aspect; if you *need* two parallel
lists, creating a list full of tuples just to pull them apart and throw
them all away isn't exactly the most efficient way to do things.
(if performance didn't matter at all, we could get rid most dictionary
methods; "iterkeys", "in", and locking should be enough, right?)
Maybe dropping the guarantee should be considered for P3K, on the off chance that either keys or values could be made faster at some point in the future. But I don't see it as a big deal.
the guarantee only means that the type must use the same algorithm
to traverse the dictionary data structure for both "keys" and "values".
I'm not aware of any dictionary algorithm that doesn't maintain a link
between keys and values, so that's not really much of a restriction...
</F>
Fredrik Lundh wrote: performance is of course another aspect; if you *need* two parallel lists, creating a list full of tuples just to pull them apart and throw them all away isn't exactly the most efficient way to do things.
(if performance didn't matter at all, we could get rid most dictionary methods; "iterkeys", "in", and locking should be enough, right?)
If I need two parallel list(one key, one value), I can still use the
same [(k,v)] tuple, just access it as x[0], x[1].
But the history alone explains it anyway.
Peter Hansen wrote: Definitely not. I believe it's currently guaranteed that the order of the items in dict.keys() and dict.values() will match (i.e. the index of any key in its list will be the same as the index of the corresponding value in its list). This property is almost certainly used in some code, so it can't be broken without good reason.
As I recall, that guarantee is only if the dict is not
modified between retrieving the keys and retrieving the
values.
--
Steven. bo****@gmail.com wrote: Fredrik Lundh wrote: performance is of course another aspect; if you *need* two parallel lists, creating a list full of tuples just to pull them apart and throw them all away isn't exactly the most efficient way to do things.
(if performance didn't matter at all, we could get rid most dictionary methods; "iterkeys", "in", and locking should be enough, right?)
If I need two parallel list(one key, one value), I can still use the same [(k,v)] tuple, just access it as x[0], x[1].
that's a single list containing tuples, not two parallel lists.
this is two parallel lists:
k = d.keys()
v = d.values()
assert isinstance(k, list)
assert isinstance(v, list)
use(k, v)
</F>
Fredrik Lundh wrote: bo****@gmail.com wrote:
Fredrik Lundh wrote: performance is of course another aspect; if you *need* two parallel lists, creating a list full of tuples just to pull them apart and throw them all away isn't exactly the most efficient way to do things.
(if performance didn't matter at all, we could get rid most dictionary methods; "iterkeys", "in", and locking should be enough, right?)
If I need two parallel list(one key, one value), I can still use the same [(k,v)] tuple, just access it as x[0], x[1].
that's a single list containing tuples, not two parallel lists.
this is two parallel lists:
k = d.keys() v = d.values() assert isinstance(k, list) assert isinstance(v, list) use(k, v)
I know that is a single list of tuples, I mean that can be used as
well.
for k, _ in d.items(): print k
for _, v in d.items(): print v
for k in d.keys(): print k
for v in d.values(): print v
Would there be a noticeable performance difference ? bo****@gmail.com wrote: I know that is a single list of tuples, I mean that can be used as well.
for k, _ in d.items(): print k for _, v in d.items(): print v for k in d.keys(): print k for v in d.values(): print v
Would there be a noticeable performance difference ?
Sloppy use of print statements is a great way to produce misleading
benchmarks [1], since the time it takes to print lots of stuff tends to
overshadow the actual algorithm (for bonus points, print to a terminal
window, and use "time" to measure performance, so you get startup
times as well). If you don't necessarily want to print all the stuff, my
original assertion still holds:
"creating a list full of tuples just to pull them apart and throw
them all away isn't exactly the most efficient way to do things"
Consider a dictionary with one million items. The following operations
k = d.keys()
v = d.values()
creates two list objects, while
i = d.items()
creates just over one million objects. In your "equivalent" example,
you're calling d.items() twice to produce two million objects, none
of which you really care about.
</F>
1) careful use of timeit and generator expressions is another one: http://online.effbot.org/2005_01_01_...e.htm#20050125 (code) http://online.effbot.org/2005_01_01_...e.htm#20050127 (explanation) bo****@gmail.com wrote: As for the (k,v) vs (v,k), I still don't think it is a good example. I can always use index to access the tuple elements and most other functions expect the first element to be the key. For example :
a=d.items() do something about a b = dict(a)
But using the imaginary example :
a = zip(d.values(), d.keys()) do something about a b = dict(a) # probably not what I want.
The typical use case for the (v,k) tuple would be when you want sort the
contents of a dictionary by value. e.g. counting the number of occurrences
of each word in a document.
a = zip(d.values(), d.keys())
a.sort()
a.reverse()
print "Today's top 10:"
print str.join(',', [ k for (v,k) in a[:10]])
Ok, so today you can do that another way, but before there was a key
parameter to sort you had to build the tuple somehow:
print str.join(',', sorted(a.iterkeys(),
key=a.__getitem__, reverse=True)[:10])
and the advantage of the latter is of course that it works even when you've
been counting the number of occurences of complex numbers in a document :)
Fredrik Lundh wrote: bo****@gmail.com wrote:
I know that is a single list of tuples, I mean that can be used as well.
for k, _ in d.items(): print k for _, v in d.items(): print v for k in d.keys(): print k for v in d.values(): print v
Would there be a noticeable performance difference ?
Sloppy use of print statements is a great way to produce misleading benchmarks [1], since the time it takes to print lots of stuff tends to overshadow the actual algorithm (for bonus points, print to a terminal window, and use "time" to measure performance, so you get startup times as well). If you don't necessarily want to print all the stuff, my original assertion still holds:
"creating a list full of tuples just to pull them apart and throw them all away isn't exactly the most efficient way to do things"
Consider a dictionary with one million items. The following operations
k = d.keys() v = d.values()
creates two list objects, while
i = d.items()
creates just over one million objects. In your "equivalent" example,
Sorry. I lose you here. Could you explain in detail what do you mean by
"two list objects" vs one million objects ?
Fredrik Lundh wrote: bo****@gmail.com wrote:
I know that is a single list of tuples, I mean that can be used as well.
for k, _ in d.items(): print k for _, v in d.items(): print v for k in d.keys(): print k for v in d.values(): print v
Would there be a noticeable performance difference ?
Sloppy use of print statements is a great way to produce misleading benchmarks [1], since the time it takes to print lots of stuff tends to overshadow the actual algorithm (for bonus points, print to a terminal window, and use "time" to measure performance, so you get startup times as well). If you don't necessarily want to print all the stuff, my original assertion still holds:
BTW, it is not intended as a speed comparion of but more about the
question of are the for "for"s have speed difference. The print is just
that to complete the statement, I believe it can also be "pass" for
performance purpose.
Fredrik Lundh wrote: Consider a dictionary with one million items. The following operations
k = d.keys() v = d.values()
creates two list objects, while
i = d.items()
creates just over one million objects. In your "equivalent" example, you're calling d.items() twice to produce two million objects, none of which you really care about.
This is what I get from the doc :
a.items() a copy of a's list of (key, value) pairs (3)
a.keys() a copy of a's list of keys (3)
a.values() a copy of a's list of values
I can't derive what you mean by "two list objects" vs "million
objects". bo****@gmail.com wrote: Consider a dictionary with one million items. The following operations
k = d.keys() v = d.values()
creates two list objects, while
i = d.items()
creates just over one million objects. In your "equivalent" example,
Sorry. I lose you here. Could you explain in detail what do you mean by "two list objects" vs one million objects ?
It should be fairly clear. 'd.keys()' creates one new object (a list).
'd.items()' creates a list of tuples, so that is one new list and one
million new tuples. bo****@gmail.com wrote: creates just over one million objects. In your "equivalent" example, you're calling d.items() twice to produce two million objects, none of which you really care about.
This is what I get from the doc : a.items() a copy of a's list of (key, value) pairs (3) a.keys() a copy of a's list of keys (3) a.values() a copy of a's list of values
I can't derive what you mean by "two list objects" vs "million objects".
The "copy of a's list of" is a list object.
The "pairs" are tuples, which are objects too. For a dictionary with
one million items, the "items" method has to create one million "pair"
tuples before it can return them to you... (the difference between
"items" and "iteritems" is that the latter doesn't create all of them
up front; it still has to create them, though).
In Python, everything that you can put in a variable or otherwise save
for a later time is an object. All objects must be created. Creating a
new object is often very cheap, but the cost is never zero.
</F>
Fredrik Lundh wrote: bo****@gmail.com wrote:
creates just over one million objects. In your "equivalent" example, you're calling d.items() twice to produce two million objects, none of which you really care about.
This is what I get from the doc : a.items() a copy of a's list of (key, value) pairs (3) a.keys() a copy of a's list of keys (3) a.values() a copy of a's list of values
I can't derive what you mean by "two list objects" vs "million objects".
The "copy of a's list of" is a list object.
The "pairs" are tuples, which are objects too. For a dictionary with one million items, the "items" method has to create one million "pair" tuples before it can return them to you... (the difference between "items" and "iteritems" is that the latter doesn't create all of them up front; it still has to create them, though).
In Python, everything that you can put in a variable or otherwise save for a later time is an object. All objects must be created. Creating a new object is often very cheap, but the cost is never zero.
I have redo the timeit test :
D:\Python24\Lib>..\python timeit.py -s "d=dict.fromkeys(range(100000))"
"zip(d.k
eys(),d.values())"
10 loops, best of 3: 158 msec per loop
D:\Python24\Lib>..\python timeit.py -s "d=dict.fromkeys(range(100000))"
"d.items
()"
10 loops, best of 3: 129 msec per loop
D:\Python24\Lib>..\python timeit.py -s "d=dict.fromkeys(range(100000))"
"d.keys(
)"
10 loops, best of 3: 33.2 msec per loop
D:\Python24\Lib>..\python timeit.py -s "d=dict.fromkeys(range(100000))"
"d.value
s()"
100 loops, best of 3: 19.6 msec per loop
These results make more sense. However, I am still puzzled :
1. why would d.keys()/d.values() only return one list element ? Why
isn't it a list of 1M element of either the keys or values but items()
is ?
2. Back to the original question of the guranteed sequence of
keys/values
If I don't need them in sync, the gurantee is a moot point and in that
case, I would use keys/values when appropriate(or the iter version).
If I need them in sync(meaning I would like to zip() them later), the
zip() time is longer than just taking items() and I don't gain any
performance advantage.
So unless I use keys()/values() only as k[i]/v[i], that is keeping my
own index, I would see get the speed(and probably memory) advantage.
This however seems to be more difficult to maintain than just use the
item tuples.
And would it be the best to just use iteritems() in 90% of the case
unless of course I need to update the dict or that I need a snapshot ?
As iteritems() don't create the list and only retrieve them when
needed. But then, there may be performance penalty comparing with
create the list upfront if I need them all. And of course, I cannot
[:]. bo****@gmail.com wrote: These results make more sense. However, I am still puzzled :
1. why would d.keys()/d.values() only return one list element ? Why isn't it a list of 1M element of either the keys or values but items() is ?
"keys" returns a single list object which contains references to existing
key objects. "items" has to create one million new pair objects (which
in turn hold references to existing key and value objects).
creating a new object is a lot more expensive than adding another
reference to an existing object
(in a fully GC:ed implementation, the cost is zero. if you have a pointer
to an object, you have a reference to it. in CPython, you need to adjust
the reference count, so the cost is a couple of machine instructions)
</F>
Fredrik Lundh wrote: bo****@gmail.com wrote:
These results make more sense. However, I am still puzzled :
1. why would d.keys()/d.values() only return one list element ? Why isn't it a list of 1M element of either the keys or values but items() is ?
"keys" returns a single list object which contains references to existing key objects. "items" has to create one million new pair objects (which in turn hold references to existing key and value objects).
Ah, that is clearer now. So keys()/items() create a list of 1M
reference(object identity?) to existing objects, items() also contains
1 M reference, but to a newly created 1M tuples.
How about the other puzzle of whether seperate keys()/values() pair
have real performance gain in real use ?
Mike Meyer wrote: Are you sure dict.values() should be a set? After all, values aren't guaranteed to be unique, so dict(a = 1, b = 1).values() currently returns [1, 1], but would return set([1]) under your proposal.
Good point. Contrary to the keys, the values are not unique. Still, it
would make sense to only return the set (the value set of the mapping)
because this is usually what you are interested in and you'd think the
information about which value belongs to which key is lost anyway.
However (see other postings in this thread) this last statement is
wrong: If you call keys() and values() consecutively, then they are
guaranteed to correspond to each other. If values() would return a set,
this would be completely broken.
What about dict.items()?
Actually, if keys() would be a set, this should return a set, too (the
set of key/value tuples). For instance, by allowing the set operator "in" for dictionaries, instead of "has_key". "in" already works for dicdtionaries:
I know. I wanted to use it as an example where set operations have
already been made available for dictionaries.
I know, 'in' has existed for lists and tuples long before sets, but
actually it is a native set operation.
From a mathematical point of view, it would have been nice if Python
had defined "set" as a basic data type in the beginning, with lists,
tuples, dictionaries as derived data types.
By the way, i wonder why the common mathematical notation { 1,2,3 } was
not allowed for set((1,2,3)). It would not clash with the dictionary
notation which requires additional colons.
-- Christoph
Martin v. Löwis schrieb: You answer the question whether it would be possible (no, because of backwards compatibility); you don't attempt to answer the question whether it would be desirable. Why do you think that would be a good idea?
It was meant simply as an open question to be discussed here, I did not
say that I think it would be a good idea. Sometimes it's good to
question why certain things are set up the way they are.
I considered the question not completely absurd, because if a language
supports sets and dictionaries, it would be somewhat consequential that
the keys of a dictionary are a set. (We also discussed "ordered
dictionaries" here, where it is obvious that the keys are a list.) At
least from a mathematical/aesthetical view.
If you consider compatibility and usability aspects, making them sets
would probably not be a good idea.
If you want the set of keys of a dictionary d, then do set(d): >>> d={1:2,3:4} >>> set(d) set([1, 3])
I know. But this does not answer the question: If the keys *are* already
a set according to their semantics, why aren't they returned as a set
from the beginning?
Better answers:
- Compatibility issues, particularly the keys()/values() match hattrick
- Because lists are still more native/pythonic objects than sets
As Mike Meyer explains, the same is meaningless for .values(): they might not be unique. For .items(), conversion into a set does would appear to be meaningful, but doesn't actually work: if the values contain unhashable objects, the set of tuples cannot be constructed (as set members must be immutable).
I would not say that a values() set would be meaningless; it is an
interesting object in itself. It would however lose the information
about the "frequency" of the values.
But your second objection is a valid one. So I can add to the list of
reasons why sets are not used for values() and items():
- Because sets can only contain immutable values
This, of course, in turn raises the question ;-) Would it be desirable
to have an additional, more general set datatype that can contain
mutable objects? I know about the perfomance drawbacks. And I think I
know the answer: It would not make much sense, since the implementation
would be actually not different from a list. So you could take a list
anyway. Which gives the answer why values() and items() return a list.
Probably the only use of such a general set type would be that it could
be considered as an abstract supertype for list and value. And in cases
where the order does not matter, this could be made more clear by using
sets. (Similar as the reason why you use False and True when you could
use 0 and 1 instead.)
-- Christoph
> Martin v. Löwis wrote: If you want the set of keys of a dictionary d, then do set(d): >>> d={1:2,3:4} >>> set(d) set([1, 3])
I know. But this does not answer the question: If the keys *are* already a set according to their semantics, why aren't they returned as a set from the beginning?
Sorry. Your answer was good; I missed the point and thought you wrote
set(d.keys()). Is it documented anywhere that set(d) = set(d.keys())? I
think this should go into the Python Doco where keys() are explained.
I would have expected that set(d) returns set(d.items()), but as has
been discussed, this would cause problems with mutable values.
-- Christoph je****@unpythonic.net schrieb: You can already get a set from a dictionary's keys in an efficient manner:l = dict.fromkeys(range(10)) set(l)
Set([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
Good point. I expected that set(l) = set(l.items()) and not
set(l.keys()), but the latter would not work with mutable values. See
discussion with Martin.
-- Christoph
Christoph Zwerschke <ci**@online.de> writes: - Because sets can only contain immutable values
Not true. Sets can only contain *hashable* objects, which isn't the
same thing.
This, of course, in turn raises the question ;-) Would it be desirable to have an additional, more general set datatype that can contain mutable objects?
You can do that now: from UserList import UserList class HashableList(UserList):
.... def __hash__(self): return id(self)
.... s = set([HashableList(), HashableList()]) s
set([[], []]) for x in s:
.... x.append(3)
.... s
set([[3], [3]])
This also illustrates the danger of this approach, in that I've got
two lists that will compare equal in the same set:
x = list(s) x[0] == x[1]
True
Of course, in this case the two objets aren't the same, which is what
is required for the mathematical definition of list:
x[0] is x[1]
False
<mike
--
Mike Meyer <mw*@mired.org> http://www.mired.org/home/mwm/
Independent WWW/Perforce/FreeBSD/Unix consultant, email for more information.
>>- Because sets can only contain immutable values
Mike Meyer wrote: Not true. Sets can only contain *hashable* objects, which isn't the same thing.
I trusted the doco which defines a set as "an unordered collection of
immutable values" ( http://docs.python.org/lib/types-set.html).
Anyway, the problems stays the same.
-- Christoph
Christoph Zwerschke <ci**@online.de> wrote: Mike Meyer wrote: Are you sure dict.values() should be a set? After all, values aren't guaranteed to be unique, so dict(a = 1, b = 1).values() currently returns [1, 1], but would return set([1]) under your proposal. Good point. Contrary to the keys, the values are not unique. Still, it would make sense to only return the set (the value set of the mapping) because this is usually what you are interested in and you'd think the
Absolutely not in my use cases. The typical reason I'm asking for
..values() is because each value is a count (e.g. of how many time the
key has occurred) and I want the total, so the typical idiom is
sum(d.values()) -- getting a result set would be an utter disaster, and
should I ever want it, set(d.values()) is absolutely trivial to code.
Note that in both cases I don't need a LIST, just an ITERATOR, so
itervalues would do just as well (and probably faster) except it looks
more cluttered and by a tiny margin less readable -- "the sum of the
values" rolls off the tongue, "the sum of the itervalues" doesn't!-)
So, the direction of change for Python 3000, when backwards
compatibility can be broken, is to return iterators rather than lists.
At that time, should you ever want a list, you'll say so explicitly, as
you do now when you want a set, i.e. list(d.values()) For instance, by allowing the set operator "in" for dictionaries, instead of "has_key". "in" already works for dicdtionaries:
I know. I wanted to use it as an example where set operations have already been made available for dictionaries.
I know, 'in' has existed for lists and tuples long before sets, but actually it is a native set operation.
Historically, the 'in' operator was added to dictionaries (and the
special method __contains__ introduced, to let you support containment
checking in your own types) well before sets were added to Python (first
as a standard library module, only later as a built-in).
In Python today 'in' doesn't necessarily mean set membership, but some
fuzzier notion of "presence in container"; e..g., you can code
'zap' in 'bazapper'
and get the result True (while 'paz' in 'bazapper' would be False, even
though, if you thought of the strings as SETS rather than SEQUENCES of
characters, that would be absurd). So far, strings (plain and unicode)
are the only containers that read 'in' this way (presence of subsequence
rather than of single item), but one example should be enough to show
that "set membership" isn't exactly what the 'in' operator means.
From a mathematical point of view, it would have been nice if Python had defined "set" as a basic data type in the beginning, with lists, tuples, dictionaries as derived data types.
You appear to have a strange notion of "derived data type". In what
sense, for example, would a list BE-A set? It breaks all kind of class
invariants, e.g. "number of items is identical to number of distinct
items", commutativity of addition, etc..
Alex
Christoph Zwerschke <ci**@online.de> writes: - Because sets can only contain immutable values Mike Meyer wrote:
Not true. Sets can only contain *hashable* objects, which isn't the same thing.
I trusted the doco which defines a set as "an unordered collection of immutable values" (http://docs.python.org/lib/types-set.html).
If the hash changes, you've screwed up the set. What it really should
say is "collection of objects with fixed hashes", or words to that
effect. Do you want to file a PR on this?
Anyway, the problems stays the same.
How so? As I demonstrated, you can subclass any class that doesn't
have a hash to add one, and then use the subclass, which - except for
having a hash - will have exactly the same behavior as your original
class.
<mike
--
Mike Meyer <mw*@mired.org> http://www.mired.org/home/mwm/
Independent WWW/Perforce/FreeBSD/Unix consultant, email for more information.
Alex Martelli wrote: Absolutely not in my use cases. The typical reason I'm asking for .values() is because each value is a count (e.g. of how many time the key has occurred) and I want the total, so the typical idiom is sum(d.values()) -- getting a result set would be an utter disaster, and should I ever want it, set(d.values()) is absolutely trivial to code.
Agree. This reason alone is enough for values() to be a list, besides
the problem that it can contain unhashable values.
Note that in both cases I don't need a LIST, just an ITERATOR, so itervalues would do just as well (and probably faster) except it looks more cluttered and by a tiny margin less readable -- "the sum of the values" rolls off the tongue, "the sum of the itervalues" doesn't!-) So, the direction of change for Python 3000, when backwards compatibility can be broken, is to return iterators rather than lists. At that time, should you ever want a list, you'll say so explicitly, as you do now when you want a set, i.e. list(d.values())
Sounds reasonable.
In Python today 'in' doesn't necessarily mean set membership, but some fuzzier notion of "presence in container"; e..g., you can code
'zap' in 'bazapper'
and get the result True (while 'paz' in 'bazapper' would be False, even though, if you thought of the strings as SETS rather than SEQUENCES of characters, that would be absurd). So far, strings (plain and unicode) are the only containers that read 'in' this way (presence of subsequence rather than of single item), but one example should be enough to show that "set membership" isn't exactly what the 'in' operator means.
In this case, I would interpret the set on which 'in' operates as the
set of all substrings of the given string to have peace for my soul ;-)
You appear to have a strange notion of "derived data type". In what sense, for example, would a list BE-A set? It breaks all kind of class invariants, e.g. "number of items is identical to number of distinct items", commutativity of addition, etc..
Sorry. Your're right. In the computer world, sets are derived from lists
(collections). In mathematics, lists are derived from sets. Here, you
would define the list [1, 1] as the set {1, {1, 1}} (you see, the
cardinality is the same). Yes, mathematics is strange ;-) Actually, in
mathematics, everything is a set and set theory is the "theory of
everything". When I grew up pedagogues here in Germany even believed it
would be best if kids learn set theory and draw venn diagrams before
they learn arithmetics... We were tortured with that kind of things in
the first class. Probably I'm still suffering from the late damages of
that treatment.
-- Christoph
Christoph Zwerschke <ci**@online.de> wrote: mathematics, everything is a set and set theory is the "theory of everything". When I grew up pedagogues here in Germany even believed it would be best if kids learn set theory and draw venn diagrams before
An alternative theory, of course, is "God made the natural numbers; all
else is the work of man" -- and that one is by a German, too (Kronecker,
if I recall correctly). The hope to found all of mathematics on set
theory was primarily a _British_ effort, as I see it (Russell and
Whitehead), and failed a long time ago... I'm not sure what, if
anything, a mathematician of today would propose as the foundational
theory... perhaps modal logic, but I'm really just guessing, being,
myself, an engineer rather than a mathematician (perhaps category
theory? it's hard to say...).
What, if anything, is the theoretically proper foundation, is of course
a separate issue from where is it best to start _teaching_ maths... with
geometry as the Greeks would have it, with arithmetic as in traditional
schooling, or with sets as the modern pedagogues would have it. Me, I'm
partial to arithmetic, but sets are just fine with me if they turn out
to work better (I wonder if proper controlled studies have ever been
done before revolutionizing the teaching of elementary maths,
though...!). Again you see my engineer's bias: "whatever works"!-)
But OO really requires a different mindset, particularly when operating
under a regime of "mutable" objects. "A circle IS-AN ellipse" in
Euclidean geometry... but inheriting Circle from Ellipse doesn't work in
OO if the objects are changeable, since you can, e.g., change
eccentricity in an Ellipse but not in a Circle...
Alex
Mike Meyer schrieb: If the hash changes, you've screwed up the set. What it really should say is "collection of objects with fixed hashes", or words to that effect. Do you want to file a PR on this?
I fear I have not enough understanding of Python's hashing concepts to
make a suggestion for improvement here.
How so? As I demonstrated, you can subclass any class that doesn't have a hash to add one, and then use the subclass, which - except for having a hash - will have exactly the same behavior as your original class.
But you surely wouldn't want to do this to the list of items just to be
able to return it as a set.
-- Christoph
Alex Martelli wrote: An alternative theory, of course, is "God made the natural numbers; all else is the work of man" -- and that one is by a German, too (Kronecker, if I recall correctly).
Yes, it was Kronecker. But even natural numbers are usually constructed
with sets using Peano's axioms.
The hope to found all of mathematics on set theory was primarily a _British_ effort, as I see it (Russell and Whitehead), and failed a long time ago... I'm not sure what, if anything, a mathematician of today would propose as the foundational theory...
Perhaps "string theory" ;-) So probably strings should become the
fundamental datatype in Python (immutable strings of course) :-)
But OO really requires a different mindset, particularly when operating under a regime of "mutable" objects. "A circle IS-AN ellipse" in Euclidean geometry... but inheriting Circle from Ellipse doesn't work in OO if the objects are changeable, since you can, e.g., change eccentricity in an Ellipse but not in a Circle...
Good example.
-- Christoph
Christoph Zwerschke <ci**@online.de> writes: Mike Meyer schrieb: If the hash changes, you've screwed up the set. What it really should say is "collection of objects with fixed hashes", or words to that effect. Do you want to file a PR on this? I fear I have not enough understanding of Python's hashing concepts to make a suggestion for improvement here.
Well, I just made a suggestion. You found the problem - you want to do
the PR, or should I? How so? As I demonstrated, you can subclass any class that doesn't have a hash to add one, and then use the subclass, which - except for having a hash - will have exactly the same behavior as your original class. But you surely wouldn't want to do this to the list of items just to be able to return it as a set.
Well, I don't have a use case for this, but it's not hard. Write a
class whose __new__ either returns the original object (if it's
hashable), or creates an instance of the class, which will proxy for
that object. Something like:
# Untested code
class Hash(object):
def __new__(cls, obj):
if hasattr(obj, '__hash__'):
return obj
self.__obj = obj
return object.__new__()
def __getattr__(self, name):
return getattr(self.__obj, name)
def __setattr(self, name, value):
setattr(self.__obj, name, value)
def __hash__(self):
return id(self.__obj)
and then do:
set([Hash(i) for i in lst])
If you need to turn a list of arbitrary, possibly unhashable, objects
into a set, is there a problem with the above?
<mike
--
Mike Meyer <mw*@mired.org> http://www.mired.org/home/mwm/
Independent WWW/Perforce/FreeBSD/Unix consultant, email for more information.
Christoph Zwerschke <ci**@online.de> wrote: Alex Martelli wrote:
An alternative theory, of course, is "God made the natural numbers; all else is the work of man" -- and that one is by a German, too (Kronecker, if I recall correctly).
Yes, it was Kronecker. But even natural numbers are usually constructed with sets using Peano's axioms.
Peano's axioms are perfectly abstract, as far as I recall. Russell and
Whitehead did try to construct naturals from sets (defining, e.g., '5'
as "the set of all sets with five items"), but that was before the
inherent contradictions of set theory were widely known (though Russell
himself had destroyed Frege's attempts at theorization by pointing out
one such contradiction, the one wrt the "set of all sets that don't
include themselves as a member" if I recall correctly). Later, Goedel
showed that any purely formal theory that's powerful enough to model
natural arithmetic cannot be both complete and consistent... > The hope to found all of mathematics on set theory was primarily a > _British_ effort, as I see it (Russell and Whitehead), and failed a > long time ago... I'm not sure what, if anything, a mathematician of > today would propose as the foundational theory...
Perhaps "string theory" ;-) So probably strings should become the
Some physicists, maybe, surely not mathematicians though!
Alex
Christoph Zwerschke wrote: I know. But this does not answer the question: If the keys *are* already a set according to their semantics, why aren't they returned as a set from the beginning?
Notice that this was not the question you originally asked. This
one has a clear answer: because that was always the case, and because
it was never changed.
Your original question was "could it be changed, and should it be
changed?" As the answer to the first part is already "no", the
second part really doesn't raise.
This, of course, in turn raises the question ;-) Would it be desirable to have an additional, more general set datatype that can contain mutable objects? I know about the perfomance drawbacks. And I think I know the answer: It would not make much sense, since the implementation would be actually not different from a list. So you could take a list anyway. Which gives the answer why values() and items() return a list.
Actually, this answer is only partial. The real question is a semantical
one:
In a set, all elements are different. In Python, this means that, for
any two elements X and Y of a set, X!=Y. Now, consider this set:
a = [1]
b = [1,2]
S = set(a,b)
a.append(2)
Now, a==b, so the inherent set property breaks. In theory, this should
cause S to contain only a single element; the other one should
disappear. This is not only hard to implement (removal from a set
would have to remove all duplicates, iterating over a set would have
to skip over duplicates) - there is also another semantical issue:
If one element is skipped/dropped, which of these (a or b)?
Regards,
Martin
Christoph Zwerschke wrote: Sorry. Your answer was good; I missed the point and thought you wrote set(d.keys()). Is it documented anywhere that set(d) = set(d.keys())? I think this should go into the Python Doco where keys() are explained.
It follows from what is documented. set(<iterable object>) creates a
set that contains all elements in the iterable object: http://docs.python.org/lib/built-in-funcs.html#l2h-63
Now, is a dictionary an iterable object? Yes, it is: http://www.python.org/peps/pep-0234.html
Together, this gives the property I demonstrated.
Unfortunately, the PEP apparently hasn't made it into the
implementation.
Regards,
Martin al***@mail.comcast.net (Alex Martelli) writes: Peano's axioms are perfectly abstract, as far as I recall. Russell and Whitehead did try to construct naturals from sets (defining, e.g., '5' as "the set of all sets with five items"), but that was before the inherent contradictions of set theory were widely known (though Russell himself had destroyed Frege's attempts at theorization by pointing out one such contradiction, the one wrt the "set of all sets that don't include themselves as a member" if I recall correctly).
That's only what's called "naive set theory". Axiomatic set theory
(the kind they use now) doesn't have these issues. http://en.wikipedia.org/wiki/Naive_set_theory http://en.wikipedia.org/wiki/Axiomatic_set_theory
Later, Goedel showed that any purely formal theory that's powerful enough to model natural arithmetic cannot be both complete and consistent...
Yes, it's not considered terribly troublesome. Set theory (and
arithmetic) are generally accepted as being consistent but incomplete.
So there will always be something for mathemeticians to do ;-).
Mike Meyer wrote: I trusted the doco which defines a set as "an unordered collection of immutable values" (http://docs.python.org/lib/types-set.html).
If the hash changes, you've screwed up the set. What it really should say is "collection of objects with fixed hashes", or words to that effect. Do you want to file a PR on this?
It's more than that: you really mean "hashable values", where "hashable"
does not just imply that the hash value is fixed, but also that the
equal objects hash same.
Regards,
Martin This discussion thread is closed Replies have been disabled for this discussion. Similar topics
14 posts
views
Thread by Leif K-Brooks |
last post: by
|
5 posts
views
Thread by Rakesh |
last post: by
|
8 posts
views
Thread by rh0dium |
last post: by
|
11 posts
views
Thread by Girish Sahani |
last post: by
|
14 posts
views
Thread by vatamane |
last post: by
|
6 posts
views
Thread by buzzweetman |
last post: by
|
11 posts
views
Thread by John |
last post: by
|
4 posts
views
Thread by =?utf-8?B?TWFjaWVqIEJsaXppxYRza2k=?= |
last post: by
|
5 posts
views
Thread by robinsiebler |
last post: by
| | | | | | | | | | |