Why are there no ordered dictionaries? - Page 17

Christoph Zwerschke

This is probably a FAQ, but I dare to ask it nevertheless since I
haven't found a satisfying answer yet: Why isn't there an "ordered
dictionary" class at least in the standard list? Time and again I am
missing that feature. Maybe there is something wrong with my programming
style, but I rather think it is generally useful.

I fully agree with the following posting where somebody complains why so
very basic and useful things are not part of the standard library:
http://groups.google.com/group/comp....52d2f771a49857

Are there plans to get it into the standard lib sometime?

-- Christoph

Nov 22 '05

Subscribe Reply

210

10560

Christoph Zwerschke

Fuzzyman wrote:

- the internal keys list should be hidden
I disagree. It is exposed so that you can manually change the order
(e.g. to create a "sorted" dict, rather than one ordered by key
insertion). What do you *gain* by hiding it ?
See my other posting. Of course hiding the list can only be done, if
- list methods be mixed in instead

In this case, you can change the order directly by using the list
methods on the dictionary. Sorting would be an example. Instead of

d.sequence = sorted(d.sequen ce)

you could simply write d.sort() which does the same.
Hmm... I did consider this. Which list methods would you consider
appropriate ?

Everything method that does not clash with the use as dictionary. For
instance, both lists and dicts have __getitem__ and __setitem__, so im
this case, the dictionary method must take precedence. But a dictionary
has not __getslice__ and __setslice__, so here the list methods can be
used (__getslice__ is actually deprecated, but you get the idea). In
some cases, like __delitem__, both have it, but there is no clash.

Other interesting methods are sort() and reverse().

Here, we have another problem however: There is not only the list of
keys, but also the list of values, and sometimes, as in the case of
sort() and reverse(), it would be also nice to have it operate on the
list of values. How should this be done? PHP solves it by using two
different methods ksort and asort for keys and values. In this notation:

d = OrderedDict( (1,13), (3,12), (2,11) )

d.ksort() ==> d = ( (1,13), (2,11), (3,12) )
d.asort() ==> d = ( (2,11), (3,12), (1,13) )

Similar for reverse().

If the keys() and values() methods would be extended to be setters, then
d.ksort() = d.keys(sorted(d .keys())) and
d.asort() = d.values(sorted (d.values()))

Anyway, I don't like "ksort" and "asort". If it must be, I'd rather use

d.ksort() = d.sortkeys()
d.asort() = d.sortvalues()

d.sort() could default to one of them (not sure which one).

-- Christoph

Nov 23 '05 #161

Christoph Zwerschke

Fuzzyman wrote:

That's not the only use case. Other use cases are to have a specific
order, not based on entry time.

Simple example :
d1 = OrderedDict(som e_dict.items())
d1.sequence.sor t()
d1 is now an ordered dict with the keys in alphabetic order.
As I said, I would not need to access the sequence, if I can write
d1.sort() or d1.sortkeys()
If you don't want to modify sequence, don't. If you want a copy do :
seq = d1.sequence[:]

This is not needed since you can do the same with: seq = d1.keys()

-- Christoph

Nov 23 '05 #162

Christoph Zwerschke

Carsten Haese schrieb:

Thus quoth the Zen of Python:
"Explicit is better than implicit."
"In the face of ambiguity, refuse the temptation to guess."

With those in mind, since an odict behaves mostly like a dictionary, []
should always refer to keys. An odict implementation that wants to allow
access by numeric index should provide explicitly named methods for that
purpose.

Exactly. But I don't think in this case such methods would be needed.
You easily get the i-th value in the ordered dict as d.values()[i].

-- Chris

Nov 23 '05 #163

Christoph Zwerschke

Here is another old posting I just found which again gives the same use
cases and semantics:

http://mail.python.org/pipermail/pyt...ch/051974.html

"Keys are iterated over in the order that they are added. Setting a
value using a key that compares equal to one already in the dict
replaces the value, but not the key, and does not change the iteration
order. Removing a key (and value) then re-adding it moves the key to the
end of the iteration order."

So far all those who would like to have an ordered dict seem to have the
same semantics in mind.

Nov 23 '05 #164

Carsten Haese

On Wed, 23 Nov 2005 23:39:22 +0100, Christoph Zwerschke wrote

Carsten Haese schrieb:
Thus quoth the Zen of Python:
"Explicit is better than implicit."
"In the face of ambiguity, refuse the temptation to guess."

With those in mind, since an odict behaves mostly like a dictionary, []
should always refer to keys. An odict implementation that wants to allow
access by numeric index should provide explicitly named methods for that
purpose.

Exactly. But I don't think in this case such methods would be
needed. You easily get the i-th value in the ordered dict as
d.values()[i].

-- Chris

True enough, but unless the odict has its list of values on hand, you're
asking the odict to build a list of all its values just so that you can get
the i'th element. Having a method that does the equivalent of d[d.sequence[i]]
would be cleaner and more efficient.

-Carsten

Nov 24 '05 #165

Alex Martelli

Christoph Zwerschke <ci**@online.de > wrote:
...

d.ksort() = d.sortkeys()
d.asort() = d.sortvalues()

d.sort() could default to one of them (not sure which one).

Define JUST d.sort, you can trivially implement the other as
d.sort(key=d.ge t).
Alex

Nov 24 '05 #166

Fuzzyman

Alex Martelli wrote:

Fuzzyman <fu******@gmail .com> wrote:
...
- the internal keys list should be hidden
I disagree. It is exposed so that you can manually change the order
(e.g. to create a "sorted" dict, rather than one ordered by key
insertion).

What do you *gain* by hiding it ?

Freedom of implementation, of course. E.g., I can perhaps get better
performance by keeping a red-black tree of (insertiontime, key) pairs, so
for example deleting a key is O(log(N)) rather than O(N) as it would
have to be if the sequence had to be kept as a Python list.

What ELSE do you ever gain by enforcing abstraction and information
hiding? Conceptual integrity and implementation freedom...

The poster I was replying to inferred that we should hide it to protect
him from breaking it. :-)

Your reason is much more valid than the one I inferred. (And probably
worth chanign the code for).

- list methods should be mixed in instead
Hmm... I did consider this. Which list methods would you consider
appropriate ?

The difficulty is that integers are valid keys for a dictionary.
Perhaps a subclass that uses integers as indexes instead...

What about allowing slicing, taking advantage of the fact that slices
are NOT valid dictionary keys?

Hmm... ok, simple enough to implement. What would anyone *use* it for
though ?
Presumably sort and reverse methods are also desired.

Yeah - good idea. Insert is also good - I can't remember if we provide
this or not.

Thanks
Fuzzyman
http://www.voidspac.org.uk/python/index.shtml

You can always perform list operations on the ``sequence`` attribute of
course.

But NOT having to expose .sequence as a real mutable list whose changes
affect ordering has many advantages, as above noticed.
Alex

Nov 24 '05 #167

Fuzzyman

Ok.

That answers a question in the post I've just made. This thread is hard
to follow.

Thanks

Fuzzyman
http://www.voidspace.org.uk/python/index.shtml

Nov 24 '05 #168

Fuzzyman

Christoph Zwerschke wrote:

Fuzzyman wrote:
So what do you want returned when you ask for d1[1] ? The member keyed
by 1, or the item in position 1 ?
In case of conflict, the ordered dictionary should behave like a
dictionary, not like a list. So d1[1] should be the member keyed by 1,
not the item in position 1. Only in case there is no member keyed by 1,
the item in position 1 could be returned, but I think that would be too
adventurous a hattrick and can lead to big confusion. Better to raise a
KeyError in that case.

I agree - hard to trace bugs lie this way....

[snip..]
Instead of writing d.sequence = new_sequence, I would write
d.setkeys(new_s equence). But I'm not sure what to do if new_sequence is
not a permutation of the old one. Raise a KeyError? Or even swallow
this? For instance

This is an interesting question - I don't know which is the best
behaviour here.

It's probably better to raise a KeyError - that way the error will
occur when it happens.

The simplest way of doing this is to check that the new key list
(sorted) is the same as the original one (sorted). This is potentially
quite an expensive operation.

It might be faster to compare sets with Python 2.4 - but we intend to
retain compatibility with Python 2.2.
d = OrderedDict((1, 11), (2,12))

d.setkeys((2, 1)) ==> d = OrderedDict((2, 11), (1, 12))

d.setkeys((3, 4)) ==> d = OrderedDict((3, 11), (4, 12)) (or KeyError?)

d.setvalues((12 , 11)) ==> d = OrderedDict((1, 12), (2, 11))

d.setvalues((13 , 14)) ==> d = OrderedDict((1, 13), (2, 14)) (always ok)

(Or maybe better set_keys in analogy to has_key?)

I hesitate making keys and values managed properties, because this would
conflict with them being methods in ordinary dicts. Ordered dicts should
resemble ordinary dicts as closely as possible. And giving it a
different name like "sequence" I find confusing and unintuitive.

You really want to be able to set values as a sequence ? I suppose it's
even easier to implement than changing keys, but I'd never considered
it.
A resort could be the following: If keys() is given a sequence as
argument, then use this as the new sequence of keys, and similar with
values(). This way, no new methods need to be introduced.

That's not a bad idea. I'll chat with Nicola Larosa and see what he
thinks.

It does break backawards compatibility of course...

All the best,

Fuzzyman
http://www.voidspace.org.uk/python/index.shtml
-- Christoph

Nov 24 '05 #169

Duncan Booth

Christoph Zwerschke wrote:

Duncan Booth schrieb:
In Javascript Object properties (often used as an associative array)
are defined as unordered although as IE seems to always store them in
the order of original insertion it wouldn't surprise me if there are
a lot of websites depending on that behaviour.

You're right with both. The ECMA language definition says object
properties are an unordered collection, but MSIE and probably other
browsers keep the order in which they were created. Of course one
should not rely on that.

Yes, the real fun bit is arrays. If you create an array using the 'new
Array' or [ ... ] then a for..in loop might well trick you into thinking it
is going to go through the elements in order, but if you assign to elements
directly it breaks down:

a = ['a', 'b', 'c'];
a[4] = 'e';
a[3] = 'd';
for (var k in a) {
alert(a[k]+'='+a[k]);
}

On IE this will go through elements in the order 0, 1, 2, 4, 3.

Also, it is original order of insertion which matters, not the 'last
in/last out' you might have expected. In other words it looks rather as
though IE simply sets deleted properties to undefined (and skips them on
iteration), it never really deletes a property, so anyone who tries to use
a Javascript object as an associative array with lots of rapidly changing
keys could be in for a nasty shock.

Nov 24 '05 #170