Why are there no ordered dictionaries? - Page 9

Christoph Zwerschke

This is probably a FAQ, but I dare to ask it nevertheless since I
haven't found a satisfying answer yet: Why isn't there an "ordered
dictionary" class at least in the standard list? Time and again I am
missing that feature. Maybe there is something wrong with my programming
style, but I rather think it is generally useful.

I fully agree with the following posting where somebody complains why so
very basic and useful things are not part of the standard library:
http://groups.google.com/group/comp....52d2f771a49857

Are there plans to get it into the standard lib sometime?

-- Christoph

Nov 22 '05

Subscribe Reply

210

10547

bonono

Christoph Zwerschke wrote:

Bengt Richter schrieb:
Ok, so if not in the standard library, what is the problem? Can't find what
you want with google and PyPI etc.? Or haven't really settled on what your
_requirements_ are? That seems to be the primary problem people who complain
with "why no sprollificator mode?" questions.

What I don't understand is why legitimate questions such as "why are
there no ordered dictionaries" are immediately interpreted as
*complaints* and not just as questions. If I ask such a question, I am
not complaining but trying to simply figure out *why* there is no such
thing. Probably there are reasons and all I want to know is find these
reasons and learn a little bit more about Python in doing so.

Why can't such questions be discussed in a factual, calm and friendly way?

Using "why can't" is already too much. Even you turn it into "is there
a thing for ordered dict", you would get similar treatment

Get used to it :-)

> They don't know what they really mean when it comes down to a DYFR
> (Define Your Felicitous Requirements) challenge.

I don't think that this was true in this case, and even if this is the
outcome, those who asked the question will have learned something.

I think a discussion group is not there for only presenting mature,
sophisticated thoughts and concepts, but also for "thinking loud"
together with other about these issues. We all know that clarifying our
thoughts works often best if you discuss them with others. And I think
that's one purpose of discussion lists. Asking questions should not be
immediately be discouraged, even silly questions. If it is really a FAQ,
you can simply point to the FAQ or add the answer in the FAQ list if it
is missing there.

Well, different groups has different "personalit y", just don't be
scared.

Nov 22 '05 #81

bonono

Christoph Zwerschke wrote:

Fredrik Lundh wrote:
I'll repeat this one last time: for the use cases presented by Zwerschke
and "bonono", using a list as the master data structure, and creating the
dictionary on demand, is a lot faster than using a ready-made ordered
dict implementation. if you will access things via the dictionary a lot,
you can cache the dictionary somewhere. if not, you can recreate it
several times and still get a net win.

You're right in pointing out that the advantage of ordered dictionaries
(unless you use an omptimized C implementation) is not a performance gain.

But please see my other reply: If the dictionary has more than 3 items
(say 10 or 20), and an effective ordered dict is used, it's not really
"a lot" slower. At least if we are talking about a situation were "on
demand" is "always". So, on the other side there isn't such a big
performance loss when using ordered dictionaries as well.

The advantage of using an ordered dictionary is that you can set up your
ordered dictionary (say, describing your database columns) once, and
then can access it in any way you like in the following: Iterate over it
in a guaranteed order or access item, always refering to the same
object, without needing to care about building and caching auxiliary
objects with different names depending on what you are doing.

Well, I don't think performance is the concern(at least not for me),
but how best to blend with the rest of the code which I have no
interest to explain as I am not here to convincing anyone for such a
thing. I just present a use case, if they see it, fine. If not, that is
fine too.

But I did learn something that creating a dict on a list cost me
nothing, I would be less worry about the performance hit in the future.

Nov 22 '05 #82

A.M. Kuchling

On 22 Nov 2005 01:41:44 -0800,
Kay Schluehr <ka**********@g mx.net> wrote:

Does anyone actually use this site? While the Vaults offered a nice
place and a nice interface the Cheese Shop has the appeal of a code
slum.

Looking at the Cheese Shop's home page at
http://cheeseshop.python.org/pypi, which lists recent package updates,
three packages were updated on 11/22, and three on 11/21. Two on
11/20, six on 11/19 (someone was busy!).

Looking at the Vaults's 'latest' page at
http://py.vaults.ca/apyllo.py?a=l, two packages were updated on 08/23,
and five on 08/06.

What would improve the Cheese Shop's interface for you?

--amk

Nov 22 '05 #83

Kay Schluehr

A.M. Kuchling wrote:

On 22 Nov 2005 01:41:44 -0800,
Kay Schluehr <ka**********@g mx.net> wrote:
Does anyone actually use this site? While the Vaults offered a nice
place and a nice interface the Cheese Shop has the appeal of a code
slum.
Looking at the Cheese Shop's home page at
http://cheeseshop.python.org/pypi, which lists recent package updates,
three packages were updated on 11/22, and three on 11/21. Two on
11/20, six on 11/19 (someone was busy!).

Looking at the Vaults's 'latest' page at
http://py.vaults.ca/apyllo.py?a=l, two packages were updated on 08/23,
and five on 08/06.

What would improve the Cheese Shop's interface for you?

--amk

From the treasures of the vaults to cheese shops - an upside down

evolution :( Personally I find the usability of the Vaults quite well
and it's design o.k. I guess it is a technical detail that causes the
Vaults to be phased out.

If I'd try something more innovative by myself I'd think about
accessing and filtering packages through the Python console itself.
Probably an enhanced standard console enabling SQL commands. Updates of
a local database ( e.g. SQLite ) should be cheap and may be performed
by only one request. Searching, filtering, loading, installing would be
done in a Python+SQL environment.

Kay

Nov 22 '05 #84

Fuzzyman

Of course ours is ordered *and* orderable ! You can explicitly alter
the sequence attribute to change the ordering.

I think we're looking at improving performance based on some of the
suggestions here - as well as possibly widening it to include some of
the alternative use cases. (Or at least Nicola Larosa will do it and
I'll criticise it).

All for another day though...

Fuzzyman
http://www.voidspace.org.uk/python/index.shtml

Nov 22 '05 #85

Alex Martelli

Christoph Zwerschke <ci**@online.de > wrote:

Alex Martelli schrieb:
Perl hashes now keep track of 'order of keys'? That's new to me, they
sure didn't back when I used Perl!
Maybe I shouldn't have talked about Perl when I'm an ignoramus about
that language... You're right, Perl has unordered arrays. That was new
to me since I associate the term "array" always with "ordered" and I
just remembered that PHP's assoc arrays are similar to Perl's but in
fact, and the examples I have read did not mention about that problem.

Perl's _arrays_ are a bit like Python _lists_, and ordered; it's the
_hashes_ that are a bit like Python _dicts_, and unordered. PHP's use
of "array" for both concepts may indeed be a bit confusing.

What about PHP?

You can conclude that PHP's assoc arrays are ordered from the fact that
the language provides a ksort() function to order the keys. And I think
PHP's insertion order is the one I mentioned in my last post.

Anyway, it would be interesting to examine this in detail and how this
is implemented in other languages.

Yep. After just a bit of research I suspect you're right re PHP but
haven't found a specific reference-manual page URL about it.

"first insertion (since the last deletion if any)" is ONE unambiguous
definition, but surely not "_the_ ONE" with emphasis on ``the''.
I see nothing _ambiguous_ (nor _unnatural_) in being interested in the
> *last* insertion, for example; indeed if phrased as "upon insertion, put
> the key at the end of the sequence" (whether it was already elsewhere in
> the sequence of not), with no need for conditionals regarding previous
existence, it might appear more "conceptual ly compact".

But it would not satisfy the concept of "keys of a dictionary" which are
always unique.

Why not? Since keys are unique, any 'sequence' of keys is a permutation
of a set, a perfectly well-defined concept.

BTW, there are some boundary conditions that should be fulfilled for the
insertion order, most obviously:

If you define an ordered dict that way:

d = odict()
d['a'] = 1
d['b'] = 2
d['c'] = 3

The keys should then be orderes as ('a', 'b', 'c').
Yep, but both 'first-insertion' and 'latest-insertion' (and many other
rules) meet that constraint.

That would be also biased (in favour of Python) by the fact that
probably very little people would look for and use the package in the
cheese shop if they were looking for ordered dicts. I for example would
probably use ordered dicts if they were part of the standard lib, but
not if I have to install it as an obscure separate package. So I don't
think that will give us a real clue how many people would like ordered
dicts in the standard lib.
A package on cheese shop need not be "obscure" -- that depends on
announcing it well, etc. And the standard library is so huge that many
things inside IT are in fact obscure -- I find myself often pointing out
standard-library solutions to rather experienced Pythonistas who had
been rolling their own, unaware of the stdlib alternative (thus making
the stdlib even bigger has a cost...). So, I don't think this effect
invalidates the experiment; while not all who download an add-on would
like it in the stdlib, and vice versa, there is surely a correlation
between amount of interest/need for the add-on's functionality, and both
rate of downloads as well as interest in having it in the stdlib.
But anyway, if I find some time, I will research a little bit more about
the issue and create such a package, because it seems to me that the
existing packages and recipes are not really satisfying and you're right
it seems to be reasonably easy. It's on my todo list now...

It presumably should have a C implementation for speed and a Python one
for wide availability and ease of installation (the latter gives all the
advantages of prototyping in fixing the API &c, and is made especially
easy by the existence of UserDict.DictMi xin in the stdlib). I'll be
glad to help out with the C part when it gets to that, just holler...
Alex

Nov 22 '05 #86

Alex Martelli

Fredrik Lundh <fr*****@python ware.com> wrote:
...

But math folks usually name things after the person(s) who came
up with the idea, not just some random implementer. The idea of

Wrong: you're forgetting Stigler's Law of Misonomy (which I imagine must
have NOT been discovered by Stigler...;-). The Poisson distribution was
in fact described earlier by Bernoulli, Gosset's z-test is universally
known as Student's t-test, etc, etc.

Salsburg's delightful "The Lady Tasting Tea" has a lot of fun with
Stigler's law in the footnotes...;-)
Alex

Nov 22 '05 #87

Magnus Lycka

Christoph Zwerschke wrote:

I still believe that the concept of an "ordered dictionary" ("behave
like dict, only keep the order of the keys") is intuitive and doesn't
give you so much scope for ambiguity.

Sure. Others think so too. The problem is that if you and these other
people actually write down exactly how this unambigous ordered dict
should behave, you'll end up with a dozen different sets of semantics,
which can be divided in at least three main groups.

People use different idioms, and often gather collections of classes
and functions that fit their style of coding and their typical problem
domains. There is nothing wrong with that, and they might well be made
publically available if they are more generally useful.

When adding things to the standard library, there are some more things
to consider, particularly for something general such as an odict class:
Is is general enough, and does it add enough value to outweigh the
increased cognitive burden of more classes/functions/types in the
library?

For a beginner, it's easy to believe that things would be easier if
the tool you use had already solved the problem you have at hand for
you, but it turns out that the world has so many problems, that you
would end up with a impenetrable forest of standard library modules
if we actually tried to achieve this.

Nov 22 '05 #88

Stuart McGraw

"A.M. Kuchling" <am*@amk.ca> wrote in message news:9s******** ************@sp eakeasy.net...
[...]

What would improve the Cheese Shop's interface for you?

Getting rid of those damn top level links to old versions.
Seeing a long list of old versions, when 99% of visitors are
only interested in the current version, is just visual noise,
and really lame. Move the old version links onto the page
describing the software.

Nov 22 '05 #89

Anton Vredegoor

Christoph Zwerschke wrote:

But of course, it will always be slower since it is constructed on top
of the built-in dict. In end effect, you always have to maintain a
sequence *plus* a dictionary, which will be always slower than a sheer
dictionary. The ordered dictionary class just hides this uglyness of
having to maintain a dictionary plus a sequence, so it's rather an issue
of convenience in writing and reading programs than a performance issue.

It may be different if the ordered dict would be implemented directly as
an ordered hash table in C.

The problem with hashing is that one is storing data from a possibly
wildly varying range in a set of values with a limited range. That's
where the ordering problems come from in the first place. If one wants
to solve it once and for all one has to give up the speed advantage
that makes hashing so popular. I wonder if optimized C code using
bisect would be very much slower for small ranges?

The current set implementation uses dicts to implement sets, while sets
are a more basic data type than dicts. At least dicts and sets should
be derived from the same type. Just speaking from an intuitive point of
view of course :-)

Here's a sorted dict implementation without using hashes, which maybe
would be fast if implemented in C. The insertion order can be retrieved
using the keys and values lists from the object itself, items() gives a
sorted sequence.

Anton.

NB warning untested code.

When using mutables as keys which is possible by this implementation,
don't change the keys after they're used in the object.

from array import array

class VDict:

def __init__(self,s equence = []):
self.keys = []
self.values = []
self.unranks = array('L',[])
for key,value in sequence:
self[key] = value

def __setitem__(sel f,key,value):
keys = self.keys
values = self.values
unranks = self.unranks
n = len(keys)
i = self.bisect_lef t(key)
if i == n or keys[unranks[i]] != key:
keys.append(key )
values.append(v alue)
unranks.insert( i,n)
else:
values[i] = value

def __getitem__(sel f,key):
i = self.bisect_lef t(key)
return self.values[self.unranks[i]]

def bisect_left(sel f,key, lo=0, hi=None):
keys = self.keys
unranks = self.unranks
if hi is None:
hi = len(keys)
while lo < hi:
mid = (lo+hi)//2
if keys[unranks[mid]] < key:
lo = mid+1
else:
hi = mid
return lo

def __contains__(se lf,key):
keys = self.keys
unranks = self.unranks
n = len(keys)
i = self.bisect_lef t(key)
return i < n and keys[unranks[i]] == key

def __len__(self):
return len(self.keys)

def items(self):
keys = self.keys
values = self.values
unranks = self.unranks
return [(keys[i],values[i]) for i in unranks]

def __iter__(self):
return iter(self.items ())

def remove(self,key ):
keys = self.keys
values = self.values
unranks = self.unranks
n = len(keys)
i = self.bisect_lef t(key)
x = unranks[i]
if i < n and keys[x] == key:
del keys[x]
del values[x]
del unranks[i]
for j,k in enumerate(unran ks):
if k > x:
unranks[j] -= 1

def test():
V = VDict()
L = [1,2,3]
V[L] = 10
print V[L]
V[L] = 20
print V[L]
V.remove(L)
print V.items()
V = VDict(zip('edcb a',range(5)))
print V.items()
print V['d']
V.remove('d')
print V.items()

if __name__=='__ma in__':
test()

Nov 22 '05 #90