By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
439,931 Members | 2,015 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 439,931 IT Pros & Developers. It's quick & easy.

missing? dictionary methods

P: n/a
Well at least I find them missing.

For the moment I frequently come across the following cases.

1) Two files, each with key-value pairs for the same dictionary.
However it is an error if the second file contains a key that
was not in the first file.

In treating the second file I miss a 'set' method.
dct.set(key, value) would be equivallent to dct[key] = value,
except that it would raise a KeyError if the key wasn't
already in the dictionary.
2) One file with key-value pairs. However it is an error
if a key is duplicated in the file.

In treating such files I miss a 'make' method.
dct.make(key, value) would be equivallent to dct[key] = value.
except that it would raise a KeyError if the key was
already in the dictionary.
What do other people think about this?

--
Antoon Pardon
Jul 18 '05 #1
Share this Question
Share on Google+
12 Replies


P: n/a
"Antoon Pardon" <ap*****@forel.vub.ac.be> wrote in message
news:sl********************@rcpc42.vub.ac.be...
Well at least I find them missing.

For the moment I frequently come across the following cases.

1) Two files, each with key-value pairs for the same dictionary.
However it is an error if the second file contains a key that
was not in the first file.

In treating the second file I miss a 'set' method.
dct.set(key, value) would be equivallent to dct[key] = value,
except that it would raise a KeyError if the key wasn't
already in the dictionary.
2) One file with key-value pairs. However it is an error
if a key is duplicated in the file.

In treating such files I miss a 'make' method.
dct.make(key, value) would be equivallent to dct[key] = value.
except that it would raise a KeyError if the key was
already in the dictionary.
What do other people think about this?

--
Antoon Pardon


+1

I'm sure I've needed and implemented this functionality in the past, but it was simple enough to
even think of extracting them into functions/methods. In contrast to the recent pre-PEP about dict
accumulating methods, set() and make() (or whatever they might be called) are meaningful for all
dicts, so they're good candidates for being added to the base dict class.

As for naming, I would suggest reset() instead of set(), to emphasize that the key must be there.
make() is ok; other candidates could be add() or put().

George
Jul 18 '05 #2

P: n/a
Antoon Pardon wrote:
Well at least I find them missing.

For the moment I frequently come across the following cases.

1) Two files, each with key-value pairs for the same dictionary.
However it is an error if the second file contains a key that
was not in the first file.

In treating the second file I miss a 'set' method.
dct.set(key, value) would be equivallent to dct[key] = value,
except that it would raise a KeyError if the key wasn't
already in the dictionary.
2) One file with key-value pairs. However it is an error
if a key is duplicated in the file.

In treating such files I miss a 'make' method.
dct.make(key, value) would be equivallent to dct[key] = value.
except that it would raise a KeyError if the key was
already in the dictionary.
What do other people think about this?


def safeset(dct, key, value):
if key not in dct:
raise KeyError(key)
else:
dct[key] = value

def make(dct, key, value):
if key in dct:
raise KeyError('%r already in dict' % key)
else:
dct[key] = value

I don't see a good reason to make these built in to dict type.

--
Robert Kern
rk***@ucsd.edu

"In the fields of hell where the grass grows high
Are the graves of dreams allowed to die."
-- Richard Harter
Jul 18 '05 #3

P: n/a
Op 2005-03-21, Robert Kern schreef <rk***@ucsd.edu>:
Antoon Pardon wrote:
Well at least I find them missing.

For the moment I frequently come across the following cases.

1) Two files, each with key-value pairs for the same dictionary.
However it is an error if the second file contains a key that
was not in the first file.

In treating the second file I miss a 'set' method.
dct.set(key, value) would be equivallent to dct[key] = value,
except that it would raise a KeyError if the key wasn't
already in the dictionary.
2) One file with key-value pairs. However it is an error
if a key is duplicated in the file.

In treating such files I miss a 'make' method.
dct.make(key, value) would be equivallent to dct[key] = value.
except that it would raise a KeyError if the key was
already in the dictionary.
What do other people think about this?


def safeset(dct, key, value):
if key not in dct:
raise KeyError(key)
else:
dct[key] = value

def make(dct, key, value):
if key in dct:
raise KeyError('%r already in dict' % key)
else:
dct[key] = value

I don't see a good reason to make these built in to dict type.


I would say the same reason that we have get. There is no
reason to have a builtin get it is easily implemented
like this:

def get(dct, key, default):

try:
return dct[key]
except KeyError:
return default
I would go even so far that there is more reason to have a built-in
safeset and make, than there is a reason to have a built-in get.

The reason is that a python implementation of safeset and make,
will mean two accesses in the dictionary, once for the test and
once for the assignment. This double access could be eliminated
with a built-in. The get on the other hand does only one dictionary
access, so having it implemeted in python is a lesser burden.

--
Antoon Pardon
Jul 18 '05 #4

P: n/a
Antoon Pardon wrote:
I would say the same reason that we have get. There is no
reason to have a builtin get it is easily implemented
like this:

def get(dct, key, default):

try:
return dct[key]
except KeyError:
return default
I would go even so far that there is more reason to have a built-in
safeset and make, than there is a reason to have a built-in get.

The reason is that a python implementation of safeset and make,
will mean two accesses in the dictionary, once for the test and
once for the assignment. This double access could be eliminated
with a built-in. The get on the other hand does only one dictionary
access, so having it implemeted in python is a lesser burden.


That's not true; they're on more or less the same level
computation-wise. try:...except... doesn't relieve the burden; it's
expensive.

For me, the issue boils down to how often such constructs are used. I
don't think that I've ever run into use cases for safeset() and make().
dct.get(key, default) comes up *a lot*, and in places where speed can
matter. Searching through the standard library can give you an idea how
often.

--
Robert Kern
rk***@ucsd.edu

"In the fields of hell where the grass grows high
Are the graves of dreams allowed to die."
-- Richard Harter
Jul 18 '05 #5

P: n/a
Op 2005-03-21, Robert Kern schreef <rk***@ucsd.edu>:
Antoon Pardon wrote:
I would say the same reason that we have get. There is no
reason to have a builtin get it is easily implemented
like this:

def get(dct, key, default):

try:
return dct[key]
except KeyError:
return default
I would go even so far that there is more reason to have a built-in
safeset and make, than there is a reason to have a built-in get.

The reason is that a python implementation of safeset and make,
will mean two accesses in the dictionary, once for the test and
once for the assignment. This double access could be eliminated
with a built-in. The get on the other hand does only one dictionary
access, so having it implemeted in python is a lesser burden.
That's not true; they're on more or less the same level
computation-wise. try:...except... doesn't relieve the burden; it's
expensive.


I have always heard that try: ... except is relatively inexpensive
in python. Particularly if there is no exception raised.
For me, the issue boils down to how often such constructs are used. I
don't think that I've ever run into use cases for safeset() and make(). dct.get(key, default) comes up *a lot*, and in places where speed can
matter. Searching through the standard library can give you an idea how
often.


It is always hard to compare the popularity/usefullness of two things when
one is already implemented and the other is not. IME it is not that
uncommon to know in some part of the code that the keys you use should
already be in the dictionary or contrary that you know the key should
not already be in the dictionary.

--
Antoon Pardon
Jul 18 '05 #6

P: n/a
Ron
On 21 Mar 2005 08:21:40 GMT, Antoon Pardon <ap*****@forel.vub.ac.be>
wrote:
Well at least I find them missing.

For the moment I frequently come across the following cases.

1) Two files, each with key-value pairs for the same dictionary.
However it is an error if the second file contains a key that
was not in the first file.

In treating the second file I miss a 'set' method.
dct.set(key, value) would be equivallent to dct[key] = value,
except that it would raise a KeyError if the key wasn't
already in the dictionary.
2) One file with key-value pairs. However it is an error
if a key is duplicated in the file.

In treating such files I miss a 'make' method.
dct.make(key, value) would be equivallent to dct[key] = value.
except that it would raise a KeyError if the key was
already in the dictionary.
What do other people think about this?

There is a has_key(k) method that helps with these.

Adding these wouldn't be that hard and it can apply to all
dictionaries with any data.

class newdict(dict):
def new_key( self, key, value):
if self.has_key(key):
raise KeyError, 'key already exists'
else:
self[key]=value
def set_key( self, key, value):
if self.has_key(key):
self[key]=value
else:
raise KeyError, 'key does not exist'

d = newdict()
for x in list('abc'):
d[x]=x
print d
d.new_key('z', 'z')
d.set_key('a', 'b')
print d

Which is faster? (has_key()) or (key in keys())?
Jul 18 '05 #7

P: n/a

"Antoon Pardon" <ap*****@forel.vub.ac.be> wrote in message
news:sl********************@rcpc42.vub.ac.be...
For the moment I frequently come across the following cases.

1) Two files, each with key-value pairs for the same dictionary.
However it is an error if the second file contains a key that
was not in the first file.

In treating the second file I miss a 'set' method.
dct.set(key, value) would be equivallent to dct[key] = value,
except that it would raise a KeyError if the key wasn't
already in the dictionary.
2) One file with key-value pairs. However it is an error
if a key is duplicated in the file.

In treating such files I miss a 'make' method.
dct.make(key, value) would be equivallent to dct[key] = value.
except that it would raise a KeyError if the key was
already in the dictionary.
What do other people think about this?


To me, one of the major problems with OOP is that there are an unbounded
number of functions that we can think of to operate on a date structure and
thus a continual pressure to turn functions into methods and thus
indefinitely expand a data structure class. And whatever is the least used
current method, there will always be candidates which are arguably at least
or almost as useful. And the addition of one method will be seen as reason
to add another, and another, and another. I was almost opposed to .get for
this reason. I think dict has about enough 'basic' methods.

So, without suppost from many people, your two examples strike me as fairly
specialized usages best written, as easily done, as Python functions.

Terry J. Reedy

Jul 18 '05 #8

P: n/a

Antoon Pardon wrote:
Well at least I find them missing.

For the moment I frequently come across the following cases.

1) Two files, each with key-value pairs for the same dictionary.
However it is an error if the second file contains a key that
was not in the first file.

In treating the second file I miss a 'set' method.
dct.set(key, value) would be equivallent to dct[key] = value,
except that it would raise a KeyError if the key wasn't
already in the dictionary.
2) One file with key-value pairs. However it is an error
if a key is duplicated in the file.

In treating such files I miss a 'make' method.
dct.make(key, value) would be equivallent to dct[key] = value.
except that it would raise a KeyError if the key was
already in the dictionary.
What do other people think about this?

--
Antoon Pardon


If (1) gets accepted, I propose the name .change(key, val) It's
simple, logical, and makes sense.

Jul 18 '05 #9

P: n/a
George Sakkis wrote:
As for naming, I would suggest reset() instead of set(), to emphasize that the key must be there.
make() is ok; other candidates could be add() or put().


How about 'new' and 'old'?

--
Greg Ewing, Computer Science Dept,
University of Canterbury,
Christchurch, New Zealand
http://www.cosc.canterbury.ac.nz/~greg
Jul 18 '05 #10

P: n/a
Op 2005-03-21, Terry Reedy schreef <tj*****@udel.edu>:

"Antoon Pardon" <ap*****@forel.vub.ac.be> wrote in message
news:sl********************@rcpc42.vub.ac.be...
For the moment I frequently come across the following cases.

1) Two files, each with key-value pairs for the same dictionary.
However it is an error if the second file contains a key that
was not in the first file.

In treating the second file I miss a 'set' method.
dct.set(key, value) would be equivallent to dct[key] = value,
except that it would raise a KeyError if the key wasn't
already in the dictionary.
2) One file with key-value pairs. However it is an error
if a key is duplicated in the file.

In treating such files I miss a 'make' method.
dct.make(key, value) would be equivallent to dct[key] = value.
except that it would raise a KeyError if the key was
already in the dictionary.
What do other people think about this?


To me, one of the major problems with OOP is that there are an unbounded
number of functions that we can think of to operate on a date structure and
thus a continual pressure to turn functions into methods and thus
indefinitely expand a data structure class. And whatever is the least used
current method, there will always be candidates which are arguably at least
or almost as useful. And the addition of one method will be seen as reason
to add another, and another, and another. I was almost opposed to .get for
this reason. I think dict has about enough 'basic' methods.

So, without suppost from many people, your two examples strike me as fairly
specialized usages best written, as easily done, as Python functions.


I don't know it they are so specialized. I would rather say the
map[key] = value semantics is specialized. If we work with a list
the key already has to exist. If you have a list with 4 elements
and you try to assign to the 6th element you get an IndexError.
If you want to assign to the 6th element you have to construct
that first. That and for symetric reason with var = dct[key]
make me think that dct[key] = value shouldn't just construct
an entry when it isn't present.

I also was under the impression that a particular part of
my program almost doubled in execution time once I replaced
the naive dictionary assignment with these self implemented
methods. A rather heavy burden IMO for something that would
require almost no extra burden when implemented as a built-in.

But you are right that there doesn't seem to be much support
for this. So I won't press the matter.

--
Antoon Pardon
Jul 18 '05 #11

P: n/a
On 22 Mar 2005 07:40:50 GMT, Antoon Pardon <ap*****@forel.vub.ac.be> wrote:
[...]
I also was under the impression that a particular part of
my program almost doubled in execution time once I replaced
the naive dictionary assignment with these self implemented
methods. A rather heavy burden IMO for something that would
require almost no extra burden when implemented as a built-in.
I think I see a conflict of concerns between language design
and optimization. I call it "arms-length assembler programming"
when I see language features being proposed to achieve assembler-level
code improvements.

For example, what if subclassing could be optimized to have virtually
zero cost, with some kind of sticky-mro hint etc to the compiler/optimizer?
How many language features would be dismissed with "just do a sticky subclass?"
But you are right that there doesn't seem to be much support
for this. So I won't press the matter.

I think I would rather see efficient general composition mechanisms
such as subclassing, decoration, and metaclassing etc. for program elements,
if possible, than incremental aggregation of efficient elements into the built-in core.

Also, because optimization risks using more computation to optimize than the expression
being optimized, I suspect that some kind of evaluate-expression-once (at def-time or first
execution time) and optimize-particular-expression hints could pay off more in general
than particular useful methods. Maybe Pypy will be an easier place to experiment with
these kinds of things.

Regards,
Bengt Richter
Jul 18 '05 #12

P: n/a
Op 2005-03-22, Bengt Richter schreef <bo**@oz.net>:
On 22 Mar 2005 07:40:50 GMT, Antoon Pardon <ap*****@forel.vub.ac.be> wrote:
[...]
I also was under the impression that a particular part of
my program almost doubled in execution time once I replaced
the naive dictionary assignment with these self implemented
methods. A rather heavy burden IMO for something that would
require almost no extra burden when implemented as a built-in.

I think I see a conflict of concerns between language design
and optimization. I call it "arms-length assembler programming"
when I see language features being proposed to achieve assembler-level
code improvements.

For example, what if subclassing could be optimized to have virtually
zero cost, with some kind of sticky-mro hint etc to the compiler/optimizer?
How many language features would be dismissed with "just do a sticky subclass?"


I'm sorry you have lost me here. What do you mean with "stick-mro"

My feeling about this is the following. A[key] = value,
A.reset(key, value) and A.make(key, value) would do almost
identical things, so identical that it would probably easy
to unite them into something like A.assign(key, value, flag)
where flag would indicate which of the three options is wanted.

Also a lot of this code is identical to searching for a key.
Now because the implemantation doesn't provide some of the
possibilities I have to duplicate some of the work.

One could argue that hashes are fast enough so that this
doesn't matter, but dictionaries are the template for
all mappings in python. What it you are using a tree
and you have to go through it twice or what if you
are working with slower mediums like with one of
the dbm modules where you have to go through your
structure on disk twice.

You can see it as assembler-level code improvements, but
you also can see it as an imcomplete interface to your
structure. IMO it would be like only providing '<'
and if people wanted '==' they would have to implement
that like 'not (b < a or a < b)' and in this
case too, this would increase the cost compared with
a directly implemented '=='.

But you are right that there doesn't seem to be much support
for this. So I won't press the matter.

I think I would rather see efficient general composition mechanisms
such as subclassing, decoration, and metaclassing etc. for program elements,
if possible, than incremental aggregation of efficient elements into the built-in core.

Also, because optimization risks using more computation to optimize than the expression
being optimized,


I think this would hardly be the case here. The dictionary code already
has to find out if the key is already in the hash or not. Instead of
just continuing the branch it decided on as is now the case, the code
would test if the branch is appropiate for the demanded action
and raise an exception if not.

--
Antoon Pardon
Jul 18 '05 #13

This discussion thread is closed

Replies have been disabled for this discussion.