423,850 Members | 1,661 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 423,850 IT Pros & Developers. It's quick & easy.

Are lists at least as efficient as dictionaries?

P: n/a
Hi,
if you know the Python internals, here is a newbie question for you.
If I have a list with 100 elements, each element being a long string,
is it more efficient to maintain it as a dictionary (with a key = a
string from the list and value = None) for the purpose of insertion
and removal?
Basically, if Python really implements lists as linked lists but
dictionaries as hash tables, it may well be that hashing a key takes
negligible time as compared to comparing it against every list
element.
Oh and here is (as a non-sequiter) something I don't understand
either:
x = ([],)
x[0] += ['something'] Traceback (most recent call last):
File "<stdin>", line 1, in ?
TypeError: object doesn't support item assignment x (['something'],) <---- complained but did it anyway?? x[0].append('and another thing') <------- no complaint!
x

(['something', 'and another thing'],)
Jul 18 '05 #1
Share this Question
Share on Google+
6 Replies


P: n/a
Narendra C. Tulpule wrote:
If I have a list with 100 elements, each element being a long string,
is it more efficient to maintain it as a dictionary (with a key = a
string from the list and value = None) for the purpose of insertion
and removal?


Wild guess: I suppose that both implementations will not significantly
affect the overall speed of your application, e.g. if the strings are
*really* large (as opposed to the list of *only* 100 elements), reading
from disk will take much longer than inserting into the list, even at
arbitrary positions.

Also, note that no particular order is preserved for the keys in a
dictionary.

And now for something completely different:
x = ([],)
x[0] += ['something'] Traceback (most recent call last):
File "<stdin>", line 1, in ?
TypeError: object doesn't support item assignment


+= calls the list.__iadd__(self, other) method, which seems to be
implemented as

def __iadd__(self, other):
self.append(other)
return self

The call of this method succeds, but the following assignment fails, because
tuples are immutable. This could only be remedied if all assignments

a = a

were silently ignored, or, if the += operator would not perform an
assignment, which has the disadvantage that it would no longer work for
immutables, so that
i = 1
i += 1
print i 1 # admittedly faked


You could change __iadd__() to

def __iadd__(self, other):
return self + other

but then sane behaviour in one special case comes at the cost of always
creating a copy of a potentially large (say more than 100 items :-) list.

By the way, one topic per post is always a good idea :-)

Peter
Jul 18 '05 #2

P: n/a
Narendra C. Tulpule wrote:
Hi,
if you know the Python internals, here is a newbie question for you.
If I have a list with 100 elements, each element being a long string,
is it more efficient to maintain it as a dictionary (with a key = a
string from the list and value = None) for the purpose of insertion
and removal?
If you use a dictionary, there will be no "intrinsic" ordering -- you
may as well use sets, then. "Insertion" for a list would thus just
be an .append, very fast. "removal" may be O(N) for a list, if you
need to search through it for occurrences.
Basically, if Python really implements lists as linked lists but
dictionaries as hash tables, it may well be that hashing a key takes
negligible time as compared to comparing it against every list
element.


Python's lists are actually vectors, dicts are indeed hash tables.
Hashing a long string does take some time, but the value may be
cached (depending on the Python implementation) so that said time
need be spent only once for a given string object (but separate
string objects will spend the time multiply, even if it so happens
that their values coincide).

All in all, there's no serious alternative to benchmarking both
possibilities in a realistic scenario. Quite likely you may find
out that -- for sensible frequencies of insertiom / removal --
the time doesn't matter for your application overall (dominated
by I/O or other issues), and then you can use the simplest rather
than the fastest approach. At least 9 times out of 10 you will
be happiest with simplicity. If you don't care about ordering,
Python 2.3's sets are likely the simplest data structure (they
are implemented in terms of dictionaries, thus pretty fast too).
Alex
Jul 18 '05 #3

P: n/a
On Fri, Aug 29, 2003 at 10:07:20AM +0200, Peter Otten wrote:
Narendra C. Tulpule wrote:
[ snip]
And now for something completely different:
> x = ([],)
> x[0] += ['something'] Traceback (most recent call last):
File "<stdin>", line 1, in ?
TypeError: object doesn't support item assignment


+= calls the list.__iadd__(self, other) method, which seems to be
implemented as

def __iadd__(self, other):
self.append(other)
return self


self.extend(other), actually. But that's the basic idea, yea.

The call of this method succeds, but the following assignment fails, because
tuples are immutable. This could only be remedied if all assignments

a = a

were silently ignored, or, if the += operator would not perform an
assignment, which has the disadvantage that it would no longer work for
immutables, so that
i = 1
i += 1
print i 1 # admittedly faked

+= could simply be syntactic sugar for a call to __add__ and then an
assignment. This would work for mutable and immutable objects.

You could change __iadd__() to

def __iadd__(self, other):
return self + other

but then sane behaviour in one special case comes at the cost of always
creating a copy of a potentially large (say more than 100 items :-) list.


True, except that list.extend() exists.

Jp

--
"There is no reason for any individual to have a computer in their
home."
-- Ken Olson, President of DEC, World Future Society
Convention, 1977

Jul 18 '05 #4

P: n/a

"Narendra C. Tulpule" <na***@trillium.com> wrote in message
news:78**************************@posting.google.c om...
Hi,
if you know the Python internals, here is a newbie question for you.
If I have a list with 100 elements, each element being a long string,
is it more efficient to maintain it as a dictionary (with a key = a
string from the list and value = None) for the purpose of insertion
and removal?


Lists are more efficient for lookup by index, dictionaries are
more efficient for insertion (except at the end, where Python
maintains extra slots for just this case), removal and lookup
by key. Lists are also much more efficient for sequential
traversal.

John Roth
Jul 18 '05 #5

P: n/a
On Fri, Aug 29, 2003 at 10:24:37AM -0700, Chad Netzer wrote:
On Fri, 2003-08-29 at 07:54, Jp Calderone wrote:
+= could simply be syntactic sugar for a call to __add__ and then an
assignment. This would work for mutable and immutable objects.


But it loses the advantage that some objects would otherwise have of
being able to mutate in place, without allocating a new object (ie. very
large matrix additions).


But as you removed from my original post, list.extend() exists. All one
has to do to retain the existing functionality of __iadd__ is name the
method something else, then call it. All the advantages, none of the
confusing or difficult to track semantics.

Jp

Jul 18 '05 #6

P: n/a
On Sat, 2003-08-30 at 00:03, Jp Calderone wrote:
But as you removed from my original post, list.extend() exists. All one
has to do to retain the existing functionality of __iadd__ is name the
method something else, then call it. All the advantages, none of the
confusing or difficult to track semantics.


True, however when working with large Matrices, I much prefer A += B, to
A.add(B), and the performance advantages with __iadd__ can be
substantial.

I prefer the original posters suggestion that self assignment be
ignored. But I see your point about the more difficult semantics of
__iadd__.
--
Chad Netzer
Jul 18 '05 #7

This discussion thread is closed

Replies have been disabled for this discussion.