Trouble sorting lists (unicode/locale related?)

Erlend Fuglum

Hi everyone,

I'm having some trouble sorting lists. I suspect this might have
something to do with locale settings and/or character
encoding/unicode.

Consider the following example, text containing norwegian special
characters æ, ø and å.

liste = ["ola", "erlend", "trygve", "Ærlige anders", "Lars", "Øksemorderen", "Åsne", "Akrobatiske Anna", "leidulf"]
liste.sort()
liste

['Akrobatiske Anna', 'Lars', 'erlend', 'leidulf', 'ola', 'trygve',
'\xc5sne', '\xc6rlige anders', '\xd8ksemorderen']

There are a couple of issues for me here:
* The sorting method apparently places strings starting with uppercase
characters before strings staring with lowercase. I would like to
treat them them equally when sorting. OK, this could probably be fixed
by hacking with .toupper() or something, but isn't it possible to
achieve this in a more elegant way?

* The norwegian special characters are sorted in a wrong way.
According to our alphabet the correct order is (...) x, y, z, æ, ø å.
Python does it this way: (...) x, y, z, å, æ, ø ?

I would really appreciate any help and suggestions - I have been
fiddling with this mess for quite some time now :-)

Jul 18 '05 #1

Subscribe Post Reply

5994

Peter Otten

Erlend Fuglum wrote:

Hi everyone,

I'm having some trouble sorting lists. I suspect this might have
something to do with locale settings and/or character
encoding/unicode.

Consider the following example, text containing norwegian special
characters æ, ø and å.
liste = ["ola", "erlend", "trygve", "Ærlige anders", "Lars",
"Øksemorderen", "Åsne", "Akrobatiske Anna", "leidulf"] liste.sort()
liste

['Akrobatiske Anna', 'Lars', 'erlend', 'leidulf', 'ola', 'trygve',
'\xc5sne', '\xc6rlige anders', '\xd8ksemorderen']

There are a couple of issues for me here:
* The sorting method apparently places strings starting with uppercase
characters before strings staring with lowercase. I would like to
treat them them equally when sorting. OK, this could probably be fixed
by hacking with .toupper() or something, but isn't it possible to
achieve this in a more elegant way?

* The norwegian special characters are sorted in a wrong way.
According to our alphabet the correct order is (...) x, y, z, æ, ø å.
Python does it this way: (...) x, y, z, å, æ, ø ?

I would really appreciate any help and suggestions - I have been
fiddling with this mess for quite some time now :-)

Try setting the appropriate locale first:

import locale
locale.setlocale(locale.LC_ALL, ("no", None))

Then for a case-insensitive sort:

wordlist.sort(locale.strcoll)

should do (disclaimer: all untested).

Peter

Jul 18 '05 #2

Erlend Fuglum

On Sun, 21 Sep 2003 14:10:16 +0200, Peter Otten <__*******@web.de>
wrote:

Erlend Fuglum wrote:
Hi everyone,

I'm having some trouble sorting lists. I suspect this might have
something to do with locale settings and/or character
encoding/unicode.
(...)

Try setting the appropriate locale first:

import locale
locale.setlocale(locale.LC_ALL, ("no", None))

Then for a case-insensitive sort:

wordlist.sort(locale.strcoll)

should do (disclaimer: all untested).

Grrrrrreat, that did it! Thank you very much!

Erlend

Jul 18 '05 #3

Jeff Epler

On Sun, Sep 21, 2003 at 02:10:16PM +0200, Peter Otten wrote:

Try setting the appropriate locale first:

import locale
locale.setlocale(locale.LC_ALL, ("no", None))

Then for a case-insensitive sort:

wordlist.sort(locale.strcoll)

should do (disclaimer: all untested).

If this did work, then you can use the DSU pattern like so:

decorated_wordlist = [(locale.strxfrm(w), w) for w in wordlist]
decorated_wordlist.sort()
wordlist[:] = [i[1] for i in decorated_wordlist]

from "man strxfrm"
DESCRIPTION
The strxfrm() function transforms the src string into a form suchthat
the result of strcmp() on two strings that have been transformed with
strxfrm() is the same as the result of strcoll() on the two strings
before their transformation. The first n characters of the transformed
string are placed in dest. The transformation is based on thepro-
gramâ€™s current locale for category LC_COLLATE. (See setlocale(3)).

Jeff

Jul 18 '05 #4

John J. Lee

Jeff Epler <je****@unpythonic.net> writes:

On Sun, Sep 21, 2003 at 02:10:16PM +0200, Peter Otten wrote: [...]
locale.setlocale(locale.LC_ALL, ("no", None))

[...] If this did work, then you can use the DSU pattern like so:

[...]

The advantage of which is that you don't have to mess with the locale.
John

Jul 18 '05 #5

Martin v. Löwis

John J. Lee wrote:

locale.setlocale(locale.LC_ALL, ("no", None))

[...]
If this did work, then you can use the DSU pattern like so:

[...strxfrm...]

The advantage of which is that you don't have to mess with the locale.

No, it doesn't. strxfrm requires that the locale is adjusted to the
desired LC_COLLATE facet. Otherwise, it will collate according to the
C locale (in which case strxfrm is the identity).

Regards,
Martin

Jul 18 '05 #6

John J. Lee

"Martin v. Löwis" <ma****@v.loewis.de> writes:

John J. Lee wrote:
locale.setlocale(locale.LC_ALL, ("no", None)) [...]
If this did work, then you can use the DSU pattern like so:

[...strxfrm...]
The advantage of which is that you don't have to mess with the
locale.

No, it doesn't.

It does set the locale, you mean? So I guess there's no advantage
there at all?

[...] strxfrm requires that the locale is adjusted to the
desired LC_COLLATE facet.

[...]
John

Jul 18 '05 #7

Martin v. Löwis

jj*@pobox.com (John J. Lee) writes:

>If this did work, then you can use the DSU pattern like so:
[...strxfrm...]
The advantage of which is that you don't have to mess with the
locale.
No, it doesn't.

It does set the locale, you mean?

Calling locale.strxfrm does not cause locale.setlocale to be called,
i.e. it does not set the locale. Likewise, calling locale.strcoll
does not cause setlocale to be called.

Instead, the application needs to call setlocale *explicitly* for
proper operation of either function.
So I guess there's no advantage there at all?

Using strxfrm is an advantage if you have to collate the same string
multiple times, e.g. when sorting a list of strings. It is an
advantage because sorting will complete faster.

Regards,
Martin

Jul 18 '05 #8

Peter Otten

Jeff Epler wrote:

If this did work, then you can use the DSU pattern like so:

decorated_wordlist = [(locale.strxfrm(w), w) for w in wordlist]
decorated_wordlist.sort()
wordlist[:] = [i[1] for i in decorated_wordlist]

Everytime that someone posts a naive list.sort(compare), the DSU pattern is
proposed to improve execution speed.

So maybe it's about time to change the sort() method to support a second
argument

list.sort(compare=None, mapping=None)

that, if provided, would perform the DSU magic. Or was that already proposed
and rejected?

Peter

Jul 18 '05 #9

Alex Martelli

Peter Otten wrote:

Jeff Epler wrote:
If this did work, then you can use the DSU pattern like so:

decorated_wordlist = [(locale.strxfrm(w), w) for w in wordlist]
decorated_wordlist.sort()
wordlist[:] = [i[1] for i in decorated_wordlist]

Everytime that someone posts a naive list.sort(compare), the DSU pattern
is proposed to improve execution speed.

So maybe it's about time to change the sort() method to support a second
argument

list.sort(compare=None, mapping=None)

that, if provided, would perform the DSU magic. Or was that already
proposed and rejected?

I have not seen this proposed before, and I'm not very clear on what
the "compare" and "mapping" arguments are supposed to be in order to
let you specify any DSU. Basically it seems you would need two
callables, "decorate" to be called with each item in the list (to
return for each item the decorated tuple) and "undecorate" to be
called with each decorated tuple after the sort (to return the item
for the result). How do you turn that into "compare" and "mapping"?
Alex

Jul 18 '05 #10

Duncan Booth

Alex Martelli <al***@aleax.it> wrote in
news:DU**********************@news1.tin.it:

So maybe it's about time to change the sort() method to support a
second argument

list.sort(compare=None, mapping=None)

that, if provided, would perform the DSU magic. Or was that already
proposed and rejected?

I have not seen this proposed before, and I'm not very clear on what
the "compare" and "mapping" arguments are supposed to be in order to
let you specify any DSU. Basically it seems you would need two
callables, "decorate" to be called with each item in the list (to
return for each item the decorated tuple) and "undecorate" to be
called with each decorated tuple after the sort (to return the item
for the result). How do you turn that into "compare" and "mapping"?

You don't need two callables because the sort function would be doing
the decorating, so it knows also how to undecorate. I think the
suggestion is that the mapping argument returns something that can be
compared.

For example, here is a DSU function that does a not-in-place sort and
takes a suitable mapping argument. Changing it to in-place sort is,
of course, trivial.

def DSU(aList, aMapping): newList = [ (aMapping(item), index, item) for (index, item)
in enumerate(aList) ]
newList.sort()
return [ item[2] for item in newList ]
print DSU(['Alex Martelli', 'Duncan Booth', 'Peter Otten', 'Jeff Epler'], str.lower)
['Alex Martelli', 'Duncan Booth', 'Jeff Epler', 'Peter Otten'] def lastFirst(name): names = name.split()
names = names[-1:] + names[:-1]
return [name.lower() for name in names]
print DSU(['Alex Martelli', 'Duncan Booth', 'Peter Otten', 'Jeff Epler'],

lastFirst)
['Duncan Booth', 'Jeff Epler', 'Alex Martelli', 'Peter Otten']
--
Duncan Booth
du****@rcp.co.uk int month(char
*p){return(124864/((p[0]+p[1]-p[2]&0x1f)+1)%12)["\5\x8\3"
"\6\7\xb\1\x9\xa\2\0\4"];} // Who said my code was obscure?

Jul 18 '05 #11

Peter Otten

Alex Martelli wrote:

Everytime that someone posts a naive list.sort(compare), the DSU pattern
is proposed to improve execution speed.

So maybe it's about time to change the sort() method to support a second
argument

list.sort(compare=None, mapping=None)

that, if provided, would perform the DSU magic. Or was that already
proposed and rejected?

I have not seen this proposed before, and I'm not very clear on what
the "compare" and "mapping" arguments are supposed to be in order to
let you specify any DSU. Basically it seems you would need two
callables, "decorate" to be called with each item in the list (to
return for each item the decorated tuple) and "undecorate" to be
called with each decorated tuple after the sort (to return the item
for the result). How do you turn that into "compare" and "mapping"?

My first idea was to add a .dsu(mapping) where the tuples in the decoration
phase would be generated as (mapping(item), item).
But I would rather enhance the already-existing sort() to provide for both
the traditional and the dsu sorting. Then you have to detect if the
function passed to sort(fun) takes one or two arguments, which is not
reliable when default values come into play. So I resorted to named
arguments. Didn't make it any clearer?

So here is a pure python prototype to shed some light on the matter:

class dsu(list):
def sort(self, compare=None, mapping=None):
if mapping:
decorated = [(mapping(i), i) for i in self]
decorated.sort()
self[:] = [i[1] for i in decorated]
else:
list.sort(self, compare)

sample = dsu("ab Aa AC d e f".split())

# "traditional" sort
sample.sort(lambda s, t: -cmp(s, t))
print sample

# decorate-sort-undecorate
sample.sort(mapping=lambda s: s.lower())
print sample
Peter

Jul 18 '05 #12

Alex Martelli

Duncan Booth wrote:
...

So maybe it's about time to change the sort() method to support a
second argument

list.sort(compare=None, mapping=None)

that, if provided, would perform the DSU magic. Or was that already
proposed and rejected?

I have not seen this proposed before, and I'm not very clear on what
the "compare" and "mapping" arguments are supposed to be in order to
let you specify any DSU. Basically it seems you would need two
callables, "decorate" to be called with each item in the list (to
return for each item the decorated tuple) and "undecorate" to be
called with each decorated tuple after the sort (to return the item
for the result). How do you turn that into "compare" and "mapping"?

You don't need two callables because the sort function would be doing
the decorating, so it knows also how to undecorate. I think the
suggestion is that the mapping argument returns something that can be
compared.

For example, here is a DSU function that does a not-in-place sort and
takes a suitable mapping argument. Changing it to in-place sort is,
of course, trivial.
def DSU(aList, aMapping):

newList = [ (aMapping(item), index, item) for (index, item)

Ah, I see -- the alleged "mapping" is in fact meant to be a
*CALLABLE*, NOT a mapping in the Python sense. Yes, you could
get away with just one callable in this way, of course. It
also seems to me that shoe-horning that second argument (in
practice mutually exclusive with the first one) to the sort
method, when there seem to be no real advantages to keeping
it a method, is not a great idea. Rather, a new built-in
function appears to me to be best here -- something like:

def sort(iterable, decorate=None):
if decorate is None:
aux = [ (item, index, item)
for index, item in enumerate(iterable) ]
else:
aux = [ (decorate(item), index, item)
for index, item in enumerate(iterable) ]
aux.sort()
return [ item for __, __, item in aux ]

so as to have no arbitrary constraints on the iterable being
a list, or the sort being necessarily in-place, when there
is no performance advantage in so doing -- and also serve the
frequent need of a non-in-place sort returning the sorted
list without any actual need for decoration. E.g., I often
use the equivalent of:

for key in sort(somesmalldict):
print key, somesmalldict[key]

and it would be handy to have that little 'sort' function
built-in for all such uses.

The only downside I can see to this proposal is that Python
isn't really all that good at passing the callable decorate
argument -- one would end up with a lot of little ad hoc
functions, or lambdas, and neither is a great solution. It's
the kind of situation where Smalltalk/Ruby shine by letting
an arbitrary block of code (albeit one only) be passed into
any method (in Ruby, this DSU-sort would probably become a
method of the Enumerable "module" [mix-in] -- a rather elegant
solution overall, "architecturally" nicer-looking than Python's,
although "pragmatically" we're probably abot even).
Alex

Jul 18 '05 #13

Duncan Booth

Peter Otten <__*******@web.de> wrote in
news:bk*************@news.t-online.com:

My first idea was to add a .dsu(mapping) where the tuples in the
decoration phase would be generated as (mapping(item), item).

Note that if anyone proposes this seriously, it should generate a 3-tuple
(mapping(item), index, item) rather than the 2-tuple you suggest.

This is because the mapping function could reasonably be used to impose an
ordering on objects that have no natural order, so you need to be sure that
the comparison never falls through the the original object even where the
mapping compares equal. It also has the side effect of making the sort
stable (although if stability is a goal you might want another option to
reverse the sort which would use '-index' as the second element and call
..reverse() on the result).

FWIW, I think something like this belongs in the standard library rather
than as a method on lists or a new builtin.

--
Duncan Booth du****@rcp.co.uk
int month(char *p){return(124864/((p[0]+p[1]-p[2]&0x1f)+1)%12)["\5\x8\3"
"\6\7\xb\1\x9\xa\2\0\4"];} // Who said my code was obscure?

Jul 18 '05 #14

Alex Martelli

Duncan Booth wrote:
...

FWIW, I think something like this belongs in the standard library rather
than as a method on lists or a new builtin.

Definitely not a method on lists, we agree on that. Problem is, there
is no module in the standard library to collect "functions that are often
useful but not QUITE often enough to warrant being built-ins" (and if
there was, there are quite a few current built-ins I'd like to relegate
into such an auxiliary/utilities module), so that finding a good place,
other than built-ins, for such utility functions, is often a bother.
Alex

Jul 18 '05 #15

Shu-Hsien Sheu

Hi,

I have a question about the comparison of efficiency of string slicing
and using string.startswith.
For example, which one of the following would be more efficient, or ,
moreover, more pythonic?

if aa[:3] == 'abc':

vs

if aa.startswith('abc'):
Thanks!

-shuhsien

Jul 18 '05 #16

Bob Gailer

At 09:17 AM 9/22/2003, Shu-Hsien Sheu wrote:

Hi,

I have a question about the comparison of efficiency of string slicing and
using string.startswith.
For example, which one of the following would be more efficient, or ,
moreover, more pythonic?

if aa[:3] == 'abc':

vs

if aa.startswith('abc'):

Here's one way to address this kind of question:

import time
def f(): .... st = time.time()
.... for i in range(500000):
.... if aa.startswith('abc'):pass
.... print time.time() - st
.... f() 1.01200008392 def f(): .... st = time.time()
.... for i in range(500000):
.... if aa[:3] == 'abc':pass
.... print time.time() - st
.... f()

1.01100003719

Bob Gailer
bg*****@alum.rpi.edu
303 442 2625
---
Outgoing mail is certified Virus Free.
Checked by AVG anti-virus system (http://www.grisoft.com).
Version: 6.0.506 / Virus Database: 303 - Release Date: 8/1/2003

Jul 18 '05 #17

Duncan Booth

Alex Martelli <al***@aleax.it> wrote in
news:VO**********************@news1.tin.it:

Duncan Booth wrote:
...
FWIW, I think something like this belongs in the standard library rather
than as a method on lists or a new builtin.

Definitely not a method on lists, we agree on that. Problem is, there
is no module in the standard library to collect "functions that are often
useful but not QUITE often enough to warrant being built-ins" (and if
there was, there are quite a few current built-ins I'd like to relegate
into such an auxiliary/utilities module), so that finding a good place,
other than built-ins, for such utility functions, is often a bother.

So someone needs to write a PEP proposing such a module.

--
Duncan Booth du****@rcp.co.uk
int month(char *p){return(124864/((p[0]+p[1]-p[2]&0x1f)+1)%12)["\5\x8\3"
"\6\7\xb\1\x9\xa\2\0\4"];} // Who said my code was obscure?

Jul 18 '05 #18

Peter Hansen

Shu-Hsien Sheu wrote:

I have a question about the comparison of efficiency of string slicing
and using string.startswith.
For example, which one of the following would be more efficient, or ,
moreover, more pythonic?

if aa[:3] == 'abc':

vs

if aa.startswith('abc'):

The latter is clearly more readable.

Jul 18 '05 #19

David Eppstein

In article <3F***************@engcorp.com>,
Peter Hansen <pe***@engcorp.com> wrote:

For example, which one of the following would be more efficient, or ,
moreover, more pythonic?

if aa[:3] == 'abc':

vs

if aa.startswith('abc'):

The latter is clearly more readable.

More Pythonic, too, I think. "Readability counts," and "There should be
one-- and preferably only one --obvious way to do it." In this case,
startswith must be the one obvious way, or else why would it exist in
the standard library at all?

--
David Eppstein http://www.ics.uci.edu/~eppstein/
Univ. of California, Irvine, School of Information & Computer Science

Jul 18 '05 #20

xtian

David Eppstein <ep******@ics.uci.edu> wrote in message news:<ep****************************@news.service. uci.edu>...

In article <3F***************@engcorp.com>,
Peter Hansen <pe***@engcorp.com> wrote:
For example, which one of the following would be more efficient, or ,
moreover, more pythonic?

if aa[:3] == 'abc':

vs

if aa.startswith('abc'):

The latter is clearly more readable.

More Pythonic, too, I think. "Readability counts," and "There should be
one-- and preferably only one --obvious way to do it." In this case,
startswith must be the one obvious way, or else why would it exist in
the standard library at all?

It's also much more maintainable - if in the future that 'abc' needs
to change to 'abcdef', the slicing version requires changes in two
places (just asking for bugs), while startswith requires only one. Of
course, all these things (readability, maintainability and
pythonicism) are fairly closely interrelated.

xtian

Jul 18 '05 #21

Jeremy Fincher

Shu-Hsien Sheu <sh**@bu.edu> wrote in message news:<ma**********************************@python. org>...

For example, which one of the following would be more efficient, or ,
moreover, more pythonic?

if aa[:3] == 'abc':

vs

if aa.startswith('abc'):

Python is about maintainability, and the latter is significantly more
maintainable than the former; if the string you're checking against
changes in size, using .startswith doesn't require you to change your
sourcecode anywhere else.

Jeremy

Jul 18 '05 #22

Paul

However, what if you don't want case sensitivity? For example, to
check if a file is a jpg, I do name[-3:].lower() == 'jpg'. This will
work with both foo.jpg and foo.JPG.

Is this slower than name.lower().endswith('jpg')? Is there a better
solution altogether?

Paul
Bob Gailer <bg*****@alum.rpi.edu> wrote in message news:<ma**********************************@python. org>...

At 09:17 AM 9/22/2003, Shu-Hsien Sheu wrote:
Hi,

I have a question about the comparison of efficiency of string slicing and
using string.startswith.
For example, which one of the following would be more efficient, or ,
moreover, more pythonic?

if aa[:3] == 'abc':

vs

if aa.startswith('abc'):

Here's one way to address this kind of question:
>>> import time
>>> def f(): ... st = time.time()
... for i in range(500000):
... if aa.startswith('abc'):pass
... print time.time() - st
... >>> f() 1.01200008392 >>> def f(): ... st = time.time()
... for i in range(500000):
... if aa[:3] == 'abc':pass
... print time.time() - st
... >>> f()

1.01100003719

Bob Gailer
bg*****@alum.rpi.edu
303 442 2625

--

Jul 18 '05 #23

Alex Martelli

Paul wrote:

However, what if you don't want case sensitivity? For example, to
check if a file is a jpg, I do name[-3:].lower() == 'jpg'. This will
work with both foo.jpg and foo.JPG.

Is this slower than name.lower().endswith('jpg')? Is there a better
solution altogether?

timeit.py gives me 0.9 microseconds for the less maintainable
name[:-3].lower()=='jpg' vs 1.7 for name.lower().endswith('jpg')
[and over 3 for re's]. Point is, _do I *care*_? How many millions
of filenames will I check, when the extra overhead of checking
a million filenames in the more maintainable way is less than a
second? How long will it have taken me to get those millions
of filenames into memory in the first place?

If this operation IS on the bottleneck, and a 0.8 microseconds
difference matters, I'll do the slicing -- but 99 times out of 100
I'll do the .endswith instead (I'll _start_ that way 100 times out
of 100, and optimize it iff profiling tells me that matters...).
Alex

Jul 18 '05 #24

Peter Hansen

Paul wrote:

However, what if you don't want case sensitivity? For example, to
check if a file is a jpg, I do name[-3:].lower() == 'jpg'. This will
work with both foo.jpg and foo.JPG.

Is this slower than name.lower().endswith('jpg')? Is there a better
solution altogether?

Yes, of course. :-)

import os
if os.path.splitext(name)[1].lower() == 'jpg':
pass

That also handles the problem with files named "ThisFileIs.NotAjpg"
being mistreated, as the other solutions do. ;-)

-Peter

Jul 18 '05 #25

David Eppstein

In article <3F**************@engcorp.com>,
Peter Hansen <pe***@engcorp.com> wrote:

Paul wrote:

However, what if you don't want case sensitivity? For example, to
check if a file is a jpg, I do name[-3:].lower() == 'jpg'. This will
work with both foo.jpg and foo.JPG.

Is this slower than name.lower().endswith('jpg')? Is there a better
solution altogether?

Yes, of course. :-)

import os
if os.path.splitext(name)[1].lower() == 'jpg':
pass

That also handles the problem with files named "ThisFileIs.NotAjpg"
being mistreated, as the other solutions do. ;-)

I was about to post the same answer. One minor nit, though: it should be

os.path.splitext(name)[1].lower() == '.jpg'

It may be a little longer than the other solutions, but it expresses the
meaning more clearly.

--
David Eppstein http://www.ics.uci.edu/~eppstein/
Univ. of California, Irvine, School of Information & Computer Science

Jul 18 '05 #26

Terry Reedy

"Shu-Hsien Sheu" <sh**@bu.edu> wrote in message
news:ma**********************************@python.o rg...

Hi,

For example, which one of the following would be more efficient, or , moreover, more pythonic?

if aa[:3] == 'abc':
This creates and deletes a temporary object.
if aa.startswith('abc'):

This makes a function call. As Bob showed for his system, the two
overheads are about the same for a three char prefix. If one were
checking a 30000 byte prefix, the call might win.

Terry J. Reedy

Jul 18 '05 #27

Peter Hansen

David Eppstein wrote:

In article <3F**************@engcorp.com>,
Peter Hansen <pe***@engcorp.com> wrote:
Paul wrote:

However, what if you don't want case sensitivity? For example, to
check if a file is a jpg, I do name[-3:].lower() == 'jpg'. This will
work with both foo.jpg and foo.JPG.

Is this slower than name.lower().endswith('jpg')? Is there a better
solution altogether?

Yes, of course. :-)

import os
if os.path.splitext(name)[1].lower() == 'jpg':
pass

That also handles the problem with files named "ThisFileIs.NotAjpg"
being mistreated, as the other solutions do. ;-)

I was about to post the same answer. One minor nit, though: it should be

os.path.splitext(name)[1].lower() == '.jpg'

Oops, thanks! I _always_ make that mistake. It just seems that if
os.path.split() does not return any path separators in the components,
then os.path.splitext() shouldn't return the extension separator...

Luckily we have tests around here to catch that kind of thing. :-)

-Peter

Jul 18 '05 #28

Peter Otten

Duncan Booth wrote:

Note that if anyone proposes this seriously, it should generate a 3-tuple
(mapping(item), index, item) rather than the 2-tuple you suggest.

This is because the mapping function could reasonably be used to impose an
ordering on objects that have no natural order, so you need to be sure
that the comparison never falls through the the original object even where
the mapping compares equal. It also has the side effect of making the sort
stable (although if stability is a goal you might want another option to
reverse the sort which would use '-index' as the second element and call
.reverse() on the result).
So my demo implementation was faulty :-(
Let me try again:

def sort(self, cmpfunc=None, mapfunc=None):
if mapfunc:
list.sort(self, lambda x, y: cmp(mapfunc(x), mapfunc(y)))
else:
list.sort(self, cmpfunc)

Seriously, if sort() were to be thus extended, the dsu pattern would rather
act as a performance enhancement under the hood. I prefer plain C

struct {
PyObject *decorator;
int index; /* iff stability is not guaranteed by the sort algorithm */
PyObject *data;
}

over n-tuples and would hope to reuse the sorting infrastructure already
there in listobject.c to compare only the decorators. I am not familiar
with the C source, so be gentle if that is not feasible.
FWIW, I think something like this belongs in the standard library rather
than as a method on lists or a new builtin.

If you think of the comparison as a two-step process,

(1) extract or calculate an attribute
(2) wrap it into a comparison function

consider the following use cases:

(a) lambda x, y: cmp(y, x)
(b) lambda x, y: cmp(x.attr.lower(), y.attr.lower())

All code following the latter pattern could be rewritten (more clearly, I
think) as

alist.sort(mapfunc=lambda x: attr.lower())

and would automatically benefit from dsu (which could even be turned off
transparently for the client code for very large lists, where the memory
footprint becomes a problem).

The litmus test whether to put it into a utility function or into an already
existing method that needs only a minor and completely backwards-compatible
interface change would then be:

Are (b)-style comparison functions nearly as or more common than (a)-style
functions.

Peter

PS: You and Alex Martelli dismissed my somewhat unfocused idea faster than I
could sort it out. I hope you will find the above useful, though.

Jul 18 '05 #29

Shu-Hsien Sheu

Great thanks for all the answers!
I really enjoy Python and learned a lot here.

-shuhsien

Jul 18 '05 #30

Shu-Hsien Sheu

Hi,

In my understanding, using try/except rather than if/else is more
pythonic. However, sometimes it is difficult to use the later.
For example, I want to search for a sub string in a list composed of
strings. It is considered "possitive" if there is a match, no matter how
many.

my_test = ['something', 'others', 'still others']

case 1: try/except

hit = 0
for i in my_test:
try:
i.index('some')
hit = 1
except ValueError:
pass

case 2: if/else

hit = 0
for i in my_test:
if 'some' in i:
hit = 1
It seems to me that in a simple searching/matching, using if might be
better and the code is smaller. Try/except would have its strengh on
catching multiple errorrs. However, problems occur if the criteria is
composed of "or" rather than "and". For instance:

if (a in b) or (c in b):
*do something

try:
b.index(a)
b.index(c)
*do something
except ValueError:
pass

The above two are very different.

Am I right here?

-shuhsien

Jul 18 '05 #31

Gerrit Holl

Shu-Hsien Sheu wrote:

In my understanding, using try/except rather than if/else is more
pythonic. However, sometimes it is difficult to use the later.
For example, I want to search for a sub string in a list composed of
strings. It is considered "possitive" if there is a match, no matter how
many.

my_test = ['something', 'others', 'still others']

case 1: try/except

hit = 0
for i in my_test:
try:
i.index('some')
hit = 1
except ValueError:
pass case 2: if/else

hit = 0
for i in my_test:
if 'some' in i:
hit = 1
Much faster would be:
def check():
for elem in my_test:
if 'some' in elem:
return True

....this way, it immediatly stops checking all following values once it finds
a single match.
It seems to me that in a simple searching/matching, using if might be
better and the code is smaller. Try/except would have its strengh on
catching multiple errorrs.
Agreed.
However, problems occur if the criteria is
composed of "or" rather than "and". For instance:

if (a in b) or (c in b):
*do something

try:
b.index(a)
b.index(c)
*do something
except ValueError:
pass

The above two are very different.

It would be more similar to use 'if (a in b) and (c in b)',
because that is what the try/except block does. If so, I
think it has the same effect.
I would absolutely prefer the former, because I don't like function
calls who neither change an object whose return value is
thrown away.

regards,
Gerrit.

--
243. As rent of herd cattle he shall pay three gur of corn to the
owner.
-- 1780 BC, Hammurabi, Code of Law
--
Asperger Syndroom - een persoonlijke benadering:
http://people.nl.linux.org/~gerrit/
Het zijn tijden om je zelf met politiek te bemoeien:
http://www.sp.nl/

Jul 18 '05 #32

Stephen Horne

On Tue, 23 Sep 2003 11:10:49 -0400, Shu-Hsien Sheu <sh**@bu.edu>
wrote:

Hi,

In my understanding, using try/except rather than if/else is more
pythonic.

When to use an exception can be a difficult issue in any language.

A common suggestion as to when to use them is 'only for errors' - but
what exactly is an error? Really, its just a special case - and if you
are handling that special case it isn't an error any more.

For example, if your program runs out of memory and cannot complete a
requested task, as long as it handles that special case by reporting
that it run out of memory and not by crashing or whatever, it is
probably doing what any decent requirements document would specify -
in case of insufficient memory, report the problem and cleanly abort
the tasks that cannot be completed without terminating or crashing. In
terms of the requirements, no error has occurred - a special case has
simply been handled exactly as per the requirements.

Actually, I'm not really convinced by that argument either. I think I
first read it in Stroustrup, though I can't be bothered checking ATM.
Anyway, what I'm trying to express is that when to use an exception is
more about intuitions and common sense than hard-and-fast rules.

The nearest thing to a hard-and-fast rule, though, is that if you
throw an exception the condition should really be exceptional - a
special case as opposed to a normal case. try/except is not just an
alternative to if/else, and neither is it an alternative to 'break' in
loops (though Icon programmers may be tempted to use it that way).

One major benefit of exceptions is that you don't have to keep
checking 'has any error occured' in multiple if statements in a
function. Any function that does anything even remotely complicated
will have many different possible error conditions. Making sure that
normal-case code is not run when a failure has already occurred can be
a major headache. Enough of a headache that in C, many people think
error handling is the only case where a 'goto' is acceptable (so an
error can trigger a goto straight to the cleanup code at the end of a
function, keeping the rest of the function clean).

There are other issues, of course - exceptions allow your functions to
cope properly with errors that you don't even know about in functions
that you call, for instance.

But basically, if you handle exceptional conditions by raising
exceptions, your logic will get much clearer as for the most part it
only has to deal with the normal case. The exceptional cases get
handled by the exception handlers.

But if you use try/except as an alternative to if/else, people who
have to read your code may well take up dark magics purely so they can
curse you more effectively ;-)
--
Steve Horne

steve at ninereeds dot fsnet dot co dot uk

Jul 18 '05 #33

Tim Rowe

On Tue, 23 Sep 2003 11:10:49 -0400, Shu-Hsien Sheu <sh**@bu.edu>
wrote:

Hi,

In my understanding, using try/except rather than if/else is more
pythonic.
Rule of thumb: when the block of code is still doing what it's
supposed to do, use if/else. If it's failing to do what it's supposed
to do, use try/except. "except" should be an /exception/!
However, sometimes it is difficult to use the later.
For example, I want to search for a sub string in a list composed of
strings. It is considered "possitive" if there is a match, no matter how
many.

my_test = ['something', 'others', 'still others']

case 1: try/except

hit = 0
for i in my_test:
try:
i.index('some')
hit = 1
except ValueError:
pass
I'd reckon that to be a bad use of try/except; the "exception" is a
perfectly normal case.
case 2: if/else

hit = 0
for i in my_test:
if 'some' in i:
hit = 1
My /guess/ is that this would be faster than case 1, as well as
clearer!
It seems to me that in a simple searching/matching, using if might be
better and the code is smaller. Try/except would have its strengh on
catching multiple errorrs. However, problems occur if the criteria is
composed of "or" rather than "and". For instance:

if (a in b) or (c in b):
*do something

try:
b.index(a)
b.index(c)
*do something
except ValueError:
pass

The above two are very different.

Yes. The first is clear and concise, the second is verbose and
unclear! Also the second could mask a genuine ValueError if a, b, or
c is an evaluation rather than a simple variable, so you'd think that
neither a nor c was in b when in fact you have no idea: something went
invisibly wrong and you never actually completed the search.

So try/except /only/ when something has gone wrong and you need to go
into some sort of recovery or termination, /not/ for routine tests.

Jul 18 '05 #34

Dennis Lee Bieber

Shu-Hsien Sheu fed this fish to the penguins on Tuesday 23 September
2003 08:10 am:

It seems to me that in a simple searching/matching, using if might be
better and the code is smaller. Try/except would have its strengh on
catching multiple errorrs. However, problems occur if the criteria is
composed of "or" rather than "and". For instance:

if (a in b) or (c in b):
*do something
Python conditionals short-circuit. If the first clause is true, the
second (for an OR) is never even executed.
try:
b.index(a)
b.index(c)
*do something
except ValueError:
pass

The above two are very different.
Yes, in that this is equivalent to an AND, rather than an OR

-- ================================================== ============ <
wl*****@ix.netcom.com | Wulfraed Dennis Lee Bieber KD6MOG <
wu******@dm.net | Bestiaria Support Staff <
================================================== ============ <
Bestiaria Home Page: http://www.beastie.dm.net/ <
Home Page: http://www.dm.net/~wulfraed/ <

Jul 18 '05 #35

Bob Gailer

At 09:10 AM 9/23/2003, Shu-Hsien Sheu wrote:

Hi,

In my understanding, using try/except rather than if/else is more pythonic.
If/else and try/except are both Pythonic. In many cases you don't even get
to choose.
However, sometimes it is difficult to use the later.
For example, I want to search for a sub string in a list composed of
strings. It is considered "possitive" if there is a match, no matter how many.

my_test = ['something', 'others', 'still others']

case 1: try/except

hit = 0
for i in my_test:
try:
i.index('some')
hit = 1
except ValueError:
pass

case 2: if/else

hit = 0
for i in my_test:
if 'some' in i:
hit = 1
Consider breaking out of the loop at the first success

for i in my_test:
if 'some' in i:
hit = 1
break
else:
hit = 0

Iist comprehension can also be an alternative:

hit = [i for i in my_test if 'some' in i]
It seems to me that in a simple searching/matching, using if might be
better and the code is smaller. Try/except would have its strengh on
catching multiple errorrs. However, problems occur if the criteria is
composed of "or" rather than "and". For instance:

if (a in b) or (c in b):
*do something

try:
b.index(a)
b.index(c)
*do something
except ValueError:
pass

The above two are very different.

Am I right here?

AFAIAC its a mater of style.

Bob Gailer
bg*****@alum.rpi.edu
303 442 2625
---
Outgoing mail is certified Virus Free.
Checked by AVG anti-virus system (http://www.grisoft.com).
Version: 6.0.506 / Virus Database: 303 - Release Date: 8/1/2003

Jul 18 '05 #36

Hung Jung Lu

Tim Rowe <tim@remove_if_not_spam.digitig.co.uk> wrote in message news:<u9********************************@4ax.com>. ..

On Tue, 23 Sep 2003 11:10:49 -0400, Shu-Hsien Sheu <sh**@bu.edu>
wrote:
In my understanding, using try/except rather than if/else is more
pythonic.
Rule of thumb: when the block of code is still doing what it's
supposed to do, use if/else. If it's failing to do what it's supposed
to do, use try/except. "except" should be an /exception/!

...... So try/except /only/ when something has gone wrong and you need to go
into some sort of recovery or termination, /not/ for routine tests.

You have a valid point of view, which nonetheless is not shared by
everyone. This is a recurring subject in the newsgroup.

Python exceptions have been used for other purposes, as can be seen
from Python FAQ (e.g. "4.22 Why is there no goto?" in
http://www.python.org/doc/faq/general.html)

The "for" loop in Python is also implemented internally with
exceptions. E.g.: http://groups.google.com/groups?selm...&output=gplain,
where it mentioned:

"... In some other languages, 'non failure' mode exceptions may be
unusual, but it's the normal idiom in Python."

regards,

Hung Jung

Jul 18 '05 #37

Hung Jung Lu

Shu-Hsien Sheu <sh**@bu.edu> wrote in message news:<ma**********************************@python. org>...

catching multiple errorrs. However, problems occur if the criteria is
composed of "or" rather than "and". For instance:

if (a in b) or (c in b):
*do something

try:
b.index(a)
b.index(c)
*do something
except ValueError:
pass

For the "or" case, actually the exception trick works rather well.
Usually people raise events by hand:

class Found(Exception): pass

try:
if a in b: raise Found
if c in b: raise Found
except Found:
do something

Of course, one can supply additional information, as in:

try:
if a in b: raise Found('a in b')
if c in b: raise Found('c in b')
except Found, e:
print str(e)

Hung Jung

Jul 18 '05 #38

Tim Rowe

On 27 Sep 2003 01:25:10 -0700, hu********@yahoo.com (Hung Jung Lu)
wrote:

Tim Rowe <tim@remove_if_not_spam.digitig.co.uk> wrote in message news:<u9********************************@4ax.com>. ..
On Tue, 23 Sep 2003 11:10:49 -0400, Shu-Hsien Sheu <sh**@bu.edu>
wrote:
>In my understanding, using try/except rather than if/else is more
>pythonic.

Rule of thumb: when the block of code is still doing what it's
supposed to do, use if/else. If it's failing to do what it's supposed
to do, use try/except. "except" should be an /exception/!

.....
So try/except /only/ when something has gone wrong and you need to go
into some sort of recovery or termination, /not/ for routine tests.

You have a valid point of view, which nonetheless is not shared by
everyone. This is a recurring subject in the newsgroup.

<examples snipped>

That's the nice thing about rules of thumb: they're not binding :-)

Internal constructs don't bother me -- ultimately it all comes down to
machine code which will be a mass of goto's. That doesn't mean that
my code should be (could be!) a mass of goto's.

And the last time I needed a goto for anything I was programming in
BASIC; I don't need to use exceptions to emulate it in a language with
decent flow control constructs (Python qualifies, of course!).

Jul 18 '05 #39

Stephen Horne

On Mon, 29 Sep 2003 10:33:27 +0100, Tim Rowe
<tim@remove_if_not_spam.digitig.co.uk> wrote:

Internal constructs don't bother me -- ultimately it all comes down to
machine code which will be a mass of goto's. That doesn't mean that
my code should be (could be!) a mass of goto's.

And the last time I needed a goto for anything I was programming in
BASIC; I don't need to use exceptions to emulate it in a language with
decent flow control constructs (Python qualifies, of course!).

Ah - a good old goto debate ;-)

I have long claimed that goto is useful in rare circumstances, that it
can be clearer and more maintainable than structured code in those
rare cases, and that while anything can be abused it is still
perfectly possible for a good programmer to use it responsibly and to
refactor when the code becomes too messy. Structured, OO, and indeed
any style of programming can become unreadable and unmaintainable
(hence the fuss about refactoring).

That said, it was quite a shock when I actually found a case where I
needed a goto about six months ago. Quite literally I made several
attempts at writing a function in a structured way and got it wrong
each time. Each time I came back to it the previous version seemed a
horrible mess so I rewrote it. Eventually I figured this was stupid,
especially as it didn't seem so hard in my head. I put that mental
picture onto paper - and as I did so I realised it simply wasn't a
structured job. It was very much a state transition model. But because
it completes all its work in one go, and doesn't need to maintain the
state between calls, it simply hadn't occurred to me to think of it
that way.

I remember telling a college lecturer that if state transition models
are appropriate for some things then so must gotos be appropriate, and
that in a 'short running' state machine a goto would be the logical
way to implement the transitions. At the time, I didn't think it would
be so long before I found an example ;-)

Anyway, this example was in some fairly low level library code I was
writing in C++. I don't see any need for goto in Python, trust that it
will never be added, and don't like to see exceptions abused that way.

Just sharing a rather pointless anecdote.
--
Steve Horne

steve at ninereeds dot fsnet dot co dot uk

Jul 18 '05 #40

Trouble sorting lists (unicode/locale related?)

Similar topics