469,568 Members | 1,430 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 469,568 developers. It's quick & easy.

Replacing cmp with key for sorting

I want to sort sequences of strings lexicographically but those with
longer prefix should come earlier, e.g. for s = ['a', 'bc', 'bd',
'bcb', 'ba', 'ab'], the sorted sequence is ['ab', 'a', 'ba', 'bcb',
'bc', 'bd']. Currently I do it with:

s.sort(cmp=lambda x,y: 0 if x==y else
-1 if x.startswith(y) else
+1 if y.startswith(x) else
cmp(x,y))

Can this be done with an equivalent key function instead of cmp ?

George
Nov 3 '08 #1
9 1683
On Nov 3, 6:49*pm, George Sakkis <george.sak...@gmail.comwrote:
I want to sort sequences of strings lexicographically but those with
longer prefix should come earlier, e.g. for s = ['a', 'bc', 'bd',
'bcb', 'ba', 'ab'], the sorted sequence is ['ab', 'a', 'ba', 'bcb',
'bc', 'bd']. Currently I do it with:

s.sort(cmp=lambda x,y: 0 if x==y else
* * * * * * * * * * * * * * * * * * -1 if x.startswith(y) else
* * * * * * * * * * * * * * * * * * +1 if y.startswith(x) else
* * * * * * * * * * * * * * * * * * cmp(x,y))

Can this be done with an equivalent key function instead of cmp ?

George
Your input and output:

s = ['a', 'bc', 'bd', 'bcb', 'ba', 'ab']
r = ['ab', 'a', 'ba', 'bcb', 'bc', 'bd']

To me your lambda looks like an abuse of the inline if expression. So
I suggest to replace it with a true function, that is more readable:

def mycmp(x, y):
if x == y:
return 0
elif x.startswith(y):
return -1
elif y.startswith(x):
return +1
else:
return cmp(x, y)

print sorted(s, cmp=mycmp)

It's a peculiar cmp function, I'm thinking still in what situations it
can be useful.

To use the key argument given a cmp function I use the simple code
written by Hettinger:

def cmp2key(mycmp):
"Converts a cmp= function into a key= function"
class K:
def __init__(self, obj, *args):
self.obj = obj
def __cmp__(self, other):
return mycmp(self.obj, other.obj)
return K
print sorted(s, key=cmp2key(mycmp))

Now I'll look for simpler solutions...

Bye,
bearophile
Nov 3 '08 #2
George Sakkis wrote:
s.sort(cmp=lambda x,y: 0 if x==y else
-1 if x.startswith(y) else
+1 if y.startswith(x) else
cmp(x,y))


Probably not what you had in mind ...
>>s
['a', 'bc', 'bd', 'bcb', 'ba', 'ab']
>>maxlen = max(len(si) for si in s)
def k(si): return si+'z'*(maxlen-len(si))
...
>>sorted(s,key=k)
['ab', 'a', 'ba', 'bcb', 'bc', 'bd']

Cheers,
Alan Isaac
Nov 3 '08 #3
George Sakkis <ge***********@gmail.comwrites:
I want to sort sequences of strings lexicographically but those with
longer prefix should come earlier, e.g. for s = ['a', 'bc', 'bd',
'bcb', 'ba', 'ab'], the sorted sequence is ['ab', 'a', 'ba', 'bcb',
'bc', 'bd']. Currently I do it with:

s.sort(cmp=lambda x,y: 0 if x==y else
-1 if x.startswith(y) else
+1 if y.startswith(x) else
cmp(x,y))

Can this be done with an equivalent key function instead of cmp ?
Here's an idea:
>>sorted(s, key=lambda x: x+'z'*(3-len(s)))
['ab', 'a', 'ba', 'bcb', 'bc', 'bd']

The 3 above is the length of the longest string in the list

Here's another idea, probably more practical:
>>sorted(s, key=lambda x: tuple(256-ord(l) for l in x), reverse=True)
['ab', 'a', 'ba', 'bcb', 'bc', 'bd']

HTH

--
Arnaud

Nov 3 '08 #4
Alan G Isaac:
Probably not what you had in mind ...
...
>>maxlen = max(len(si) for si in s)
* * *>>def k(si): return si+'z'*(maxlen-len(si))
This looks a little better:

assert isinstance(s, str)
sorted(s, key=lambda p: p.ljust(maxlen, "\255"))

If the string is an unicode that may not work anymore.
I don't know if there are better solutions.

Bye,
bearophile
Nov 3 '08 #5
Arnaud Delobelle:
Here's another idea, probably more practical:
>sorted(s, key=lambda x: tuple(256-ord(l) for l in x), reverse=True)
Nice.
A variant that probably works with unicode strings too:

print sorted(s, key=lambda x: [-ord(l) for l in x], reverse=True)

Bye,
bearophile
Nov 3 '08 #6
be************@lycos.com writes:
Arnaud Delobelle:
>Here's another idea, probably more practical:
>>sorted(s, key=lambda x: tuple(256-ord(l) for l in x), reverse=True)

Nice.
A variant that probably works with unicode strings too:

print sorted(s, key=lambda x: [-ord(l) for l in x], reverse=True)
Of course that's better! (although mine will work with unicode if yours
does). It's funny how the obvious escapes me so often. Still I think
the idea of the 'double reverse' (one letterwise, the other listwise)
was quite good.

--
Arnaud
Nov 3 '08 #7
Arnaud Delobelle:
It's funny how the obvious escapes me so often.
In this case it's a well known cognitive effect: the mind of humans
clings to first good/working solution, not allowing its final tuning.
For that you may need to think about something else for a short time,
and then look at your solution with a little "fresher" mind.

This (ugly) translation into D + my functional-style libs shows why
Python syntax is a good idea:

import d.all;
void main() {
auto txt = "a bc bd bcb ba ab".split();
putr( sorted(txt, (string s){ return map((char c){return -
cast(int)c;}, s);} ).reverse );
}

Long Live To Python! :-)

Bye,
bearophile
Nov 3 '08 #8
On Nov 3, 1:51*pm, bearophileH...@lycos.com wrote:
Arnaud Delobelle:
Here's another idea, probably more practical:
>>sorted(s, key=lambda x: tuple(256-ord(l) for l in x), reverse=True)

Nice.
A variant that probably works with unicode strings too:

print sorted(s, key=lambda x: [-ord(l) for l in x], reverse=True)

Bye,
bearophile
Awesome! I tested it on a sample list of ~61K words [1] and it's
almost 40% faster, from ~1.05s dropped to ~0.62s. That's still >15
times slower than the default sorting (0.04s) but I guess there's not
much more room for improvement.

George

[1] http://www.cs.pitt.edu/~kirk/cs1501/...ggle/5desk.txt
Nov 3 '08 #9
George Sakkis:
but I guess there's not much more room for improvement.
That's nonsense, Python is a high level language, so there's nearly
always room for improvement (even in programs written in assembly you
can generally find faster solutions).
If speed is what you look for, and your strings are ASCII then this is
much faster:

tab = "".join(map(chr, xrange(256)))[::-1]
s.sort(key=lambda x: x.translate(tab), reverse=True)

Bye,
bearophile
Nov 3 '08 #10

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

4 posts views Thread by dont bother | last post: by
7 posts views Thread by Federico G. Babelis | last post: by
19 posts views Thread by Owen T. Soroke | last post: by
4 posts views Thread by Ambica Jain | last post: by
7 posts views Thread by Kamal | last post: by
5 posts views Thread by jrod11 | last post: by
reply views Thread by suresh191 | last post: by
4 posts views Thread by guiromero | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.