473,394 Members | 1,781 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,394 software developers and data experts.

sort one list using the values from another list

Hello,

I have two lists, one with strings (filenames, actually), and one with a real-number
rank, like:

A=['hello','there','this','that']
B=[3,4,2,5]

I'd like to sort list A using the values from B, so the result would be in this example,

A=['this','hello','there','that']

The sort method on lists does in-place sorting. Is there a way to do what I want here?
thanks,

Brian Blais

--
-----------------

bb****@bryant.edu
http://web.bryant.edu/~bblais

Feb 26 '06 #1
23 7292
Brian Blais wrote:
Hello,

I have two lists, one with strings (filenames, actually), and one with a
real-number
rank, like:

A=['hello','there','this','that']
B=[3,4,2,5]

I'd like to sort list A using the values from B, so the result would be
in this example,

A=['this','hello','there','that']
Here are two ways:
A=['hello','there','this','that']
B=[3,4,2,5]
zip(*sorted(zip(B,A)))[1] ('this', 'hello', 'there', 'that')
[a for b,a in sorted(zip(B,A))]

['this', 'hello', 'there', 'that']

I prefer the second one, I think it is more readable, will use less
memory (doesn't have to create a new list B in the final step) and it's
even faster on my computer:

D:\Projects\CB>python -m timeit -s
"A=['hello','there','this','that'];B=[3,4,2,5]" "zip(*sorted(zip(B,A)))[1]"
100000 loops, best of 3: 6.29 usec per loop

D:\Projects\CB>python -m timeit -s
"A=['hello','there','this','that'];B=[3,4,2,5]" "[a for b,a in
sorted(zip(B,A))]"
100000 loops, best of 3: 5.53 usec per loop

(I'm bored this morning :-)

There's probably a clever way to do it using the key parameter to sort
but I can't think of it...
The sort method on lists does in-place sorting. Is there a way to do
what I want here?


The example above does not sort in place, if you want it to be in place use
A[:] = [a for b,a in sorted(zip(B,A))]

Kent
Feb 26 '06 #2
Brian Blais wrote:
Hello,

I have two lists, one with strings (filenames, actually), and one with a
real-number
rank, like:

A=['hello','there','this','that']
B=[3,4,2,5]

I'd like to sort list A using the values from B, so the result would be
in this example,

A=['this','hello','there','that']

The sort method on lists does in-place sorting. Is there a way to do
what I want here?


If A has no duplicate elements, you could create a hash mapping A's
elements to their respective precedences, then provide a sort criterion
that accessed the hash. Alternatively, you could do something like this:

from operator import itemgetter
result = map(itemgetter(0), sorted(zip(A, B), key=itemgetter(1)))
Feb 26 '06 #3
Brian Blais <bb****@bryant.edu> wrote:
Hello,

I have two lists, one with strings (filenames, actually), and one with a
real-number rank, like:

A=['hello','there','this','that']
B=[3,4,2,5]

I'd like to sort list A using the values from B, so the result would be in
this example,

A=['this','hello','there','that']

The sort method on lists does in-place sorting. Is there a way to do what
I want here?


Sure, many ways, mostly ones based on the "Decorate-Sort-Undecorate"
idiom and its incarnation as the key= optional argument to function
sorted and list method sort. I believe that a more explicit DSU is
ideal (clearer, though maybe a tad slower) in your case:

_aux = zip(B, A)
_aux.sort()
A[:] = [a for b, a in _aux]

Twisting things to use the speedy key= approach seems more "contorted"
and less general in this case, and it also needs an auxiliary structure
(to avoid slowing things down by repeated .index calls) However, things
may be different if you need to consider the possibility that B has some
duplicate items, and deeply care for such a case NOT to result in
comparisons of items of A. The above approach would then have to be
changed, e.g., to:

_aux = zip(B, enumerate(A))
_aux.sort()
A[:] = [a for (b, (i, a)) in _aux]

where I've also added a pair of redundant parentheses to make the
nesting structure of items of _aux more obvious. Of course, each of
these has a variant using sorted instead of sort, and for those you
could use izip from itertools rather than built-in zip, and do
everything within one single statement, etc, etc.
Alex
Feb 26 '06 #4
Brian Blais wrote:
Hello,

I have two lists, one with strings (filenames, actually), and one with a
real-number
rank, like:

A=['hello','there','this','that']
B=[3,4,2,5]

I'd like to sort list A using the values from B, so the result would be
in this example,

A=['this','hello','there','that']


Here's a solution that makes use of the key= argument to sorted():
A = ['hello','there','this','that']
B = [3,4,2,5]
indices = range(len(A))
indices.sort(key=B.__getitem__)
[A[i] for i in indices]

['this', 'hello', 'there', 'that']

Basically, it sorts the indices to A -- [0, 1, 2, 3] -- in the order
given by B, and then selects the items from A in the appropriate order.

Feb 26 '06 #5
Steven Bethard wrote:
Here's a solution that makes use of the key= argument to sorted():
A = ['hello','there','this','that']
B = [3,4,2,5]
indices = range(len(A))
indices.sort(key=B.__getitem__)
[A[i] for i in indices] ['this', 'hello', 'there', 'that']

Basically, it sorts the indices to A -- [0, 1, 2, 3] -- in the order
given by B, and then selects the items from A in the appropriate order.


That's impressive. I'm sure like a lot of other people I looked at the
question and thought it ought to be possible to use key to get this result
without a load of zipping and unzipping but just couldn't quite see it.

If you combine your technique with 'sorted', you get the one line version:
[A[i] for i in sorted(range(len(A)), key=B.__getitem__)]

['this', 'hello', 'there', 'that']

Feb 26 '06 #6
Your solution Steven Bethard looks very intelligent, here is a small
speed test, because sorting a list according another one is a quite
common operation.
(Not all solutions are really the same, as Alex has shown).
from itertools import izip, imap
from operator import itemgetter
from random import randint, choice, randint, shuffle
from string import lowercase

def timeit(function, *arg1, **arg2):
"""timeit(function, *arg1, **arg2): given a function and its
parameters, calls it
and computes its running time, in seconds. It result is
discarded."""
t1 = clock()
function(*arg1, **arg2)
t2 = clock()
return round(t2-t1, 2)

def psort1(s1, s2):
s1[:] = [a for b,a in sorted(zip(s2, s1))]

def psort2(s1, s2):
aux = zip(s2, enumerate(s1))
aux.sort()
s1[:] = [a for b, (i, a) in aux]

def psort3(s1, s2):
_indices = range(len(s1))
_indices.sort(key=s2.__getitem__)
s1[:] = [s1[i] for i in _indices]

def psort4(s1, s2):
_indices = range(len(s1))
_indices.sort(key=s2.__getitem__)
s1[:] = map(s1.__getitem__, _indices)

def psort5(s1, s2):
s1[:] = zip(*sorted(zip(s2, s1)))[1]

def psort6(s1, s2):
s1[:] = map(itemgetter(0), sorted(zip(s1, s2), key=itemgetter(1)))

def psort7(s1, s2):
s1[:] = [a for b,a in sorted(izip(s2, s1))]

def psort8(s1, s2):
s1[:] = zip(*sorted(izip(s2, s1)))[1]

def psort9(s1, s2):
s1[:] = map(itemgetter(0), sorted(izip(s1, s2), key=itemgetter(1)))
n = 100000
s1 = ["".join(choice(lowercase) for i in xrange(randint(2,8))) for k in
xrange(n)]
s2 = range(n)
shuffle(s2)

for psort in sorts:
s1c = list(s1)
print timeit(psort, s1c, s2), "s"
Timings on my PC, Python 2.4, PIII 500 MHz:
2.87
3.82
1.6
1.56
4.35
2.49
2.75
4.29
2.35

psort4 is my variant of your solution, and it seems the faster one.
Note: one liners are bad, and expecially bad in Python that is designed
to be a readable language, so let's avoid them as much as possible. I
have used one of them to generate s1, but I'll not use them in
production code or in my libraries, etc.

Bye,
bearophile

Feb 26 '06 #7
be************@lycos.com wrote:
Your solution Steven Bethard looks very intelligent, here is a small
speed test, because sorting a list according another one is a quite
common operation.
(Not all solutions are really the same, as Alex has shown).


Try this one.
def psort10(s1, s2):
d = dict(zip(s2,s1))
s1[:] = (d[n] for n in sorted(d.keys()))
It's faster on my system because d.keys() is already sorted. But that
may not be the case on other versions of python.

Ron

Feb 26 '06 #8
Ron Adam <rr*@ronadam.com> wrote:
be************@lycos.com wrote:
Your solution Steven Bethard looks very intelligent, here is a small
speed test, because sorting a list according another one is a quite
common operation.
(Not all solutions are really the same, as Alex has shown).


Try this one.

def psort10(s1, s2):
d = dict(zip(s2,s1))
s1[:] = (d[n] for n in sorted(d.keys()))

It's faster on my system because d.keys() is already sorted. But that
may not be the case on other versions of python.


If there are duplicates in s2, this solution will silently lose some
items from s1. I would at least include an assert len(s2)==len(d) as
the second statement to get some insurance that this doesn't occur.
Alex
Feb 26 '06 #9
>It's faster on my system because d.keys() is already sorted. But that may not be the case on other versions of python.<

In my version it's a little slower. But what system are you using where
keys is already sorted? IronPython maybe?

Bye,
bearophile

Feb 26 '06 #10
Alex Martelli wrote:
Ron Adam <rr*@ronadam.com> wrote:
be************@lycos.com wrote:
Your solution Steven Bethard looks very intelligent, here is a small
speed test, because sorting a list according another one is a quite
common operation.
(Not all solutions are really the same, as Alex has shown).

Try this one.

def psort10(s1, s2):
d = dict(zip(s2,s1))
s1[:] = (d[n] for n in sorted(d.keys()))

It's faster on my system because d.keys() is already sorted. But that
may not be the case on other versions of python.


If there are duplicates in s2, this solution will silently lose some
items from s1. I would at least include an assert len(s2)==len(d) as
the second statement to get some insurance that this doesn't occur.
Alex


Good point, and a function that only works part time isn't good. ;-)

Feb 27 '06 #11
be************@lycos.com wrote:
It's faster on my system because d.keys() is already sorted. But that may not be the case on other versions of python.<


In my version it's a little slower. But what system are you using where
keys is already sorted? IronPython maybe?

Bye,
bearophile

Python 2.4.1 (#65, Mar 30 2005, 09:13:57) [MSC v.1310 32 bit (Intel)] on
win32

I was a bit surprised by them being sorted. I just happend to try
d.keys() in place of s2, and it sped up. I was expecting it to be a bit
slower.

Considering the number time I sort keys after getting them, It's the
behavior I would prefer. Maybe a more dependable dict.sortedkeys()
method would be nice. ;-)

Like Alex pointed out you need to make sure there aren't duplicates in
the list used for keys.

def psort10(s1, s2):
d = dict(zip(s2,s1))
assert len(d) == len(s1)
s1[:] = (d[n] for n in sorted(d.keys()))

Ron
Feb 27 '06 #12
Ron Adam <rr*@ronadam.com> wrote:
...
Considering the number time I sort keys after getting them, It's the
behavior I would prefer. Maybe a more dependable dict.sortedkeys()
method would be nice. ;-)


sorted(d) is guaranteed to do exactly the same thing as sorted(d.keys())
AND to be faster (would be pretty weird if it weren't faster...!).

E.g., ...:

helen:~ alex$ python -mtimeit -s'd=dict(enumerate("tarazoplay"))'
'sorted(d.keys())'
100000 loops, best of 3: 6.82 usec per loop

helen:~ alex$ python -mtimeit -s'd=dict(enumerate("tarazoplay"))'
'sorted(d)'
100000 loops, best of 3: 5.98 usec per loop
Alex
Feb 27 '06 #13
Alex Martelli wrote:
Ron Adam <rr*@ronadam.com> wrote:
...
Considering the number time I sort keys after getting them, It's the
behavior I would prefer. Maybe a more dependable dict.sortedkeys()
method would be nice. ;-)


sorted(d) is guaranteed to do exactly the same thing as sorted(d.keys())
AND to be faster (would be pretty weird if it weren't faster...!).

E.g., ...:

helen:~ alex$ python -mtimeit -s'd=dict(enumerate("tarazoplay"))'
'sorted(d.keys())'
100000 loops, best of 3: 6.82 usec per loop

helen:~ alex$ python -mtimeit -s'd=dict(enumerate("tarazoplay"))'
'sorted(d)'
100000 loops, best of 3: 5.98 usec per loop
Alex

Yes, it did decrease it. And simplified it as well. ;)
def psort11(s1, s2):
d = dict(zip(s2, s1))
assert len(d) == len(s1)
sorted(d)
s1[:] = d.values()
psort1 0.554 s
psort2 0.727 s
psort3 0.295 s
psort4 0.293 s
psort5 0.831 s
psort6 0.438 s
psort7 0.575 s
psort8 0.845 s
psort9 0.424 s
psort10 0.235 s
psort11 0.206 s

Feb 27 '06 #14
Ron Adam wrote:
Alex Martelli wrote:
Ron Adam <rr*@ronadam.com> wrote:
...
Considering the number time I sort keys after getting them, It's the
behavior I would prefer. Maybe a more dependable dict.sortedkeys()
method would be nice. ;-)


sorted(d) is guaranteed to do exactly the same thing as sorted(d.keys())
AND to be faster (would be pretty weird if it weren't faster...!).

E.g., ...:

helen:~ alex$ python -mtimeit -s'd=dict(enumerate("tarazoplay"))'
'sorted(d.keys())'
100000 loops, best of 3: 6.82 usec per loop

helen:~ alex$ python -mtimeit -s'd=dict(enumerate("tarazoplay"))'
'sorted(d)'
100000 loops, best of 3: 5.98 usec per loop
Alex

Yes, it did decrease it. And simplified it as well. ;)
def psort11(s1, s2):
d = dict(zip(s2, s1))
assert len(d) == len(s1)
sorted(d)
s1[:] = d.values()


This probably should be:

def psort11(s1, s2):
d = dict(zip(s2,s1))
assert len(d) == len(s1)
s1[:] = list(d[v] for v in sorted(d))

Feb 27 '06 #15
Kent Johnson wrote:
>>> [a for b,a in sorted(zip(B,A))]

also, [a for _,a in sorted(zip(B,A))]

didn't read refs, tested above python-2.2.3.

--
Sun Yi Ming

you can logout any time you like,
but you can never leave...
Feb 27 '06 #16
Dolmans Sun wrote:
Kent Johnson wrote:
>>> [a for b,a in sorted(zip(B,A))]


also, [a for _,a in sorted(zip(B,A))]

didn't read refs, tested above python-2.2.3.


Is there something here I can't see, or did you just
change a variable name and present that as another
solution?

Feb 27 '06 #17
Ron Adam wrote:
Ron Adam wrote:
Alex Martelli wrote:
Ron Adam <rr*@ronadam.com> wrote:
...
Considering the number time I sort keys after getting them, It's the
behavior I would prefer. Maybe a more dependable dict.sortedkeys()
method would be nice. ;-)

sorted(d) is guaranteed to do exactly the same thing as sorted(d.keys())
AND to be faster (would be pretty weird if it weren't faster...!).

E.g., ...:

helen:~ alex$ python -mtimeit -s'd=dict(enumerate("tarazoplay"))'
'sorted(d.keys())'
100000 loops, best of 3: 6.82 usec per loop

helen:~ alex$ python -mtimeit -s'd=dict(enumerate("tarazoplay"))'
'sorted(d)'
100000 loops, best of 3: 5.98 usec per loop
Alex

Yes, it did decrease it. And simplified it as well. ;)

def psort11(s1, s2):
d = dict(zip(s2, s1))
assert len(d) == len(s1)
sorted(d)
s1[:] = d.values()


Dictionaries are not ordered, the "sorted" line does nothing except produce
a sorted list of the dictionary's keys which is ignored.
This probably should be:

def psort11(s1, s2):
d = dict(zip(s2,s1))
assert len(d) == len(s1)
s1[:] = list(d[v] for v in sorted(d))


You could do this to avoid all of those lookups:

def psort_xx(s1, s2):
pairs = sorted(dict(zip(s2, s1)).iteritems())
assert len(pairs) == len(s1)
s1[:] = [s1_value for s2_value, s1_value in pairs]
--
-Scott David Daniels
sc***********@acm.org
Feb 27 '06 #18
Scott David Daniels wrote:
Ron Adam wrote:
Ron Adam wrote: This probably should be:

def psort11(s1, s2):
d = dict(zip(s2,s1))
assert len(d) == len(s1)
s1[:] = list(d[v] for v in sorted(d))


You could do this to avoid all of those lookups:

def psort_xx(s1, s2):
pairs = sorted(dict(zip(s2, s1)).iteritems())
assert len(pairs) == len(s1)
s1[:] = [s1_value for s2_value, s1_value in pairs]


This version takes twice as long on my system, although the result may
vary on other platforms. Dictionary lookups are very fast.

Although replacing zip with izip cut the time in half.
def psort11(s1, s2):
d = dict(izip(s2,s1))
assert len(d) == len(s1)
s1[:] = list(d[v] for v in sorted(d))

Ron
Feb 27 '06 #19
Following Ron Adam solution (and using [] instead of list() in the last
line), this may be a possible solution of the problem, that is often
quite fast:

def psort16(s1, s2):
try:
d = dict(izip(s2, s1))
except TypeError:
_indices = range(len(s1))
_indices.sort(key=s2.__getitem__)
s1[:] = map(s1.__getitem__, _indices)
else:
if len(d) == len(s1):
s1[:] = [d[v] for v in sorted(d)]
else:
_indices = range(len(s1))
_indices.sort(key=s2.__getitem__)
s1[:] = map(s1.__getitem__, _indices)

Bye,
bearophile

Feb 28 '06 #20
Magnus Lycka wrote:
Is there something here I can't see, or did you just
change a variable name and present that as another
solution?

Heh, I have use python for a very short time. Base your statements, I test
in python, found `_' is a valid variable name. So I don't know it is a just
a plain variable or like something pattern matching in Haskell.
--
Simon Sun

you can logout any time you like,
but you can never leave...
Feb 28 '06 #21
Simon Sun wrote:
Magnus Lycka wrote:
Is there something here I can't see, or did you just
change a variable name and present that as another
solution?


Heh, I have use python for a very short time. Base your statements, I test
in python, found `_' is a valid variable name. So I don't know it is a just
a plain variable or like something pattern matching in Haskell.


_ is just a plain variable name in Python. It is sometimes when a
variable is needed to receive a value that won't be used. So your
example was correct and idiomatic but functionally identical to mine.

Kent
Feb 28 '06 #22
>_ is just a plain variable name in Python. It is sometimes when a variable is needed to receive a value that won't be used.<

Like in some other interactive systems (Mathematica, etc, but with a
different syntax) _ has a use in the interactive shell, it contains the
last unassigned result:
a = 2 * 5
_ Traceback (most recent call last):
File "<interactive input>", line 1, in ?
NameError: name '_' is not defined 2 * 5 10 _

10

Bye,
bearophile

Feb 28 '06 #23
be************@lycos.com wrote:
Following Ron Adam solution (and using [] instead of list() in the last
line), this may be a possible solution of the problem, that is often
quite fast:

def psort16(s1, s2):
try:
d = dict(izip(s2, s1))
except TypeError:
_indices = range(len(s1))
_indices.sort(key=s2.__getitem__)
s1[:] = map(s1.__getitem__, _indices)
else:
if len(d) == len(s1):
s1[:] = [d[v] for v in sorted(d)]
else:
_indices = range(len(s1))
_indices.sort(key=s2.__getitem__)
s1[:] = map(s1.__getitem__, _indices)

Bye,
bearophile


Looks good, but I think It would be simpler to just do.

def psort17(s1, s2):
try:
d = dict(izip(s2, s1))
assert len(d) != len(s2)
s1[:] = [d[v] for v in sorted(d)]
except Exception:
_indices = range(len(s1))
_indices.sort(key=s2.__getitem__)
s1[:] = map(s1.__getitem__, _indices)

We don't need to specify which exception. Any problems will just get
raised again on the second try it also fails.

Cheers,
Ron


Feb 28 '06 #24

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
by: Booser | last post by:
// Merge sort using circular linked list // By Jason Hall <booser108@yahoo.com> #include <stdio.h> #include <stdlib.h> #include <time.h> #include <math.h> //#define debug
4
by: Robin Tucker | last post by:
How do I sort the items in a list box? I am using a class derived from IComparer to sort items on columns in a ListView, but the ListBox doesn't support this kind of facility. The "items" in my...
5
by: carrajo | last post by:
Hey, I'm trying to duplicate the following: Select List 1 --- Apple Orange Banana
2
by: John Dalberg | last post by:
I am adding integer values from a datarow collection to a List<int> using the Add method. The values are read in the correct order from the collection but once the List is populated, the values in...
2
subashini Thiyagarajan
by: subashini Thiyagarajan | last post by:
Dear friends, two list boxes are there.when i select any data from first list box second list box should automatically deactivated.how to do? i have populated list box items from database in both...
2
nabh4u
by: nabh4u | last post by:
hi friends, i have a program where i have to sort a double linked list using merge sort. i am having problem with changing the pointers of the list, like the previous and the next pointers of a...
6
by: troy_lee | last post by:
I have a continuous form that has one combo box (cbo1) with the selectable values of "Removal" and "Installation". I would like to change another combo box (cbo2) value list based on the selection...
6
by: Tem | last post by:
Thanks for all your responses. I see why c# is such a powerful language and starting to like it more and more. I also need help sorting this nested list. It is a little tricky. not sure where to...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.