473,480 Members | 2,333 Online
Bytes | Software Development & Data Engineering Community
Create Post

Home Posts Topics Members FAQ

Sorting by item_in_another_list

Hi,

I have two lists, A and B, such that B is a subset of A.

I wish to sort A such that the elements in B are at the beginning of A,
and keep the existing order otherwise, i.e. stable sort. The order of
elements in B will always be correct.

for example:

A = [0,1,2,3,4,5,6,7,8,9,10]
B = [2,3,7,8]

desired_result = [2,3,7,8,0,1,4,5,6,9,10]
At the moment I have defined a comparator function:

def sort_by_in_list(x,y):
ret = 0
if x in B:
ret -= 1
if y in B:
ret += 1
return ret

and am using:

A.sort(sort_by_in_list)

which does produce the desired results.

I do now have a few questions:

1.) Is this the most efficient method for up to around 500 elements?
If not, what would be better?
2.) This current version does not allow me to choose a different list
for B. Is there a bind_third function of some description that I could
use to define a new function with 3 parameters, feed it the third (the
list to sort by), and have the A.sort(sort_by_in_list) provide the other
2 variables?
Regards to all,

Cameron.
Oct 24 '06 #1
17 1096
Cameron Walsh wrote:
Hi,

I have two lists, A and B, such that B is a subset of A.

I wish to sort A such that the elements in B are at the beginning of A,
and keep the existing order otherwise, i.e. stable sort. The order of
elements in B will always be correct.

for example:

A = [0,1,2,3,4,5,6,7,8,9,10]
B = [2,3,7,8]

desired_result = [2,3,7,8,0,1,4,5,6,9,10]
At the moment I have defined a comparator function:

def sort_by_in_list(x,y):
ret = 0
if x in B:
ret -= 1
if y in B:
ret += 1
return ret

and am using:

A.sort(sort_by_in_list)

which does produce the desired results.

I do now have a few questions:

1.) Is this the most efficient method for up to around 500 elements? If
not, what would be better?
2.) This current version does not allow me to choose a different list
for B. Is there a bind_third function of some description that I could
use to define a new function with 3 parameters, feed it the third (the
list to sort by), and have the A.sort(sort_by_in_list) provide the other
2 variables?
Regards to all,

Cameron.

Well I found an answer to the second question with the following:
>>A=[0,1,2,3,4,5,6,7,8,9,10]
B=[2,3,7,8]
def sort_by_in_list(in_list):
def ret_function(x,y):
ret = 0
if x in in_list:
ret -= 1
if y in in_list:
ret += 1
return ret
return ret_function
>>A.sort(sort_by_in_list(B))
A
[2, 3, 7, 8, 0, 1, 4, 5, 6, 9, 10]
Hope it helps someone,

Cameron.
Oct 24 '06 #2
"Cameron Walsh" <ca***********@gmail.comwrote in message
news:eh**********@enyo.uwa.edu.au...
Hi,

I have two lists, A and B, such that B is a subset of A.

I wish to sort A such that the elements in B are at the beginning of A,
and keep the existing order otherwise, i.e. stable sort. The order of
elements in B will always be correct.

for example:

A = [0,1,2,3,4,5,6,7,8,9,10]
B = [2,3,7,8]

desired_result = [2,3,7,8,0,1,4,5,6,9,10]
At the moment I have defined a comparator function:

def sort_by_in_list(x,y):
ret = 0
if x in B:
ret -= 1
if y in B:
ret += 1
return ret

and am using:

A.sort(sort_by_in_list)

which does produce the desired results.

I do now have a few questions:

1.) Is this the most efficient method for up to around 500 elements? If
not, what would be better?
2.) This current version does not allow me to choose a different list for
B. Is there a bind_third function of some description that I could use to
define a new function with 3 parameters, feed it the third (the list to
sort by), and have the A.sort(sort_by_in_list) provide the other 2
variables?
Think in Python. Define a function to take the list, and have that function
return the proper comparison function. This gives me the chance to also
convert the input list to a set, which will help in scaling up my list to
hundreds of elements. See below.

-- Paul
def sort_by_in_list(reflist):
reflist = set(reflist)
def sort_by_in_list_(x,y):
ret = 0
if x in reflist: ret -= 1
if y in reflist: ret += 1
return ret
return sort_by_in_list_

A = [0,1,2,3,4,5,6,7,8,9,10]
B = [2,3,7,8]
A.sort( sort_by_in_list(B) )
print A

Gives:
[2, 3, 7, 8, 0, 1, 4, 5, 6, 9, 10]
Oct 24 '06 #3
Paul McGuire wrote:
"Cameron Walsh" <ca***********@gmail.comwrote in message
news:eh**********@enyo.uwa.edu.au...
>Hi,

I have two lists, A and B, such that B is a subset of A.

I wish to sort A such that the elements in B are at the beginning of A,
and keep the existing order otherwise, i.e. stable sort. The order of
elements in B will always be correct.

for example:

A = [0,1,2,3,4,5,6,7,8,9,10]
B = [2,3,7,8]

desired_result = [2,3,7,8,0,1,4,5,6,9,10]
At the moment I have defined a comparator function:

def sort_by_in_list(x,y):
ret = 0
if x in B:
ret -= 1
if y in B:
ret += 1
return ret

and am using:

A.sort(sort_by_in_list)

which does produce the desired results.

I do now have a few questions:

1.) Is this the most efficient method for up to around 500 elements? If
not, what would be better?
2.) This current version does not allow me to choose a different list for
B. Is there a bind_third function of some description that I could use to
define a new function with 3 parameters, feed it the third (the list to
sort by), and have the A.sort(sort_by_in_list) provide the other 2
variables?

Think in Python. Define a function to take the list, and have that function
return the proper comparison function. This gives me the chance to also
convert the input list to a set, which will help in scaling up my list to
hundreds of elements. See below.

-- Paul
def sort_by_in_list(reflist):
reflist = set(reflist)
def sort_by_in_list_(x,y):
ret = 0
if x in reflist: ret -= 1
if y in reflist: ret += 1
return ret
return sort_by_in_list_

A = [0,1,2,3,4,5,6,7,8,9,10]
B = [2,3,7,8]
A.sort( sort_by_in_list(B) )
print A

Gives:
[2, 3, 7, 8, 0, 1, 4, 5, 6, 9, 10]

Looks like our answers crossed-over. Must learn about sets in Python...

Thanks very much,

Cameron.
Oct 24 '06 #4
Cameron Walsh <ca***********@gmail.comwrites:
for example:

A = [0,1,2,3,4,5,6,7,8,9,10]
B = [2,3,7,8]

desired_result = [2,3,7,8,0,1,4,5,6,9,10]
How about:

desired_result = B + sorted(x for x in A if x not in B)
Oct 24 '06 #5
ZeD
Paul Rubin wrote:
>A = [0,1,2,3,4,5,6,7,8,9,10]
B = [2,3,7,8]

desired_result = [2,3,7,8,0,1,4,5,6,9,10]

How about:

desired_result = B + sorted(x for x in A if x not in B)
this. is. cool.

--
Under construction
Oct 25 '06 #6
ZeD <vi***********@gmail.comwrites:
desired_result = B + sorted(x for x in A if x not in B)
this. is. cool.
Actually it's better to use a set if B is large:

B2 = set(B)
desired_result = B + sorted(x for x in A if x not in B2)

That avoids a linear search through B for each element of A.
Oct 25 '06 #7
Paul Rubin wrote:
>for example:

A = [0,1,2,3,4,5,6,7,8,9,10]
B = [2,3,7,8]

desired_result = [2,3,7,8,0,1,4,5,6,9,10]

How about:

desired_result = B + sorted(x for x in A if x not in B)
assuming that "keep the existing order" means what it says, you might as
well replace "sorted" with a list comprehension.

</F>

Oct 25 '06 #8
Fredrik Lundh <fr*****@pythonware.comwrites:
desired_result = B + sorted(x for x in A if x not in B)
assuming that "keep the existing order" means what it says, you might
as well replace "sorted" with a list comprehension.
Hmm. I didn't read it that way, but yeah, if that's what the OP meant
then the sort could be skipped. I thought he just wanted a stable
sort, which the sorting primitives now promise.
Oct 25 '06 #9
ZeD wrote:
Paul Rubin wrote:
>>A = [0,1,2,3,4,5,6,7,8,9,10]
B = [2,3,7,8]

desired_result = [2,3,7,8,0,1,4,5,6,9,10]
How about:

desired_result = B + sorted(x for x in A if x not in B)

this. is. cool.
Cool, yes, but I'm not entirely sure it does what the OP wanted. Partly
because I'm not entirely sure what the OP wanted. Counter example:

Given these variables:

A = [0,1,2,3,4,5,6,8,9,10] # Note 7 is missing
B = [2,3,7,8]

which of the following should the function yield?

desired_result = [2,3,7,8,0,1,4,5,6,9,10]
desired_result2 = [2,3,8,0,1,4,5,6,9,10]

The fact that we are ostensibly sorting A makes me thing it should be
the latter, but the example given was ambiguous. If we are in fact
looking for desired_result2, maybe we should use:

result = [ x for x in B if x in A ] + [ x for x in A if X not in B ]

or like the sibling post suggests: substitute set(A) and set(B) for the
"in" argument in each comprehension.

Cheers,
Cliff
Oct 25 '06 #10
"J. Clifford Dyer" <jc*@sdf.lonestar.orgwrote in message
news:eh**********@aioe.server.aioe.org...
ZeD wrote:
>Paul Rubin wrote:
>>>A = [0,1,2,3,4,5,6,7,8,9,10]
B = [2,3,7,8]

desired_result = [2,3,7,8,0,1,4,5,6,9,10]
How about:

desired_result = B + sorted(x for x in A if x not in B)

this. is. cool.

Cool, yes, but I'm not entirely sure it does what the OP wanted. Partly
because I'm not entirely sure what the OP wanted. Counter example:

Given these variables:

A = [0,1,2,3,4,5,6,8,9,10] # Note 7 is missing
B = [2,3,7,8]

which of the following should the function yield?
From the original post:

"I have two lists, A and B, such that B is a subset of A."

So this is not a case that needs to be supported.

I envisioned something like the OP had a sequence of items to start with,
did a random sampling from the list, and wanted to move the sampled items to
the front of the list.

-- Paul

Oct 25 '06 #11
"Paul McGuire" <pt***@austin.rr._bogus_.comwrote in message
news:rk******************@tornado.texas.rr.com...
"J. Clifford Dyer" <jc*@sdf.lonestar.orgwrote in message
news:eh**********@aioe.server.aioe.org...
>ZeD wrote:
>>Paul Rubin wrote:

A = [0,1,2,3,4,5,6,7,8,9,10]
B = [2,3,7,8]
>
desired_result = [2,3,7,8,0,1,4,5,6,9,10]
How about:

desired_result = B + sorted(x for x in A if x not in B)

this. is. cool.

Cool, yes, but I'm not entirely sure it does what the OP wanted. Partly
because I'm not entirely sure what the OP wanted. Counter example:

Given these variables:

A = [0,1,2,3,4,5,6,8,9,10] # Note 7 is missing
B = [2,3,7,8]

which of the following should the function yield?

From the original post:

"I have two lists, A and B, such that B is a subset of A."

So this is not a case that needs to be supported.

I envisioned something like the OP had a sequence of items to start with,
did a random sampling from the list, and wanted to move the sampled items
to the front of the list.

-- Paul
Here is a little subclass of list that has some set-ish operators added.

-- Paul

# ListWithMath does set-like operations on lists, keeping list order
# stable where applicable
class ListWithMath(list):
def isSuperset(self,other):
return len(self ^ other) == len(other)

def __iadd__(self,other):
self.extend(other)
return self

def __isub__(self,other):
if self.isSuperset(other):
for i in other:
self.remove(i)
return self
else:
raise ValueError

def __ixor__(self,other):
for a in self[::-1]:
if not a in other:
self.remove(a)
return self

def __add__(self,other):
temp = ListWithMath(self)
temp += other
return temp

def __sub__(self,other):
temp = ListWithMath(self)
temp -= other
return temp

def __xor__(self,other):
temp = ListWithMath(self)
temp ^= other
return temp

def __radd__(self,other):
return ListWithMath(other) + self
def __rsub__(self,other):
return ListWithMath(other) - self
def __rxor__(self,other):
return ListWithMath(other) ^ self

A = ListWithMath( [0,1,2,3,4,5,6,7,8,9,10] )
B = ListWithMath( [2,3,7,10,8] )

print "%s+(%s-%s)" % (B,A,B)
C = B+(A-B) # union and exclusion
print C

D = ListWithMath( [1,2,3,6,8,11] )
print "%s ^ %s" % (B,D)
print B^D # intersection

print "%s ^= %s" % (B,D)
B ^= D # in-place intersection
print B

print C
C += []
print C
C -= []
print C
C ^= []
print C

Prints:
[2, 3, 7, 10, 8]+([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]-[2, 3, 7, 10, 8])
[2, 3, 7, 10, 8, 0, 1, 4, 5, 6, 9]
[2, 3, 7, 10, 8] ^ [1, 2, 3, 6, 8, 11]
[2, 3, 8]
[2, 3, 7, 10, 8] ^= [1, 2, 3, 6, 8, 11]
[2, 3, 8]
[2, 3, 7, 10, 8, 0, 1, 4, 5, 6, 9]
[2, 3, 7, 10, 8, 0, 1, 4, 5, 6, 9]
[2, 3, 7, 10, 8, 0, 1, 4, 5, 6, 9]
[]
Oct 25 '06 #12
Right you are. I was a bit hasty.

Many thanks.

Cliff
Paul McGuire wrote:
>
From the original post:

"I have two lists, A and B, such that B is a subset of A."
Oct 25 '06 #13
Paul McGuire wrote:
"J. Clifford Dyer" <jc*@sdf.lonestar.orgwrote in message
news:eh**********@aioe.server.aioe.org...
>ZeD wrote:
>>Paul Rubin wrote:

A = [0,1,2,3,4,5,6,7,8,9,10]
B = [2,3,7,8]
>
desired_result = [2,3,7,8,0,1,4,5,6,9,10]
How about:

desired_result = B + sorted(x for x in A if x not in B)
this. is. cool.
Cool, yes, but I'm not entirely sure it does what the OP wanted. Partly
because I'm not entirely sure what the OP wanted. Counter example:

Given these variables:

A = [0,1,2,3,4,5,6,8,9,10] # Note 7 is missing
B = [2,3,7,8]

which of the following should the function yield?

From the original post:

"I have two lists, A and B, such that B is a subset of A."

So this is not a case that needs to be supported.

I envisioned something like the OP had a sequence of items to start with,
did a random sampling from the list, and wanted to move the sampled items to
the front of the list.

-- Paul
The sequence of items I had was a list of files to be processed in a
particular order. I noted that the order in which the files were placed
into the list might not match the desired order of processing, and that
the filenames might not be sortable by filename into this desired order
(such are the vagaries of human file naming.)

So yes, Paul is correct.

However, Cliff is also correct in that the solution provided is not what
I (the OP) wanted. The solution provided is half right, in that it
places the elements also in B before the elements only in A, but it
sorts the remaining elements only in A instead of keeping the original
order.

A more correct solution (at least in terms of answering this particular
problem) would be:

desired_result = B + list(x for x in A if x not in B)
rather than
desired_result = B + sorted(x for x in A if x not in B)

This solution is also guaranteed to work if for some reason the sort
algorithm cannot be guaranteed to be stable in the future or on some
other implementation of Python.
Which brings me to the question, would this solution:

B = set(B)
A = B + list(x for x in A if x not in B)

be faster than this solution:

B = set(B)
A.sort(key=B.__contains__, reverse=True)
My guess is yes, since while the __contains__ method is only run once
for each object, the list still does not require sorting.

Thanks everyone for all your helpful replies,

Cameron.
Oct 26 '06 #14
Cameron Walsh wrote:
Which brings me to the question, would this solution:

B = set(B)
A = B + list(x for x in A if x not in B)

be faster than this solution:

B = set(B)
A.sort(key=B.__contains__, reverse=True)

My guess is yes, since while the __contains__ method is only run once
for each object, the list still does not require sorting.
You're probably going to need to time that for your particular data.
Here's what it looks like on an artificial data set:

$ python -m timeit -s "A = range(100); B = range(40, 60, 2); B_set =
set(B)" "A = B + list(x for x in A if x not in B_set)"
10000 loops, best of 3: 54.5 usec per loop

$ python -m timeit -s "A = range(100); B = range(40, 60, 2); B_set =
set(B)" "A.sort(key=B_set.__contains__, reverse=True)"
10000 loops, best of 3: 39.7 usec per loop

That said, I'd probably still use the first solution -- it's more
immediately obvious why that one works.

STeVe
Oct 26 '06 #15
Steven Bethard <st************@gmail.comwrites:
Cameron Walsh wrote:
Which brings me to the question, would this solution:
B = set(B)
A = B + list(x for x in A if x not in B)
be faster than this solution:
B = set(B)
A.sort(key=B.__contains__, reverse=True)
[timings deleted]
That said, I'd probably still use the first solution -- it's more
immediately obvious why that one works.
Wait a minute, the first example looks wrong, B has gotten replaced by
a set and then it's added to a list.

Anyway how about timing

C = set(A) - set(B)
A = B + filter(C.__contains__, A)

This scans A twice, but it does more of the work in native code,
without sorting.
Oct 26 '06 #16
Paul Rubin wrote:
Steven Bethard <st************@gmail.comwrites:
>Cameron Walsh wrote:
>>Which brings me to the question, would this solution:
B = set(B)
A = B + list(x for x in A if x not in B)
be faster than this solution:
B = set(B)
A.sort(key=B.__contains__, reverse=True)
[timings deleted]
That said, I'd probably still use the first solution -- it's more
immediately obvious why that one works.

Wait a minute, the first example looks wrong, B has gotten replaced by
a set and then it's added to a list.
Yep. If you look back at the lines I actually timed, they were::

B_set = set(B)
A = B + list(x for x in A if x not in B_set)

and::

B_set = set(B)
A.sort(key=B_set.__contains__, reverse=True)

As you noted, you'll get an error if you try to concatenate B as a set
to the list.

Steve
Oct 26 '06 #17
Steven Bethard wrote:
>
As you noted, you'll get an error if you try to concatenate B as a set
to the list.

Steve
Whoops, remind me to check things work before I type them. In the mean
time, here are some more interesting timing results:

With a larger data set, 500 elements instead of 100, the times were
almost the same:

python -m timeit -s "A=range(500); B=range(40,60,2); B_set=set(B)"
"A=B+list(x for x in A if x not in B_set)"
10000 loops, best of 3: 103 usec per loop

python -m timeit -s "A=range(500); B=range(40,60,2); B_set = set(B)"
"A.sort(key=B_set.__contains__, reverse=True)"
10000 loops, best of 3: 99.2 usec per loop

But for the last recommended one:

python -m timeit -s "A=range(500); B=range(40,60,2); C = set(A) -set(B)"
"A=B+filter(C.__contains__, A)"
10000 loops, best of 3: 38.3 usec per loop

Much faster! But that does not take into account the set creation and
set difference calculations. Let's try including that:

python -m timeit -s "A=range(500); B=range(40,60,2)" "C=set(A)-set(B)"
"a=B+filter(C.__contains__, A)"
10000 loops, best of 3: 105 usec per loop

and compare to our other two versions including set creation:

python -m timeit -s "A=range(500); B=range(40,60,2)" "B_set = set(B);
A=B+list(x for x in A if X not in B_set"
10000 loops, best of 3: 105 usec per loop

python -m timeit -s "A=range(500); B=range(40,60,2)" "B_set = set(B);
A.sort(key=B_set.__contains__, reverse=True)"
10000 loops, best of 3: 101 usec per loop

Pretty similar again.

However, converting each element to a string first ("A=list(str(x) for x
in range(500)" etc.), making it more similar to my data, gave:

102usec per loop for the A.sort() method
132usec per loop for the A=B+filter() method
104usec per loop for the A=B+list() method

It is interesting to note the increased time taken by the set
difference/filter method. It appears not to like strings as much as
ints. We can also shave another 7 usec off the filter time:

python -m timeit -s "A=list(str)x) for x in range(500)); B=list(str(x)
for x in range(40,60,2)); not_in_B_set=lambda x: x not in B_set"
"B_set=set(B); A=B+filter(not_in_B_set, A)"
10000 loops, best of 3: 125 usec per loop

In conclusion, it appears I should use the clearest of all the methods,
since for the data sizes I am dealing with, they are of approximately
the same speed, and the time taken is negligible anyway.

Thanks for all your help,

Cameron.
Oct 27 '06 #18

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

4
2513
by: dont bother | last post by:
This is really driving me crazy. I have a dictionary feature_vectors{}. I try to sort its keys using #apply sorting on feature_vectors sorted_feature_vector=feature_vectors.keys()...
7
3227
by: Federico G. Babelis | last post by:
Hi All: I have this line of code, but the syntax check in VB.NET 2003 and also in VB.NET 2005 Beta 2 shows as unknown: Dim local4 As Byte Fixed(local4 = AddressOf dest(offset)) ...
19
25421
by: Owen T. Soroke | last post by:
Using VB.NET I have a ListView with several columns. Two columns contain integer values, while the remaining contain string values. I am confused as to how I would provide functionality to...
10
2747
by: Sjaakie | last post by:
Hi, I'm, what it turns out to be, fooling around with 3-tier design. At several websites people get really enthusiastic about using custom dataobjects instead of datasets/-tables. While trying to...
4
3079
by: Ambica Jain | last post by:
Hi, I want custom sorting on some of the columns in the datagrid. And i am able to do the same by overriding MouseDown event. However, i need to rebind my datatable to reflect the changes in...
7
4791
by: Kamal | last post by:
Hello all, I have a very simple html table with collapsible rows and sorting capabilities. The collapsible row is hidden with css rule (display:none). When one clicks in the left of the...
1
7162
KevinADC
by: KevinADC | last post by:
Introduction In part one we discussed the default sort function. In part two we will discuss more advanced techniques you can use to sort data. Some of the techniques might introduce unfamiliar...
5
3163
by: lemlimlee | last post by:
hello, this is the task i need to do: For this task, you are to develop a Java program that allows a user to search or sort an array of numbers using an algorithm that the user chooses. The...
5
4897
by: jrod11 | last post by:
hi, I found a jquery html table sorting code i have implemented. I am trying to figure out how to edit how many colums there are, but every time i remove code that I think controls how many colums...
0
6915
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
7054
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
7097
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
1
6750
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
6993
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
5353
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
0
4493
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and...
0
3003
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The...
0
193
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.