By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
443,907 Members | 2,039 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 443,907 IT Pros & Developers. It's quick & easy.

generic xmerge ?

P: n/a
Hi,

I was reading this recipe and am wondering if there is a generic
version of it floating around ? My list is a tuple (date, v1, v2, v3)
and I would like it to sort on date. The documentation doesn't mention
how the items are compared and the example only use integers.

http://aspn.activestate.com/ASPN/Coo.../Recipe/141934

Oct 17 '05 #1
Share this Question
Share on Google+
5 Replies


P: n/a
bo****@gmail.com <bo****@gmail.com> wrote:
Hi,

I was reading this recipe and am wondering if there is a generic
version of it floating around ? My list is a tuple (date, v1, v2, v3)
and I would like it to sort on date. The documentation doesn't mention
how the items are compared and the example only use integers.

http://aspn.activestate.com/ASPN/Coo.../Recipe/141934


I'm not sure what "my list is a tuple" mean (list and tuple being
different types) nor what this has to do with the recipe. Anyway...
sequences are compared lexicographically -- first items first, then
second items if the first items are equal, and so on. So, if you have a
list X whose items tuples and want X sorted on the tuples' first items,
X.sort() will suffice -- if the tuples never have equal first-items, or
if you're OK with second-items getting compared when the first-items are
equal. If you want to sort on first-items ONLY, leaving the tuples in
the same order in the list when their first-items are equal:

import operator
X.sort(key=operator.itemgetter(0))
Alex
Oct 17 '05 #2

P: n/a
oops, sorry. I meant

l1=[(date,v1,v2,v3), ...]
l2=[ another set of tuples ]
Thanks. so I have to concat the multiple lists first(all of them are
sorted already) ?

Alex Martelli wrote:
I'm not sure what "my list is a tuple" mean (list and tuple being
different types) nor what this has to do with the recipe. Anyway...
sequences are compared lexicographically -- first items first, then
second items if the first items are equal, and so on. So, if you have a
list X whose items tuples and want X sorted on the tuples' first items,
X.sort() will suffice -- if the tuples never have equal first-items, or
if you're OK with second-items getting compared when the first-items are
equal. If you want to sort on first-items ONLY, leaving the tuples in
the same order in the list when their first-items are equal:

import operator
X.sort(key=operator.itemgetter(0))
Alex


Oct 17 '05 #3

P: n/a
bo****@gmail.com <bo****@gmail.com> wrote:
oops, sorry. I meant

l1=[(date,v1,v2,v3), ...]
l2=[ another set of tuples ]

Thanks. so I have to concat the multiple lists first(all of them are
sorted already) ?


You can do it either way -- simplest, and pretty fast, is to concatenate
them all and sort the result (the sort method is very good at taking
advantage of any sorting that may already be present in some parts of
the list it's sorting), but you can also try a merging approach. E.g.:
import heapq

def merge_by_heap(*lists):
sources = [[s.next(), i, s.next]
for i, s in enumerate(map(iter,lists))]
heapq.heapify(sources)
while sources:
best_source = sources[0]
yield best_source[0]
try: best_source[0] = best_source[-1]()
except StopIteration: heapq.heappop(sources)
else: heapq.heapreplace(sources, best_source)

Now, L=list(merge_by_heap(l1, l2, l3)) should work, just as well as,
say, L = sorted(l1+l2+l3). I suspect the second approach may be faster,
as well as simpler, but it's best to _measure_ (use the timeit.py module
from the standard library) if this code is highly speed-critical for
your overall application.
Alex
Oct 17 '05 #4

P: n/a
million thanks. So the default compare funcion of heapq also do it like
sort ?

The size of the list is not very large but has the potential of being
run many times(web apps). So I believe second one should be faster(from
the app perspective) as it goes into the optimized code quickly without
all the overheads in the merge case.

Alex Martelli wrote:
You can do it either way -- simplest, and pretty fast, is to concatenate
them all and sort the result (the sort method is very good at taking
advantage of any sorting that may already be present in some parts of
the list it's sorting), but you can also try a merging approach. E.g.:
import heapq

def merge_by_heap(*lists):
sources = [[s.next(), i, s.next]
for i, s in enumerate(map(iter,lists))]
heapq.heapify(sources)
while sources:
best_source = sources[0]
yield best_source[0]
try: best_source[0] = best_source[-1]()
except StopIteration: heapq.heappop(sources)
else: heapq.heapreplace(sources, best_source)

Now, L=list(merge_by_heap(l1, l2, l3)) should work, just as well as,
say, L = sorted(l1+l2+l3). I suspect the second approach may be faster,
as well as simpler, but it's best to _measure_ (use the timeit.py module
from the standard library) if this code is highly speed-critical for
your overall application.
Alex


Oct 17 '05 #5

P: n/a
bo****@gmail.com <bo****@gmail.com> wrote:
million thanks. So the default compare funcion of heapq also do it like
sort ?
By defaults, all comparisons in Python occur by the same mechanisms: by
preference, specific comparison operators such as < , <= , and so on
(corresponding to special methods __lt__, __le__, and so on) -- missing
that, the three-way comparison done by built-in function cmp
(corresponding to speial method __cmp__). For built-in sequences, in
particular (both tuples and lists), the comparisons are lexicographical.

Some (but not all) occasions that imply comparisons let you specify
something else than the default, for example by such mechanisms as the
cmp= optional argument to .sort (which tends to have unpleasant
performance impacts) and the key= optional argument (which tends to have
good performance impact). heapq does not offer this feature in today's
Python (i.e., 2.4), but I believe it's planned to have it in the future
2.5 release.

But in your case the default comparisons appear to be OK, so you should
have no problem either way.

The size of the list is not very large but has the potential of being
run many times(web apps). So I believe second one should be faster(from
the app perspective) as it goes into the optimized code quickly without
all the overheads in the merge case.


Yes, the simpler solution may well perform better. Note that:

L = list(l1)
L.extend(l2)
L.extend(l3)
L.sort()

may perform better than L = sorted(l1+l2+l3) -- if speed matters a lot,
be sure to try (and measure!) both versions.
Alex
Oct 17 '05 #6

This discussion thread is closed

Replies have been disabled for this discussion.