473,837 Members | 1,543 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

A Sort Optimization Technique: decorate-sort-dedecorate

Last year, i've posted a tutorial and commentary about Python and
Perl's sort function. (http://xahlee.org/perl-python/sort_list.html)

In that article, i discussed a technique known among juvenile Perlers
as the Schwartzian Transform, which also manifests in Python as its
“key” optional parameter.

Here, i give a more detailed account on why and how of this construct.

----

Language and Sort Optimization: decorate-sort-dedecorate

There are many algorithms for sorting. Each language can chose which to
use. See wikipedia for a detailed list and explanation:
Sorting_algorit hm↗ .

However, does not matter which algorithm is used, ultimately it will
need the order-deciding function on the items to be sorted. Suppose
your items are (a,b,c,d,...), and your order-deciding function is F.
Various algorithms will try to minimize the number of times F is
called, but nevertheless, F will be applied to a particular element in
the list multiple times. For example, F(a,b) may be called to see which
of “a” or “b” comes first. Then, later the algorithm might need
to call F(m,a), or F(a,z). The point here is that, F will be called
many times on arbitrary two items in your list, even if one of the
element has been compared to others before.

Now suppose, you are sorting some polygons in 3D space, or personal
records by the person's address's distance from a location, or sorting
matrixes by their eigen-values in some math application, or ordering
files by number of occurrences of some text in the file.

In general, when you define your decision function F(x,y), you will
need to extract some property from the elements to be sorted. For
example, when sorting points in space by a criterion of distance, one
will need to compute the distance for the point. When sorting personal
records from database by the person's location, the decision function
will need to retrieve the person's address from the database, then find
the coordinate of that address, that compute the distance from there to
a given coordinate. In sorting matrixes in math by eigen-values, the
order-decision will first compute the eigen-value of the matrix. A
common theme from all of the above is that they all need to do some
non-trivial computation on each element.

As we can see, the order-decision function F may need to do some
expensive computation on each element first, and this is almost always
the case when sorting elements other than simple numbers. Also, we know
that a sorting algorithm will need to call F(x,y) many times, even if
one of x or y has been compared to others before. So, this may result
high inefficiency. For example, you need to order people by their
location to your home. So when F(Mary,Jane) is called, Mary's address
is first retrieved from a database across a network, the coordinate of
her address is looked up in another database. Then the distance to your
home are computed using spherical geometry. The exact same thing is
done for Jane. But later on, it may call F(Mary,Chrissy) ,
F(Mary,Jonesy), F(Mary,Pansy) and so on, and the entire record
retrieval for Mary is repeated many times.

One solution, is to do the expensive extraction one time for each
element, then associate that with the corresponding elements. Suppose
this expensive extraction function is called gist(). So, you create a
new list ([Mary,gist(Mary)], [Jane,gist(Jane)], [John,gist(John)],
[Jenny,gist(Jenn y)], ...) and sort this list instead, when done, remove
associated gist. This technique is sometimes called
decorate-sort-dedecorate.

In Perl programing, this decorate-sort-dedecorate technique is sillily
known as Schwartzian Transform as we have demonstrated previously. In
Python, they tried to incorporate this technique into the language, by
adding the “key” optional parameter, which is our gist() function.

----------
This post is archived at:
http://xahlee.org/perl-python/sort_list.html

I would be interested in comments about how Common Lisp, Scheme, and
Haskell deal with the decorate-sort-dedecorate technique. In
particular, does it manifest in the language itself? If not, how does
one usually do it in the code? (note: please reply to approprate groups
if it is of no general interest. Thanks) (am also interested on how
Perl6 or Python3000 does this, if there are major changes to their sort
function)

Thanks.

Xah
xa*@xahlee.org
http://xahlee.org/

Aug 27 '06 #1
18 2350
Well you cross-posted this enough, including a Java group, and didn't
even ask about us... What a pity.

In Java, classes can implement the Comparable interface. This interface
contains only one method, a compareTo(Objec t o) method, and it is
defined to return a value < 0 if the Object is considered less than the
one being passed as an argument, it returns a value 0 if considered
greater than, and 0 if they are considered equal.

The object implementing this interface can use any of the variables
available to it (AKA address, zip code, longitude, latitude, first
name, whatever) to return this -1, 0 or 1. This is slightly different
than what you mention as we don't have to "decorate" the object. These
are all variables that already exist in the Object, and if fact make it
what it is. So, of course, there is no need to un-decorate at the end.

There are several built-in objects and methods available to sort
Objects that are Comparable, even full Arrays of them.

Aug 27 '06 #2
xa****@gmail.co m wrote:
I would be interested in comments about how Common Lisp, Scheme, and
Haskell deal with the decorate-sort-dedecorate technique.
%w(FORTRAN LISP COBOL).sort_by{ |s| s.reverse}
==>["COBOL", "FORTRAN", "LISP"]

--
Common Lisp did kill Lisp. Period. ... It is to Lisp what
C++ is to C. A monstrosity that totally ignores the basics
of language design, simplicity, and orthogonality to begin
with. --- Bernard Lang

Aug 28 '06 #3
In <11************ *********@m79g2 000cwm.googlegr oups.com>, Tom Cole wrote:
In Java, classes can implement the Comparable interface. This interface
contains only one method, a compareTo(Objec t o) method, and it is
defined to return a value < 0 if the Object is considered less than the
one being passed as an argument, it returns a value 0 if considered
greater than, and 0 if they are considered equal.

The object implementing this interface can use any of the variables
available to it (AKA address, zip code, longitude, latitude, first
name, whatever) to return this -1, 0 or 1. This is slightly different
than what you mention as we don't have to "decorate" the object. These
are all variables that already exist in the Object, and if fact make it
what it is. So, of course, there is no need to un-decorate at the end.
Python has such a mechanism too, the special `__cmp__()` method
has basically the same signature. The problem the decorate, sort,
un-decorate pattern solves is that this object specific compare operations
only use *one* criteria.

Let's say you have a `Person` object with name, surname, date of birth and
so on. When you have a list of such objects and want to sort them by name
or by date of birth you can't use the `compareTo()` method for both.

Ciao,
Marc 'BlackJack' Rintsch
Aug 28 '06 #4
In article <pa************ *************** *@gmx.net>, Marc 'BlackJack'
Rintsch <bj****@gmx.net wrote:
In <11************ *********@m79g2 000cwm.googlegr oups.com>, Tom Cole wrote:
In Java, classes can implement the Comparable interface. This interface
contains only one method, a compareTo(Objec t o) method, and it is
defined to return a value < 0 if the Object is considered less than the
one being passed as an argument, it returns a value 0 if considered
greater than, and 0 if they are considered equal.

The object implementing this interface can use any of the variables
available to it (AKA address, zip code, longitude, latitude, first
name, whatever) to return this -1, 0 or 1. This is slightly different
than what you mention as we don't have to "decorate" the object. These
are all variables that already exist in the Object, and if fact make it
what it is. So, of course, there is no need to un-decorate at the end.

Python has such a mechanism too, the special `__cmp__()` method
has basically the same signature. The problem the decorate, sort,
un-decorate pattern solves is that this object specific compare operations
only use *one* criteria.
I can't believe I am getting drawn into a thread started by xahlee, but
here goes anyway:

The problem addressed by what is know in Perl as the 'Schwartzian
Transform' is that the compare operation can be an expensive one,
regardless of the whether the comparison uses multiple keys. Since in
comparison sorts, the compare operation will be executed N(logN) times,
it is more efficient to pre-compute a set of keys, one for each object
to be sorted. That need be done only N times. The sort can then use
these pre-computed keys to sort the objects. See, for example:

http://en.wikipedia.org/wiki/Schwartzian_transform

--
Jim Gibson

Posted Via Usenet.com Premium Usenet Newsgroup Services
----------------------------------------------------------
** SPEED ** RETENTION ** COMPLETION ** ANONYMITY **
----------------------------------------------------------
http://www.usenet.com
Aug 28 '06 #5
Jim Gibson schreef:
The problem addressed by what is know in Perl as the 'Schwartzian
Transform' is that the compare operation can be an expensive one,
regardless of the whether the comparison uses multiple keys. Since in
comparison sorts, the compare operation will be executed N(logN)
times, it is more efficient to pre-compute a set of keys, one for
each object to be sorted. That need be done only N times. The sort
can then use these pre-computed keys to sort the objects.
Basically it first builds, than sorts an index.

The pre-computed (multi-)keys can often be optimized, see Uri's
Sort::Maker http://search.cpan.org/search?query=Sort::Maker
for facilities.

--
Affijn, Ruud

"Gewoon is een tijger."
Aug 28 '06 #6
Jim Gibson schrieb:
>
The problem addressed by what is know in Perl as the 'Schwartzian
Transform' is that the compare operation can be an expensive one,
regardless of the whether the comparison uses multiple keys. Since in
comparison sorts, the compare operation will be executed N(logN) times,
it is more efficient to pre-compute a set of keys, one for each object
to be sorted. That need be done only N times.
Wikipedia says it's going from 2NlogN to N. If a sort is massively
dominated by the comparison, that could give a speedup of up to 100%
(approximately - dropping the logN factor is almost irrelevant, what
counts is losing that factor of 2).

Regards,
Jo
Aug 29 '06 #7
Joachim Durchholz <jo@durchholz.o rgwrote:
Jim Gibson schrieb:

The problem addressed by what is know in Perl as the 'Schwartzian
Transform' is that the compare operation can be an expensive one,
regardless of the whether the comparison uses multiple keys. Since in
comparison sorts, the compare operation will be executed N(logN) times,
it is more efficient to pre-compute a set of keys, one for each object
to be sorted. That need be done only N times.

Wikipedia says it's going from 2NlogN to N. If a sort is massively
dominated by the comparison, that could give a speedup of up to 100%
(approximately - dropping the logN factor is almost irrelevant, what
counts is losing that factor of 2).
It seems to me that ln 1,000,000 is 13.8, and that 13.8 is quite a bit
greater than 2.

Cheers,

Xho

--
-------------------- http://NewsReader.Com/ --------------------
Usenet Newsgroup Service $9.95/Month 30GB
Aug 29 '06 #8
At Tuesday 29/8/2006 07:50, Joachim Durchholz wrote:
>Wikipedia says it's going from 2NlogN to N. If a sort is massively
dominated by the comparison, that could give a speedup of up to 100%
(approximate ly - dropping the logN factor is almost irrelevant, what
counts is losing that factor of 2).
In fact it's the other way - losing a factor of 2 is irrelevant,
O(2N)=O(N). The logN factor is crucial here.

Gabriel Genellina
Softlab SRL

_______________ _______________ _______________ _____
Pregunt. Respond. Descubr.
Todo lo que queras saber, y lo que ni imaginabas,
est en Yahoo! Respuestas (Beta).
Probalo ya!
http://www.yahoo.com.ar/respuestas

Aug 29 '06 #9
Gabriel Genellina schrieb:
At Tuesday 29/8/2006 07:50, Joachim Durchholz wrote:
>Wikipedia says it's going from 2NlogN to N. If a sort is massively
dominated by the comparison, that could give a speedup of up to 100%
(approximate ly - dropping the logN factor is almost irrelevant, what
counts is losing that factor of 2).

In fact it's the other way - losing a factor of 2 is irrelevant,
O(2N)=O(N). The logN factor is crucial here.
That's just a question of what you're interested in.

If it's asymptotic behavior, then the O(logN) factor is a difference.

If it's practical speed, a constant factor of 2 is far more relevant
than any O(logN) factor.

(I'm not on the list, so I won't see responses unless specifically CC'd.)

Regards,
Jo
Aug 29 '06 #10

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

8
1651
by: Jacek Generowicz | last post by:
Given fncs = args = How should one spell results = map(lambda f,a: f(a), fncs, args) in order to get the result most quickly ?
41
2883
by: John Marshall | last post by:
How about the following, which I am almost positive has not been suggested: ----- class Klass: def __init__(self, name): self.name = name deco meth0: staticmethod def meth0(x):
11
1782
by: Bonj | last post by:
When performance benchmark testing one of my own functions against the equivalent runtime function, I found that with /Ox on it optimized away its own function so that it didn't call it *at all* after the first loop. <psuedocode> #ifdef USE_MY_FUNC #define testfunc(x) myfunc(x) #else #define testfunc(x) runtimefunc(x) #endif QPC(start);
9
2406
by: Rune | last post by:
Is it best to use double quotes and let PHP expand variables inside strings, or is it faster to do the string manipulation yourself manually? Which is quicker? 1) $insert = 'To Be'; $sentence = "$insert or not $insert. That is the question."; or
2
1503
by: Ruggiero | last post by:
Sorry for my bad english What is the optimization technique used by Access dbms? Thanks
0
1067
by: Umut Tezduyar | last post by:
I really hate working with .net framework design time attributes. I have read asp.net server controls and components (Nikhil Kothari, Vandana Datye) book, chapter 15 but still i can't handle the complex tasks on designer. The thing i want to do is so simple (I hope so). I want my control to understand the Menu and SubMenus under Menu. Ex: <cc1:MyConrol runat="server" id="MyControl1"> <Menu name="level1"> <SubMenus> <Menu name="level2">
3
1492
by: alexquisi | last post by:
Hi, I am newbie in optimization techniques and I need a little of help. I want to take adventage of the open bank and open row policy available in the processor (it can keep 4 banks open per bank, for a total of 12 banks at a time) that I am using (AU1xxx), by locating code, data, etc. on separete SDRAM bank and row boundaries. The processor's documentation says that it can be done using the
1
3048
by: Don Li | last post by:
Hi, Env: MS SQL Server 2000 DB Info (sorry no DDL nor sample data): tblA has 147249 rows -- clustered index on pk (one key of datatype(int)) and has two clumns, both are being used in joins; intersecTbl4AB has 207016 rows -- clustered index on two fks and
2
2728
by: AJ | last post by:
Hi, I am trying to optimize gcc for power consumption. Technique I am using is rescheduling instruction based on hamming distance between two consecutive instructions. Now I want to create a new pass after register allocation and final jump optimization. I am new to gcc development and am having trouble finding out how to create a new pass. Also I understand my scheduler will have to operate on RTL code. what would be the data...
20
2368
by: Ravikiran | last post by:
Hi Friends, I wanted know about whatt is ment by zero optimization and sign optimization and its differences.... Thank you...
0
9678
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
10863
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
1
10609
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
1
7798
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupr who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
5663
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
0
5838
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
4468
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
2
4034
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
3
3120
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.