The Schwartzian Transform
The Schwartzian Transform (referred to as ST from now on) is a good balance between implementation and efficiency making it an attractive option for sorting data. For the curious, the name of this technique is in honor of its originator, Randal Schwartz. This article is a detailed look at how the ST works.
ACME INC.
First we need some data to sort. At the end of this article a data file is attached (employees.txt). It's a tab separated values list of employees of ACME INC. It's a list of ten employees with some fields that describe their employment status with ACME INC. The fields are:
EmployeeID, Name, HireDate, Position, Department, Salary
Expand|Select|Wrap|Line Numbers
- EmployeeID Name HireDate Position Department Salary
- M011 Smith,John,T 01-12-1981 Welder Maintenance 20.00 p/h
- M102 Hart,Thomas,J 06-30-1982 Supervisor Maintenance 28.50 p/h
- J309 Jones,Steve,W 02-23-1990 Janitor Janitorial 15.50 p/h
- A124 White,Mary,H 03-15-1990 Assembler Assembly 8.75 p/h
- Q365 Miles,Frank,R 12-01-1999 Inspector Quality 16.25 p/h
- A316 Roberts,Andy,P 07-30-2006 Assembler Assembly 8.00 p/h
- S554 William,Terry,K 03-3-2005 Expeditor Stock Room 9.00 p/h
- R078 Norris,Chris,K 04-17-2003 Clerk Shipping/Recieving 9.00 p/h
- X832 Anderson,Jane,M 03-23-1992 VP Operations Mangement 33.00 p/h
- X111 Kelly,Mark,D 05-29-1989 Engineer Engineering 26.75 p/h
Expand|Select|Wrap|Line Numbers
- #!/usr/bin/perl
- use strict;
- use warnings;
- use Data::Dumper; # for printing a "picture" of the data
- my %employees;
- open (my $in, 'path/to/employees.txt') or die "$!";
- readline <$in>; #skip file header
- while (<$in>) {
- chomp;
- my ($id,$name,$hdate,$pos,$dept,$sal) = split/\t/;
- $employees{$id} = {NAME => $name, HIREDATE => $hdate, POSITION => $pos, DEPT => $dept, SALARY => $sal};
- }
- close $in;
- print Dumper \%employees;
Expand|Select|Wrap|Line Numbers
- 'R078' => {
- 'DEPT' => 'Shipping/Recieving',
- 'POSITION' => 'Clerk',
- 'NAME' => 'Norris,Chris,K',
- 'HIREDATE' => '04-17-2003',
- 'SALARY' => '9.00 p/h'
- },
Expand|Select|Wrap|Line Numbers
- #!/usr/bin/perl
- use strict;
- use warnings;
- use Data::Dumper;
- my %employees;
- open (my $in, 'path/to/employees.txt') or die "$!";
- readline <$in>; #skip header
- while (<$in>) {
- chomp;
- my ($id,$name,$hdate,$pos,$dept,$sal) = split/\t/;
- $employees{$id} = {NAME => $name, HIREDATE => $hdate, POSITION => $pos, DEPT => $dept, SALARY => $sal};
- }
- close $in;
- my @sorted = map {$employees{$_->[0]}{NAME} was hired on $employees{$_->[0]}{HIREDATE}}
- sort {$a->[1] cmp $b->[1]}
- map {my ($m,$d,$y) = split/-/,$employees{$_}{HIREDATE};
- [$_, "$y$m$d"] } keys %employees;
- print Dumper \@sorted;
'Smith,John,T was hired on 01-12-1981',
'Hart,Thomas,J was hired on 06-30-1982',
'Kelly,Mark,D was hired on 05-29-1989',
'Jones,Steve,W was hired on 02-23-1990',
'White,Mary,H was hired on 03-15-1990',
'Anderson,Jane,M was hired on 03-23-1992',
'Miles,Frank,R was hired on 12-01-1999',
'Norris,Chris,K was hired on 04-17-2003',
'William,Terry,K was hired on 03-3-2005',
'Roberts,Andy,P was hired on 07-30-2006'
Lets take a closer look at the ST with most of the code stripped away to reveal what's going on.
Expand|Select|Wrap|Line Numbers
- my @sorted = map {}
- sort {}
- map {} keys %employees;
What Makes the Schwartzian Transform Fly
What makes the ST fly with so little code and no temporary variables is the ability to chain together list processing functions and Perls support of anonymous data storage.
It's maybe easier to visualize the ST technique linearly. As a flow chart it would look like this with the arrows indicating input/output direction:
Expand|Select|Wrap|Line Numbers
- END output <-- map {} <-- sort {} <-- map {} <-- input START;
The Map Function
A map block is similar to a subroutine:
Expand|Select|Wrap|Line Numbers
- map {
- my ($m,$d,$y) = split/-/,$employees{$_}{HIREDATE};
- [$_, "$y$m$d"]
- } keys %employees;
The map block is also very much like a loop, but not the same as for/foreach/while/until loops. Map will process all the elements of a list, like grep() does. You can't use loop controls like "last" or "next" or "redo" with map.
In our example we used only one comparison to sort the data:
Expand|Select|Wrap|Line Numbers
- sort {$a->[1] cmp $b->[1]}
Expand|Select|Wrap|Line Numbers
- sort {$a->[1] cmp $b->[1] ||
- $b->[2] <=> $a->[2] ||
- $a->[3] cmp $b->[3]}
The last map block can also munge the final output however you need it to for your particular application. In our example we just added a little text to make the output more readable. But you can do whatever you want in that map block: omit some of the data, add some markup code (like html code), change the order of the fields, whatever you want.
Why Use a Schwartzian Transform?
The ST is a good option when the sort keys do not actually exist within the data and the sort keys will only be used to sort the data and not for any other purpose. Like in my example we sorted the list by the hire date, but the sort key had to be made on-the-fly by splitting the date and reordering the American style date of MM-DD-YYYY into a sortable ASCII string, YYYYMMDD. The date could have been sorted numerically but sorting it as a string is really easier.
Another good use of the ST is when you have to sort some data just to test some sort keys that you will use in a larger more robust program.
Why Not Use a Schwartzian Transform?
If you have data with keys that are already is a sortable format there is no need to use the ST technique but you still could if you wanted to. You may also consider using a database that can sort data much more efficiently than a Perl script can.
Conclusion
The Schwartzian Transform is a powerful yet convenient method of sorting data. If after reading the article you are still a bit confused, try experimenting with some code to get a feel for how it works. Once you master this technique you will find yourself using it all the time.
Kevin (aka KevinADC)
Online Resources
Perldoc Website All the Perl documentation online.
perlreftut - Mark's very short tutorial about references
perldsc - data structure complex data structure struct
perllol - Manipulating Arrays of Arrays in Perl
map - apply a change to a list to get back a new list with the changes
This article is protected under the Creative Commons License.