Connecting Tech Pros Worldwide Help | Site Map

Sorting data with perl - Part One

KevinADC's Avatar
Expert
 
Join Date: Jan 2007
Location: Southern California USA
Posts: 4,091
#1   Dec 23 '07
Introduction

This discussion of the sort function is targeted at beginners to perl coding. More experienced perl coders will find nothing new or useful. Sorting lists or arrays is a very common requirement of programs. If you don't know the difference between a list and array don't worry about it. A list is just an array without a name. They both can hold the same type of data: scalars or strings. I will use the terms "list" and "array" to mean the same thing.

The sort() Function

The sort() function sorts a copy of the original list and returns a new list. This means you can't use the sort() function in a void context like you can with some of perls other built-in functions:

Expand|Select|Wrap|Line Numbers
  1. sort(@array); # bad - void context
You have to assign the results to an array using the assignment operator "=":

Expand|Select|Wrap|Line Numbers
  1. @sorted = sort(@arrray);
But you can use the same name as the original array to sort the original array:

Expand|Select|Wrap|Line Numbers
  1. @array = sort(@arrray);
Keep in mind that the original unsorted values of @array are gone when you do that.

Perl's built-in function for sorting is generally not much use with real-world data. The default sort is by standard string comparison. To the uninitiated perl coder this can be quite confusing. Lets look at an example.

Expand|Select|Wrap|Line Numbers
  1. @array = qw(@foo 1 32 11 4 2 44 22 !bar Mary mary Adam ant xxx XXX);
  2. @sorted = sort (@array);
  3. print "$_\n" for @sorted;
  4.  
The sorted output would be:

Expand|Select|Wrap|Line Numbers
  1. !bar
  2. 1
  3. 11
  4. 2
  5. 22
  6. 32
  7. 4
  8. 44
  9. @foo
  10. Adam
  11. Mary
  12. XXX
  13. ant
  14. mary
  15. xxx
  16.  
The output is sorted in ascending ASCII value, not very useful for most practical applications. But if you were sorting normalized data, all lower-case alpha strings for example, the default sort will work without you having to do anything else.

To sort data in descending order prefix the sort function with the reverse function:

Expand|Select|Wrap|Line Numbers
  1. @sorted = reverse sort (@array);
Another method is to use a code block and perls "cmp" string comparison operator:

Expand|Select|Wrap|Line Numbers
  1. @sorted = sort {$b cmp $a} @array; # descending order same as reverse sort above
  2. @sorted = sort {$a cmp $b} @array; # ascending order same as the default sort 
  3.  
I did not arbitrarily decide to use $a and $b in the above example. $a and $b are special scalar variables that perl uses to sort data. You should avoid using them in your perl programs for anything else and they should not be declared with "my" when using the "strict" pragma.

Part Two will discuss more advanced ways to sort data using perl.

This article is protected under the Creative Commons License .



KevinADC's Avatar
Expert
 
Join Date: Jan 2007
Location: Southern California USA
Posts: 4,091
#2   Jan 17 '08

re: Sorting data with perl - Part One


As of perl 5.10 this comment is no longer true:

Quote:
The sort() function sorts a copy of the original list and returns a new list. This means you can't use the sort() function in a void context like you can with some of perls other built-in functions:
Perl no longer sorts a copy of an array if you sort the original array ( in-place sorting ):

Expand|Select|Wrap|Line Numbers
  1. @array = sort @array;
But you still can't use the sort() function in a void context. There have been other changes to the sort() function in perl 5.10 which can be read on the perldoc.perl.org website on the History/Changes page.
Kelicula's Avatar
Expert
 
Join Date: Jul 2007
Posts: 169
#3   Feb 24 '08

re: Sorting data with perl - Part One


Also for numbers..


Expand|Select|Wrap|Line Numbers
  1.  
  2. @array = qw(7 8 59 58 4  5 6 2 59);
  3.  
  4. @array2 = sort {$a <=> $b } @array;
  5.  
  6.  
Only the comparison operator is changed.

cmp for strings
<=> for numerical
Member
 
Join Date: Feb 2008
Posts: 88
#4   Feb 25 '08

re: Sorting data with perl - Part One


Quote:

Originally Posted by Kelicula

Also for numbers..


Expand|Select|Wrap|Line Numbers
  1.  
  2. @array = qw(7 8 59 58 4  5 6 2 59);
  3.  
  4. @array2 = sort {$a <=> $b } @array;
  5.  
  6.  
Only the comparison operator is changed.

cmp for strings
<=> for numerical

I am little confused what it print for $a and $b, how the comparision is really happening. can you explain.....

#!/usr/bin/perl

@array = qw(7 8 59 58 4 5 6 2 59);

@array2 = sort {$a <=> $b; print "$a and $b.... comp.. \n"; } @array;


print "$_\n" for @array2;
Kelicula's Avatar
Expert
 
Join Date: Jul 2007
Posts: 169
#5   Feb 26 '08

re: Sorting data with perl - Part One


Quote:

Originally Posted by rohitbasu77

I am little confused what it print for $a and $b, how the comparision is really happening. can you explain.....

#!/usr/bin/perl

@array = qw(7 8 59 58 4 5 6 2 59);

@array2 = sort {$a <=> $b; print "$a and $b.... comp.. \n"; } @array;


print "$_\n" for @array2;

$a and $b are "internal" global perl variables. As mentioned earlier in Kevin's article.
I did not chose those names. When testing for comparison the "block" or function will return one of three values: -1, 0, 1.

example:
Expand|Select|Wrap|Line Numbers
  1.  
  2. $test = 31;
  3. $test2 = 32;
  4.  
  5. print $test <=> $test2;
  6.  
The output is: -1.

Here is how I think of it: $test2 must "go back" to reach $test value.

If $test were larger than $test2:
Expand|Select|Wrap|Line Numbers
  1.  
  2. $test = 40;
  3. $test2 = 32;
  4.  
  5. print $test <=> $test2;
  6.  
The output is: 1.

Then $test2 must "go forward" to reach $test value. "1"

If they are the same:
Expand|Select|Wrap|Line Numbers
  1.  
  2. $test = 31;
  3. $test2 = 31;
  4.  
  5. print $test <=> $test2;
  6.  
The output is: 0.

$test2 doesn't need to go "anywhere" to reach $test value.

{} is known in perl as an anonymous "block".
When using sort the block must return 1, 0, or -1.

You can actually write your own subroutine in that place as long as it returns either 1, 0, or -1. Example:
Expand|Select|Wrap|Line Numbers
  1. sort sortit @array;
  2.  
  3. sub sortit { 
  4. $a <=> $b
  5. }
  6.  
This can be very useful for sorting multilevel arrays and/or hashes.
Here is a sub that will sort a hash, based on it's values, and if two keys have the same value it sorts according to the key.

Let's say you own a department store, and want to keep track of the various departments sales for each month. You would like the departments that earn the most money listed first, and departments that sale the same amount to be alphabetically sorted by name.

Expand|Select|Wrap|Line Numbers
  1. #!/usr/bin/perl -Tw
  2.  
  3. use strict;
  4.  
  5. my %sale = ( 'Sound'=> 45,
  6.              'Lighting'=> 25,
  7.              'Electronics'=> 89,
  8.              'Extra'=> 25,
  9.              'Clothing'=> 45,
  10.              'Home Goods'=> 67);
  11.  
  12. for my $dept ( sort sort_num_name keys %sale){
  13. print "$dept sold: \$$sale{$dept}\n";
  14. }
  15.  
  16. sub sort_num_name {
  17. $sale{$b} <=> $sale{$a} ||
  18. $a cmp $b;
  19. }
  20.  
This will output:
Expand|Select|Wrap|Line Numbers
  1. Electronics sold: $89
  2. Home Goods sold: $67
  3. Clothing sold: $45
  4. Sound sold: $45
  5. Extra sold: $25
  6. Lighting sold: $25
  7.  
This is because 0 is false to perl, so when two values are identical they return 0, and then the "||" or statement is executed.

You should never alter the values of $a or $b. You also cannot pass $a and $b into the sub via "$_". As well as you cannot use "next", "last", or recursive subroutines. $a and $b will be set by perl within the lexical scope of the sort function call. (You can use this same sub later to sort another Hash)


Read more here: Perldoc sort
KevinADC's Avatar
Expert
 
Join Date: Jan 2007
Location: Southern California USA
Posts: 4,091
#6   Feb 27 '08

re: Sorting data with perl - Part One


Quote:

Originally Posted by Kelicula

Also for numbers..


Expand|Select|Wrap|Line Numbers
  1.  
  2. @array = qw(7 8 59 58 4  5 6 2 59);
  3.  
  4. @array2 = sort {$a <=> $b } @array;
  5.  
  6.  
Only the comparison operator is changed.

cmp for strings
<=> for numerical

Good comment. Thanks for adding it to the article.
Reply