470,810 Members | 1,978 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Share your developer knowledge by writing an article on Bytes.

Sorting data with perl - Part One

KevinADC
4,059 Expert 2GB
Introduction

This discussion of the sort function is targeted at beginners to perl coding. More experienced perl coders will find nothing new or useful. Sorting lists or arrays is a very common requirement of programs. If you don't know the difference between a list and array don't worry about it. A list is just an array without a name. They both can hold the same type of data: scalars or strings. I will use the terms "list" and "array" to mean the same thing.

The sort() Function

The sort() function sorts a copy of the original list and returns a new list. This means you can't use the sort() function in a void context like you can with some of perls other built-in functions:

Expand|Select|Wrap|Line Numbers
  1. sort(@array); # bad - void context
You have to assign the results to an array using the assignment operator "=":

Expand|Select|Wrap|Line Numbers
  1. @sorted = sort(@arrray);
But you can use the same name as the original array to sort the original array:

Expand|Select|Wrap|Line Numbers
  1. @array = sort(@arrray);
Keep in mind that the original unsorted values of @array are gone when you do that.

Perl's built-in function for sorting is generally not much use with real-world data. The default sort is by standard string comparison. To the uninitiated perl coder this can be quite confusing. Lets look at an example.

Expand|Select|Wrap|Line Numbers
  1. @array = qw(@foo 1 32 11 4 2 44 22 !bar Mary mary Adam ant xxx XXX);
  2. @sorted = sort (@array);
  3. print "$_\n" for @sorted;
  4.  
The sorted output would be:

Expand|Select|Wrap|Line Numbers
  1. !bar
  2. 1
  3. 11
  4. 2
  5. 22
  6. 32
  7. 4
  8. 44
  9. @foo
  10. Adam
  11. Mary
  12. XXX
  13. ant
  14. mary
  15. xxx
  16.  
The output is sorted in ascending ASCII value, not very useful for most practical applications. But if you were sorting normalized data, all lower-case alpha strings for example, the default sort will work without you having to do anything else.

To sort data in descending order prefix the sort function with the reverse function:

Expand|Select|Wrap|Line Numbers
  1. @sorted = reverse sort (@array);
Another method is to use a code block and perls "cmp" string comparison operator:

Expand|Select|Wrap|Line Numbers
  1. @sorted = sort {$b cmp $a} @array; # descending order same as reverse sort above
  2. @sorted = sort {$a cmp $b} @array; # ascending order same as the default sort 
  3.  
I did not arbitrarily decide to use $a and $b in the above example. $a and $b are special scalar variables that perl uses to sort data. You should avoid using them in your perl programs for anything else and they should not be declared with "my" when using the "strict" pragma.

Part Two will discuss more advanced ways to sort data using perl.

This article is protected under the Creative Commons License .
Dec 23 '07 #1
5 6348
KevinADC
4,059 Expert 2GB
As of perl 5.10 this comment is no longer true:

The sort() function sorts a copy of the original list and returns a new list. This means you can't use the sort() function in a void context like you can with some of perls other built-in functions:
Perl no longer sorts a copy of an array if you sort the original array ( in-place sorting ):

Expand|Select|Wrap|Line Numbers
  1. @array = sort @array;
But you still can't use the sort() function in a void context. There have been other changes to the sort() function in perl 5.10 which can be read on the perldoc.perl.org website on the History/Changes page.
Jan 17 '08 #2
Kelicula
176 Expert 100+
Also for numbers..


Expand|Select|Wrap|Line Numbers
  1.  
  2. @array = qw(7 8 59 58 4  5 6 2 59);
  3.  
  4. @array2 = sort {$a <=> $b } @array;
  5.  
  6.  
Only the comparison operator is changed.

cmp for strings
<=> for numerical
Feb 24 '08 #3
Also for numbers..


Expand|Select|Wrap|Line Numbers
  1.  
  2. @array = qw(7 8 59 58 4  5 6 2 59);
  3.  
  4. @array2 = sort {$a <=> $b } @array;
  5.  
  6.  
Only the comparison operator is changed.

cmp for strings
<=> for numerical
I am little confused what it print for $a and $b, how the comparision is really happening. can you explain.....

#!/usr/bin/perl

@array = qw(7 8 59 58 4 5 6 2 59);

@array2 = sort {$a <=> $b; print "$a and $b.... comp.. \n"; } @array;


print "$_\n" for @array2;
Feb 25 '08 #4
Kelicula
176 Expert 100+
I am little confused what it print for $a and $b, how the comparision is really happening. can you explain.....

#!/usr/bin/perl

@array = qw(7 8 59 58 4 5 6 2 59);

@array2 = sort {$a <=> $b; print "$a and $b.... comp.. \n"; } @array;


print "$_\n" for @array2;
$a and $b are "internal" global perl variables. As mentioned earlier in Kevin's article.
I did not chose those names. When testing for comparison the "block" or function will return one of three values: -1, 0, 1.

example:
Expand|Select|Wrap|Line Numbers
  1.  
  2. $test = 31;
  3. $test2 = 32;
  4.  
  5. print $test <=> $test2;
  6.  
The output is: -1.

Here is how I think of it: $test2 must "go back" to reach $test value.

If $test were larger than $test2:
Expand|Select|Wrap|Line Numbers
  1.  
  2. $test = 40;
  3. $test2 = 32;
  4.  
  5. print $test <=> $test2;
  6.  
The output is: 1.

Then $test2 must "go forward" to reach $test value. "1"

If they are the same:
Expand|Select|Wrap|Line Numbers
  1.  
  2. $test = 31;
  3. $test2 = 31;
  4.  
  5. print $test <=> $test2;
  6.  
The output is: 0.

$test2 doesn't need to go "anywhere" to reach $test value.

{} is known in perl as an anonymous "block".
When using sort the block must return 1, 0, or -1.

You can actually write your own subroutine in that place as long as it returns either 1, 0, or -1. Example:
Expand|Select|Wrap|Line Numbers
  1. sort sortit @array;
  2.  
  3. sub sortit { 
  4. $a <=> $b
  5. }
  6.  
This can be very useful for sorting multilevel arrays and/or hashes.
Here is a sub that will sort a hash, based on it's values, and if two keys have the same value it sorts according to the key.

Let's say you own a department store, and want to keep track of the various departments sales for each month. You would like the departments that earn the most money listed first, and departments that sale the same amount to be alphabetically sorted by name.

Expand|Select|Wrap|Line Numbers
  1. #!/usr/bin/perl -Tw
  2.  
  3. use strict;
  4.  
  5. my %sale = ( 'Sound'=> 45,
  6.              'Lighting'=> 25,
  7.              'Electronics'=> 89,
  8.              'Extra'=> 25,
  9.              'Clothing'=> 45,
  10.              'Home Goods'=> 67);
  11.  
  12. for my $dept ( sort sort_num_name keys %sale){
  13. print "$dept sold: \$$sale{$dept}\n";
  14. }
  15.  
  16. sub sort_num_name {
  17. $sale{$b} <=> $sale{$a} ||
  18. $a cmp $b;
  19. }
  20.  
This will output:
Expand|Select|Wrap|Line Numbers
  1. Electronics sold: $89
  2. Home Goods sold: $67
  3. Clothing sold: $45
  4. Sound sold: $45
  5. Extra sold: $25
  6. Lighting sold: $25
  7.  
This is because 0 is false to perl, so when two values are identical they return 0, and then the "||" or statement is executed.

You should never alter the values of $a or $b. You also cannot pass $a and $b into the sub via "$_". As well as you cannot use "next", "last", or recursive subroutines. $a and $b will be set by perl within the lexical scope of the sort function call. (You can use this same sub later to sort another Hash)


Read more here: Perldoc sort
Feb 26 '08 #5
KevinADC
4,059 Expert 2GB
Also for numbers..


Expand|Select|Wrap|Line Numbers
  1.  
  2. @array = qw(7 8 59 58 4  5 6 2 59);
  3.  
  4. @array2 = sort {$a <=> $b } @array;
  5.  
  6.  
Only the comparison operator is changed.

cmp for strings
<=> for numerical
Good comment. Thanks for adding it to the article.
Feb 27 '08 #6

Post your reply

Sign in to post your reply or Sign up for a free account.

Similar topics

2 posts views Thread by Xah Lee | last post: by
7 posts views Thread by Federico G. Babelis | last post: by
20 posts views Thread by Xah Lee | last post: by
19 posts views Thread by George Sakkis | last post: by
7 posts views Thread by Steve Bergman | last post: by
1 post views Thread by dorandoran | last post: by
reply views Thread by mihailmihai484 | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.