472,782 Members | 1,125 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes and contribute your articles to a community of 472,782 developers and data experts.

Sorting data with perl - Part One

KevinADC
4,059 Expert 2GB
Introduction

This discussion of the sort function is targeted at beginners to perl coding. More experienced perl coders will find nothing new or useful. Sorting lists or arrays is a very common requirement of programs. If you don't know the difference between a list and array don't worry about it. A list is just an array without a name. They both can hold the same type of data: scalars or strings. I will use the terms "list" and "array" to mean the same thing.

The sort() Function

The sort() function sorts a copy of the original list and returns a new list. This means you can't use the sort() function in a void context like you can with some of perls other built-in functions:

Expand|Select|Wrap|Line Numbers
  1. sort(@array); # bad - void context
You have to assign the results to an array using the assignment operator "=":

Expand|Select|Wrap|Line Numbers
  1. @sorted = sort(@arrray);
But you can use the same name as the original array to sort the original array:

Expand|Select|Wrap|Line Numbers
  1. @array = sort(@arrray);
Keep in mind that the original unsorted values of @array are gone when you do that.

Perl's built-in function for sorting is generally not much use with real-world data. The default sort is by standard string comparison. To the uninitiated perl coder this can be quite confusing. Lets look at an example.

Expand|Select|Wrap|Line Numbers
  1. @array = qw(@foo 1 32 11 4 2 44 22 !bar Mary mary Adam ant xxx XXX);
  2. @sorted = sort (@array);
  3. print "$_\n" for @sorted;
  4.  
The sorted output would be:

Expand|Select|Wrap|Line Numbers
  1. !bar
  2. 1
  3. 11
  4. 2
  5. 22
  6. 32
  7. 4
  8. 44
  9. @foo
  10. Adam
  11. Mary
  12. XXX
  13. ant
  14. mary
  15. xxx
  16.  
The output is sorted in ascending ASCII value, not very useful for most practical applications. But if you were sorting normalized data, all lower-case alpha strings for example, the default sort will work without you having to do anything else.

To sort data in descending order prefix the sort function with the reverse function:

Expand|Select|Wrap|Line Numbers
  1. @sorted = reverse sort (@array);
Another method is to use a code block and perls "cmp" string comparison operator:

Expand|Select|Wrap|Line Numbers
  1. @sorted = sort {$b cmp $a} @array; # descending order same as reverse sort above
  2. @sorted = sort {$a cmp $b} @array; # ascending order same as the default sort 
  3.  
I did not arbitrarily decide to use $a and $b in the above example. $a and $b are special scalar variables that perl uses to sort data. You should avoid using them in your perl programs for anything else and they should not be declared with "my" when using the "strict" pragma.

Part Two will discuss more advanced ways to sort data using perl.

This article is protected under the Creative Commons License .
Dec 23 '07 #1
5 6550
KevinADC
4,059 Expert 2GB
As of perl 5.10 this comment is no longer true:

The sort() function sorts a copy of the original list and returns a new list. This means you can't use the sort() function in a void context like you can with some of perls other built-in functions:
Perl no longer sorts a copy of an array if you sort the original array ( in-place sorting ):

Expand|Select|Wrap|Line Numbers
  1. @array = sort @array;
But you still can't use the sort() function in a void context. There have been other changes to the sort() function in perl 5.10 which can be read on the perldoc.perl.org website on the History/Changes page.
Jan 17 '08 #2
Kelicula
176 Expert 100+
Also for numbers..


Expand|Select|Wrap|Line Numbers
  1.  
  2. @array = qw(7 8 59 58 4  5 6 2 59);
  3.  
  4. @array2 = sort {$a <=> $b } @array;
  5.  
  6.  
Only the comparison operator is changed.

cmp for strings
<=> for numerical
Feb 24 '08 #3
Also for numbers..


Expand|Select|Wrap|Line Numbers
  1.  
  2. @array = qw(7 8 59 58 4  5 6 2 59);
  3.  
  4. @array2 = sort {$a <=> $b } @array;
  5.  
  6.  
Only the comparison operator is changed.

cmp for strings
<=> for numerical
I am little confused what it print for $a and $b, how the comparision is really happening. can you explain.....

#!/usr/bin/perl

@array = qw(7 8 59 58 4 5 6 2 59);

@array2 = sort {$a <=> $b; print "$a and $b.... comp.. \n"; } @array;


print "$_\n" for @array2;
Feb 25 '08 #4
Kelicula
176 Expert 100+
I am little confused what it print for $a and $b, how the comparision is really happening. can you explain.....

#!/usr/bin/perl

@array = qw(7 8 59 58 4 5 6 2 59);

@array2 = sort {$a <=> $b; print "$a and $b.... comp.. \n"; } @array;


print "$_\n" for @array2;
$a and $b are "internal" global perl variables. As mentioned earlier in Kevin's article.
I did not chose those names. When testing for comparison the "block" or function will return one of three values: -1, 0, 1.

example:
Expand|Select|Wrap|Line Numbers
  1.  
  2. $test = 31;
  3. $test2 = 32;
  4.  
  5. print $test <=> $test2;
  6.  
The output is: -1.

Here is how I think of it: $test2 must "go back" to reach $test value.

If $test were larger than $test2:
Expand|Select|Wrap|Line Numbers
  1.  
  2. $test = 40;
  3. $test2 = 32;
  4.  
  5. print $test <=> $test2;
  6.  
The output is: 1.

Then $test2 must "go forward" to reach $test value. "1"

If they are the same:
Expand|Select|Wrap|Line Numbers
  1.  
  2. $test = 31;
  3. $test2 = 31;
  4.  
  5. print $test <=> $test2;
  6.  
The output is: 0.

$test2 doesn't need to go "anywhere" to reach $test value.

{} is known in perl as an anonymous "block".
When using sort the block must return 1, 0, or -1.

You can actually write your own subroutine in that place as long as it returns either 1, 0, or -1. Example:
Expand|Select|Wrap|Line Numbers
  1. sort sortit @array;
  2.  
  3. sub sortit { 
  4. $a <=> $b
  5. }
  6.  
This can be very useful for sorting multilevel arrays and/or hashes.
Here is a sub that will sort a hash, based on it's values, and if two keys have the same value it sorts according to the key.

Let's say you own a department store, and want to keep track of the various departments sales for each month. You would like the departments that earn the most money listed first, and departments that sale the same amount to be alphabetically sorted by name.

Expand|Select|Wrap|Line Numbers
  1. #!/usr/bin/perl -Tw
  2.  
  3. use strict;
  4.  
  5. my %sale = ( 'Sound'=> 45,
  6.              'Lighting'=> 25,
  7.              'Electronics'=> 89,
  8.              'Extra'=> 25,
  9.              'Clothing'=> 45,
  10.              'Home Goods'=> 67);
  11.  
  12. for my $dept ( sort sort_num_name keys %sale){
  13. print "$dept sold: \$$sale{$dept}\n";
  14. }
  15.  
  16. sub sort_num_name {
  17. $sale{$b} <=> $sale{$a} ||
  18. $a cmp $b;
  19. }
  20.  
This will output:
Expand|Select|Wrap|Line Numbers
  1. Electronics sold: $89
  2. Home Goods sold: $67
  3. Clothing sold: $45
  4. Sound sold: $45
  5. Extra sold: $25
  6. Lighting sold: $25
  7.  
This is because 0 is false to perl, so when two values are identical they return 0, and then the "||" or statement is executed.

You should never alter the values of $a or $b. You also cannot pass $a and $b into the sub via "$_". As well as you cannot use "next", "last", or recursive subroutines. $a and $b will be set by perl within the lexical scope of the sort function call. (You can use this same sub later to sort another Hash)


Read more here: Perldoc sort
Feb 26 '08 #5
KevinADC
4,059 Expert 2GB
Also for numbers..


Expand|Select|Wrap|Line Numbers
  1.  
  2. @array = qw(7 8 59 58 4  5 6 2 59);
  3.  
  4. @array2 = sort {$a <=> $b } @array;
  5.  
  6.  
Only the comparison operator is changed.

cmp for strings
<=> for numerical
Good comment. Thanks for adding it to the article.
Feb 27 '08 #6

Sign in to post your reply or Sign up for a free account.

Similar topics

2
by: Xah Lee | last post by:
Today we'll write a program that can sort a matrix in all possible ways. Here's the Perl documentation. I'll post a Perl and Python version in 2 days. ----------- sort_matrix( $matrix, , ,...
7
by: Federico G. Babelis | last post by:
Hi All: I have this line of code, but the syntax check in VB.NET 2003 and also in VB.NET 2005 Beta 2 shows as unknown: Dim local4 As Byte Fixed(local4 = AddressOf dest(offset)) ...
20
by: Xah Lee | last post by:
Sort a List Xah Lee, 200510 In this page, we show how to sort a list in Python & Perl and also discuss some math of sort. To sort a list in Python, use the “sort” method. For example: ...
19
by: George Sakkis | last post by:
It would be useful if list.sort() accepted two more optional parameters, start and stop, so that you can sort a slice in place. In other words, x = range(1000000) x.sort(start=3, stop=-1) ...
5
by: cnsabar | last post by:
Hi., I am having the index pg no. data in file .. i need to sort the data ., Can any one help regarding this using PERL .. Its very urgent ... Source file <ce:intra-ref id="10011#f0070"/>310f,...
1
KevinADC
by: KevinADC | last post by:
Introduction In part one we discussed the default sort function. In part two we will discuss more advanced techniques you can use to sort data. Some of the techniques might introduce unfamiliar...
3
KevinADC
by: KevinADC | last post by:
If you are entirely unfamiliar with using Perl to sort data, read the "Sorting Data with Perl - Part One and Two" articles before reading this article. Beginning Perl coders may find this article...
7
by: Steve Bergman | last post by:
I'm involved in a discussion thread in which it has been stated that: """ Anything written in a language that is 20x slower (Perl, Python, PHP) than C/C++ should be instantly rejected by users...
1
by: dorandoran | last post by:
The sort on the childgrid is not working; nothing happens when I click on the each column header for sort. (I followed Satay's sample: http://www.codeproject.com/KB/aspnet/EditNestedGridView.aspx)...
3
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 2 August 2023 starting at 18:00 UK time (6PM UTC+1) and finishing at about 19:15 (7.15PM) The start time is equivalent to 19:00 (7PM) in Central...
0
by: erikbower65 | last post by:
Using CodiumAI's pr-agent is simple and powerful. Follow these steps: 1. Install CodiumAI CLI: Ensure Node.js is installed, then run 'npm install -g codiumai' in the terminal. 2. Connect to...
0
linyimin
by: linyimin | last post by:
Spring Startup Analyzer generates an interactive Spring application startup report that lets you understand what contributes to the application startup time and helps to optimize it. Support for...
0
by: erikbower65 | last post by:
Here's a concise step-by-step guide for manually installing IntelliJ IDEA: 1. Download: Visit the official JetBrains website and download the IntelliJ IDEA Community or Ultimate edition based on...
0
by: kcodez | last post by:
As a H5 game development enthusiast, I recently wrote a very interesting little game - Toy Claw ((http://claw.kjeek.com/))。Here I will summarize and share the development experience here, and hope it...
0
by: Rina0 | last post by:
I am looking for a Python code to find the longest common subsequence of two strings. I found this blog post that describes the length of longest common subsequence problem and provides a solution in...
0
by: lllomh | last post by:
Define the method first this.state = { buttonBackgroundColor: 'green', isBlinking: false, // A new status is added to identify whether the button is blinking or not } autoStart=()=>{
0
by: Mushico | last post by:
How to calculate date of retirement from date of birth
2
by: DJRhino | last post by:
Was curious if anyone else was having this same issue or not.... I was just Up/Down graded to windows 11 and now my access combo boxes are not acting right. With win 10 I could start typing...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.