469,287 Members | 2,666 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 469,287 developers. It's quick & easy.

sorting of array for duplicacy

18
i need a syntax to sort elements of an array to remove duplicacy if any.
i tried sort -u to sort a file BUT i need to do this sorting on array.
plz help me if it could be.
with regard
Oct 1 '06 #1
9 6297
i need a syntax to sort elements of an array to remove duplicacy if any.
i tried sort -u to sort a file BUT i need to do this sorting on array.
plz help me if it could be.
with regard

hi,

have you tried using the function sort

sort will sort the given array and return an sorted array.

suppose you have got an array named array1 with duplicate data in unsorted manner.

@array2 = sort(@array1);
open(fp1,">file.txt") || die "could not open the file for writting";
foreach my $element (@array1)
{
print fp1 $element;
print fp1 "\n";
}
close(fp1);
#fire the uniq command on the file and redirect the output to a new file
system("uniq file.txt > file1.txt");
#dump the vontnet of file in array;
open(fp2,"file1.txt");
@array3=<fp2>;
close(fp3);

#array3 contains the uniw sorted data
#you can even use other logic to picj uniq elements from the array.

let me know if this approch solves your problem.
Oct 3 '06 #2
sstouk
3
my(@array1) = ("1 one","1 one","2 two","3 three","3 three","4 four");
my(@array2);
my(%hash) = undef;
foreach (@array1) {$hash{$_}++};
foreach (sort keys %hash) {push @array2, $_};
print "\@array1 = @array1\n";
print "\@array2 = @array2\n";
Oct 4 '06 #3
my(@array1) = ("1 one","1 one","2 two","3 three","3 three","4 four");
my(@array2);
my(%hash) = undef;
foreach (@array1) {$hash{$_}++};
foreach (sort keys %hash) {push @array2, $_};
print "\@array1 = @array1\n";
print "\@array2 = @array2\n";
What should I do if I want the array elements to remain in the order in which they were before sorting.And the duplicacy should not exist.
Oct 12 '06 #4
mamoon
18
hi,
thanks.this script i tried earlier but it is too long.
i needed a short script, that sort array without involving lots of file handling and redirecting.
well the alternate script is-

[system ("sort -u file.txt >file1.txt");]

the above script will sort file.txt into file1.txt removing duplicacy.BUT the order will change.
bye
Oct 13 '06 #5
mamoon
18
my(@array1) = ("1 one","1 one","2 two","3 three","3 three","4 four");
my(@array2);
my(%hash) = undef;
foreach (@array1) {$hash{$_}++};
foreach (sort keys %hash) {push @array2, $_};
print "\@array1 = @array1\n";
print "\@array2 = @array2\n";

[HTML]hi sstouk,
thanks alot. i got so many approaches to this problem. well puzzle still persists.
i am giving input and output both. can u suggest some perl script.
input [/HTML]
Expand|Select|Wrap|Line Numbers
  1. imp_185#0.0063
  2. imp_184#0.018
  3. imp_185#0.59
  4. imp_184#0.59
  5. amla_33#2.5
  6. imp_378#2.4
  7. imp_83#6.9
  8.  
output
Expand|Select|Wrap|Line Numbers
  1.  
  2. amla_33#2.5
  3. imp_184#0.018
  4. imp_185#0.0063
  5. imp_378#2.4
  6. imp_83#6.9 
i mean to remove duplicacy on the left hand side of # but according to the values on the right hand side of #.
waiting eagerly
Oct 13 '06 #6
miller
1,089 Expert 1GB
Expand|Select|Wrap|Line Numbers
  1. #!/usr/bin/perl
  2.  
  3. my $inFile = $ARGV[0] or die "no file specified";
  4. my $outFile = $inFile . '.unique';
  5. my $dupFile = $inFile . '.dup';
  6.  
  7. local *INPUT, *OUTPUT, *DUPS;
  8.  
  9. open(IN, "<$inFile") or die "open $inFile: $!";
  10. open(OUT, ">$outFile") or die "open >$outFile: $!";
  11. open(DUP, ">$dupFile") or die "open >$dupFile: $!";
  12.  
  13. my %beenSeen;
  14. while (my $line = <IN>) {
  15.     next unless $line =~ m{(.*)#.*}; # Skip Empty Lines
  16.  
  17.     if (! $beenSeen{$1}++) {
  18.         print OUT $line or die "write $outFile: $!";
  19.     } else {
  20.         print DUP $line or die "write $dupFile: $!";
  21.     }
  22. }
  23.  
  24. close(IN) or die "close $inFile: $!";
  25. close(OUT) or die "close $outFile: $!";
  26. close(DUP) or die "close $dupFile: $!";
  27.  
  28. 1;
  29.  
  30. __END__
  31.  
The above script takes a file name parameter, and outputs two new files. One file is created with the extension ".unique", the other with the extension ".dup". All unique keys that are found in the source file and routed to the unique output, and alternatively all subsequent duplicates are routed to the dup output. This should be more than enough code for you to figure out how to more accurately match whatever your specifications are.

<rant>
Please note that next time you ask a question, it would help if you more accurately stated what your problem is. The subject of this request "sorting of array for duplicacy" was actually not what you wanted. Instead what you desired was "Removing duplicates from a file". Obviously the sort utility in unix almost achieves this, but fixation on this attempted solution clouded your request and actually introduced new problems. Instead, be sure to state exactly what you want next time, and you'll be more likely to get a solution in a timely manner.
</rant>
Oct 16 '06 #7
Dear Sstouk,

I came across your piece of code in this thread. The two lines below help me perfectly to get the highest group id from "/etc/group" after having spunged it with "endpwent()". However I have not been able to figure out exactly what happens with the arrays and hash hidden in there somewhere ... :)

foreach (@array1) {$hash{$_}++};
foreach (sort keys %hash) {push @array2, $_};

Would you be so kind to explain in a nutshell what these do?

Thanx a lot!

Regards,

Gerard.
Dec 1 '06 #8
Miller,
I came across the following code that you posted a few days back. I want to do something similar to what you are doing here. I did not want to copy and use the code blindly. I am trying to understand what you are doing here.
I am having trouble understanding the key line:

if (! $beenSeen{$1}++) {

I am kind of confused as to where do you assign any value to the %beenSeen before yuouse it in the if statement? What does {$1} mean in this context?

I will really appreciate if you could explain.
Thanks
M

Expand|Select|Wrap|Line Numbers
  1. #!/usr/bin/perl
  2.  
  3. my $inFile = $ARGV[0] or die "no file specified";
  4. my $outFile = $inFile . '.unique';
  5. my $dupFile = $inFile . '.dup';
  6.  
  7. local *INPUT, *OUTPUT, *DUPS;
  8.  
  9. open(IN, "<$inFile") or die "open $inFile: $!";
  10. open(OUT, ">$outFile") or die "open >$outFile: $!";
  11. open(DUP, ">$dupFile") or die "open >$dupFile: $!";
  12.  
  13. my %beenSeen;
  14. while (my $line = <IN>) {
  15.     next unless $line =~ m{(.*)#.*}; # Skip Empty Lines
  16.  
  17.     if (! $beenSeen{$1}++) {
  18.         print OUT $line or die "write $outFile: $!";
  19.     } else {
  20.         print DUP $line or die "write $dupFile: $!";
  21.     }
  22. }
  23.  
  24. close(IN) or die "close $inFile: $!";
  25. close(OUT) or die "close $outFile: $!";
  26. close(DUP) or die "close $dupFile: $!";
  27.  
  28. 1;
  29.  
  30. __END__
  31.  
The above script takes a file name parameter, and outputs two new files. One file is created with the extension ".unique", the other with the extension ".dup". All unique keys that are found in the source file and routed to the unique output, and alternatively all subsequent duplicates are routed to the dup output. This should be more than enough code for you to figure out how to more accurately match whatever your specifications are.

<rant>
Please note that next time you ask a question, it would help if you more accurately stated what your problem is. The subject of this request "sorting of array for duplicacy" was actually not what you wanted. Instead what you desired was "Removing duplicates from a file". Obviously the sort utility in unix almost achieves this, but fixation on this attempted solution clouded your request and actually introduced new problems. Instead, be sure to state exactly what you want next time, and you'll be more likely to get a solution in a timely manner.
</rant>
Dec 6 '06 #9
miller
1,089 Expert 1GB
Expand|Select|Wrap|Line Numbers
  1. my %beenSeen;
  2. while (my $line = <IN>) {
  3.     # This matches his specific record type, ex: "imp_185#0.0063"
  4.     # - It extracts the value he wants to filter by, and assigns that to $1
  5.     next unless $line =~ m{(.*)#.*}; # Skip Empty Lines
  6.  
  7.     # The only challenge to understanding this line is to respect the order
  8.     # of operations.  The ++ in this statement is a post-incrementer, meaning
  9.     # the value is only incremented after all other operations are done.
  10.     # Therefore, all values of $1 will return true the first time, but every
  11.     # subsequent if, the value will be found in %beenSeen and will return false.
  12.     if (! $beenSeen{$1}++) {
  13.         print OUT $line or die "write $outFile: $!";
  14.     } else {
  15.         print DUP $line or die "write $dupFile: $!";
  16.     }
  17. }
  18.  
Dec 7 '06 #10

Post your reply

Sign in to post your reply or Sign up for a free account.

Similar topics

3 posts views Thread by Paul Kirby | last post: by
7 posts views Thread by Federico G. Babelis | last post: by
3 posts views Thread by SilverWolf | last post: by
7 posts views Thread by Kamal | last post: by
5 posts views Thread by lemlimlee | last post: by
5 posts views Thread by jrod11 | last post: by
1 post views Thread by CARIGAR | last post: by
reply views Thread by suresh191 | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.