By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
437,903 Members | 1,081 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 437,903 IT Pros & Developers. It's quick & easy.

sorting confusion

P: 89
I have a tab separated text file which is as follows:

Expand|Select|Wrap|Line Numbers
  1. 000017        chr4
  2. 000034        chr11
  3. 000035        chrY
  4. 000038        chr4
  5. 000040        chr4
  6. 000041        chr20
  7. 000051        chrY
  8. 000051        chrY
  9. 000051        chrY
  10. 000051        chrY
  11. 000051        chrY
  12. 000051        chrY
  13. 000051        chrY
  14. 000051        chrY
  15. 000051        chrY
  16. 000051        chrY
  17. 000051        chrY
  18. 000051        chrX_random
  19. 000051        chrX_random
  20. 000051        chrX_random
  21. 023826        chr2
  22. 023827       chrY
  23. 023998        chr7
  24. 024100        chr6
  25. 024157        chr15
  26. 024245        chrY
  27. 024446        chrY
  28. 025091        chr2
  29. 025204        chrY
  30. 025431        chr19
  31. 024637        chr12
  32. 024834        chrY
  33. 024940        chrY
  34. 025747        chrY
  35. 026217        chr5
  36. 026398        chr18
  37. 026912        chr14
  38.  
Basically I want to see whether in each record, left column eg( 026398) is having chrY on the right column and 026398 is a single record in the whole of left column.

For example, 026912 contains only chr14, hence a unique record where as 000051 consists of many hits on the right hand side. I tried using some unique commands but bit confusing.
Please let me know the logic
Dec 1 '08 #1
Share this Question
Share on Google+
8 Replies


nithinpes
Expert 100+
P: 410
You can read the contents of the file, line by line and split each line across tab. You can check if the second element in the resulting array is the right-column value that you are looking for.
To determine uniqueness, you can make use of a hash with left-column value as key ($hash{$leftvalue}++). For any further assistance, you have to post the code that you tried.
Dec 1 '08 #2

Ganon11
Expert 2.5K+
P: 3,652
Going a bit further with nithinpes' idea, you could probably have a hash whose keys are the left hand side numbers (i.e. 026912) and whose values are lists of results. For example:

Expand|Select|Wrap|Line Numbers
  1. %hash = (
  2.    026912 => [chr14], # 026912 had 1 hit
  3.    000051 => [chrY, chrY, chrY, chrX_random, chrX_random], # 000051 had 5 hits
  4.    # etc...
  5. );
To check uniqueness, just check the size of the array - if it is 1, that element is unique.
Dec 1 '08 #3

P: 89
Thanks, I could managed to add the value into a hash and it works fine till that. This is the first time, I am handling hash. Regarding the has manipulation, on how to check whether the array size of $value element is more than one ? I was able to find only the size of a hash table and couldn't find a way to manupulate among hash value or its size. Knidly let me know. Thanks.
Expand|Select|Wrap|Line Numbers
  1. #!/usr/bin/perl
  2. my %hashcount = ();
  3.  
  4. while(<>) {
  5.         chomp $_;
  6.         s/\s+/\t/g;
  7.         my(@v) = split(/\t/,$_);
  8.         $hashcount{$v[0]} .= $v[1];
  9. }
  10.  
  11. while( my ($k, $v) = each %hashcount ) {
  12.         print "key: $k, value: $v\n";
  13.     }
Dec 2 '08 #4

P: 89
I tried adding values into an array and then trying to find the array size. But it aways prints the array size as 1.

Expand|Select|Wrap|Line Numbers
  1. @value = $v;
  2.               $value = @value;
  3.         print "The array size: $value\n";
  4.         if($value >1) {
  5.         print "MULTIPLE HITS \n";
  6.         }
Hence I think the way I add elements into hash itself is giving me problem. How do I over come this? Thanks.
Dec 2 '08 #5

P: 89
Hi I managed to solve the problem. I added a comma while adding values to the hash and using plit function to get an array of elements while retrieving as below.
Expand|Select|Wrap|Line Numbers
  1. #!/usr/bin/perl
  2. my %hashcount = ();
  3.  
  4.  
  5. while(<>) {
  6.         chomp $_;
  7.         s/\s+/\t/g;
  8.         my(@v) = split(/\t/,$_);
  9.         $hashcount{$v[0]} .= $v[1]."," ;
  10. }
  11.  
  12. while( my ($k, $v) = each %hashcount ) {
  13.         print "key: $k, value: $v\n";
  14.         @value = split(/,/,$v);
  15.               $value = @value;
  16.         print "The array size: $value\n";
  17.         if($value == 1){
  18.  
  19.                 if( @value[0] eq "chrX") {
  20.                 print"UNIQUE HIT AT X\n";
  21.                 }else{
  22.                 print"UNIQUE HIT AT  NON X\n";
  23.                 }
  24.  
  25.  
  26.         }else {
  27.         print"MULTUPLE HITS \n";
  28. }
  29.  
  30. }
  31.  
  32.  
Thanks. Now my objective is to find the elements in @value. I need to check whether all the elements in @value are ChrX or mix oc ChrY and others or does not have chrX at all. Is there any short cut method to achieve this? Thanks.
Dec 2 '08 #6

KevinADC
Expert 2.5K+
P: 4,059
Easier to use a hash of arrays:

Expand|Select|Wrap|Line Numbers
  1. my %hashcount = ();
  2.  
  3. while(<>) {
  4.         chomp;
  5.         my @v = split(/\s+/);
  6.         push @{$hashcount{$v[0]}}, $v[1];
  7. }
  8.  
  9. while( my ($k, $v) = each %hashcount ) {
  10.         print "key: $k, value: @{$v}\n";
  11. }
  12.  
Each hash key will have an array as its value instead of a string. You can check the size of the array associated with each hash key and/or loop/grep through the arrays to find whatever it is you need to find in them.
Dec 2 '08 #7

KevinADC
Expert 2.5K+
P: 4,059
If you don't know about references yet then this code will look a little odd to you but they are not hard to learn.

Expand|Select|Wrap|Line Numbers
  1. my %hashcount = ();
  2.  
  3.  
  4. while(<>) {
  5.         chomp $_;
  6.         my @v = split(/\s+/,$_);
  7.         push @{$hashcount{$v[0]}}, $v[1];
  8. }
  9.  
  10. while( my ($k, $v) = each %hashcount ) {
  11.         print "key: $k, value: @{$v}\n";
  12.         print "The array size: ", scalar @{$v}, "\n";
  13.         if (@{$v} == 1){
  14.                 if( $v->[0] eq "chrX") {
  15.                    print"UNIQUE HIT AT X\n";
  16.                 }else{
  17.                    print"UNIQUE HIT AT  NON X\n";
  18.                 }
  19.  
  20.  
  21.         }else {
  22.         print"MULTUPLE HITS \n";
  23. }
  24.  
  25. }
  26.  
Dec 2 '08 #8

P: 89
Thanks Kevin. I learnt something new today.
Dec 2 '08 #9

Post your reply

Sign in to post your reply or Sign up for a free account.