473,396 Members | 1,865 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,396 software developers and data experts.

sorting confusion

89
I have a tab separated text file which is as follows:

Expand|Select|Wrap|Line Numbers
  1. 000017        chr4
  2. 000034        chr11
  3. 000035        chrY
  4. 000038        chr4
  5. 000040        chr4
  6. 000041        chr20
  7. 000051        chrY
  8. 000051        chrY
  9. 000051        chrY
  10. 000051        chrY
  11. 000051        chrY
  12. 000051        chrY
  13. 000051        chrY
  14. 000051        chrY
  15. 000051        chrY
  16. 000051        chrY
  17. 000051        chrY
  18. 000051        chrX_random
  19. 000051        chrX_random
  20. 000051        chrX_random
  21. 023826        chr2
  22. 023827       chrY
  23. 023998        chr7
  24. 024100        chr6
  25. 024157        chr15
  26. 024245        chrY
  27. 024446        chrY
  28. 025091        chr2
  29. 025204        chrY
  30. 025431        chr19
  31. 024637        chr12
  32. 024834        chrY
  33. 024940        chrY
  34. 025747        chrY
  35. 026217        chr5
  36. 026398        chr18
  37. 026912        chr14
  38.  
Basically I want to see whether in each record, left column eg( 026398) is having chrY on the right column and 026398 is a single record in the whole of left column.

For example, 026912 contains only chr14, hence a unique record where as 000051 consists of many hits on the right hand side. I tried using some unique commands but bit confusing.
Please let me know the logic
Dec 1 '08 #1
8 1350
nithinpes
410 Expert 256MB
You can read the contents of the file, line by line and split each line across tab. You can check if the second element in the resulting array is the right-column value that you are looking for.
To determine uniqueness, you can make use of a hash with left-column value as key ($hash{$leftvalue}++). For any further assistance, you have to post the code that you tried.
Dec 1 '08 #2
Ganon11
3,652 Expert 2GB
Going a bit further with nithinpes' idea, you could probably have a hash whose keys are the left hand side numbers (i.e. 026912) and whose values are lists of results. For example:

Expand|Select|Wrap|Line Numbers
  1. %hash = (
  2.    026912 => [chr14], # 026912 had 1 hit
  3.    000051 => [chrY, chrY, chrY, chrX_random, chrX_random], # 000051 had 5 hits
  4.    # etc...
  5. );
To check uniqueness, just check the size of the array - if it is 1, that element is unique.
Dec 1 '08 #3
lilly07
89
Thanks, I could managed to add the value into a hash and it works fine till that. This is the first time, I am handling hash. Regarding the has manipulation, on how to check whether the array size of $value element is more than one ? I was able to find only the size of a hash table and couldn't find a way to manupulate among hash value or its size. Knidly let me know. Thanks.
Expand|Select|Wrap|Line Numbers
  1. #!/usr/bin/perl
  2. my %hashcount = ();
  3.  
  4. while(<>) {
  5.         chomp $_;
  6.         s/\s+/\t/g;
  7.         my(@v) = split(/\t/,$_);
  8.         $hashcount{$v[0]} .= $v[1];
  9. }
  10.  
  11. while( my ($k, $v) = each %hashcount ) {
  12.         print "key: $k, value: $v\n";
  13.     }
Dec 2 '08 #4
lilly07
89
I tried adding values into an array and then trying to find the array size. But it aways prints the array size as 1.

Expand|Select|Wrap|Line Numbers
  1. @value = $v;
  2.               $value = @value;
  3.         print "The array size: $value\n";
  4.         if($value >1) {
  5.         print "MULTIPLE HITS \n";
  6.         }
Hence I think the way I add elements into hash itself is giving me problem. How do I over come this? Thanks.
Dec 2 '08 #5
lilly07
89
Hi I managed to solve the problem. I added a comma while adding values to the hash and using plit function to get an array of elements while retrieving as below.
Expand|Select|Wrap|Line Numbers
  1. #!/usr/bin/perl
  2. my %hashcount = ();
  3.  
  4.  
  5. while(<>) {
  6.         chomp $_;
  7.         s/\s+/\t/g;
  8.         my(@v) = split(/\t/,$_);
  9.         $hashcount{$v[0]} .= $v[1]."," ;
  10. }
  11.  
  12. while( my ($k, $v) = each %hashcount ) {
  13.         print "key: $k, value: $v\n";
  14.         @value = split(/,/,$v);
  15.               $value = @value;
  16.         print "The array size: $value\n";
  17.         if($value == 1){
  18.  
  19.                 if( @value[0] eq "chrX") {
  20.                 print"UNIQUE HIT AT X\n";
  21.                 }else{
  22.                 print"UNIQUE HIT AT  NON X\n";
  23.                 }
  24.  
  25.  
  26.         }else {
  27.         print"MULTUPLE HITS \n";
  28. }
  29.  
  30. }
  31.  
  32.  
Thanks. Now my objective is to find the elements in @value. I need to check whether all the elements in @value are ChrX or mix oc ChrY and others or does not have chrX at all. Is there any short cut method to achieve this? Thanks.
Dec 2 '08 #6
KevinADC
4,059 Expert 2GB
Easier to use a hash of arrays:

Expand|Select|Wrap|Line Numbers
  1. my %hashcount = ();
  2.  
  3. while(<>) {
  4.         chomp;
  5.         my @v = split(/\s+/);
  6.         push @{$hashcount{$v[0]}}, $v[1];
  7. }
  8.  
  9. while( my ($k, $v) = each %hashcount ) {
  10.         print "key: $k, value: @{$v}\n";
  11. }
  12.  
Each hash key will have an array as its value instead of a string. You can check the size of the array associated with each hash key and/or loop/grep through the arrays to find whatever it is you need to find in them.
Dec 2 '08 #7
KevinADC
4,059 Expert 2GB
If you don't know about references yet then this code will look a little odd to you but they are not hard to learn.

Expand|Select|Wrap|Line Numbers
  1. my %hashcount = ();
  2.  
  3.  
  4. while(<>) {
  5.         chomp $_;
  6.         my @v = split(/\s+/,$_);
  7.         push @{$hashcount{$v[0]}}, $v[1];
  8. }
  9.  
  10. while( my ($k, $v) = each %hashcount ) {
  11.         print "key: $k, value: @{$v}\n";
  12.         print "The array size: ", scalar @{$v}, "\n";
  13.         if (@{$v} == 1){
  14.                 if( $v->[0] eq "chrX") {
  15.                    print"UNIQUE HIT AT X\n";
  16.                 }else{
  17.                    print"UNIQUE HIT AT  NON X\n";
  18.                 }
  19.  
  20.  
  21.         }else {
  22.         print"MULTUPLE HITS \n";
  23. }
  24.  
  25. }
  26.  
Dec 2 '08 #8
lilly07
89
Thanks Kevin. I learnt something new today.
Dec 2 '08 #9

Sign in to post your reply or Sign up for a free account.

Similar topics

4
by: dont bother | last post by:
This is really driving me crazy. I have a dictionary feature_vectors{}. I try to sort its keys using #apply sorting on feature_vectors sorted_feature_vector=feature_vectors.keys()...
11
by: Nikki | last post by:
Is it possible to sort a dataset rather than a dataview? I have a web service that returns a dataset which I would like to sort before returning it (this is so the sorting is standardised and so...
8
by: Mike MacSween | last post by:
tblCourses one to many to tblEvents. A course may have an intro workshop (a type of event), a mid course workshop, a final exam. Or any combination. Or something different in the future. At...
28
by: Bailey.W87 | last post by:
my professor give me this assignment. Sort the R's B's and W's in an array. for example, the user enter: R B W W B B R W W R R W R B W i need to swap the characters in the array and arrange it...
7
by: Pete Davis | last post by:
A different question this time. I have a DataGrid bound to a collection. Is there any way for me to allow sorting? The DataGrid.AllowSorting=true doesn't work, but that's probably because it can't...
7
by: Able | last post by:
Dear friends I am using FindRows methods of the DataView to select multiple rows as this: Dim custView As DataView = New DataView(custDS.Tables("Customers"), "", _ "CompanyName, ContactName",...
4
by: deancarstens | last post by:
Hi, This is a tougher one, but I'm quite sure someone will have a solution for this. Of course, a last minute thing thrown at me by my boss. I have a unique identifier consisting of regions,...
1
by: jearnshaw | last post by:
Newbie Moment!! I hope you guys can help. I admit it I know NOTHING about xml and css. But I need to get a large amount of data out of xml and onto an intranet site which uses restricted html.I...
16
by: barcaroller | last post by:
I have a map<T*that stores pointers to objects. How can I tell map<T*to use the objects' operator<() and not the value of the pointers for sorting? If that's not feasible, what alternatives do...
5
by: jrod11 | last post by:
hi, I found a jquery html table sorting code i have implemented. I am trying to figure out how to edit how many colums there are, but every time i remove code that I think controls how many colums...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.