sorting confusion

I have a tab separated text file which is as follows:

Expand|Select|Wrap|Line Numbers

 
000017        chr4

000034        chr11

000035        chrY

000038        chr4

000040        chr4

000041        chr20

000051        chrY

000051        chrY

000051        chrY

000051        chrY

000051        chrY

000051        chrY

000051        chrY

000051        chrY

000051        chrY

000051        chrY

000051        chrY

000051        chrX_random

000051        chrX_random

000051        chrX_random

023826        chr2

023827       chrY

023998        chr7

024100        chr6

024157        chr15

024245        chrY

024446        chrY

025091        chr2

025204        chrY

025431        chr19

024637        chr12

024834        chrY

024940        chrY

025747        chrY

026217        chr5

026398        chr18

026912        chr14

Basically I want to see whether in each record, left column eg( 026398) is having chrY on the right column and 026398 is a single record in the whole of left column.

For example, 026912 contains only chr14, hence a unique record where as 000051 consists of many hits on the right hand side. I tried using some unique commands but bit confusing.
Please let me know the logic

Dec 1 '08 #1

Subscribe Post Reply

1350

nithinpes

410

Expert 256MB

You can read the contents of the file, line by line and split each line across tab. You can check if the second element in the resulting array is the right-column value that you are looking for.
To determine uniqueness, you can make use of a hash with left-column value as key ($hash{$leftvalue}++). For any further assistance, you have to post the code that you tried.

Dec 1 '08 #2

Ganon11

3,652

Expert 2GB

Going a bit further with nithinpes' idea, you could probably have a hash whose keys are the left hand side numbers (i.e. 026912) and whose values are lists of results. For example:

Expand|Select|Wrap|Line Numbers

 %hash = (

   026912 => [chr14], # 026912 had 1 hit

   000051 => [chrY, chrY, chrY, chrX_random, chrX_random], # 000051 had 5 hits

   # etc...

);

To check uniqueness, just check the size of the array - if it is 1, that element is unique.

Dec 1 '08 #3

lilly07

Thanks, I could managed to add the value into a hash and it works fine till that. This is the first time, I am handling hash. Regarding the has manipulation, on how to check whether the array size of $value element is more than one ? I was able to find only the size of a hash table and couldn't find a way to manupulate among hash value or its size. Knidly let me know. Thanks.

Expand|Select|Wrap|Line Numbers

 #!/usr/bin/perl

my %hashcount = ();
 
while(<>) {

        chomp $_;

        s/\s+/\t/g;

        my(@v) = split(/\t/,$_);

        $hashcount{$v[0]} .= $v[1];

}
 
while( my ($k, $v) = each %hashcount ) {

        print "key: $k, value: $v\n";

    }

Dec 2 '08 #4

lilly07

I tried adding values into an array and then trying to find the array size. But it aways prints the array size as 1.

Expand|Select|Wrap|Line Numbers

 @value = $v;

              $value = @value;

        print "The array size: $value\n";

        if($value >1) {

        print "MULTIPLE HITS \n";

        }

Hence I think the way I add elements into hash itself is giving me problem. How do I over come this? Thanks.

Dec 2 '08 #5

lilly07

Hi I managed to solve the problem. I added a comma while adding values to the hash and using plit function to get an array of elements while retrieving as below.

Expand|Select|Wrap|Line Numbers

 
#!/usr/bin/perl

my %hashcount = ();
 
while(<>) {

        chomp $_;

        s/\s+/\t/g;

        my(@v) = split(/\t/,$_);

        $hashcount{$v[0]} .= $v[1]."," ;

}
 
while( my ($k, $v) = each %hashcount ) {

        print "key: $k, value: $v\n";

        @value = split(/,/,$v);

              $value = @value;

        print "The array size: $value\n";

        if($value == 1){
 
                if( @value[0] eq "chrX") {

                print"UNIQUE HIT AT X\n";

                }else{

                print"UNIQUE HIT AT  NON X\n";

                }
 
        }else {

        print"MULTUPLE HITS \n";

}
 
}

Thanks. Now my objective is to find the elements in @value. I need to check whether all the elements in @value are ChrX or mix oc ChrY and others or does not have chrX at all. Is there any short cut method to achieve this? Thanks.

Dec 2 '08 #6

KevinADC

4,059

Expert 2GB

Easier to use a hash of arrays:

Expand|Select|Wrap|Line Numbers

 
my %hashcount = ();
 
while(<>) {

        chomp;

        my @v = split(/\s+/);

        push @{$hashcount{$v[0]}}, $v[1];

}
 
while( my ($k, $v) = each %hashcount ) {

        print "key: $k, value: @{$v}\n";

}

Each hash key will have an array as its value instead of a string. You can check the size of the array associated with each hash key and/or loop/grep through the arrays to find whatever it is you need to find in them.

Dec 2 '08 #7

KevinADC

4,059

Expert 2GB

If you don't know about references yet then this code will look a little odd to you but they are not hard to learn.

Expand|Select|Wrap|Line Numbers

 
my %hashcount = ();
 
while(<>) {

        chomp $_;

        my @v = split(/\s+/,$_);

        push @{$hashcount{$v[0]}}, $v[1];

}
 
while( my ($k, $v) = each %hashcount ) {

        print "key: $k, value: @{$v}\n";

        print "The array size: ", scalar @{$v}, "\n";

        if (@{$v} == 1){

                if( $v->[0] eq "chrX") {

                   print"UNIQUE HIT AT X\n";

                }else{

                   print"UNIQUE HIT AT  NON X\n";

                }
 
        }else {

        print"MULTUPLE HITS \n";

}
 
}

Dec 2 '08 #8

lilly07

Thanks Kevin. I learnt something new today.

Dec 2 '08 #9

by: dont bother | last post by:

This is really driving me crazy. I have a dictionary feature_vectors{}. I try to sort its keys using #apply sorting on feature_vectors sorted_feature_vector=feature_vectors.keys()...

Python

Sorting a dataset

by: Nikki | last post by:

Is it possible to sort a dataset rather than a dataview? I have a web service that returns a dataset which I would like to sort before returning it (this is so the sorting is standardised and so...

.NET Framework

Formatting within Access report text controls - OR - sorting and grouping in Word

by: Mike MacSween | last post by:

tblCourses one to many to tblEvents. A course may have an intro workshop (a type of event), a mid course workshop, a final exam. Or any combination. Or something different in the future. At...

Microsoft Access / VBA

HELP ME on Sorting Characters.

by: Bailey.W87 | last post by:

my professor give me this assignment. Sort the R's B's and W's in an array. for example, the user enter: R B W W B B R W W R R W R B W i need to swap the characters in the array and arrange it...

C / C++

Sorting DataGrid bound to collection

by: Pete Davis | last post by:

A different question this time. I have a DataGrid bound to a collection. Is there any way for me to allow sorting? The DataGrid.AllowSorting=true doesn't work, but that's probably because it can't...

C# / C Sharp

DataRowView Sorting?

by: Able | last post by:

Dear friends I am using FindRows methods of the DataView to select multiple rows as this: Dim custView As DataView = New DataView(custDS.Tables("Customers"), "", _ "CompanyName, ContactName",...

Visual Basic .NET

Sorting by using parts of a string

by: deancarstens | last post by:

Hi, This is a tougher one, but I'm quite sure someone will have a solution for this. Of course, a last minute thing thrown at me by my boss. I have a unique identifier consisting of regions,...

Microsoft Access / VBA

help with xml and css - sorting

by: jearnshaw | last post by:

Newbie Moment!! I hope you guys can help. I admit it I know NOTHING about xml and css. But I need to get a large amount of data out of xml and onto an intranet site which uses restricted html.I...

.NET Framework

Sorting a map<T*>

by: barcaroller | last post by:

I have a map<T*that stores pointers to objects. How can I tell map<T*to use the objects' operator<() and not the value of the pointers for sorting? If that's not feasible, what alternatives do...

C / C++

Jquery table sorting help

by: jrod11 | last post by:

hi, I found a jquery html table sorting code i have implemented. I am trying to figure out how to edit how many colums there are, but every time i remove code that I think controls how many colums...

Javascript

How to turn on java script in a villaon keypad mobile phone

by: Charles Arthur | last post by:

How do i turn on java script on a villaon, callus and itel keypad mobile phone

Java

Merging data from multiple Excel files

by: ryjfgjl | last post by:

In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...

Data Management

Migrating Website to Cloud - Emmanuel Katto

by: emmanuelkatto | last post by:

Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel

General

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

General

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...

General

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

Online Marketing

AI Job Threat for Devs

by: agi2029 | last post by:

Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

Career Advice

sorting confusion

Similar topics