I have a tab separated text file which is as follows: -
000017 chr4
-
000034 chr11
-
000035 chrY
-
000038 chr4
-
000040 chr4
-
000041 chr20
-
000051 chrY
-
000051 chrY
-
000051 chrY
-
000051 chrY
-
000051 chrY
-
000051 chrY
-
000051 chrY
-
000051 chrY
-
000051 chrY
-
000051 chrY
-
000051 chrY
-
000051 chrX_random
-
000051 chrX_random
-
000051 chrX_random
-
023826 chr2
-
023827 chrY
-
023998 chr7
-
024100 chr6
-
024157 chr15
-
024245 chrY
-
024446 chrY
-
025091 chr2
-
025204 chrY
-
025431 chr19
-
024637 chr12
-
024834 chrY
-
024940 chrY
-
025747 chrY
-
026217 chr5
-
026398 chr18
-
026912 chr14
-
Basically I want to see whether in each record, left column eg( 026398) is having chrY on the right column and 026398 is a single record in the whole of left column.
For example, 026912 contains only chr14, hence a unique record where as 000051 consists of many hits on the right hand side. I tried using some unique commands but bit confusing.
Please let me know the logic
8 1350
You can read the contents of the file, line by line and split each line across tab. You can check if the second element in the resulting array is the right-column value that you are looking for.
To determine uniqueness, you can make use of a hash with left-column value as key ($hash{$leftvalue}++). For any further assistance, you have to post the code that you tried.
Going a bit further with nithinpes' idea, you could probably have a hash whose keys are the left hand side numbers (i.e. 026912) and whose values are lists of results. For example: - %hash = (
-
026912 => [chr14], # 026912 had 1 hit
-
000051 => [chrY, chrY, chrY, chrX_random, chrX_random], # 000051 had 5 hits
-
# etc...
-
);
To check uniqueness, just check the size of the array - if it is 1, that element is unique.
Thanks, I could managed to add the value into a hash and it works fine till that. This is the first time, I am handling hash. Regarding the has manipulation, on how to check whether the array size of $value element is more than one ? I was able to find only the size of a hash table and couldn't find a way to manupulate among hash value or its size. Knidly let me know. Thanks. - #!/usr/bin/perl
-
my %hashcount = ();
-
-
while(<>) {
-
chomp $_;
-
s/\s+/\t/g;
-
my(@v) = split(/\t/,$_);
-
$hashcount{$v[0]} .= $v[1];
-
}
-
-
while( my ($k, $v) = each %hashcount ) {
-
print "key: $k, value: $v\n";
-
}
I tried adding values into an array and then trying to find the array size. But it aways prints the array size as 1. - @value = $v;
-
$value = @value;
-
print "The array size: $value\n";
-
if($value >1) {
-
print "MULTIPLE HITS \n";
-
}
Hence I think the way I add elements into hash itself is giving me problem. How do I over come this? Thanks.
Hi I managed to solve the problem. I added a comma while adding values to the hash and using plit function to get an array of elements while retrieving as below. -
#!/usr/bin/perl
-
my %hashcount = ();
-
-
-
while(<>) {
-
chomp $_;
-
s/\s+/\t/g;
-
my(@v) = split(/\t/,$_);
-
$hashcount{$v[0]} .= $v[1]."," ;
-
}
-
-
while( my ($k, $v) = each %hashcount ) {
-
print "key: $k, value: $v\n";
-
@value = split(/,/,$v);
-
$value = @value;
-
print "The array size: $value\n";
-
if($value == 1){
-
-
if( @value[0] eq "chrX") {
-
print"UNIQUE HIT AT X\n";
-
}else{
-
print"UNIQUE HIT AT NON X\n";
-
}
-
-
-
}else {
-
print"MULTUPLE HITS \n";
-
}
-
-
}
-
-
Thanks. Now my objective is to find the elements in @value. I need to check whether all the elements in @value are ChrX or mix oc ChrY and others or does not have chrX at all. Is there any short cut method to achieve this? Thanks.
Easier to use a hash of arrays: -
my %hashcount = ();
-
-
while(<>) {
-
chomp;
-
my @v = split(/\s+/);
-
push @{$hashcount{$v[0]}}, $v[1];
-
}
-
-
while( my ($k, $v) = each %hashcount ) {
-
print "key: $k, value: @{$v}\n";
-
}
-
Each hash key will have an array as its value instead of a string. You can check the size of the array associated with each hash key and/or loop/grep through the arrays to find whatever it is you need to find in them.
If you don't know about references yet then this code will look a little odd to you but they are not hard to learn. -
my %hashcount = ();
-
-
-
while(<>) {
-
chomp $_;
-
my @v = split(/\s+/,$_);
-
push @{$hashcount{$v[0]}}, $v[1];
-
}
-
-
while( my ($k, $v) = each %hashcount ) {
-
print "key: $k, value: @{$v}\n";
-
print "The array size: ", scalar @{$v}, "\n";
-
if (@{$v} == 1){
-
if( $v->[0] eq "chrX") {
-
print"UNIQUE HIT AT X\n";
-
}else{
-
print"UNIQUE HIT AT NON X\n";
-
}
-
-
-
}else {
-
print"MULTUPLE HITS \n";
-
}
-
-
}
-
Thanks Kevin. I learnt something new today.
Sign in to post your reply or Sign up for a free account.
Similar topics
by: dont bother |
last post by:
This is really driving me crazy.
I have a dictionary feature_vectors{}. I try to sort
its keys using
#apply sorting on feature_vectors
sorted_feature_vector=feature_vectors.keys()...
|
by: Nikki |
last post by:
Is it possible to sort a dataset rather than a dataview? I have a web
service that returns a dataset which I would like to sort before
returning it (this is so the sorting is standardised and so...
|
by: Mike MacSween |
last post by:
tblCourses one to many to tblEvents.
A course may have an intro workshop (a type of event), a mid course
workshop, a final exam. Or any combination. Or something different in the
future.
At...
|
by: Bailey.W87 |
last post by:
my professor give me this assignment. Sort the R's B's and W's in an
array. for example, the user enter:
R B W W B B R W W R R W R B W
i need to swap the characters in the array and arrange it...
|
by: Pete Davis |
last post by:
A different question this time. I have a DataGrid bound to a collection. Is
there any way for me to allow sorting? The DataGrid.AllowSorting=true
doesn't work, but that's probably because it can't...
|
by: Able |
last post by:
Dear friends
I am using FindRows methods of the DataView to select multiple rows as this:
Dim custView As DataView = New DataView(custDS.Tables("Customers"), "", _
"CompanyName, ContactName",...
|
by: deancarstens |
last post by:
Hi,
This is a tougher one, but I'm quite sure someone will have a solution
for this. Of course, a last minute thing thrown at me by my boss.
I have a unique identifier consisting of regions,...
|
by: jearnshaw |
last post by:
Newbie Moment!!
I hope you guys can help. I admit it I know NOTHING about xml and css.
But I need to get a large amount of data out of xml and onto an
intranet site which uses restricted html.I...
|
by: barcaroller |
last post by:
I have a map<T*that stores pointers to objects. How can I tell map<T*to
use the objects' operator<() and not the value of the pointers for sorting?
If that's not feasible, what alternatives do...
|
by: jrod11 |
last post by:
hi,
I found a jquery html table sorting code i have implemented. I am trying to figure out how to edit how many colums there are, but every time i remove code that I think controls how many colums...
|
by: Charles Arthur |
last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
|
by: ryjfgjl |
last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
|
by: emmanuelkatto |
last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud.
Please let me know.
Thanks!
Emmanuel
|
by: nemocccc |
last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
|
by: Sonnysonu |
last post by:
This is the data of csv file
1 2 3
1 2 3
1 2 3
1 2 3
2 3
2 3
3
the lengths should be different i have to store the data by column-wise with in the specific length.
suppose the i have to...
|
by: Hystou |
last post by:
There are some requirements for setting up RAID:
1. The motherboard and BIOS support RAID configuration.
2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
|
by: marktang |
last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
|
by: jinu1996 |
last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
|
by: agi2029 |
last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
| |