Hi,
The first part of my script works fine. Basically, the script reads a file with IDs that I want to search from a Flatdatabase and pull information. I setup the a TILDA Delimitted File and then I setup the two Tab Delimitted Files.
Located Here:
ftp://genome-ftp.stanford.edu/pub/yeast/data_download/chromosomal_feature/SGD_features.tab
ftp://genome-ftp.stanford.edu/pub/yeast/data_download/literature_curation/go_slim_mapping.tab
The SGD_features.tab part of the script works great and it does everything that I need it to. However, the go_slim_mapping.tab part of the script is where I am having some trouble. If you check out the link, you can see that there are multiple rows of the same SGDID.
What I need to do is lookup sgdid's and reference the go_aspect term (three choices: F,C,P) and the associated description under go_slim. If there are multiple entries of $sgdids with multiple F,C,P entries, join the specific entries (let's say C) together with a | delimmiter.
If you lookup S000004664 in the linked file, you can see about 8 rows with the same SGDID but each row has 1 associated go_aspect letter, with 1 definition for go_slim. So for this example, lets say it takes the C values and combines them so it looks like this
cytoplasm|membrane|mitochondrial envelope|mitochondrion
which will then be placed in the csv file for that particular SGDID under the Cellular Component column. I would need this same process to be done for the F, and P values and their respective columns Molecular Function and Biological Process.
If you don't understand what I am talking about, please ask and I'll try to explain it again.
Thanks for the help,
Hans - #!/usr/bin/perl
-
use strict;
-
use warnings;
-
-
-
open IDS, "<partsgdids.txt";
-
chomp (my @ids = <IDS> );
-
close(IDS);
-
-
##Tilda Delimitted File
-
open (MYFILE, '>data.csv');
-
print MYFILE "SGDID~ORF~Standard_Name~Alias~Description~Name_Description~Molecular_Function~Biological_Process~Cellular_Component~Define~Mutant_Phenotype\n";
-
-
open (SGDFEAT, "SGD_features.tab") || die "File not found\n";
-
chomp (my @sgdfeats=<SGDFEAT>);
-
close (SGDFEAT);
-
-
open (SLIMMAP, "go_slim_mapping.tab") || die "File not found\n";
-
chomp (my @slim=<SLIMMAP>);
-
close (SLIMMAP);
-
-
-
##List of columns $sgdid, $feat_type, $feat_qual, $feat_name, $stnd_name, $alias,
-
##$parent, $sec_sgdid, $chrom, $start_coord, $stop_coord, $strand, $genetic_pos,
-
##$coord_ver, $seq_vers, $desc
-
-
my %feat_type = ();
-
my %feat_qual = ();
-
my %feat_name = ();
-
my %stnd_name = ();
-
my %desc = ();
-
my %alias = ();
-
my %parent = ();
-
my %sec_sgdid = ();
-
my %chrom = ();
-
my %start_coord = ();
-
my %stop_coord = ();
-
my %strand = ();
-
my %genetic_pos = ();
-
my %coord_ver = ();
-
my %seq_vers = ();
-
-
foreach my $i (@sgdfeats) {
-
my ($sgdid, $feat_type, $feat_qual, $feat_name, $stnd_name, $alias, $parent, $sec_sgdid, $chrom, $start_coord, $stop_coord, $strand, $genetic_pos, $coord_ver, $seq_vers, $desc) = split(/\t/, $i);
-
$feat_type{$sgdid} = $feat_type;
-
$feat_qual{$sgdid} = $feat_qual;
-
$feat_name{$sgdid} = $feat_name;
-
$stnd_name{$sgdid} = $stnd_name;
-
$desc{$sgdid} = $desc;
-
$alias{$sgdid} = $alias;
-
$parent{$sgdid} = $parent;
-
$sec_sgdid{$sgdid} = $sec_sgdid;
-
$chrom{$sgdid} = $chrom;
-
$start_coord{$sgdid} = $start_coord;
-
$stop_coord{$sgdid} = $stop_coord;
-
$strand{$sgdid} = $strand;
-
$genetic_pos{$sgdid} = $genetic_pos;
-
$coord_ver{$sgdid} = $coord_ver;
-
$seq_vers{$sgdid} = $seq_vers;
-
}
-
-
-
-
##List of Columns for go_slim: $orf, $gene, $sgdid, $go_aspect, $go_slim, $goid, $feature_type
-
-
##NEED HELP HERE
-
my %orf = ();
-
my %gene = ();
-
#my %sgdid = ();
-
my %go_aspect = ();
-
my %go_slim = ();
-
my %goid = ();
-
my %feature_type = ();
-
-
foreach my $p (@slim)
-
{
-
my ($orf, $gene, $sgdid, $go_aspect, $go_slim, $goid, $feature_type) = split(/\t/, $p);
-
#$orf{$sgdid} = $orf;
-
#$gene{$sgdid} = $gene;
-
#$go_aspect{$sgdid} = $go_aspect;
-
#$go_slim{$sgdid} = $go_slim;
-
#$goid{$sgdid} = $goid;
-
#$feature_type{$sgdid} = $feature_type;
-
-
}
-
-
-
foreach my $ids (@ids) {
-
print MYFILE "$ids~$feat_name{$ids}~$stnd_name{$ids}~$alias{$ids}~$desc{$ids}~\n"
-
}
-
-
1 4097
Looking at the lines you mentioned: -
YMR060C SAM37 S000004664 C cytoplasm GO:0005737 ORF|Verified
-
YMR060C SAM37 S000004664 C membrane GO:0016020 ORF|Verified
-
YMR060C SAM37 S000004664 C mitochondrial envelope GO:0005740 ORF|Verified
-
YMR060C SAM37 S000004664 C mitochondrion GO:0005739 ORF|Verified
-
YMR060C SAM37 S000004664 F protein binding GO:0005515 ORF|Verified
-
YMR060C SAM37 S000004664 P anatomical structure morphogenesis GO:0009653 ORF|Verified
-
YMR060C SAM37 S000004664 P membrane organization GO:0016044 ORF|Verified
-
YMR060C SAM37 S000004664 P organelle organization GO:0006996 ORF|Verified
-
From the code you posted, I take it that the SGDID is the third field: S000004664
And the lines appear to have 7 fields of data but the fifth field can have spaces in the data, for example: "mitochondrial envelope".
Is what I have said correct?
Sign in to post your reply or Sign up for a free account.
Similar topics
by: Xah Lee |
last post by:
Sort a List
Xah Lee, 200510
In this page, we show how to sort a list in Python & Perl and also
discuss some math of sort.
To sort a list in Python, use the “sort” method. For example:
...
|
by: Ignoramus6539 |
last post by:
There were some strange requests to my server asking for config.php
file (which I do not have in the requested location).
I did some investigation. Seems to be a virus written in perl,...
|
by: supern |
last post by:
this is my perl script saved as login.pl
#!c:/perl/bin/perl.exe
$basedir="c:/program files/apache software foundation/apache2.2/cgi-bin";
$datafile="regstr.txt";
$name=$in{'login'};...
|
by: Shawn Milo |
last post by:
I'm new to Python and fairly experienced in Perl, although that
experience is limited to the things I use daily.
I wrote the same script in both Perl and Python, and the output is
identical. The...
|
by: jonathan184 |
last post by:
Hi I have a perl script, basically what it is suppose to do is check a folder with files. Now the files are checked using a timestamp with the command ls -l so the timestamp in this format is...
|
by: happyse27 |
last post by:
Hi All,
I got this apache errors(see section A1 and A2 below) when I used a html(see section b below) to activate acctman.pl(see section c below). Section D below is part of the configuration...
|
by: KevinADC |
last post by:
Note: You may skip to the end of the article if all you want is the perl code.
Introduction
Many websites have a form or a link you can use to download a file. You click a form button or click...
|
by: Faith0G |
last post by:
I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...
|
by: ryjfgjl |
last post by:
In our work, we often need to import Excel data into databases (such as MySQL, SQL Server, Oracle) for data analysis and processing. Usually, we use database tools like Navicat or the Excel import...
|
by: Charles Arthur |
last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
|
by: ryjfgjl |
last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
|
by: emmanuelkatto |
last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud.
Please let me know.
Thanks!
Emmanuel
|
by: BarryA |
last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
|
by: nemocccc |
last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
|
by: Sonnysonu |
last post by:
This is the data of csv file
1 2 3
1 2 3
1 2 3
1 2 3
2 3
2 3
3
the lengths should be different i have to store the data by column-wise with in the specific length.
suppose the i have to...
|
by: Hystou |
last post by:
There are some requirements for setting up RAID:
1. The motherboard and BIOS support RAID configuration.
2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
| |