Hi,
The first part of my script works fine. Basically, the script reads a file with IDs that I want to search from a Flatdatabase and pull information. I setup the a TILDA Delimitted File and then I setup the two Tab Delimitted Files.
Located Here:
ftp://genome-ftp.stanford.edu/pub/yeast/data_download/chromosomal_feature/SGD_features.tab
ftp://genome-ftp.stanford.edu/pub/yeast/data_download/literature_curation/go_slim_mapping.tab
The SGD_features.tab part of the script works great and it does everything that I need it to. However, the go_slim_mapping.tab part of the script is where I am having some trouble. If you check out the link, you can see that there are multiple rows of the same SGDID.
What I need to do is lookup sgdid's and reference the go_aspect term (three choices: F,C,P) and the associated description under go_slim. If there are multiple entries of $sgdids with multiple F,C,P entries, join the specific entries (let's say C) together with a | delimmiter.
If you lookup S000004664 in the linked file, you can see about 8 rows with the same SGDID but each row has 1 associated go_aspect letter, with 1 definition for go_slim. So for this example, lets say it takes the C values and combines them so it looks like this
cytoplasm|membrane|mitochondrial envelope|mitochondrion
which will then be placed in the csv file for that particular SGDID under the Cellular Component column. I would need this same process to be done for the F, and P values and their respective columns Molecular Function and Biological Process.
If you don't understand what I am talking about, please ask and I'll try to explain it again.
Thanks for the help,
Hans
- #!/usr/bin/perl
-
use strict;
-
use warnings;
-
-
-
open IDS, "<partsgdids.txt";
-
chomp (my @ids = <IDS> );
-
close(IDS);
-
-
##Tilda Delimitted File
-
open (MYFILE, '>data.csv');
-
print MYFILE "SGDID~ORF~Standard_Name~Alias~Description~Name_Description~Molecular_Function~Biological_Process~Cellular_Component~Define~Mutant_Phenotype\n";
-
-
open (SGDFEAT, "SGD_features.tab") || die "File not found\n";
-
chomp (my @sgdfeats=<SGDFEAT>);
-
close (SGDFEAT);
-
-
open (SLIMMAP, "go_slim_mapping.tab") || die "File not found\n";
-
chomp (my @slim=<SLIMMAP>);
-
close (SLIMMAP);
-
-
-
##List of columns $sgdid, $feat_type, $feat_qual, $feat_name, $stnd_name, $alias,
-
##$parent, $sec_sgdid, $chrom, $start_coord, $stop_coord, $strand, $genetic_pos,
-
##$coord_ver, $seq_vers, $desc
-
-
my %feat_type = ();
-
my %feat_qual = ();
-
my %feat_name = ();
-
my %stnd_name = ();
-
my %desc = ();
-
my %alias = ();
-
my %parent = ();
-
my %sec_sgdid = ();
-
my %chrom = ();
-
my %start_coord = ();
-
my %stop_coord = ();
-
my %strand = ();
-
my %genetic_pos = ();
-
my %coord_ver = ();
-
my %seq_vers = ();
-
-
foreach my $i (@sgdfeats) {
-
my ($sgdid, $feat_type, $feat_qual, $feat_name, $stnd_name, $alias, $parent, $sec_sgdid, $chrom, $start_coord, $stop_coord, $strand, $genetic_pos, $coord_ver, $seq_vers, $desc) = split(/\t/, $i);
-
$feat_type{$sgdid} = $feat_type;
-
$feat_qual{$sgdid} = $feat_qual;
-
$feat_name{$sgdid} = $feat_name;
-
$stnd_name{$sgdid} = $stnd_name;
-
$desc{$sgdid} = $desc;
-
$alias{$sgdid} = $alias;
-
$parent{$sgdid} = $parent;
-
$sec_sgdid{$sgdid} = $sec_sgdid;
-
$chrom{$sgdid} = $chrom;
-
$start_coord{$sgdid} = $start_coord;
-
$stop_coord{$sgdid} = $stop_coord;
-
$strand{$sgdid} = $strand;
-
$genetic_pos{$sgdid} = $genetic_pos;
-
$coord_ver{$sgdid} = $coord_ver;
-
$seq_vers{$sgdid} = $seq_vers;
-
}
-
-
-
-
##List of Columns for go_slim: $orf, $gene, $sgdid, $go_aspect, $go_slim, $goid, $feature_type
-
-
##NEED HELP HERE
-
my %orf = ();
-
my %gene = ();
-
#my %sgdid = ();
-
my %go_aspect = ();
-
my %go_slim = ();
-
my %goid = ();
-
my %feature_type = ();
-
-
foreach my $p (@slim)
-
{
-
my ($orf, $gene, $sgdid, $go_aspect, $go_slim, $goid, $feature_type) = split(/\t/, $p);
-
#$orf{$sgdid} = $orf;
-
#$gene{$sgdid} = $gene;
-
#$go_aspect{$sgdid} = $go_aspect;
-
#$go_slim{$sgdid} = $go_slim;
-
#$goid{$sgdid} = $goid;
-
#$feature_type{$sgdid} = $feature_type;
-
-
}
-
-
-
foreach my $ids (@ids) {
-
print MYFILE "$ids~$feat_name{$ids}~$stnd_name{$ids}~$alias{$ids}~$desc{$ids}~\n"
-
}
-
-