Perl script to join Tab Delimitted File elements

Hi,

The first part of my script works fine. Basically, the script reads a file with IDs that I want to search from a Flatdatabase and pull information. I setup the a TILDA Delimitted File and then I setup the two Tab Delimitted Files.
Located Here:
ftp://genome-ftp.stanford.edu/pub/yeast/data_download/chromosomal_feature/SGD_features.tab

ftp://genome-ftp.stanford.edu/pub/yeast/data_download/literature_curation/go_slim_mapping.tab

The SGD_features.tab part of the script works great and it does everything that I need it to. However, the go_slim_mapping.tab part of the script is where I am having some trouble. If you check out the link, you can see that there are multiple rows of the same SGDID.

What I need to do is lookup sgdid's and reference the go_aspect term (three choices: F,C,P) and the associated description under go_slim. If there are multiple entries of $sgdids with multiple F,C,P entries, join the specific entries (let's say C) together with a | delimmiter.

If you lookup S000004664 in the linked file, you can see about 8 rows with the same SGDID but each row has 1 associated go_aspect letter, with 1 definition for go_slim. So for this example, lets say it takes the C values and combines them so it looks like this

cytoplasm|membrane|mitochondrial envelope|mitochondrion

which will then be placed in the csv file for that particular SGDID under the Cellular Component column. I would need this same process to be done for the F, and P values and their respective columns Molecular Function and Biological Process.

If you don't understand what I am talking about, please ask and I'll try to explain it again.

Thanks for the help,
Hans

Expand|Select|Wrap|Line Numbers

 #!/usr/bin/perl

use strict;

use warnings;
 
open IDS, "<partsgdids.txt";

chomp (my @ids = <IDS> );

close(IDS);
 
##Tilda Delimitted File

open (MYFILE, '>data.csv');

print MYFILE "SGDID~ORF~Standard_Name~Alias~Description~Name_Description~Molecular_Function~Biological_Process~Cellular_Component~Define~Mutant_Phenotype\n";
 
open (SGDFEAT, "SGD_features.tab") || die "File not found\n";

chomp (my @sgdfeats=<SGDFEAT>);

close (SGDFEAT);
 
open (SLIMMAP, "go_slim_mapping.tab") || die "File not found\n";

chomp (my @slim=<SLIMMAP>);

close (SLIMMAP);
 
##List of columns $sgdid, $feat_type, $feat_qual, $feat_name, $stnd_name, $alias, 

##$parent, $sec_sgdid, $chrom, $start_coord, $stop_coord, $strand, $genetic_pos, 

##$coord_ver, $seq_vers, $desc
 
my %feat_type = ();

my %feat_qual = ();

my %feat_name = ();

my %stnd_name = ();

my %desc = ();

my %alias = ();

my %parent = ();

my %sec_sgdid = ();

my %chrom = ();

my %start_coord = ();

my %stop_coord = ();

my %strand = ();

my %genetic_pos = ();

my %coord_ver = ();

my %seq_vers = ();
 
foreach my $i (@sgdfeats) {

    my ($sgdid, $feat_type, $feat_qual, $feat_name, $stnd_name, $alias, $parent, $sec_sgdid, $chrom, $start_coord, $stop_coord, $strand, $genetic_pos, $coord_ver, $seq_vers, $desc) = split(/\t/, $i);

    $feat_type{$sgdid} = $feat_type;

    $feat_qual{$sgdid} = $feat_qual;

    $feat_name{$sgdid} = $feat_name;

    $stnd_name{$sgdid} = $stnd_name;

    $desc{$sgdid} = $desc;

    $alias{$sgdid} = $alias;

    $parent{$sgdid} = $parent;

    $sec_sgdid{$sgdid} = $sec_sgdid;

    $chrom{$sgdid} = $chrom;

    $start_coord{$sgdid} = $start_coord;

    $stop_coord{$sgdid} = $stop_coord;

    $strand{$sgdid} = $strand;

    $genetic_pos{$sgdid} = $genetic_pos;

    $coord_ver{$sgdid} = $coord_ver;

    $seq_vers{$sgdid} = $seq_vers;

}
 
##List of Columns for go_slim: $orf, $gene, $sgdid, $go_aspect, $go_slim, $goid, $feature_type
 
##NEED HELP HERE

my %orf = ();

my %gene = ();

#my %sgdid = ();

my %go_aspect = ();

my %go_slim = ();

my %goid = ();

my %feature_type = ();
 
foreach my $p (@slim) 

    {

    my ($orf, $gene, $sgdid, $go_aspect, $go_slim, $goid, $feature_type) = split(/\t/, $p);

      #$orf{$sgdid} = $orf;

    #$gene{$sgdid} = $gene;

    #$go_aspect{$sgdid} = $go_aspect;

    #$go_slim{$sgdid} = $go_slim;

    #$goid{$sgdid} = $goid;

    #$feature_type{$sgdid} = $feature_type;
 
}
 
foreach my $ids (@ids) {

    print MYFILE "$ids~$feat_name{$ids}~$stnd_name{$ids}~$alias{$ids}~$desc{$ids}~\n"

}

Oct 8 '08 #1

Subscribe Post Reply

4097

KevinADC

4,059

Expert 2GB

Looking at the lines you mentioned:

Expand|Select|Wrap|Line Numbers

 
YMR060C    SAM37    S000004664    C    cytoplasm    GO:0005737    ORF|Verified

YMR060C    SAM37    S000004664    C    membrane    GO:0016020    ORF|Verified

YMR060C    SAM37    S000004664    C    mitochondrial envelope    GO:0005740    ORF|Verified

YMR060C    SAM37    S000004664    C    mitochondrion    GO:0005739    ORF|Verified

YMR060C    SAM37    S000004664    F    protein binding    GO:0005515    ORF|Verified

YMR060C    SAM37    S000004664    P    anatomical structure morphogenesis    GO:0009653    ORF|Verified

YMR060C    SAM37    S000004664    P    membrane organization    GO:0016044    ORF|Verified

YMR060C    SAM37    S000004664    P    organelle organization    GO:0006996    ORF|Verified

From the code you posted, I take it that the SGDID is the third field: S000004664

And the lines appear to have 7 fields of data but the fifth field can have spaces in the data, for example: "mitochondrial envelope".

Is what I have said correct?

Oct 9 '08 #2

Similar topics

Perl-Python-a-Day: Sorting

by: Xah Lee | last post by:

Sort a List Xah Lee, 200510 In this page, we show how to sort a list in Python & Perl and also discuss some math of sort. To sort a list in Python, use the â€œsortâ€ method. For example: ...

Python

PHP/Perl/Unix Virus: delete config.php files asap

by: Ignoramus6539 | last post by:

There were some strange requests to my server asking for config.php file (which I do not have in the requested location). I did some investigation. Seems to be a virus written in perl,...

PHP

logging to a website and data is being stored in a text file using perl

by: supern | last post by:

this is my perl script saved as login.pl #!c:/perl/bin/perl.exe $basedir="c:/program files/apache software foundation/apache2.2/cgi-bin"; $datafile="regstr.txt"; $name=$in{'login'};...

Perl

Perl and Python, a practical side-by-side example.

by: Shawn Milo | last post by:

I'm new to Python and fairly experienced in Perl, although that experience is limited to the things I use daily. I wrote the same script in both Perl and Python, and the output is identical. The...

Python

Perl script will not sort files from August to Dec 2007 for some reason

by: jonathan184 | last post by:

Hi I have a perl script, basically what it is suppose to do is check a folder with files. Now the files are checked using a timestamp with the command ls -l so the timestamp in this format is...

Perl

apache error when interfacing html with perl

by: happyse27 | last post by:

Hi All, I got this apache errors(see section A1 and A2 below) when I used a html(see section b below) to activate acctman.pl(see section c below). Section D below is part of the configuration...

Perl

How to Make a File Download Script with Perl

by: KevinADC | last post by:

Note: You may skip to the end of the article if all you want is the perl code. Introduction Many websites have a form or a link you can use to download a file. You click a form button or click...

Perl

Wordpress or something else?

by: Faith0G | last post by:

I am starting a new it consulting business and it's been a while since I setup a new website. Is wordpress still the best web based software for hosting a 5 page website? The webpages will be...

Content Management Systems

One-click Importing Excel Data into a*Database

by: ryjfgjl | last post by:

In our work, we often need to import Excel data into databases (such as MySQL, SQL Server, Oracle) for data analysis and processing. Usually, we use database tools like Navicat or the Excel import...

Microsoft Excel

How to turn on java script in a villaon keypad mobile phone

by: Charles Arthur | last post by:

How do i turn on java script on a villaon, callus and itel keypad mobile phone

Java

Merging data from multiple Excel files

by: ryjfgjl | last post by:

In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...

Data Management

Migrating Website to Cloud - Emmanuel Katto

by: emmanuelkatto | last post by:

Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel

General

Navigating the Data Structures and Algorithms (DSA)

by: BarryA | last post by:

What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...

Algorithms / Advanced Math

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

General

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware