Loop for application I wrote crashes

Hi i need to find the GC content in a number of files i have a code which works but would like to shorten it, at the moment i have the same piece of script repeated 8 times but would like to add a loop instead, however every time i try it crashes the program.

Expand|Select|Wrap|Line Numbers

 
@Name;

@Seq;
 
open FILE,"YBL091C.data";
 
($NoSeq,$size) = split(/ /,<FILE>);
 
print "Starter NumSeq:$NoSeq Length:$size";
 
foreach (1..$NoSeq) {

        $line = <FILE>;

                push(@Name,$line);

                        $line = <FILE>;

                                push(@Seq,$line);

                                }
 
                                print "Names @Name\n";

#                               print "Seq @Seq\n";
 
# I would like to loop the following segment of my program another 7 times but #every time i try to add a loop it crashes
 
$str = $Seq[0];

print "this yeast is @Name[0]";

$countG = $str =~ s/(g)/$1/gi;

print "There are $countG G's. \n ";

$countA = $str =~ s/(a)/$1/gi;

print "There are $countA A's. \n";

$countT = $str =~ s/(t)/$1/gi;

print "There are $countT T's. \n";

$countC = $str =~ s/(c)/$1/gi;

print "There are $countC C's. \n";

Feb 26 '08 #1

Subscribe Post Reply

1425

nithinpes

410

Expert 256MB

This is my understanding of your problem from the code provided:
- first line of the file contains number of seq and size.
- the succeeding lines contain yeast name and the sequence alternatively.
- You need to count A, T, G and C in all the sequences.

For this, I would prefer a hash with yeast name as key and the sequence as value, though it can be achieved using arrays as you have done.

Expand|Select|Wrap|Line Numbers

 
use strict;

use warnings;
 
my @data;

my %yeasts;
 
open FILE,"YBL091C.data" or die "sorry:$!";

my @file=<FILE>; 

my ($NoSeq,$size) = split(/ /,shift @file);  # remove first element and split

push @data,$file[$_] foreach(0..($NoSeq*2-1));  # take only $NoSeq sequences

%yeasts = @data;   ##convert remaining array to hash with name as key
 
print "Name: $_\n" foreach(keys %yeasts);
 
foreach (sort keys %yeasts) {

my $str = $yeasts{$_};

print "this yeast is $_";

my $countG = $str =~ s/(g)/$1/gi;

print "There are $countG G's. \n ";

my $countA = $str =~ s/(a)/$1/gi;

print "There are $countA A's. \n";

my $countT = $str =~ s/(t)/$1/gi;

print "There are $countT T's. \n";

my $countC = $str =~ s/(c)/$1/gi;

print "There are $countC C's. \n";

}

Feb 27 '08 #2

minowicz

I may be wrong, but from what I've seen from other folks with genetic sequence data they tend to have very large files with huge numbers of lines that they are processing. If that is the case for you, then you may well wish to avoid the previous code's suggestion of pulling the entire file into an array, as it will potentially consume a huge amount of memory.

Also, the act of putting all this into a hash will ruin any chance of preserving the original order of the sequences unless you preserve it in an additional array... and if you need the additional array for that, then we might as well keep the array in the first place. On the other hand, it may well be that the original order was in fact the order produced by sort, or that you would prefer to sort them rather than preserving the original order. Still, I hate to make assumptions if I don't have to.

So with that in mind, your original approach of reading in a single line at a time in your loop seems reasonable enough to start. However, we could count the Gs, As, Ts, and Cs along the way and avoid having to store the whole sequence in an array at all. This should avoid holding a huge array in memory and generally be more efficient. If it were necessary due to inordinately long lines, you could even take the approach of pulling in a single character at a time.

Unfortunately, you seem to be printing out both @Name and @Seq just before printing the counts. It is my hope that this is for debugging purposes, but if it is not, then you're doomed to store it all in these arrays anyhow. If they're not needed, you'll be able to remove them from the code bellow as commented.

In the searches that you are using to do the counts, I've removed the capture and replace, since they are unneeded. Once you've stored the line in the array, you can be as destructive with it as you like.

I've further taken the liberty of storing your counts in an array of hashes rather than 4 scalars. This makes things look a bit more complicated, but it lets you loop over the 4 bases rather than having multiple prints or searches that look almost exactly the same. Besides, by moving the counting up into the first loop, we'd have needed to store them in four arrays at the very least.

The downside to one of these changes though is that I'm using $base inside the regular expression used fro counting. This causes the regular expression to be recompiled every time we pass through it so that $base can be interpolated. That could be solved be reverting to individual regexes for each of the four bases, or it could be solved cleverly by using pre-compiled regular expressions defined before the start of the main loop. I'm not feeling clever enough at the moment to attempt it. If things run too slowly, try it with four individual regexes first, and only go to the trouble to do pre-compiled regexes if it gets you a significant savings.

I've also marked some points where you could conceivably place an outer loop should you intend to process files that contain more than just one pass through this format.

Expand|Select|Wrap|Line Numbers

 
my @bases = ('G', 'A', 'T', 'C');
 
open FILE,"YBL091C.data";
 
# Start of potential looping point

my ($NoSeq,$size) = split(/ /,<FILE>);

print "Starter NumSeq:$NoSeq Length:$size";

my @names = ();

my @counts = ();

for (my $index = 0; $index < $NoSeq; $index++) {

        my $line = <FILE>;

        push @names,$line;

        $line = <FILE>;

        push @seq,$line; # Remove if the print of @seq below are not needed.

        foreach my $base (@bases) {

                $counts[$index]->{$base} = ($line =~ s/$base//gi);

        }

}

print "Names @names\n"; # Remove if not needed.

print "Seq @seq\n"; # Remove if not needed.

# If the above two print calls are not needed, then this loop could be made a

# part of the first loop.

for (my $index = 0; $index < $NoSeq; $index++) {

        print "This yeast is @names[$index]";

        foreach my $base (@bases) {

                print "There are $counts[$index]->{$base} ${base}'s.\n";

        }

        print "\n";

}

# End of potential looping point

Feb 27 '08 #3

KevinADC

4,059

Expert 2GB

I would use tr/// to count the occurances. It should be considerably more effcient than s///ig. I hope the OP comes back and reads this thread because some good effort went into posting replies to it.

Feb 28 '08 #4

minowicz

I would use tr/// to count the occurances. It should be considerably more effcient than s///ig. I hope the OP comes back and reads this thread because some good effort went into posting replies to it.

tr/// works well with single letter search counting like this, but does not interpolate $base. So again, you'd have to split back out each count. It would however, be more efficient in the long run.

Expand|Select|Wrap|Line Numbers

 
foreach my $base (@bases) {

     $counts[$index]->{$base} = ($line =~ s/$base//gi);

}

becomes

Expand|Select|Wrap|Line Numbers

 
$counts[$index]->{G} = ($line =~ tr/G//gi);

$counts[$index]->{A} = ($line =~ tr/A//gi);

$counts[$index]->{T} = ($line =~ tr/T//gi);

$counts[$index]->{C} = ($line =~ tr/C//gi);

For multi-character matching, you'd need to use s/// though (if you wanted to count all the occurrences of 'CA' in the sequence for instance). Though I suppose that regex becomes more complicated anyhow since you'd probably be processing the string as two-character base pairs rather than simply a stream of characters.

Mar 3 '08 #5

KevinADC

4,059

Expert 2GB

just for clarification sake, tr/// does not recognize the g and i options. And yes, it would not work for counting pairs, just the single characters.

Mar 3 '08 #6

minowicz

just for clarification sake, tr/// does not recognize the g and i options. And yes, it would not work for counting pairs, just the single characters.

Correct... I was overzealous with cut and past and too sparing with editing thereafter.

Mar 4 '08 #7

KevinADC

4,059

Expert 2GB

Correct... I was overzealous with cut and past and too sparing with editing thereafter.

I figured as much, we all do that. I know I appreciate if another member takes the time to correct any little errors I make so I assumed you would too. But if not just let me know and I won't do that in the future if that is what you prefer.

Mar 5 '08 #8

Similar topics

system call exits the loop

by: Bob Helber | last post by:

I've got a perl script with a system command, within a for loop, that calls a FORTRAN program. For one of the iterations of the for loop the FORTRAN program quits with an error due to bad input. ...

Perl

Win2003 won't run my application

by: Tosch | last post by:

I have a Win2003 server where I installed my VB.NET application, built with VS.NET 2002. My main application crashes before the first line of my code is executed with this error: "Memory at...

.NET Framework

.NET Runtime Application Log Error, Server crashing due to .NET Er

by: VinnieT | last post by:

I have a load balanced system that consists of 3 production servers. There are about 20 different applications that are used on these boxes. One of my applications in particular is used more than...

.NET Framework

Application crashes using .NET 2.0 framework because of permission

by: mike2036 | last post by:

I have an application (that has unmanaged code) and when I launch it without 'FullTrust' permissions (LocalIntranet_Zone), it crashes. When I set 'FullTrust' permissions, it launches fine. Is...

Visual Basic .NET

Application crashes when connecting to SQL Server (dotNet 2.0)

by: Tom | last post by:

When I try to connect to a MS SQL Server 2000 my application always crashes (without message). In the event log I find the following error: Faulting application testapp.exe, version...

.NET Framework

using a for loop to determine maximum value of an int variable

by: garyusenet | last post by:

I'm trying to investigate the maximum size of different variable types. I'm using INT as my starting variable for exploration. I know that the maximum number that the int variable can take is:...

C# / C Sharp

C++ application which crashes in dynamic_cast

by: yinglcs | last post by:

Hi, I have a c++ application which crashes in this line (from the debugger, I have a segmentation fault here): void *object = dynamic_cast<void>(aObject); I have stepped thru the code in...

C / C++

TransferSpreadsheet command crashes application--SP2 needed?

by: rlntemp-gng | last post by:

re:Access 2003 TransferSpreadsheet has worked in my app for weeks now. Now, the day I was to put in production (today) it crashes the app, and users are livid...but not more than me...

Microsoft Access / VBA

Application crashes while using OTL class

by: teju | last post by:

I am using the OTL library for database interaction(otl.sourceforge.net) This is the sample code from my application: strm_ << parm.corp_id(); strm_ << parm.prod_id(); write_to_log(("Before...

C / C++

Easy Steps to Fix "Canon Printer Won't Connect to WiFi Network"

by: taylorcarr | last post by:

A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...

General

How to turn on java script in a villaon keypad mobile phone

by: Charles Arthur | last post by:

How do i turn on java script on a villaon, callus and itel keypad mobile phone

Java

Basic Javascript concepts

by: aa123db | last post by:

Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...

Javascript

Batch import of multiple excel files into the database

by: ryjfgjl | last post by:

If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...

Data Management

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

General

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...

General

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...

Windows Server

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

C / C++