I am kind of stuck and therefore would really appreciate some clues:
I actually have to run a script which has to compare two elements from two different files which are a blast file and a cdf file
I need also to keep the data structure
For this I chose the following strategy:
-dumping the files into two arrays
-doing a pattern matching between the two files.
-if it doesn't matches then remove the line.
-if the line has a different structure then keep the line
Here is the part of my script which take the most time
Expand|Select|Wrap|Line Numbers
- foreach my $line(@CDF)
- {
- my $wanted;
- if ($line =~ /^.*?\t.*?\t.*?\t.*?\t.*?\t.*?\t.*?\t.*?\t.*?\t.*?\t.*?\t(.*?)\t/)
- {
- print "repeat again\n";
- $wanted = ($1);
- print $wanted."\n" ;
- foreach my $lineB(@Blast)
- {
- if ($lineB =~ /^($wanted)\s/)
- {
- print $wanted."\n";
- print OUTPUTFILEHANDLE "$line";
- }
- }
- }
Here are my questions:
Trying to only use subsets from the file instead of the complete 90Mb files
I have tried to use coordinate using array like this :
Expand|Select|Wrap|Line Numbers
- my @array;
- print $array[0];
and also tried to understand hashes
So far I have read that it might be faster to use arrays than hashes therefore
Is there anyone who could give me some clue about how to define my file as a grid where I could use the coordinate x,y to get my subsets and then do my comparison?
I also though about using hashes to link key to values which would constitute the subsets I need but this way too I am stuck
I know that I could use the object oriented way but after having a look at it I think it is even more difficult so I would prefer to use one of the two previous methods
Any help is very welcome as I've been stuck for a while on this...