By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
437,777 Members | 1,755 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 437,777 IT Pros & Developers. It's quick & easy.

Using hashes or arrays for file parsing

P: 4
hi everyone,

I am kind of stuck and therefore would really appreciate some clues:

I actually have to run a script which has to compare two elements from two different files which are a blast file and a cdf file
I need also to keep the data structure
For this I chose the following strategy:

-dumping the files into two arrays
-doing a pattern matching between the two files.
-if it doesn't matches then remove the line.
-if the line has a different structure then keep the line

Here is the part of my script which take the most time
Expand|Select|Wrap|Line Numbers
  1.  
  2. foreach my $line(@CDF)
  3. {
  4.  
  5.     my $wanted;
  6.  
  7.         if ($line =~ /^.*?\t.*?\t.*?\t.*?\t.*?\t.*?\t.*?\t.*?\t.*?\t.*?\t.*?\t(.*?)\t/)
  8.         {
  9.             print "repeat again\n";
  10.             $wanted = ($1);
  11.             print $wanted."\n" ;
  12.             foreach my $lineB(@Blast)
  13.             {
  14.                 if ($lineB =~ /^($wanted)\s/)
  15.                 {
  16.                     print $wanted."\n";
  17.                     print OUTPUTFILEHANDLE "$line"; 
  18.                 }
  19.             } 
  20.         }
  21.  
  22.  
It takes hours to run it and obtain my output file.

Here are my questions:
Trying to only use subsets from the file instead of the complete 90Mb files
I have tried to use coordinate using array like this :

Expand|Select|Wrap|Line Numbers
  1.  
  2. my @array;
  3. print $array[0];
  4.  
  5.  
and then it ends up here printing the first line of the file...whereas I want 12th element of the line to do the comparison.

and also tried to understand hashes

So far I have read that it might be faster to use arrays than hashes therefore

Is there anyone who could give me some clue about how to define my file as a grid where I could use the coordinate x,y to get my subsets and then do my comparison?

I also though about using hashes to link key to values which would constitute the subsets I need but this way too I am stuck

I know that I could use the object oriented way but after having a look at it I think it is even more difficult so I would prefer to use one of the two previous methods

Any help is very welcome as I've been stuck for a while on this...
Jun 10 '08 #1
Share this question for a faster answer!
Share on Google+

Post your reply

Sign in to post your reply or Sign up for a free account.