473,387 Members | 1,925 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,387 software developers and data experts.

Using hashes or arrays for file parsing

hi everyone,

I am kind of stuck and therefore would really appreciate some clues:

I actually have to run a script which has to compare two elements from two different files which are a blast file and a cdf file
I need also to keep the data structure
For this I chose the following strategy:

-dumping the files into two arrays
-doing a pattern matching between the two files.
-if it doesn't matches then remove the line.
-if the line has a different structure then keep the line

Here is the part of my script which take the most time
Expand|Select|Wrap|Line Numbers
  1.  
  2. foreach my $line(@CDF)
  3. {
  4.  
  5.     my $wanted;
  6.  
  7.         if ($line =~ /^.*?\t.*?\t.*?\t.*?\t.*?\t.*?\t.*?\t.*?\t.*?\t.*?\t.*?\t(.*?)\t/)
  8.         {
  9.             print "repeat again\n";
  10.             $wanted = ($1);
  11.             print $wanted."\n" ;
  12.             foreach my $lineB(@Blast)
  13.             {
  14.                 if ($lineB =~ /^($wanted)\s/)
  15.                 {
  16.                     print $wanted."\n";
  17.                     print OUTPUTFILEHANDLE "$line"; 
  18.                 }
  19.             } 
  20.         }
  21.  
  22.  
It takes hours to run it and obtain my output file.

Here are my questions:
Trying to only use subsets from the file instead of the complete 90Mb files
I have tried to use coordinate using array like this :

Expand|Select|Wrap|Line Numbers
  1.  
  2. my @array;
  3. print $array[0];
  4.  
  5.  
and then it ends up here printing the first line of the file...whereas I want 12th element of the line to do the comparison.

and also tried to understand hashes

So far I have read that it might be faster to use arrays than hashes therefore

Is there anyone who could give me some clue about how to define my file as a grid where I could use the coordinate x,y to get my subsets and then do my comparison?

I also though about using hashes to link key to values which would constitute the subsets I need but this way too I am stuck

I know that I could use the object oriented way but after having a look at it I think it is even more difficult so I would prefer to use one of the two previous methods

Any help is very welcome as I've been stuck for a while on this...
Jun 10 '08 #1
0 1308

Sign in to post your reply or Sign up for a free account.

Similar topics

35
by: Troll | last post by:
Hi, I need to write a script which reads some data and reports the findings. Just to give you an idea the structure is similar to the following. Data input example: HEADING 1 **********...
7
by: Gary | last post by:
I haver a table of students - Say 100 students that I need to be able to update/delete and amend. I know I can do this one student at a time which is simple but lets say I want to see all the...
10
by: Randell D. | last post by:
Folks, I have a SELECT that returns with multiple records - It works when I have a LIMIT clause but when I try to include a GROUP BY clause, the select returns nothing (ie no records, no...
9
by: David Helgason | last post by:
I'm calling one stored procedure with a prepared statement on the server with 6 arrays of around 1200 elements each as parameters. The parameters are around 220K in total. This is taking a...
3
by: RossettoeCioccolato | last post by:
Is there a brief tutorial somewhere on how to use the VC8 linker to generate a manifest for an isolated application with a dependency section for an arbitrary dll? There are some implementation...
7
by: christian.eickhoff | last post by:
Hi Everyone, I am currently implementing an XercesDOMParser to parse an XML file and to validate this file against its XSD Schema file which are both located on my local HD drive. For this...
7
by: RallyDSM | last post by:
Hello, I'm currently trying to read a .CSV file and get all the data into an array so I can work with it in the program. Here is what I currently have. Private Sub IntializeData() Dim AL...
3
Chittaranjan
by: Chittaranjan | last post by:
Hi All, I am stuck with a problem in my coding that is the use of arrays and hashes in perl modules so any one can give me some link to good sites or examples to get deeper in array and hash in...
10
by: aurekha | last post by:
Hi I have hashes with arrays to its keys like, %h1 = ('a'=>, 'b'=>, 'c'=> ); %h2 = ('a'=>, 'b'=>); then, how can i compare the 2 hashes(based on values. not keys) and get...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.