Using hashes or arrays for file parsing

hi everyone,

I am kind of stuck and therefore would really appreciate some clues:

I actually have to run a script which has to compare two elements from two different files which are a blast file and a cdf file
I need also to keep the data structure
For this I chose the following strategy:

-dumping the files into two arrays
-doing a pattern matching between the two files.
-if it doesn't matches then remove the line.
-if the line has a different structure then keep the line

Here is the part of my script which take the most time

Expand|Select|Wrap|Line Numbers

  
foreach my $line(@CDF)

{
 
    my $wanted;
 
        if ($line =~ /^.*?\t.*?\t.*?\t.*?\t.*?\t.*?\t.*?\t.*?\t.*?\t.*?\t.*?\t(.*?)\t/)

        {

            print "repeat again\n";

            $wanted = ($1);

            print $wanted."\n" ;

            foreach my $lineB(@Blast)

            {

                if ($lineB =~ /^($wanted)\s/)

                {

                    print $wanted."\n";

                    print OUTPUTFILEHANDLE "$line"; 

                }

            } 

        }

It takes hours to run it and obtain my output file.

Here are my questions:
Trying to only use subsets from the file instead of the complete 90Mb files
I have tried to use coordinate using array like this :

Expand|Select|Wrap|Line Numbers

  
my @array;

print $array[0];

and then it ends up here printing the first line of the file...whereas I want 12th element of the line to do the comparison.

and also tried to understand hashes

So far I have read that it might be faster to use arrays than hashes therefore

Is there anyone who could give me some clue about how to define my file as a grid where I could use the coordinate x,y to get my subsets and then do my comparison?

I also though about using hashes to link key to values which would constitute the subsets I need but this way too I am stuck

I know that I could use the object oriented way but after having a look at it I think it is even more difficult so I would prefer to use one of the two previous methods

Any help is very welcome as I've been stuck for a while on this...

Jun 10 '08 #1

Subscribe Post Reply

1308

by: Troll | last post by:

Hi, I need to write a script which reads some data and reports the findings. Just to give you an idea the structure is similar to the following. Data input example: HEADING 1 **********...

Perl

multiple row updates in MYSQL using HTML

by: Gary | last post by:

I haver a table of students - Say 100 students that I need to be able to update/delete and amend. I know I can do this one student at a time which is simple but lets say I want to see all the...

Perl

SELECT problem using GROUP BY and LIMIT

by: Randell D. | last post by:

Folks, I have a SELECT that returns with multiple records - It works when I have a LIMIT clause but when I try to include a GROUP BY clause, the select returns nothing (ie no records, no...

MySQL Database

Large arrays give long lag on server side before command executes

by: David Helgason | last post by:

I'm calling one stored procedure with a prepared statement on the server with 6 arrays of around 1200 elements each as parameters. The parameters are around 220K in total. This is taking a...

PostgreSQL Database

Using VC8 linker to generate isolated app manifest with dependencies.

by: RossettoeCioccolato | last post by:

Is there a brief tutorial somewhere on how to use the VC8 linker to generate a manifest for an isolated application with a dependency section for an arbitrary dll? There are some implementation...

.NET Framework

Validation of XML file against external XSD Schema using Xerces CDT

by: christian.eickhoff | last post by:

Hi Everyone, I am currently implementing an XercesDOMParser to parse an XML file and to validate this file against its XSD Schema file which are both located on my local HD drive. For this...

.NET Framework

New student stuck with .CSVs and Arrays

by: RallyDSM | last post by:

Hello, I'm currently trying to read a .CSV file and get all the data into an array so I can work with it in the program. Here is what I currently have. Private Sub IntializeData() Dim AL...

Visual Basic .NET

Perl Arrays and Hashes

by: Chittaranjan | last post by:

Hi All, I am stuck with a problem in my coding that is the use of arrays and hashes in perl modules so any one can give me some link to good sites or examples to get deeper in array and hash in...

Perl

How to compare 2 hashes of arrays by values(not keys)

by: aurekha | last post by:

Hi I have hashes with arrays to its keys like, %h1 = ('a'=>, 'b'=>, 'c'=> ); %h2 = ('a'=>, 'b'=>); then, how can i compare the 2 hashes(based on values. not keys) and get...

Perl

How to turn on java script in a villaon keypad mobile phone

by: Charles Arthur | last post by:

How do i turn on java script on a villaon, callus and itel keypad mobile phone

Java

Basic Javascript concepts

by: aa123db | last post by:

Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...

Javascript

Merging data from multiple Excel files

by: ryjfgjl | last post by:

In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...

Data Management

Migrating Website to Cloud - Emmanuel Katto

by: emmanuelkatto | last post by:

Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel

General

Navigating the Data Structures and Algorithms (DSA)

by: BarryA | last post by:

What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...

Algorithms / Advanced Math

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

General

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...

General

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...

C / C++

Using hashes or arrays for file parsing

Similar topics