By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
432,385 Members | 847 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 432,385 IT Pros & Developers. It's quick & easy.

how to reduce the time complexity while reading files

P: 10
hi! all,

in a directory nearly 10 zipped file are available.
totally the size of the all files is nearly 15GB.

i have to retrive the line which dont have the text "ORA" from each file and i have to write this data to a another big file.

i got it but it is taking the time of nearly 5 minutes to complete the process.But i have to process 7 directories at a time..so totally it is taking so much time..

i wrote the code as..

Expand|Select|Wrap|Line Numbers
  1.    !#use/bin/perl
  2.     @filenames=</home/dir/*.gz>;
  3.  
  4.    open(OUT ,">bigfile");
  5.     foreach $file(@filenames)
  6.     {
  7.        open(IN,"gzcat $file|");
  8.          while($line=<IN>)
  9.          {
  10.              next if($line=~/^ORA | ^$/);
  11.              print  OUT $line;
  12.         }
  13.     close IN;
  14.   }# end for
  15.  
  16. close OUT;
  17.  
this is only for one directory..like this seven directories r there.

if any one knows better way to do this..in order to reduce the time comlexity plz help me as i m new to perl.

thank & regards,
Manogna.
Mar 5 '08 #1
Share this Question
Share on Google+
2 Replies


P: 12
I realize this isn't exactly a perl answer, but why not simply:

zgrep -vh ^ORA dirname/*.gz > bigfile

or if you don't have zgrep:

gzip -dc dirname/*.gz | grep -v ^ORA > bigfile

As to having multiple directories, it is not clear if you want each to be processed in sequence and appended to the single bigfile, or if you want them each to be processed in parallel and put into their own bigfile.
Mar 5 '08 #2

P: 10
thank you very much! for ur response.

i want to write the data by parellel execution of the respective directory files to thier respective big files.

i tried ur code its working properly but i am willing to use the regular expressions in that.
i tried as follows but its is not worikng properly...

zegrep -vh "^ORA |^\s*$ |read_time|" dirname/*.gz > bigfile

with in fraction of seconds i must write all the data from each directory to their respective big files.

and i want to parse the lines to take only first three fields from the about script before writing to the big file..

is it possible?


please help me..

thanks and regards,
Manogna.
Mar 6 '08 #3

Post your reply

Sign in to post your reply or Sign up for a free account.