By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
438,834 Members | 2,265 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 438,834 IT Pros & Developers. It's quick & easy.

Program Performance

P: 11
Hi,

I have a program that does the following: given a directory as it's only argument, it goes to that directory and looks for directories inside that contain 2 special types of very large files. If it finds these files, it goes through each file line by line looking to see if they contain certain strings. As mentioned, these two files are very big and contain a lot of information, and I recently ran this program on a certain directory and it took 35 minutes to complete. I've optimized the algorithms as much as I can using binary search to compare each line for each value that I'm looking for (don't want to implement a hash table for this particular program). My question is this, would it be faster for me to use the 'grep' command to look for the information im looking for, rather than opening each file and going line by line, or would it even make a difference?

KiSSz
Oct 8 '08 #1
Share this Question
Share on Google+
10 Replies


KevinADC
Expert 2.5K+
P: 4,059
what do you mean by "using binary search"?
Oct 8 '08 #2

P: 11
I have a list of items that I'm searching for in the big files and they are stored in an array and they are sorted. A binary search is when you first look at the middle value to compare with the value you're looking for, then if it's less you go to middle value of the left half of the array, if it's greater you go to the middle value of the right half of the array and you keep doing this till you find the value. With each comparison you eliminate half the values so it's much faster than searching from the beginning of the array to the end.
Oct 8 '08 #3

eWish
Expert 100+
P: 971
I would suggest that do a Benchmark using each method (binary search and grep). Then compare the times and see which one is quicker.

--Kevin
Oct 8 '08 #4

KevinADC
Expert 2.5K+
P: 4,059
I have a list of items that I'm searching for in the big files and they are stored in an array and they are sorted. A binary search is when you first look at the middle value to compare with the value you're looking for, then if it's less you go to middle value of the left half of the array, if it's greater you go to the middle value of the right half of the array and you keep doing this till you find the value. With each comparison you eliminate half the values so it's much faster than searching from the beginning of the array to the end.

Is the sorted list the search items you look for in the file? From what I know, a binary search is for finding the items in the sorted list, not using the sorted list to find items in a file. But maybe you have some unusual circumstance that makes your search method a good choice.
Oct 8 '08 #5

P: 11
Is the sorted list the search items you look for in the file? From what I know, a binary search is for finding the items in the sorted list, not using the sorted list to find items in a file. But maybe you have some unusual circumstance that makes your search method a good choice.
I come across a line in the file, then I go search my sorted list of elements to see if it matches one of them.

KiSsZ
Oct 8 '08 #6

KevinADC
Expert 2.5K+
P: 4,059
I come across a line in the file, then I go search my sorted list of elements to see if it matches one of them.

KiSsZ
ahhh... I see. Unless you post your code for evaluation I would say you are doing the best you can.
Oct 9 '08 #7

P: 11
I'm going to paste my code and I was wondering if any of you guys can see where I can be more efficient. I know a lot of it is sloppy because I'm a Perl beginner and I went quickly through this. Any feedback would be much appreciated. Here is my code:

Expand|Select|Wrap|Line Numbers
  1. #!/usr/bin/perl -w
  2. use Cwd;
  3. use Time::Local;
  4.  
  5. main();
  6.  
  7. sub partition
  8. {
  9.     my $x = $testNames[$_[0]];
  10.      my $i = $_[0] - 1;
  11.     my $j = $_[1] + 1;
  12.  
  13.     do
  14.     {
  15.         do      
  16.         {
  17.             $j--;
  18.         }
  19.         while ($x lt $testNames[$j]);
  20.  
  21.         do  
  22.         {
  23.             $i++;
  24.         } 
  25.         while ($x gt $testNames[$i]);
  26.  
  27.         if ($i < $j)
  28.         { 
  29.             $temp1 = $testNames[$i];   
  30.             $testNames[$i] = $testNames[$j];
  31.             $testNames[$j] = $temp1;
  32.  
  33.             $temp2 = $passFailBoolean[$i];
  34.             $passFailBoolean[$i] = $passFailBoolean[$j];
  35.             $passFailBoolean[$j] = $temp2;
  36.  
  37.             $temp3 = $elapsedTime[$i];
  38.             $elapsedTime[$i] = $elapsedTime[$j];
  39.             $elapsedTime[$j] = $temp3;
  40.  
  41.             $temp4 = $directorySizes[$i];
  42.             $directorySizes[$i] = $directorySizes[$j];
  43.             $directorySizes[$j] = $temp4;
  44.         }
  45.     }
  46.     while ($i < $j);     
  47.     return $j;           
  48. }
  49.  
  50. sub quicksort
  51. {
  52.     my $middle;
  53.     my $first = $_[0];
  54.     my $last = $_[1];
  55.     if ($first < $last)
  56.         {
  57.         $middle = partition($first, $last);
  58.         quicksort($first, $middle);   
  59.         quicksort($middle + 1, $last);    
  60.     }
  61. }
  62.  
  63. sub calculateElapsedTime
  64. {
  65.     my @parameters = @_;
  66.  
  67.     my %months = 
  68.     (
  69.         "Jan" => 0,
  70.         "Feb" => 1,
  71.         "Mar" => 2,
  72.         "Apr" => 3,
  73.         "May" => 4,
  74.         "Jun" => 5,
  75.         "Jul" => 6,
  76.         "Aug" => 7,
  77.         "Sep" => 8,
  78.         "Oct" => 9,
  79.         "Nov" => 10,
  80.         "Dec" => 11
  81.     );
  82.  
  83.     my @time;
  84.     foreach $parameter (@parameters)
  85.     {
  86.         my @words = split(/\s+/, $parameter);
  87.         my @numbers = split(/:/, $words[7]);
  88.         push(@time, timelocal($numbers[2], $numbers[1], $numbers[0], $words[6], $months{$words[5]}, $words[9])); 
  89.     }
  90.  
  91.     return ($time[1] - $time[0]);
  92. }
  93.  
  94. sub binarySearch
  95. {
  96.     @parameters = @_;
  97.     my $low = 0;
  98.     my $high = @passFailNames - 1;
  99.     my $middle;
  100.  
  101.     while ($low <= $high)
  102.     {
  103.         $middle = $low + $high;
  104.         if ($parameters[0] eq $passFailNames[$middle])
  105.         {
  106.             $tempIndex = $middle;
  107.             return 1;
  108.         }
  109.         elsif ($parameters[0] lt $passFailNames[$middle])
  110.         {
  111.             $high = $middle - 1;
  112.         }
  113.         else
  114.         {
  115.             $low = $middle + 1;
  116.         }
  117.     }
  118.  
  119.     return 0;
  120. }
  121.  
  122. sub main 
  123. {
  124.     @passFailNames = 
  125.     (
  126.         "CSI Errors : 0",
  127.         "CSIBFM_TX_ISM Errors : 0",
  128.         "CSI_CREDIT_TRK Errors : 0",
  129.         "CSI_FLITTRK Errors : 0",
  130.         "CSI_ISM Errors : 0",
  131.         "GFX Checker Errors : 0",
  132.         "GFX Checkers not done : 0",
  133.         "HSB_SD0 Errors : 0",
  134.         "HSB_SD1 Errors : 0",
  135.         "HT_SB_ISOCH_TRK Errors : 0",
  136.         "HT_SB_PRIM_TRK Errors : 0",
  137.         "HT_SB_SEC_TRK Errors : 0",
  138.         "MCHCORE Errors : 0",
  139.         "SAGP Errors : 0",    
  140.         "Test Completed : Yes",
  141.         "cpu1_bfm Errors : 0",
  142.         "flex_config Errors : 0",
  143.         "ioorb_isr_trk Errors : 0",    
  144.         "sig_trk_idwb Errors : 0",
  145.         "system_cfg_addr Errors : 0",
  146.         "system_noncfg_addr Errors : 0"
  147.     );
  148.  
  149.     if ($ARGV[0])
  150.     {
  151.         $dir1 = $ARGV[0];
  152.     }
  153.     else
  154.     {
  155.         $dir1 = getcwd;
  156.     }
  157.  
  158.     opendir(DIR1, $dir1) or die "[!] Can't open directory: $dir1";
  159.     $passes = 0;
  160.     while ($file1 = readdir(DIR1)) 
  161.     {
  162.         if (-d "$dir1/$file1")
  163.         {
  164.             my $passFailTotal = 0;
  165.             my $passFailBool = "false";
  166.             my $runSimFlag = "FALSE";
  167.             my $transcriptFlag = "FALSE";
  168.             opendir(DIR2, "$dir1/$file1") or die "[!] Can't open directory: $dir1/$file1";
  169.             while ($file2 = readdir(DIR2))
  170.             {
  171.                 my $misrErrorTotal = 0;
  172.                  if (($file2 eq "runsim.log") or ($file2 eq "runsim.log.gz"))
  173.                 {
  174.                     $runSimFlag = "TRUE";
  175.                     my $gunzipFlag = "false";
  176.                     if ($file2 eq "runsim.log.gz")
  177.                     {
  178.                         system("gunzip $dir1/$file1/runsim.log.gz");
  179.                         $file2 = "runsim.log";
  180.                         $gunzipFlag = "true";
  181.                     }
  182.  
  183.                     my $startedFlag = "FALSE";
  184.                     my $endedFlag = "FALSE";
  185.                     @temp = split(/\s+/, `du $dir1/$file1`);
  186.                     push(@directorySizes, $temp[@temp - 2]);
  187.                     push(@testNames, $file1);
  188.                     open(FH1, "$dir1/$file1/$file2") or die "Can't open file: $dir1/$file1/$file2";
  189.                     while ($line1 = <FH1>)
  190.                     {
  191.                         if ($startedFlag eq "FALSE" and $line1 =~ m/Simulation started at:/)
  192.                         {
  193.                             push(@startTimes, $line1);
  194.                             $startedFlag = "TRUE";
  195.                         }
  196.                         elsif ($endedFlag eq "FALSE" and $line1 =~ m/Simulation ended at:/)
  197.                         {
  198.                             push(@endTimes, $line1);
  199.                             $endedFlag = "TRUE";            
  200.                         }
  201.                         elsif ($passFailTotal < 21)
  202.                         {
  203.                             my @passFailValues = (0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0);
  204.                             $line1 =~ s/^\s+//;
  205.                                 $line1 =~ s/\s+$//;
  206.                             if (binarySearch($line1, $tempIndex) == 1)
  207.                             {
  208.                                 if ($passFailValues[$tempIndex] == 0)
  209.                                 {    
  210.                                     $passFailValues[$tempIndex] = 1;
  211.                                     $passFailTotal = $passFailTotal + 1;
  212.                                 }
  213.                             }    
  214.                         }
  215.  
  216.  
  217.                     }
  218.                     close(FH1);
  219.                     if ($endedFlag eq "FALSE" or $startedFlag eq "FALSE")
  220.                     {
  221.                         push(@startTimes, "-");
  222.                         push(@endTimes, "-");
  223.                         $noStartEndTimeTotal++;
  224.                     }
  225.  
  226.                     if ($passFailTotal == @passFailNames)
  227.                     {
  228.                         push(@passFailBoolean, "Pass");
  229.                         $passes++;
  230.                     }
  231.                     else
  232.                     {
  233.                         push(@passFailBoolean, "Fail");
  234.                     }
  235.  
  236.                     if ($gunzipFlag eq "true")
  237.                     {
  238.                         system("gzip $dir1/$file1/runsim.log");
  239.                     }
  240.                 }
  241.                  elsif (($file2 eq "transcript.log") or ($file2 eq "transcript.log.gz"))
  242.                 {
  243.                     $transcriptFlag = "TRUE";
  244.                     my $gunzipFlag = "false";
  245.  
  246.                     if ($file2 eq "transcript.log.gz")
  247.                     {
  248.                         system("gunzip $dir1/$file1/transcript.log.gz");
  249.                         $file2 = "transcript.log";
  250.                         $gunzipFlag = "true";
  251.                     }
  252.                     open(FH1, "$dir1/$file1/$file2") or die "Can't open file: $dir1/$file1/$file2";
  253.                     my $tclFlag1 = 0, $tclFlag2 = 0;
  254.                     while ($line1 = <FH1>)
  255.                     {
  256.  
  257.                         if ($tclFlag1 == 0 and ($line1 =~ m/# TclNOTE:      [0-9]+ ps: Translated address from 0x0004E008 to 0x4027004d data is 0x00000003/))
  258.                         {
  259.                             $tclFlag1 = 1;
  260.                         }
  261.                         elsif ($tclFlag2 == 0 and ($line1 =~ m/# TclNOTE:      [0-9]+ ps: Translated address from 0x0004E008 to 0x4027004d data is 0x00000002/))    
  262.                         {
  263.                             $tclFlag2 = 1;
  264.                         }
  265.                         elsif ($tclFlag1 == 1 and $tclFlag2 == 1 and ($line1 =~ m/Error: ERROR!! X in data: misr_data_qual/))
  266.                         {
  267.                             $misrErrorTotal++;
  268.                         }
  269.  
  270.                     }
  271.                     if ($gunzipFlag eq "true")
  272.                     {
  273.                         system("gzip $dir1/$file1/transcript.log");
  274.                     }
  275.                 }
  276.  
  277.                 if ($runSimFlag eq "TRUE" and $runSimFlag ne "FALSE")
  278.                 {
  279.                     push(@misrErrors, $misrErrorTotal); 
  280.                 }
  281.                 elsif($runSimFlag eq "TRUE" and $runSimFlag ne "TRUE")
  282.                 {
  283.                     last;
  284.                 }
  285.  
  286.             }
  287.             closedir(DIR2);
  288.         }
  289.     }
  290.  
  291.  
  292.     if (@testNames > 0)
  293.     {
  294.         for ($i = 0; $i < @startTimes; $i++)
  295.         {
  296.             if ($startTimes[$i] ne "-" and $endTimes[$i] ne "-")
  297.             {
  298.                 $simulationTime = calculateElapsedTime($startTimes[$i], $endTimes[$i]);
  299.                 $timeTotal += $simulationTime;
  300.                 $localHour = int ($simulationTime / 3600);
  301.                 $simulationTime -= (3600 * $localHour);
  302.                 $localMinute = int($simulationTime / 60);
  303.                 $simulationTime -= (60 * $localMinute);
  304.                 $localSecond = sprintf("%.2f", $simulationTime);
  305.                 $simulationTime = "$localHour:$localMinute:$localSecond";
  306.                 push(@elapsedTime, $simulationTime);
  307.             }
  308.             else
  309.             {
  310.                 push(@elapsedTime, "-");
  311.             }
  312.         } 
  313.  
  314.         for ($j = 0; $j < @directorySizes; $j++)
  315.         {
  316.             $sizeTotal += $directorySizes[$j];
  317.         }
  318.  
  319.         $sizeAverage = sprintf("%.2f", $sizeTotal / @directorySizes);
  320.         $totalSimulations = @testNames;
  321.         $fails = $totalSimulations - $passes;
  322.         $timeAverage = $timeTotal / (@startTimes - $noStartEndTimeTotal);
  323.         $hour = int ($timeAverage / 3600);
  324.         $timeAverage -= (3600 * $hour);
  325.         $minute = int($timeAverage / 60);
  326.         $timeAverage -= (60 * $minute);
  327.         $second = sprintf("%.2f", $timeAverage);
  328.         open (OUTPUTFILE, '>output.txt');
  329.         print OUTPUTFILE "@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@\n";
  330.         print OUTPUTFILE "@       Netbatch_Util Output       @\n";
  331.         print OUTPUTFILE "@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@\n\n";
  332.         print OUTPUTFILE "Regression Directory: $dir1\n";    
  333.         print OUTPUTFILE "Total Simulations: $totalSimulations\n";
  334.         print OUTPUTFILE "Total Passes: $passes\n";
  335.         print OUTPUTFILE "Total Fails: $fails\n";
  336.         print OUTPUTFILE "Average Simulation Time: $hour:$minute:$second\n";
  337.         print OUTPUTFILE "Average Directory Size: $sizeAverage\n\n";
  338.         print OUTPUTFILE "@@@@@@@@@@@@@@@\n";
  339.         print OUTPUTFILE "@    Fails    @\n";
  340.         print OUTPUTFILE "@@@@@@@@@@@@@@@\n\n";
  341.  
  342.         quicksort(0, @testNames-1);
  343.         for ($r = 0; $r < @testNames; $r++)
  344.         {
  345.             if ($passFailBoolean[$r] eq "Fail")
  346.             {
  347.                 print OUTPUTFILE "$testNames[$r]\n";
  348.             }
  349.         }
  350.  
  351.         print OUTPUTFILE "\n@@@@@@@@@@@@@@@\n";
  352.         print OUTPUTFILE "@  All Tests  @\n";
  353.         print OUTPUTFILE "@@@@@@@@@@@@@@@\n\n";
  354.  
  355.         for ($l = 0; $l < @testNames; $l++)
  356.         {
  357.             print OUTPUTFILE "Name: $testNames[$l]\n";
  358.             print OUTPUTFILE "Status: $passFailBoolean[$l]\n";
  359.             print OUTPUTFILE "Time: $elapsedTime[$l]\n";
  360.             print OUTPUTFILE "Size: $directorySizes[$l]\n";
  361.             print OUTPUTFILE "Misr Errors: $misrErrors[$l]\n\n";
  362.         }
  363.     }
  364.     else
  365.     {
  366.         open (OUTPUTFILE, '>output.txt');
  367.         print OUTPUTFILE "@@@@@@@@@@@@@@@@@@@@@@@@\n";
  368.         print OUTPUTFILE "@ Netbatch_Util Output @\n";
  369.         print OUTPUTFILE "@@@@@@@@@@@@@@@@@@@@@@@@\n\n";
  370.         print OUTPUTFILE "@ Regression Directory: $dir1\n";    
  371.         print OUTPUTFILE "@ Total Simulations: $totalSimulations\n\n";
  372.     }        
  373.     close (OUTPUTFILE);     
  374.     closedir(DIR1);
  375.     exit 0;
  376. }
  377.  
Oct 9 '08 #8

KevinADC
Expert 2.5K+
P: 4,059
Nothing bad is jumping out. Looks OK to me. Hard to say if it could be improved without being to run the code, but I don't want to run it.

in general:

Expand|Select|Wrap|Line Numbers
  1. do {
  2.  ...
  3. } while()
  4.  
can be replaced with a simpler loop:

Expand|Select|Wrap|Line Numbers
  1. while() {
  2.  ...
  3. }
  4.  
It might also be a little more efficient, although its not going to be dramatic.
Oct 9 '08 #9

P: 11
Thanks Kevin. I guess what initially lead me to think that I could improve the efficiency is whenever Im at the console and I do a grep it finds the information so quickly, even for big files. I guess if ran through the Perl interpreter this would be a lot slower?
Oct 9 '08 #10

KevinADC
Expert 2.5K+
P: 4,059
Thanks Kevin. I guess what initially lead me to think that I could improve the efficiency is whenever Im at the console and I do a grep it finds the information so quickly, even for big files. I guess if ran through the Perl interpreter this would be a lot slower?

I'm not sure. Perls only disadvantage that I am aware of is the time it takes to compile the code versus running a shell command that doesn't have to be compiled. But that can't account for a dramatic difference, especially not for a small script like you have.

Hopefully someone on Devshed will have more insight. You can also try www.perlmonks.com
Oct 9 '08 #11

Post your reply

Sign in to post your reply or Sign up for a free account.