470,594 Members | 1,559 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 470,594 developers. It's quick & easy.

perl vs Unix grep

Hi all,
I've been working on a problem that I thought might be of interest: I'm
trying to replace some korn shell scripts that search source code files with
perl scripts to gain certain features such as:

More powerful regular expressions available in perl
Ability to print out lines before and after matches (gnu grep supports this
but is not availble on our Digital Unix and AIX platforms)
Make searches case insensitive by default (yes, I know this can be done with
grep but the shell scripts that use
grep don't do this)

We're talking about approx. 5000 files spread over 15 directories. To date
it has proven quite difficult (for me) to match the performance of the Korn
shell scripts using perl scripts and still obtain the line number and
context information needed. The crux of the problem is that I have seen the
best performance from perl when I match with the /g option on a string that
represents the current slurped file:

local $/;
my $curStr = <FH>;
while ($curStr =~ /$compiledRegex/g)
{
# write matches to file for eventual paging
}

This works well except that when each match is found I need the line number
the match has been found in. As far as I can tell from reading and research
there is no variable that holds this information as I am not reading from
the file at this point. I can get the information in other ways such as:

1. Reading each file a line at a time, testing for a match and keeping a
line counter or using $NR.
2. Reading the file into an array and processing a line at a time
3. Creating index files for the source files that store line offsets and
using them with the slurp method in the
paragraph above
4. Creating an in-memory index for each file that contains a match and using
it for subsequent matches in that file

1, 2 and 4 above suffer performance degradation relative to unix grep. #3
provides good performance and is the method I am currently using but it
requires creating and maintaining index files. I was wondering if I could
tie a scalar to a file and use the slurping loop above. Then perhaps $NR and
$. would contain the current line number as the file would be read as the
loop is traversed. Any other ideas would be welcome

Al
Jul 19 '05 #1
1 17558
Hello Al Baden,

I have had similar problem with getting the index numder of the
element match when we search for elements in an array. It was
fruitless. I used Hash map, but that was a burden on the system. In
another possiable implementation i have done with use of a separate
variable indexCount on array and reintialized evry time.

That's it.
Perl is langauge to make things work at any cost. All the best.
Thanks.
Giridhar Nandigam
"Al Belden" <ab*****@comcast.net> wrote in message news:<Bv********************@comcast.com>...
Hi all,
I've been working on a problem that I thought might be of interest: I'm
trying to replace some korn shell scripts that search source code files with
perl scripts to gain certain features such as:

More powerful regular expressions available in perl
Ability to print out lines before and after matches (gnu grep supports this
but is not availble on our Digital Unix and AIX platforms)
Make searches case insensitive by default (yes, I know this can be done with
grep but the shell scripts that use
grep don't do this)

We're talking about approx. 5000 files spread over 15 directories. To date
it has proven quite difficult (for me) to match the performance of the Korn
shell scripts using perl scripts and still obtain the line number and
context information needed. The crux of the problem is that I have seen the
best performance from perl when I match with the /g option on a string that
represents the current slurped file:

local $/;
my $curStr = <FH>;
while ($curStr =~ /$compiledRegex/g)
{
# write matches to file for eventual paging
}

This works well except that when each match is found I need the line number
the match has been found in. As far as I can tell from reading and research
there is no variable that holds this information as I am not reading from
the file at this point. I can get the information in other ways such as:

1. Reading each file a line at a time, testing for a match and keeping a
line counter or using $NR.
2. Reading the file into an array and processing a line at a time
3. Creating index files for the source files that store line offsets and
using them with the slurp method in the
paragraph above
4. Creating an in-memory index for each file that contains a match and using
it for subsequent matches in that file

1, 2 and 4 above suffer performance degradation relative to unix grep. #3
provides good performance and is the method I am currently using but it
requires creating and maintaining index files. I was wondering if I could
tie a scalar to a file and use the slurping loop above. Then perhaps $NR and
$. would contain the current line number as the file would be read as the
loop is traversed. Any other ideas would be welcome

Al

Jul 19 '05 #2

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

31 posts views Thread by surfunbear | last post: by
reply views Thread by Kirt Loki Dankmyer | last post: by
4 posts views Thread by Ignoramus6539 | last post: by
3 posts views Thread by soniamadtha | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.