Need help with a Regular Expression

Hi,
I am trying to understand a concept in Regex in Perl. How to write regex in Perl such that metacharacter * is not greedy.

Here is my code:-

Expand|Select|Wrap|Line Numbers

 
#!usr/bin/perl

use strict;

my $sentence = "Perl is a dynamic programming language created by Larry Wall and first released in 1987,

Perl is based on the brace-delimited block style of AWK and C, 

and was widely adopted for its strengths in text processing 

and lack of the arbitrary limitations 

of many scripting languages at the time.";
 
my $b;

if ($sentence =~ /and(.*)\./s)

{

    $b = $1;

    print "The following is the output:-\n";

    print "$b\n";

}

#Output:-
The following is the output:-
first released in 1987,
Perl is based on the brace-delimited block style of AWK and C,
and was widely adopted for its strengths in text processing
and lack of the arbitrary limitations
of many scripting languages at the time

The * operator is very greedy and so I get the output like that.
I want the output to be just from the last occurence of "and" upto the "." like the following:-

lack of the arbitrary limitations
of many scripting languages at the time

So how do I achieve that? I tried using the repetition modifier {} after "and" but that does not work either.
I would appreciate if you could help me with this.

Thanks in advance,
Sangith

Jan 10 '08 #1

Subscribe Post Reply

1506

KevinADC

4,059

Expert 2GB

Hi,
I am trying to understand a concept in Regex in Perl. How to write regex in Perl such that metacharacter * is not greedy.

Here is my code:-

Expand|Select|Wrap|Line Numbers

#!usr/bin/perl

use strict;

my $sentence = "Perl is a dynamic programming language created by Larry Wall and first released in 1987,

Perl is based on the brace-delimited block style of AWK and C,

and was widely adopted for its strengths in text processing

and lack of the arbitrary limitations

of many scripting languages at the time.";

my $b;

if ($sentence =~ /and(.*)\./s)

{

    $b = $1;

    print "The following is the output:-\n";

    print "$b\n";

}

#Output:-
The following is the output:-
first released in 1987,
Perl is based on the brace-delimited block style of AWK and C,
and was widely adopted for its strengths in text processing
and lack of the arbitrary limitations
of many scripting languages at the time

The * operator is very greedy and so I get the output like that.
I want the output to be just from the last occurence of "and" upto the "." like the following:-

lack of the arbitrary limitations
of many scripting languages at the time

So how do I achieve that? I tried using the repetition modifier {} after "and" but that does not work either.
I would appreciate if you could help me with this.

Thanks in advance,
Sangith

Regular expressions are probably one of the more complicated things about perl (and many languages) that the casual perl coder will have to learn. A significant thing to note is that a regular expression will try and match a pattern as early as it can in a string. The word "and" occurs several times in the string, perl will try and match the first occurance, just after Larry Wall: "Larry Wall and".

In order to match the last occurance you actually want to use greedy matching:

/.*and (.*)\./

the first '.*' will match until the last occurance of: "and " (and-space). So you have to learn how to take advantage of greedy matching and when to use and when not to use it. But your problem is further complicated because it is a string of multiple lines (at least it looks that way in your post). To ignore the multiple-lines, you use the"s" modifier at the end of the regexp. This tells perl to treat the string as one long line and ignore all newlines except the one at the very end of the string (if there is one).

This is one way it could be done:

Expand|Select|Wrap|Line Numbers

 #!usr/bin/perl

use strict;

my $sentence = "Perl is a dynamic programming language created by Larry Wall and first released in 1987,

Perl is based on the brace-delimited block style of AWK and C,

and was widely adopted for its strengths in text processing

and lack of the arbitrary limitations

of many scripting languages at the time.";

my $r;

if ($sentence =~ /.*and (.*)\./s)

{

   $r = $1;

   print "The following is the output:-\n";

   print "$r\n";

}

This is a bit contrived to fit the string you posted. The pattern you want to match appears to start at the beginning of a line within the string. But if you did not know where the pattern started in the string you would probably have to use a different search pattern to avoid substring matches like "land" or "sand".

Here is a link that might help you:

http://perldoc.perl.org/perlretut.html

Take it a little at a time if it's confusing.

Jan 10 '08 #2

sangith

Hi Kevin,
Thank you so much for your help! Your approach works just great!
I am using this perl code for parsing my text file. The string that I am searching for in the file is a fixed one and will not occur as a part of any other string, so this approach is the best one for me.

Thanks again,
Sangith

Regular expressions are probably one of the more complicated things about perl (and many languages) that the casual perl coder will have to learn. A significant thing to note is that a regular expression will try and match a pattern as early as it can in a string. The word "and" occurs several times in the string, perl will try and match the first occurance, just after Larry Wall: "Larry Wall and".

In order to match the last occurance you actually want to use greedy matching:

/.*and (.*)\./

the first '.*' will match until the last occurance of: "and " (and-space). So you have to learn how to take advantage of greedy matching and when to use and when not to use it. But your problem is further complicated because it is a string of multiple lines (at least it looks that way in your post). To ignore the multiple-lines, you use the"s" modifier at the end of the regexp. This tells perl to treat the string as one long line and ignore all newlines except the one at the very end of the string (if there is one).

This is one way it could be done:

Expand|Select|Wrap|Line Numbers

#!usr/bin/perl

use strict;

my $sentence = "Perl is a dynamic programming language created by Larry Wall and first released in 1987,

Perl is based on the brace-delimited block style of AWK and C,

and was widely adopted for its strengths in text processing

and lack of the arbitrary limitations

of many scripting languages at the time.";

my $r;

if ($sentence =~ /.*and (.*)\./s)

{

   $r = $1;

   print "The following is the output:-\n";

   print "$r\n";

}

This is a bit contrived to fit the string you posted. The pattern you want to match appears to start at the beginning of a line within the string. But if you did not know where the pattern started in the string you would probably have to use a different search pattern to avoid substring matches like "land" or "sand".

Here is a link that might help you:

http://perldoc.perl.org/perlretut.html

Take it a little at a time if it's confusing.

Jan 10 '08 #3

Similar topics

Need regular expression for this

by: Danny | last post by:

I am trying to do a regular expression to search for a url so anything that has http:\\www.hellothere.com but may not have the http:\\ and may not have the www and may not have http:\\www and...

Microsoft Access / VBA

Problem with a Regular Expression in C. Need Help!

by: Mike Andrews | last post by:

Guys, I've got a regular expression that will just not work. I can't get it work properly and I would like to see if someone out there can tell me if I'm doing this wrong, or if there is a...

C / C++

Help needed with a regular expression

by: Neri | last post by:

Some document processing program I write has to deal with documents that have headers and footers that are unnecessary for the main processing part. Therefore, I'm using a regular expression to go...

C# / C Sharp

Need help understanding regular expression

by: Joe | last post by:

Hi, I have been using a regular expression that I donâ€™t uite understand to filter the valid email address. My regular expression is as follows: <asp:RegularExpressionValidator...

ASP.NET

Simple Regular Expression need

by: Q. John Chen | last post by:

I have Vidation Controls First One: Simple exluce certain special characters: say no a or b or c in the string: * Second One: I required date be entered in "MM/DD/YYYY" format: //+4 How...

ASP.NET

Need one Regular Expression

by: Lucky | last post by:

hi guys, i'm practising regular expression. i've got one string and i want it to split in groups. i was trying to make one regular expression but i didn't successed. please help me guys. i'm...

Visual Basic .NET

parsing VB code with a regex

by: Mark | last post by:

I must create a routine that finds tokens in small, arbitrary VB code snippets. For example, it might have to find all occurrences of {Formula} I was thinking that using regular expressions...

.NET Framework

Need help in forming a regular expression using regex_replace

by: deepak_kamath_n | last post by:

Hello, I am relatively new to the world of regex and require some help in forming a regular expression to achieve the following: I have an input stream similar to: Slot: slot1 Description:...

C / C++

need some regular expression help

by: Chris | last post by:

I need a pattern that matches a string that has the same number of '(' as ')': findall( compile('...'), '42^((2x+2)sin(x)) + (log(2)/log(5))' ) = Can anybody help me out? Thanks for any help!

Python

How to build long Regular Expression

by: altavim | last post by:

Usually when you make regular expression to extract text you are starting from simple expression. When you got to know target text, you are extending your expression. Subsequently very hard to ready...

.NET Framework

Migrating Website to Cloud - Emmanuel Katto

by: emmanuelkatto | last post by:

Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel

General

Navigating the Data Structures and Algorithms (DSA)

by: BarryA | last post by:

What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...

Algorithms / Advanced Math

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

General

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware

What is ONU?

by: marktang | last post by:

ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...

General

Maximizing Business Potential: The Nexus of Website Design and Digital Marketing

by: jinu1996 | last post by:

In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...

Online Marketing

Discussion: How does Zigbee compare with other wireless protocols in smart home applications?

by: tracyyun | last post by:

Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...

General

Access Europe - Using VBA to create a class based on a table - Wed 1 May

by: isladogs | last post by:

The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

Microsoft Access / VBA