473,398 Members | 2,113 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,398 software developers and data experts.

Need help with a Regular Expression

25
Hi,
I am trying to understand a concept in Regex in Perl. How to write regex in Perl such that metacharacter * is not greedy.

Here is my code:-
Expand|Select|Wrap|Line Numbers
  1. #!usr/bin/perl
  2. use strict;
  3. my $sentence = "Perl is a dynamic programming language created by Larry Wall and first released in 1987,
  4. Perl is based on the brace-delimited block style of AWK and C, 
  5. and was widely adopted for its strengths in text processing 
  6. and lack of the arbitrary limitations 
  7. of many scripting languages at the time.";
  8.  
  9. my $b;
  10. if ($sentence =~ /and(.*)\./s)
  11. {
  12.     $b = $1;
  13.     print "The following is the output:-\n";
  14.     print "$b\n";
  15. }
  16.  
#Output:-
The following is the output:-
first released in 1987,
Perl is based on the brace-delimited block style of AWK and C,
and was widely adopted for its strengths in text processing
and lack of the arbitrary limitations
of many scripting languages at the time

The * operator is very greedy and so I get the output like that.
I want the output to be just from the last occurence of "and" upto the "." like the following:-

lack of the arbitrary limitations
of many scripting languages at the time

So how do I achieve that? I tried using the repetition modifier {} after "and" but that does not work either.
I would appreciate if you could help me with this.

Thanks in advance,
Sangith
Jan 10 '08 #1
2 1506
KevinADC
4,059 Expert 2GB
Hi,
I am trying to understand a concept in Regex in Perl. How to write regex in Perl such that metacharacter * is not greedy.

Here is my code:-
Expand|Select|Wrap|Line Numbers
  1. #!usr/bin/perl
  2. use strict;
  3. my $sentence = "Perl is a dynamic programming language created by Larry Wall and first released in 1987,
  4. Perl is based on the brace-delimited block style of AWK and C, 
  5. and was widely adopted for its strengths in text processing 
  6. and lack of the arbitrary limitations 
  7. of many scripting languages at the time.";
  8.  
  9. my $b;
  10. if ($sentence =~ /and(.*)\./s)
  11. {
  12.     $b = $1;
  13.     print "The following is the output:-\n";
  14.     print "$b\n";
  15. }
  16.  
#Output:-
The following is the output:-
first released in 1987,
Perl is based on the brace-delimited block style of AWK and C,
and was widely adopted for its strengths in text processing
and lack of the arbitrary limitations
of many scripting languages at the time

The * operator is very greedy and so I get the output like that.
I want the output to be just from the last occurence of "and" upto the "." like the following:-

lack of the arbitrary limitations
of many scripting languages at the time

So how do I achieve that? I tried using the repetition modifier {} after "and" but that does not work either.
I would appreciate if you could help me with this.

Thanks in advance,
Sangith
Regular expressions are probably one of the more complicated things about perl (and many languages) that the casual perl coder will have to learn. A significant thing to note is that a regular expression will try and match a pattern as early as it can in a string. The word "and" occurs several times in the string, perl will try and match the first occurance, just after Larry Wall: "Larry Wall and".

In order to match the last occurance you actually want to use greedy matching:

/.*and (.*)\./

the first '.*' will match until the last occurance of: "and " (and-space). So you have to learn how to take advantage of greedy matching and when to use and when not to use it. But your problem is further complicated because it is a string of multiple lines (at least it looks that way in your post). To ignore the multiple-lines, you use the"s" modifier at the end of the regexp. This tells perl to treat the string as one long line and ignore all newlines except the one at the very end of the string (if there is one).

This is one way it could be done:

Expand|Select|Wrap|Line Numbers
  1. #!usr/bin/perl
  2. use strict;
  3. my $sentence = "Perl is a dynamic programming language created by Larry Wall and first released in 1987,
  4. Perl is based on the brace-delimited block style of AWK and C,
  5. and was widely adopted for its strengths in text processing
  6. and lack of the arbitrary limitations
  7. of many scripting languages at the time.";
  8. my $r;
  9. if ($sentence =~ /.*and (.*)\./s)
  10. {
  11.    $r = $1;
  12.    print "The following is the output:-\n";
  13.    print "$r\n";
  14. }
This is a bit contrived to fit the string you posted. The pattern you want to match appears to start at the beginning of a line within the string. But if you did not know where the pattern started in the string you would probably have to use a different search pattern to avoid substring matches like "land" or "sand".

Here is a link that might help you:

http://perldoc.perl.org/perlretut.html

Take it a little at a time if it's confusing.
Jan 10 '08 #2
sangith
25
Hi Kevin,
Thank you so much for your help! Your approach works just great!
I am using this perl code for parsing my text file. The string that I am searching for in the file is a fixed one and will not occur as a part of any other string, so this approach is the best one for me.

Thanks again,
Sangith


Regular expressions are probably one of the more complicated things about perl (and many languages) that the casual perl coder will have to learn. A significant thing to note is that a regular expression will try and match a pattern as early as it can in a string. The word "and" occurs several times in the string, perl will try and match the first occurance, just after Larry Wall: "Larry Wall and".

In order to match the last occurance you actually want to use greedy matching:

/.*and (.*)\./

the first '.*' will match until the last occurance of: "and " (and-space). So you have to learn how to take advantage of greedy matching and when to use and when not to use it. But your problem is further complicated because it is a string of multiple lines (at least it looks that way in your post). To ignore the multiple-lines, you use the"s" modifier at the end of the regexp. This tells perl to treat the string as one long line and ignore all newlines except the one at the very end of the string (if there is one).

This is one way it could be done:

Expand|Select|Wrap|Line Numbers
  1. #!usr/bin/perl
  2. use strict;
  3. my $sentence = "Perl is a dynamic programming language created by Larry Wall and first released in 1987,
  4. Perl is based on the brace-delimited block style of AWK and C,
  5. and was widely adopted for its strengths in text processing
  6. and lack of the arbitrary limitations
  7. of many scripting languages at the time.";
  8. my $r;
  9. if ($sentence =~ /.*and (.*)\./s)
  10. {
  11.    $r = $1;
  12.    print "The following is the output:-\n";
  13.    print "$r\n";
  14. }
This is a bit contrived to fit the string you posted. The pattern you want to match appears to start at the beginning of a line within the string. But if you did not know where the pattern started in the string you would probably have to use a different search pattern to avoid substring matches like "land" or "sand".

Here is a link that might help you:

http://perldoc.perl.org/perlretut.html

Take it a little at a time if it's confusing.
Jan 10 '08 #3

Sign in to post your reply or Sign up for a free account.

Similar topics

3
by: Danny | last post by:
I am trying to do a regular expression to search for a url so anything that has http:\\www.hellothere.com but may not have the http:\\ and may not have the www and may not have http:\\www and...
2
by: Mike Andrews | last post by:
Guys, I've got a regular expression that will just not work. I can't get it work properly and I would like to see if someone out there can tell me if I'm doing this wrong, or if there is a...
4
by: Neri | last post by:
Some document processing program I write has to deal with documents that have headers and footers that are unnecessary for the main processing part. Therefore, I'm using a regular expression to go...
3
by: Joe | last post by:
Hi, I have been using a regular expression that I don’t uite understand to filter the valid email address. My regular expression is as follows: <asp:RegularExpressionValidator...
18
by: Q. John Chen | last post by:
I have Vidation Controls First One: Simple exluce certain special characters: say no a or b or c in the string: * Second One: I required date be entered in "MM/DD/YYYY" format: //+4 How...
3
by: Lucky | last post by:
hi guys, i'm practising regular expression. i've got one string and i want it to split in groups. i was trying to make one regular expression but i didn't successed. please help me guys. i'm...
17
by: Mark | last post by:
I must create a routine that finds tokens in small, arbitrary VB code snippets. For example, it might have to find all occurrences of {Formula} I was thinking that using regular expressions...
6
by: deepak_kamath_n | last post by:
Hello, I am relatively new to the world of regex and require some help in forming a regular expression to achieve the following: I have an input stream similar to: Slot: slot1 Description:...
14
by: Chris | last post by:
I need a pattern that matches a string that has the same number of '(' as ')': findall( compile('...'), '42^((2x+2)sin(x)) + (log(2)/log(5))' ) = Can anybody help me out? Thanks for any help!
0
by: altavim | last post by:
Usually when you make regular expression to extract text you are starting from simple expression. When you got to know target text, you are extending your expression. Subsequently very hard to ready...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.