By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
438,852 Members | 2,198 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 438,852 IT Pros & Developers. It's quick & easy.

Perl pattern matching

P: 6
I have a string of characters ex: fhjhejfherhfehkehkeh

I want to retain the first "h" and delete all the other characters how do i do it ?
May 6 '10 #1
Share this Question
Share on Google+
13 Replies

Expert Mod 2.5K+
P: 3,503
What have you tried thus far? We would like to see what road you are going down instead of just giving you the code. Its a better way to learn. Then, we can guide you.


May 7 '10 #2

P: 6
To start with i am a newbie in perl.
I am aware of how to delete the first “h “character and keep the remaining as it is, using ^
But I wanted to know how I do the inverse of this i.e keep the first “h” character and remove the remaining "h" characters
May 7 '10 #3

Expert 100+
P: 785
regex: s/(.*h[^h]*)h/$1/g
Algorhithm: just read to the second "h" from the beginning on. then copy everything except the second "h".
repeat this regex until nothing will be replaced anymore.

there is also another way: a single regex with negative lookahead that needs no looping : look backward from the current 'h'. If there exists an 'h' left of it, delete the current, else don't delete it (that means regex will not match).

Also a third way:
step 1: replace all multiple h with a double one h+ --> hh
step 2 replace first 'hh' with a single one. (^.*h)h --> $1
step 3: replace all hh with nothing: hh -->

Also a fourth way:
step 1: find first h and store its position.
step 2: use regex after this position, to replace all h with nothing .

Fifth way:

there are many other ways which come to my mind, but I am tired of writing them down all. Just give a hint how you like it.

Most likely you want to know how to do the regex with the negative lookahead (the shortest coding), right?
Here is an example how negative lookahead is used, so you can learn and do it yourself (If you still have problems, come back)

This Regex:
(?<!\.)(\d+?)(?=(\d{3})+\D) Replace all with: $1
didn't work as expected. Why? (it should format a number inside a text like "distance 1234567.1234567 meter" into "distance 1,234,567.1234567 meter").

Try to correct it yourself, so you learn a lot.
I corrected it to this one and it worked:
(?<!\.\d{0,100})(\d+?)(?=(\d{3})+($|\D)) Replace all with: $1

Is this also possible without the restriction of 100 maximum characters after the dot?
May 12 '10 #4

P: 6
Can you please give some pointers to the concept of regex with negative lookahead basics, it would really help.

My main application of perl is for pattern matching, some tutorial on this also would help
May 20 '10 #5

Expert Mod 2.5K+
P: 3,503
@pkn876 There are plenty of Perl Regular Expression tutorials out there. All you have to do is search Google and you will find them.

You say your a newbie and want to do negative look aheads. You really need to start practicing with regex's and work with them. They take practice to understand. To get to and understand the look aheads and look behinds, you will need to have a good understanding of how regex's work.

I would start with the link I provided above.


May 21 '10 #6

Expert Mod 100+
P: 589
Mastering Regular Expressions
May 21 '10 #7

P: 6
I started with writing some simple programmes to understand the look ahead and look behind concepts.

I basically have a file which has lot of lines of characters, i loop through each line for some pattern matching.
Expand|Select|Wrap|Line Numbers
  1. while(<IN1>) {
  2.         chomp($_);
  3.         if (/cat(?=\s)/) {
  4.             s/cat/dog/;
  5.         }
  6.     }
For ex:
catacatbcatccatdagduefgvdcat sfjjgja

the code is written such that the o/p should have been:
catacatbcatccatdagduefgvddog sfjjgja

but the o/p i am seeing is:

dogacatbcatccatdagduefgvdcat sfjjgja

It is replacing the first cat

Anything wrong with the piece of code?
May 23 '10 #8

Expert Mod 100+
P: 589
Expand|Select|Wrap|Line Numbers
  1. use strict;
  2. use warnings;
  3. use 5.010;
  5. my $str = 'catacatbcatccatdagduefgvdcat sfjjgja';
  7. if ($str =~ s/cat(?=\s)/dog/) {
  8.     say $str;
  9. }
  10. else {
  11.     say "no match";
  12. }
May 23 '10 #9

P: 6
I tried it on vi with simple regular expression it says "pattern not found"
My file has:
catacatbcatccatdagduefgvdcat sfjjgja

I used:


Getting the following error:

E486: Pattern not found:cat(?=\s)
May 23 '10 #10

Expert Mod 100+
P: 589
Please post a short but complete script that demonstrates the problem.

Here's my example that uses your sample data.

Expand|Select|Wrap|Line Numbers
  1. D:\perl>type
  2. #!/usr/bin/perl
  4. use strict;
  5. use warnings;
  6. use 5.010;
  8. my $str = 'catacatbcatccatdagduefgvdcat sfjjgja';
  10. if ($str =~ s/cat(?=\s)/dog/) {
  11.     say $str;
  12. }
  13. else {
  14.     say "no match";
  15. }
catacatbcatccatdagduefgvddog sfjjgja
May 23 '10 #11

P: 6
This is the code.
The input file passed in the argument contains the character $str.
Expand|Select|Wrap|Line Numbers
  1. #!/usr/bin/perl -w
  3. my $testlist = $ARGV[0];
  4. open(IN, "$testlist") || die "cannot open test list:$testlist";
  5. open(OUT, ">$outfile") || die "cannot open output file:$outfile";
  8. while(<IN>) {
  9.         chomp($_);
  10.         if (/cat(?=\s)/){
  11.         s/cat/dog/;}
  13. printf OUT <<EOF
  14. $_
  15. EOF
  16. ;
  17. }
  18. close(IN);
  19. close(OUT);
May 23 '10 #12

Expert Mod 100+
P: 589
I fail to see why you think using 2 different regexs would accomplish your goal.

Did you try the regex in my example?
May 23 '10 #13

Expert Mod 2.5K+
P: 3,503
@pkn876 You really need to please learn to use CODE TAGS!!

They are required around any and all code that you post into the forums. I have replaced it in your posts here, but in the future you need to use them. This is your only warning.


May 24 '10 #14

Post your reply

Sign in to post your reply or Sign up for a free account.