By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
454,253 Members | 1,333 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 454,253 IT Pros & Developers. It's quick & easy.

Negative regular expression not working at end of string

nithinpes
Expert 100+
P: 410
I tried the following regular expression in script to extract lines not begining with capital letters and not ending with numbers.
Expand|Select|Wrap|Line Numbers
  1.  if ($str =~/^[^A-Z].+[^0-9]$/)       ## similarly /^[^A-Z].+\D$/
  2.    {
  3.     ######
  4.    }
  5.  
This didn't work. The pattern condition was applied only to begining of string. The lines returned contained both digits & non-digits at the end.

However, I could achieve the task as follows:
Expand|Select|Wrap|Line Numbers
  1.  if (($str !~ /^[A-Z]/)&&($str !~ /[0-9]$/))
  2.    {
  3.     ######
  4.    }
  5.  
But, I need to know why the previous expression didn't work. I understand the problem with using negative grammar for pattern matching. But, I am unable to find any conceptual/ logical error in the first expression.
To boil down the above query,
Expand|Select|Wrap|Line Numbers
  1. if(/^[^A-Z]/)
  2. { print $_; }
  3.  
will successfully return lines not begining with capital letters, while
Expand|Select|Wrap|Line Numbers
  1. if(/[^0-9]$/)
  2. { print $_; }
  3.  
fails to return lines not ending with digits.It returns all lines.

Any help would be greatly appreciated.
Jan 21 '08 #1
Share this Question
Share on Google+
2 Replies


KevinADC
Expert 2.5K+
P: 4,059
The end of string anchor "$" is supposed to match the pattern before any newline at the end of the string/line. And it does in this example:

Expand|Select|Wrap|Line Numbers
  1. while(<DATA>){
  2.    print if ($_ !~ /[0-9]$/);
  3. }
  4.  
  5. __DATA__
  6. This is a test
  7. This is a test 96
  8. this is a test
  9. this is a test 99
  10. this is another test
The lines with digits on the end do not get printed. But when I change it to a negated character class the newline is not being ignored:

Expand|Select|Wrap|Line Numbers
  1. while(<DATA>){
  2.    print if ($_ =~ /[^0-9]$/);
  3. }
  4.  
  5. __DATA__
  6. This is a test
  7. This is a test 96
  8. this is a test
  9. this is a test 99
  10. this is another test 
and all the lines are printed. But if I chomp() each line the ones with digits on the end are not printed:

Expand|Select|Wrap|Line Numbers
  1. while(<DATA>){
  2.    chomp;
  3.    print if ($_ =~ /[^0-9]$/);
  4. }
  5.  
  6. __DATA__
  7. This is a test
  8. This is a test 96
  9. this is a test
  10. this is a test 99
  11. this is another test
Evidently a negated character class tied to the end of the line/string does not ignore the record seperator (a newline in this case) and the match fails. I could not find this documented anywhere though.
Jan 21 '08 #2

nithinpes
Expert 100+
P: 410
Thanks a lot!
This concept is not documented in any of the books that I refered or in online references.
Jan 21 '08 #3

Post your reply

Sign in to post your reply or Sign up for a free account.