470,870 Members | 1,425 Online
Bytes | Developer Community
New Post

Home Posts Topics Members FAQ

Post your question to a community of 470,870 developers. It's quick & easy.

preg's 'negative lookbehind' -- broken?

I'm trying to write a filter that will ignore text of the form
'\_foo\_' while filtering text of the form '_foo_'. In other words,
a backslash is meant to protect against the operation of this
particular filter.

To detect the leading backslash once I find the underscore, I've been
trying to use a 'negative lookbehind' in preg_replace. It seems a
perfect use for it: if we find an under but it was preceded by a
backslash, don't filter.

I can't use a positive filter because I have another filter that
strips the backslashes, leaving only the underscores. So the order in
which they run is important.

Unfortunately, I can't seem to get the lookbehind to work...the filter
runs whether or not there are backslashes. So either the lookbehind
is broken or I'm having another problem finding the right number of
backslashes. As I mentioned in my earlier post, passing
'\_sometext\_' via a form causes the string to become '\\_sometext\\_'
with all 4 backslashes being counted as separate literals--which
requires 8 backslashes in the filter! Quite unexpected, also
annoying.

Anyhow, I've tried 4 and I've tried 8 and neither works, which is why
I wonder whether it's broken.

Any insights?

thanks in advance!
Margaret
--
(To mail me, please change .not.invalid to .net, first.
Apologies for the inconvenience.)
Jul 17 '05 #1
4 3055
Margaret MacDonald wrote:
I'm trying to write a filter that will ignore text of the form
'\_foo\_' while filtering text of the form '_foo_'. In other words,
a backslash is meant to protect against the operation of this
particular filter. .... Anyhow, I've tried 4 and I've tried 8 and neither works, which is why
I wonder whether it's broken.


This

========
<?php
$input = 'text _text_ text \_text\_ text';

$rx = '@(?<!\\\\)_(.*)_@U'; // ungreedy matching
// ^^^^^^^^^ -- look-behind assertion

$output = preg_replace($rx, '_filtered_', $input);
echo $input, ' ==> ', $output, "\n";
?>
--------

works for me. The output is:

========
text _text_ text \_text\_ text ==> text _filtered_ text \_text\_ text
--------

--
USENET would be a better place if everybody read: | to email me: use |
http://www.catb.org/~esr/faqs/smart-questions.html | my name in "To:" |
http://www.netmeister.org/news/learn2quote2.html | header, textonly |
http://www.expita.com/nomime.html | no attachments. |
Jul 17 '05 #2
I wrote:
I'm trying to write a filter that will ignore text of the form
'\_foo\_' while filtering text of the form '_foo_'. In other words,
a backslash is meant to protect against the operation of this
particular filter.

To detect the leading backslash once I find the underscore, I've been
trying to use a 'negative lookbehind' in preg_replace. It seems a
perfect use for it: if we find an under but it was preceded by a
backslash, don't filter.

I can't use a positive filter because I have another filter that
strips the backslashes, leaving only the underscores. So the order in
which they run is important.

Unfortunately, I can't seem to get the lookbehind to work...the filter
runs whether or not there are backslashes. So either the lookbehind
is broken or I'm having another problem finding the right number of
backslashes. As I mentioned in my earlier post, passing
'\_sometext\_' via a form causes the string to become '\\_sometext\\_'
with all 4 backslashes being counted as separate literals--which
requires 8 backslashes in the filter! Quite unexpected, also
annoying.

Anyhow, I've tried 4 and I've tried 8 and neither works, which is why
I wonder whether it's broken.


I'd been trying this syntax:

preg_replace( '/_(?<!\\\\)test_(?<!\\\\)/', ....)

based on my understanding of the example in Lerdorf & Tatroe. My
mental model was that it would find the underscore, step the counter
and find the lookbehind, then, to satisfy the lookbehind, rewind the
counter and see whether it could find a backslash. If it does, then
the test is false and it continues searching.

But that didn't work and nothing I could do would make it work.

Then I tried switching the order:

preg_replace( '/ (?<!\\\\)_test{?<!\\\\)/', ....)

and that DOES seem to work, though it doesn't seem to make sense in
terms of how the construct is described (lookbehind, etc) and I don't
know whether it's working by accident or design.

So my model for this would be that the so-called 'lookbehind' is
actually buffering the n chars (in my case 1) before the counter so
that, when the interpreter finds something under the counter (in my
case the _ ), the 'lookbehind' buffer already has the extra
information needed to finish the evaluation--the counter is never
rewound.

Does anyone else have any experience that would confirm or refute
this? I really hate to rely on something that might only be working
by accident.

Margaret
--
(To mail me, please change .not.invalid to .net, first.
Apologies for the inconvenience.)
Jul 17 '05 #3
Pedro Graca wrote:
Margaret MacDonald wrote:
I'm trying to write a filter that will ignore text of the form
'\_foo\_' while filtering text of the form '_foo_'. In other words,
a backslash is meant to protect against the operation of this
particular filter.

...
Anyhow, I've tried 4 and I've tried 8 and neither works, which is why
I wonder whether it's broken.


This

========
<?php
$input = 'text _text_ text \_text\_ text';

$rx = '@(?<!\\\\)_(.*)_@U'; // ungreedy matching
// ^^^^^^^^^ -- look-behind assertion

$output = preg_replace($rx, '_filtered_', $input);
echo $input, ' ==> ', $output, "\n";
?>
--------

works for me. The output is:

========
text _text_ text \_text\_ text ==> text _filtered_ text \_text\_ text
--------


ha! Thanks, Pedro. That seems to confirm my suspicions.

Margaret
--
(To mail me, please change .not.invalid to .net, first.
Apologies for the inconvenience.)
Jul 17 '05 #4
Margaret MacDonald wrote:
That seems to confirm my suspicions.


No need to be suspicious about it :)

It's all documented on the manual:
http://www.php.net/manual/en/pcre.pa...nce.assertions
--
USENET would be a better place if everybody read: | to email me: use |
http://www.catb.org/~esr/faqs/smart-questions.html | my name in "To:" |
http://www.netmeister.org/news/learn2quote2.html | header, textonly |
http://www.expita.com/nomime.html | no attachments. |
Jul 17 '05 #5

This discussion thread is closed

Replies have been disabled for this discussion.

Similar topics

1 post views Thread by Thomas F. O'Connell | last post: by
1 post views Thread by mail | last post: by
2 posts views Thread by brendan | last post: by
4 posts views Thread by system7designs | last post: by
14 posts views Thread by frizzle | last post: by
3 posts views Thread by Jim | last post: by
By using this site, you agree to our Privacy Policy and Terms of Use.