473,383 Members | 1,837 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,383 software developers and data experts.

preg's 'negative lookbehind' -- broken?

I'm trying to write a filter that will ignore text of the form
'\_foo\_' while filtering text of the form '_foo_'. In other words,
a backslash is meant to protect against the operation of this
particular filter.

To detect the leading backslash once I find the underscore, I've been
trying to use a 'negative lookbehind' in preg_replace. It seems a
perfect use for it: if we find an under but it was preceded by a
backslash, don't filter.

I can't use a positive filter because I have another filter that
strips the backslashes, leaving only the underscores. So the order in
which they run is important.

Unfortunately, I can't seem to get the lookbehind to work...the filter
runs whether or not there are backslashes. So either the lookbehind
is broken or I'm having another problem finding the right number of
backslashes. As I mentioned in my earlier post, passing
'\_sometext\_' via a form causes the string to become '\\_sometext\\_'
with all 4 backslashes being counted as separate literals--which
requires 8 backslashes in the filter! Quite unexpected, also
annoying.

Anyhow, I've tried 4 and I've tried 8 and neither works, which is why
I wonder whether it's broken.

Any insights?

thanks in advance!
Margaret
--
(To mail me, please change .not.invalid to .net, first.
Apologies for the inconvenience.)
Jul 17 '05 #1
4 3217
Margaret MacDonald wrote:
I'm trying to write a filter that will ignore text of the form
'\_foo\_' while filtering text of the form '_foo_'. In other words,
a backslash is meant to protect against the operation of this
particular filter. .... Anyhow, I've tried 4 and I've tried 8 and neither works, which is why
I wonder whether it's broken.


This

========
<?php
$input = 'text _text_ text \_text\_ text';

$rx = '@(?<!\\\\)_(.*)_@U'; // ungreedy matching
// ^^^^^^^^^ -- look-behind assertion

$output = preg_replace($rx, '_filtered_', $input);
echo $input, ' ==> ', $output, "\n";
?>
--------

works for me. The output is:

========
text _text_ text \_text\_ text ==> text _filtered_ text \_text\_ text
--------

--
USENET would be a better place if everybody read: | to email me: use |
http://www.catb.org/~esr/faqs/smart-questions.html | my name in "To:" |
http://www.netmeister.org/news/learn2quote2.html | header, textonly |
http://www.expita.com/nomime.html | no attachments. |
Jul 17 '05 #2
I wrote:
I'm trying to write a filter that will ignore text of the form
'\_foo\_' while filtering text of the form '_foo_'. In other words,
a backslash is meant to protect against the operation of this
particular filter.

To detect the leading backslash once I find the underscore, I've been
trying to use a 'negative lookbehind' in preg_replace. It seems a
perfect use for it: if we find an under but it was preceded by a
backslash, don't filter.

I can't use a positive filter because I have another filter that
strips the backslashes, leaving only the underscores. So the order in
which they run is important.

Unfortunately, I can't seem to get the lookbehind to work...the filter
runs whether or not there are backslashes. So either the lookbehind
is broken or I'm having another problem finding the right number of
backslashes. As I mentioned in my earlier post, passing
'\_sometext\_' via a form causes the string to become '\\_sometext\\_'
with all 4 backslashes being counted as separate literals--which
requires 8 backslashes in the filter! Quite unexpected, also
annoying.

Anyhow, I've tried 4 and I've tried 8 and neither works, which is why
I wonder whether it's broken.


I'd been trying this syntax:

preg_replace( '/_(?<!\\\\)test_(?<!\\\\)/', ....)

based on my understanding of the example in Lerdorf & Tatroe. My
mental model was that it would find the underscore, step the counter
and find the lookbehind, then, to satisfy the lookbehind, rewind the
counter and see whether it could find a backslash. If it does, then
the test is false and it continues searching.

But that didn't work and nothing I could do would make it work.

Then I tried switching the order:

preg_replace( '/ (?<!\\\\)_test{?<!\\\\)/', ....)

and that DOES seem to work, though it doesn't seem to make sense in
terms of how the construct is described (lookbehind, etc) and I don't
know whether it's working by accident or design.

So my model for this would be that the so-called 'lookbehind' is
actually buffering the n chars (in my case 1) before the counter so
that, when the interpreter finds something under the counter (in my
case the _ ), the 'lookbehind' buffer already has the extra
information needed to finish the evaluation--the counter is never
rewound.

Does anyone else have any experience that would confirm or refute
this? I really hate to rely on something that might only be working
by accident.

Margaret
--
(To mail me, please change .not.invalid to .net, first.
Apologies for the inconvenience.)
Jul 17 '05 #3
Pedro Graca wrote:
Margaret MacDonald wrote:
I'm trying to write a filter that will ignore text of the form
'\_foo\_' while filtering text of the form '_foo_'. In other words,
a backslash is meant to protect against the operation of this
particular filter.

...
Anyhow, I've tried 4 and I've tried 8 and neither works, which is why
I wonder whether it's broken.


This

========
<?php
$input = 'text _text_ text \_text\_ text';

$rx = '@(?<!\\\\)_(.*)_@U'; // ungreedy matching
// ^^^^^^^^^ -- look-behind assertion

$output = preg_replace($rx, '_filtered_', $input);
echo $input, ' ==> ', $output, "\n";
?>
--------

works for me. The output is:

========
text _text_ text \_text\_ text ==> text _filtered_ text \_text\_ text
--------


ha! Thanks, Pedro. That seems to confirm my suspicions.

Margaret
--
(To mail me, please change .not.invalid to .net, first.
Apologies for the inconvenience.)
Jul 17 '05 #4
Margaret MacDonald wrote:
That seems to confirm my suspicions.


No need to be suspicious about it :)

It's all documented on the manual:
http://www.php.net/manual/en/pcre.pa...nce.assertions
--
USENET would be a better place if everybody read: | to email me: use |
http://www.catb.org/~esr/faqs/smart-questions.html | my name in "To:" |
http://www.netmeister.org/news/learn2quote2.html | header, textonly |
http://www.expita.com/nomime.html | no attachments. |
Jul 17 '05 #5

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
by: Thomas F. O'Connell | last post by:
I've been looking through the negative lookbehind posts and haven't yet found a definitive answer to the question I'm about to ask: Does negative lookbehind have lower precedence than even a...
1
by: mail | last post by:
Hello, I am trying to use regular expressions to scan a subdirectory structure and run sfv and parity file checks on the directory. However, I am having an issue with my current code using...
2
by: brendan | last post by:
Hi ... want to antispam any email address that is written in a page, so long as it is not already contained in a hyperlink ie 'mailto:user@email.com' so am trying to use a negative look behind...
4
by: system7designs | last post by:
I don't know preg's that well, can anyone tell me how to write a regular expression that will select everything BUT files/folders that begin with ._ or __?(that's period-underscore and underscore...
14
by: frizzle | last post by:
Hi group, I have a function which validates a string using preg match. A part looks like if( !preg_match( '/^(+((*)?)?)$/', $string ) || preg_match( '/(--|__)+/' ,$string) ) { i wonder...
4
by: DSmith1974 | last post by:
Are lookarounds supported in the boost regex lib? In my VS6 project using boost 1.32.0 I can declare a regex as.. <code_snippet> std::wstring wstrFilename = L"01_BAR08"; boost::wregex...
3
by: Jim | last post by:
Hi, I'm trying to prefix the "src" attribute of all "img" elements with a given string, $prefix. Here's what I've got: preg_replace('/\<img(.+?)src="(?<!=http)(.+?)"(.+?)\/>/', '<img $1src="'...
2
by: Bart Kastermans | last post by:
I have a file in which I am searching for the letter "i" (actually a bit more general than that, arbitrary regular expressions could occur) as long as it does not occur inside an expression that...
1
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
0
by: taylorcarr | last post by:
A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: aa123db | last post by:
Variable and constants Use var or let for variables and const fror constants. Var foo ='bar'; Let foo ='bar';const baz ='bar'; Functions function $name$ ($parameters$) { } ...
0
by: ryjfgjl | last post by:
In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.