473,802 Members | 2,081 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

preg's 'negative lookbehind' -- broken?

I'm trying to write a filter that will ignore text of the form
'\_foo\_' while filtering text of the form '_foo_'. In other words,
a backslash is meant to protect against the operation of this
particular filter.

To detect the leading backslash once I find the underscore, I've been
trying to use a 'negative lookbehind' in preg_replace. It seems a
perfect use for it: if we find an under but it was preceded by a
backslash, don't filter.

I can't use a positive filter because I have another filter that
strips the backslashes, leaving only the underscores. So the order in
which they run is important.

Unfortunately, I can't seem to get the lookbehind to work...the filter
runs whether or not there are backslashes. So either the lookbehind
is broken or I'm having another problem finding the right number of
backslashes. As I mentioned in my earlier post, passing
'\_sometext\_' via a form causes the string to become '\\_sometext\\_ '
with all 4 backslashes being counted as separate literals--which
requires 8 backslashes in the filter! Quite unexpected, also
annoying.

Anyhow, I've tried 4 and I've tried 8 and neither works, which is why
I wonder whether it's broken.

Any insights?

thanks in advance!
Margaret
--
(To mail me, please change .not.invalid to .net, first.
Apologies for the inconvenience.)
Jul 17 '05 #1
4 3247
Margaret MacDonald wrote:
I'm trying to write a filter that will ignore text of the form
'\_foo\_' while filtering text of the form '_foo_'. In other words,
a backslash is meant to protect against the operation of this
particular filter. .... Anyhow, I've tried 4 and I've tried 8 and neither works, which is why
I wonder whether it's broken.


This

========
<?php
$input = 'text _text_ text \_text\_ text';

$rx = '@(?<!\\\\)_(.* )_@U'; // ungreedy matching
// ^^^^^^^^^ -- look-behind assertion

$output = preg_replace($r x, '_filtered_', $input);
echo $input, ' ==> ', $output, "\n";
?>
--------

works for me. The output is:

========
text _text_ text \_text\_ text ==> text _filtered_ text \_text\_ text
--------

--
USENET would be a better place if everybody read: | to email me: use |
http://www.catb.org/~esr/faqs/smart-questions.html | my name in "To:" |
http://www.netmeister.org/news/learn2quote2.html | header, textonly |
http://www.expita.com/nomime.html | no attachments. |
Jul 17 '05 #2
I wrote:
I'm trying to write a filter that will ignore text of the form
'\_foo\_' while filtering text of the form '_foo_'. In other words,
a backslash is meant to protect against the operation of this
particular filter.

To detect the leading backslash once I find the underscore, I've been
trying to use a 'negative lookbehind' in preg_replace. It seems a
perfect use for it: if we find an under but it was preceded by a
backslash, don't filter.

I can't use a positive filter because I have another filter that
strips the backslashes, leaving only the underscores. So the order in
which they run is important.

Unfortunatel y, I can't seem to get the lookbehind to work...the filter
runs whether or not there are backslashes. So either the lookbehind
is broken or I'm having another problem finding the right number of
backslashes. As I mentioned in my earlier post, passing
'\_sometext\ _' via a form causes the string to become '\\_sometext\\_ '
with all 4 backslashes being counted as separate literals--which
requires 8 backslashes in the filter! Quite unexpected, also
annoying.

Anyhow, I've tried 4 and I've tried 8 and neither works, which is why
I wonder whether it's broken.


I'd been trying this syntax:

preg_replace( '/_(?<!\\\\)test_ (?<!\\\\)/', ....)

based on my understanding of the example in Lerdorf & Tatroe. My
mental model was that it would find the underscore, step the counter
and find the lookbehind, then, to satisfy the lookbehind, rewind the
counter and see whether it could find a backslash. If it does, then
the test is false and it continues searching.

But that didn't work and nothing I could do would make it work.

Then I tried switching the order:

preg_replace( '/ (?<!\\\\)_test{ ?<!\\\\)/', ....)

and that DOES seem to work, though it doesn't seem to make sense in
terms of how the construct is described (lookbehind, etc) and I don't
know whether it's working by accident or design.

So my model for this would be that the so-called 'lookbehind' is
actually buffering the n chars (in my case 1) before the counter so
that, when the interpreter finds something under the counter (in my
case the _ ), the 'lookbehind' buffer already has the extra
information needed to finish the evaluation--the counter is never
rewound.

Does anyone else have any experience that would confirm or refute
this? I really hate to rely on something that might only be working
by accident.

Margaret
--
(To mail me, please change .not.invalid to .net, first.
Apologies for the inconvenience.)
Jul 17 '05 #3
Pedro Graca wrote:
Margaret MacDonald wrote:
I'm trying to write a filter that will ignore text of the form
'\_foo\_' while filtering text of the form '_foo_'. In other words,
a backslash is meant to protect against the operation of this
particular filter.

...
Anyhow, I've tried 4 and I've tried 8 and neither works, which is why
I wonder whether it's broken.


This

========
<?php
$input = 'text _text_ text \_text\_ text';

$rx = '@(?<!\\\\)_(.* )_@U'; // ungreedy matching
// ^^^^^^^^^ -- look-behind assertion

$output = preg_replace($r x, '_filtered_', $input);
echo $input, ' ==> ', $output, "\n";
?>
--------

works for me. The output is:

========
text _text_ text \_text\_ text ==> text _filtered_ text \_text\_ text
--------


ha! Thanks, Pedro. That seems to confirm my suspicions.

Margaret
--
(To mail me, please change .not.invalid to .net, first.
Apologies for the inconvenience.)
Jul 17 '05 #4
Margaret MacDonald wrote:
That seems to confirm my suspicions.


No need to be suspicious about it :)

It's all documented on the manual:
http://www.php.net/manual/en/pcre.pa...nce.assertions
--
USENET would be a better place if everybody read: | to email me: use |
http://www.catb.org/~esr/faqs/smart-questions.html | my name in "To:" |
http://www.netmeister.org/news/learn2quote2.html | header, textonly |
http://www.expita.com/nomime.html | no attachments. |
Jul 17 '05 #5

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
6152
by: Thomas F. O'Connell | last post by:
I've been looking through the negative lookbehind posts and haven't yet found a definitive answer to the question I'm about to ask: Does negative lookbehind have lower precedence than even a non-greedy wildcard (*) in the Perl regular expression engine? The reason I ask is the following scenario: I am try to match a pattern B as long as it is not preceded by a pattern A in a given string, regardless of what occurs between B and
1
2254
by: mail | last post by:
Hello, I am trying to use regular expressions to scan a subdirectory structure and run sfv and parity file checks on the directory. However, I am having an issue with my current code using regular expressions to find par2 files. Multiple par2 files are created for an archive in the following format: test.par2 test.vol000+01.par2 test.vol000+03.par2
2
2378
by: brendan | last post by:
Hi ... want to antispam any email address that is written in a page, so long as it is not already contained in a hyperlink ie 'mailto:user@email.com' so am trying to use a negative look behind but I can't get it to work preg_replace('/(?<!mailto)((+)@(+))/i', "<script>document.write('<a href=\'mailto:'+'\\1'+'@'+'\\2'+'\'>'+'\\1'+'@'+'\\2'+'</a>');</script>",$st ring );
4
1997
by: system7designs | last post by:
I don't know preg's that well, can anyone tell me how to write a regular expression that will select everything BUT files/folders that begin with ._ or __?(that's period-underscore and underscore underscore)
14
2024
by: frizzle | last post by:
Hi group, I have a function which validates a string using preg match. A part looks like if( !preg_match( '/^(+((*)?)?)$/', $string ) || preg_match( '/(--|__)+/' ,$string) ) { i wonder how i could combine those two into one ...
4
4404
by: DSmith1974 | last post by:
Are lookarounds supported in the boost regex lib? In my VS6 project using boost 1.32.0 I can declare a regex as.. <code_snippet> std::wstring wstrFilename = L"01_BAR08"; boost::wregex regxCarFile( L"(?=BAR)BAR{2}" ); bRet = boost::regex_search( wstrFilename, m, regxCarFile, boost::match_default ); if( true == bRet )
3
3318
by: Jim | last post by:
Hi, I'm trying to prefix the "src" attribute of all "img" elements with a given string, $prefix. Here's what I've got: preg_replace('/\<img(.+?)src="(?<!=http)(.+?)"(.+?)\/>/', '<img $1src="' . $prefix . '$2"$3/>', $content); The problem comes in that it always performs the replace, even when there's an "http" in the source attribute.
2
1517
by: Bart Kastermans | last post by:
I have a file in which I am searching for the letter "i" (actually a bit more general than that, arbitrary regular expressions could occur) as long as it does not occur inside an expression that matches \\.+?\b (something started by a backslash and including the word that follows). More concrete example, I have the string "\sin(i)" and I want to match the argument, but not the i in \sin. Can this be achieved by combining the regular...
0
9699
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
9562
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
1
10285
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
10063
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
9114
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
0
5494
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
1
4270
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
2
3792
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
3
2966
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.