473,671 Members | 2,473 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

preg's 'negative lookbehind' -- broken?

I'm trying to write a filter that will ignore text of the form
'\_foo\_' while filtering text of the form '_foo_'. In other words,
a backslash is meant to protect against the operation of this
particular filter.

To detect the leading backslash once I find the underscore, I've been
trying to use a 'negative lookbehind' in preg_replace. It seems a
perfect use for it: if we find an under but it was preceded by a
backslash, don't filter.

I can't use a positive filter because I have another filter that
strips the backslashes, leaving only the underscores. So the order in
which they run is important.

Unfortunately, I can't seem to get the lookbehind to work...the filter
runs whether or not there are backslashes. So either the lookbehind
is broken or I'm having another problem finding the right number of
backslashes. As I mentioned in my earlier post, passing
'\_sometext\_' via a form causes the string to become '\\_sometext\\_ '
with all 4 backslashes being counted as separate literals--which
requires 8 backslashes in the filter! Quite unexpected, also
annoying.

Anyhow, I've tried 4 and I've tried 8 and neither works, which is why
I wonder whether it's broken.

Any insights?

thanks in advance!
Margaret
--
(To mail me, please change .not.invalid to .net, first.
Apologies for the inconvenience.)
Jul 17 '05 #1
4 3239
Margaret MacDonald wrote:
I'm trying to write a filter that will ignore text of the form
'\_foo\_' while filtering text of the form '_foo_'. In other words,
a backslash is meant to protect against the operation of this
particular filter. .... Anyhow, I've tried 4 and I've tried 8 and neither works, which is why
I wonder whether it's broken.


This

========
<?php
$input = 'text _text_ text \_text\_ text';

$rx = '@(?<!\\\\)_(.* )_@U'; // ungreedy matching
// ^^^^^^^^^ -- look-behind assertion

$output = preg_replace($r x, '_filtered_', $input);
echo $input, ' ==> ', $output, "\n";
?>
--------

works for me. The output is:

========
text _text_ text \_text\_ text ==> text _filtered_ text \_text\_ text
--------

--
USENET would be a better place if everybody read: | to email me: use |
http://www.catb.org/~esr/faqs/smart-questions.html | my name in "To:" |
http://www.netmeister.org/news/learn2quote2.html | header, textonly |
http://www.expita.com/nomime.html | no attachments. |
Jul 17 '05 #2
I wrote:
I'm trying to write a filter that will ignore text of the form
'\_foo\_' while filtering text of the form '_foo_'. In other words,
a backslash is meant to protect against the operation of this
particular filter.

To detect the leading backslash once I find the underscore, I've been
trying to use a 'negative lookbehind' in preg_replace. It seems a
perfect use for it: if we find an under but it was preceded by a
backslash, don't filter.

I can't use a positive filter because I have another filter that
strips the backslashes, leaving only the underscores. So the order in
which they run is important.

Unfortunatel y, I can't seem to get the lookbehind to work...the filter
runs whether or not there are backslashes. So either the lookbehind
is broken or I'm having another problem finding the right number of
backslashes. As I mentioned in my earlier post, passing
'\_sometext\ _' via a form causes the string to become '\\_sometext\\_ '
with all 4 backslashes being counted as separate literals--which
requires 8 backslashes in the filter! Quite unexpected, also
annoying.

Anyhow, I've tried 4 and I've tried 8 and neither works, which is why
I wonder whether it's broken.


I'd been trying this syntax:

preg_replace( '/_(?<!\\\\)test_ (?<!\\\\)/', ....)

based on my understanding of the example in Lerdorf & Tatroe. My
mental model was that it would find the underscore, step the counter
and find the lookbehind, then, to satisfy the lookbehind, rewind the
counter and see whether it could find a backslash. If it does, then
the test is false and it continues searching.

But that didn't work and nothing I could do would make it work.

Then I tried switching the order:

preg_replace( '/ (?<!\\\\)_test{ ?<!\\\\)/', ....)

and that DOES seem to work, though it doesn't seem to make sense in
terms of how the construct is described (lookbehind, etc) and I don't
know whether it's working by accident or design.

So my model for this would be that the so-called 'lookbehind' is
actually buffering the n chars (in my case 1) before the counter so
that, when the interpreter finds something under the counter (in my
case the _ ), the 'lookbehind' buffer already has the extra
information needed to finish the evaluation--the counter is never
rewound.

Does anyone else have any experience that would confirm or refute
this? I really hate to rely on something that might only be working
by accident.

Margaret
--
(To mail me, please change .not.invalid to .net, first.
Apologies for the inconvenience.)
Jul 17 '05 #3
Pedro Graca wrote:
Margaret MacDonald wrote:
I'm trying to write a filter that will ignore text of the form
'\_foo\_' while filtering text of the form '_foo_'. In other words,
a backslash is meant to protect against the operation of this
particular filter.

...
Anyhow, I've tried 4 and I've tried 8 and neither works, which is why
I wonder whether it's broken.


This

========
<?php
$input = 'text _text_ text \_text\_ text';

$rx = '@(?<!\\\\)_(.* )_@U'; // ungreedy matching
// ^^^^^^^^^ -- look-behind assertion

$output = preg_replace($r x, '_filtered_', $input);
echo $input, ' ==> ', $output, "\n";
?>
--------

works for me. The output is:

========
text _text_ text \_text\_ text ==> text _filtered_ text \_text\_ text
--------


ha! Thanks, Pedro. That seems to confirm my suspicions.

Margaret
--
(To mail me, please change .not.invalid to .net, first.
Apologies for the inconvenience.)
Jul 17 '05 #4
Margaret MacDonald wrote:
That seems to confirm my suspicions.


No need to be suspicious about it :)

It's all documented on the manual:
http://www.php.net/manual/en/pcre.pa...nce.assertions
--
USENET would be a better place if everybody read: | to email me: use |
http://www.catb.org/~esr/faqs/smart-questions.html | my name in "To:" |
http://www.netmeister.org/news/learn2quote2.html | header, textonly |
http://www.expita.com/nomime.html | no attachments. |
Jul 17 '05 #5

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
6140
by: Thomas F. O'Connell | last post by:
I've been looking through the negative lookbehind posts and haven't yet found a definitive answer to the question I'm about to ask: Does negative lookbehind have lower precedence than even a non-greedy wildcard (*) in the Perl regular expression engine? The reason I ask is the following scenario: I am try to match a pattern B as long as it is not preceded by a pattern A in a given string, regardless of what occurs between B and
1
2241
by: mail | last post by:
Hello, I am trying to use regular expressions to scan a subdirectory structure and run sfv and parity file checks on the directory. However, I am having an issue with my current code using regular expressions to find par2 files. Multiple par2 files are created for an archive in the following format: test.par2 test.vol000+01.par2 test.vol000+03.par2
2
2370
by: brendan | last post by:
Hi ... want to antispam any email address that is written in a page, so long as it is not already contained in a hyperlink ie 'mailto:user@email.com' so am trying to use a negative look behind but I can't get it to work preg_replace('/(?<!mailto)((+)@(+))/i', "<script>document.write('<a href=\'mailto:'+'\\1'+'@'+'\\2'+'\'>'+'\\1'+'@'+'\\2'+'</a>');</script>",$st ring );
4
1989
by: system7designs | last post by:
I don't know preg's that well, can anyone tell me how to write a regular expression that will select everything BUT files/folders that begin with ._ or __?(that's period-underscore and underscore underscore)
14
2014
by: frizzle | last post by:
Hi group, I have a function which validates a string using preg match. A part looks like if( !preg_match( '/^(+((*)?)?)$/', $string ) || preg_match( '/(--|__)+/' ,$string) ) { i wonder how i could combine those two into one ...
4
4400
by: DSmith1974 | last post by:
Are lookarounds supported in the boost regex lib? In my VS6 project using boost 1.32.0 I can declare a regex as.. <code_snippet> std::wstring wstrFilename = L"01_BAR08"; boost::wregex regxCarFile( L"(?=BAR)BAR{2}" ); bRet = boost::regex_search( wstrFilename, m, regxCarFile, boost::match_default ); if( true == bRet )
3
3296
by: Jim | last post by:
Hi, I'm trying to prefix the "src" attribute of all "img" elements with a given string, $prefix. Here's what I've got: preg_replace('/\<img(.+?)src="(?<!=http)(.+?)"(.+?)\/>/', '<img $1src="' . $prefix . '$2"$3/>', $content); The problem comes in that it always performs the replace, even when there's an "http" in the source attribute.
2
1511
by: Bart Kastermans | last post by:
I have a file in which I am searching for the letter "i" (actually a bit more general than that, arbitrary regular expressions could occur) as long as it does not occur inside an expression that matches \\.+?\b (something started by a backslash and including the word that follows). More concrete example, I have the string "\sin(i)" and I want to match the argument, but not the i in \sin. Can this be achieved by combining the regular...
0
8401
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
8824
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
8603
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
7444
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own.... Now, this would greatly impact the work of software developers. The idea...
1
6236
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
4227
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
0
4416
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
2
2060
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
2
1815
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.