473,399 Members | 3,656 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,399 software developers and data experts.

Negative look-behind

Hello,

I am a newbie to python and need some help.

I am looking at doing some batch search/replace for some of my source
code. Criteria is to find all literal strings and wrap them up with
some macro, say MC. For ex., var = "somestring" would become var =
MC("somestring"). Literal strings can contain escaped " & \.

But there are 2 cases when this replace should not happen:
1.literal strings which have already been wrapped, like
MC("somestring")
2.directives like #include "header.h" and #extern "C".

I tried to use negative look-behind assertion for this purpose. The
expression I use for matching a literal string is
"((\\")|[^"(\\")])+". This works fine. But as I start prepending
look-behind patterns, things go wrong. The question I have is whether
the pattern in negative look-behind part can contain alternation ? In
other words can I make up a regexp which says "match this pattern x
only if it not preceded by anyone of pattern a, pattern b and pattern
c" ?

I tried the following expression to take into account the two
constraints mentioned above, (?<![(#include )(#extern
)(MC\()])"((\\")|[^"(\\")])+". Can someone point out the mistakes in
this ?

Thanks,
Bhargava
Jul 18 '05 #1
4 2050
Bhargava wrote:
Hello,

I am a newbie to python and need some help.

I am looking at doing some batch search/replace for some of my source
code. Criteria is to find all literal strings and wrap them up with
some macro, say MC. For ex., var = "somestring" would become var =
MC("somestring"). Literal strings can contain escaped " & \.

But there are 2 cases when this replace should not happen:
1.literal strings which have already been wrapped, like
MC("somestring")
2.directives like #include "header.h" and #extern "C".

I tried to use negative look-behind assertion for this purpose. The
expression I use for matching a literal string is
"((\\")|[^"(\\")])+". This works fine. But as I start prepending
look-behind patterns, things go wrong. The question I have is whether
the pattern in negative look-behind part can contain alternation ? In
other words can I make up a regexp which says "match this pattern x
only if it not preceded by anyone of pattern a, pattern b and pattern
c" ?

I tried the following expression to take into account the two
constraints mentioned above, (?<![(#include )(#extern
)(MC\()])"((\\")|[^"(\\")])+". Can someone point out the mistakes in
this ?

Thanks,
Bhargava


Hi.

It would have been nice if you simplified your example. Since you said
that your base pattern matched properly (for example) you could have let
that be a literal. But no matter.

I think that your problem is that you're trying to use grouping in a
character class (set). [(1 )(2 )] matches '1', ' ', '(', ')'. My proof:
re.sub('[(1 )(2 )]','a','1 2 ( ) ')

'aaaaaaaa'

So you should just need to ditch the '[' and ']'.

I think what you meant by the set was question marks, ie:
(#include )?(#extern )?(MC\()?
So at least one occurs, though all may.

This is not a Python specific question, this is just plain Reg ex's. You
may wish to consult a good reference site such as
http://www.regular-expressions.info/
or the O'Reilly book http://www.oreilly.com/catalog/regex/ in the future.

Josh Gilbert.
Jul 18 '05 #2
"Bhargava" <bh********@yahoo.com> wrote in message
news:e7**************************@posting.google.c om...
Hello,

I am a newbie to python and need some help.

I am looking at doing some batch search/replace for some of my source
code. Criteria is to find all literal strings and wrap them up with
some macro, say MC. For ex., var = "somestring" would become var =
MC("somestring"). Literal strings can contain escaped " & \.

But there are 2 cases when this replace should not happen:
1.literal strings which have already been wrapped, like
MC("somestring")
2.directives like #include "header.h" and #extern "C".

I tried to use negative look-behind assertion for this purpose. The
expression I use for matching a literal string is
"((\\")|[^"(\\")])+". This works fine. But as I start prepending
look-behind patterns, things go wrong. The question I have is whether
the pattern in negative look-behind part can contain alternation ? In
other words can I make up a regexp which says "match this pattern x
only if it not preceded by anyone of pattern a, pattern b and pattern
c" ?

I tried the following expression to take into account the two
constraints mentioned above, (?<![(#include )(#extern
)(MC\()])"((\\")|[^"(\\")])+". Can someone point out the mistakes in
this ?

Thanks,
Bhargava


Please check out the latest beta release of pyparsing, at
http://pyparsing.sourceforge.net . Your post inspired me to add the
transformString() method to pyparsing; look at the included scanExamples.py
program for some search-and-replace examples similar to the ones you give in
your post.

Sincerely,
-- Paul McGuire
Jul 18 '05 #3
"Paul McGuire" <pt***@austin.rr._bogus_.com> wrote in message news:<Zw*******************@fe2.texas.rr.com>...
"Bhargava" <bh********@yahoo.com> wrote in message
news:e7**************************@posting.google.c om...
Hello,

I am a newbie to python and need some help.

I am looking at doing some batch search/replace for some of my source
code. Criteria is to find all literal strings and wrap them up with
some macro, say MC. For ex., var = "somestring" would become var =
MC("somestring"). Literal strings can contain escaped " & \.

But there are 2 cases when this replace should not happen:
1.literal strings which have already been wrapped, like
MC("somestring")
2.directives like #include "header.h" and #extern "C".

I tried to use negative look-behind assertion for this purpose. The
expression I use for matching a literal string is
"((\\")|[^"(\\")])+". This works fine. But as I start prepending
look-behind patterns, things go wrong. The question I have is whether
the pattern in negative look-behind part can contain alternation ? In
other words can I make up a regexp which says "match this pattern x
only if it not preceded by anyone of pattern a, pattern b and pattern
c" ?

I tried the following expression to take into account the two
constraints mentioned above, (?<![(#include )(#extern
)(MC\()])"((\\")|[^"(\\")])+". Can someone point out the mistakes in
this ?

Thanks,
Bhargava


Please check out the latest beta release of pyparsing, at
http://pyparsing.sourceforge.net . Your post inspired me to add the
transformString() method to pyparsing; look at the included scanExamples.py
program for some search-and-replace examples similar to the ones you give in
your post.

Sincerely,
-- Paul McGuire

Hi,

I downloaded version 1.2beta3 from sourceforge, but could not find the
scanExamples.py program. I will go thro' the documentation/examples
provided and try.

Thanks,
Bhargava
Jul 18 '05 #4
"Bhargava" <bh********@yahoo.com> wrote in message
news:e7**************************@posting.google.c om...
Hi,

I downloaded version 1.2beta3 from sourceforge, but could not find the
scanExamples.py program. I will go thro' the documentation/examples
provided and try.

Thanks,
Bhargava


Well, I think I messed up the 'setup.py sdist' step. Here is
scanExamples.py - it works through some simple scan/transform passes on some
hokey sample C code.

-- Paul

-------------------------------------------
#
# scanExamples.py
#
# Illustration of using pyparsing's scanString and transformString methods
#
# Copyright (c) 2004, Paul McGuire
#
from pyparsing import Word, alphas, alphanums, Literal, restOfLine,
OneOrMore, Empty

# simulate some C++ code
testData = """
#define MAX_LOCS=100
#define USERNAME = "floyd"
#define PASSWORD = "swordfish"

a = MAX_LOCS;

A::assignA( a );
A2::A1::printA( a );

CORBA::initORB("xyzzy", USERNAME, PASSWORD );

"""

#################
print "Example of an extractor"
print "----------------------"

# simple grammar to match #define's
ident = Word(alphas, alphanums+"_")
macroDef = Literal("#define") + ident.setResultsName("name") + "=" +
restOfLine.setResultsName("value")
for t,s,e in macroDef.scanString( testData ):
print t.name,":", t.value

# or a quick way to make a dictionary of the names and values (need to
suppress output of all tokens, other than the name and the value)
macroDef = Literal("#define").suppress() + ident + Literal("=").suppress() +
Empty() + restOfLine
macros = dict([t for t,s,e in macroDef.scanString(testData)])
print "macros =", macros
print
#################
print "Examples of a transformer"
print "----------------------"

# convert C++ namespaces to mangled C-compatible names
scopedIdent = ident + OneOrMore( Literal("::").suppress() + ident )
scopedIdent.setParseAction(lambda s,l,t: "_".join(t))

print "(replace namespace-scoped names with C-compatible names)"
print scopedIdent.transformString( testData )
# or a crude pre-processor (use parse actions to replace matching text)
def substituteMacro(s,l,t):
if t[0] in macros:
return macros[t[0]]
ident.setParseAction( substituteMacro )
ident.ignore(macroDef)

print "(simulate #define pre-processor)"
print ident.transformString( testData )

#################
print "Example of a stripper"
print "----------------------"

from pyparsing import dblQuotedString, LineStart

# remove all string macro definitions (after extracting to a string resource
table?)
ident.setParseAction( None )
stringMacroDef = Literal("#define") + ident + "=" + dblQuotedString +
LineStart()
stringMacroDef.setParseAction( lambda s,l,t: [] )

print stringMacroDef.transformString( testData )
Jul 18 '05 #5

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

7
by: pj | last post by:
Why does M$ Query Analyzer display all numbers as positive, no matter whether they are truly positive or negative ? I am having to cast each column to varchar to find out if there are any...
2
by: luke | last post by:
Could anyone, please, explain to me why I have negative values in RowModCtr column in sysobjects table? I have tested that after I update statistics the RowModCtr column is reset to 0. But why do I...
13
by: Ron | last post by:
Hi all I'm deciding whether to use the PK also as an account number, invoice number, transaction number, etc that the user will see for the respective files. I understand that sometimes a...
5
by: Subrahmanyam Arya | last post by:
Hi Folks , I am trying to solve the problem of reading the numbers correctly from a serial line into an intel pentium processor machine . I am reading 1 byte and 2byte data both positive...
39
by: Frederick Gotham | last post by:
I have a general idea about how negative number systems work, but I'd appreciate some clarification if anyone would be willing to help me. Let's assume we're working with an 8-Bit signed integer,...
2
by: writebrent | last post by:
I think I need to do a negative lookahead with a regular expression, but I'm a bit confused how to make it all work. Take these example texts: Need to match these two: =========================...
7
by: intrader | last post by:
The regular expression is /(?!((00000)|(11111)))/ in oRe. That is oRE=/(?!((00000)|(11111)))/ The test strings are 92708, 00000, 11111 in checkStr The expression used is checkStr.search(oRE). The...
20
by: Casey | last post by:
Is there an easy way to use getopt and still allow negative numbers as args? I can easily write a workaround (pre-process the tail end of the arguments, stripping off any non-options including...
5
by: vbgunz | last post by:
/* * BEGIN EXAMPLES */ var text = 'A Cats Catalog of Cat Catastrophes and Calamities'; /*** * EXAMPLE 1: negative lookahead assertion logic ***/
7
by: klays | last post by:
Hi all, I have created char array, when I use tab compiler shouts out: error: size of array 'tab' is negative I tried to cheat compiler and a little modify this code. Look at the code: ...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.