473,729 Members | 2,235 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

using re module to find " but not " alone ... is this a BUG in re?

Hi,

I want to replace all occourences of " by \" in a string.

But I want to leave all occourences of \" as they are.

The following should happen:

this I want " while I dont want this \"

should be transformed to:

this I want \" while I dont want this \"

and NOT:

this I want \" while I dont want this \\"

I tried even the (?<=...) construction but here I get an unbalanced paranthesis
error.

It seems tha re is not able to do the job due to parsing/compiling problems
for this sort of strings.
Have you any idea??

Anton
Example: --------------------

import re

re.findall("[^\\]\"","this I want \" while I dont want this \\\" ")

Traceback (most recent call last):
File "<interacti ve input>", line 1, in <module>
File "C:\Python25\li b\re.py", line 175, in findall
return _compile(patter n, flags).findall( string)
File "C:\Python25\li b\re.py", line 241, in _compile
raise error, v # invalid expression
error: unexpected end of regular expression

Jun 27 '08 #1
6 1397
On Jun 12, 7:11 pm, anton <anto...@gmx.de wrote:
Hi,

I want to replace all occourences of " by \" in a string.

But I want to leave all occourences of \" as they are.

The following should happen:

this I want " while I dont want this \"

should be transformed to:

this I want \" while I dont want this \"

and NOT:

this I want \" while I dont want this \\"

I tried even the (?<=...) construction but here I get an unbalanced paranthesis
error.
Sounds like a deficit of backslashes causing re to regard \) as plain
text and not the magic closing parenthesis in (?<=...) -- and don't
you want (?<!...) ?
>
It seems tha re is not able to do the job due to parsing/compiling problems
for this sort of strings.
Nothing is ever as it seems.
>
Have you any idea??
For a start, *ALWAYS* use a raw string for an re pattern -- halves the
backslash pollution!

>

re.findall("[^\\]\"","this I want \" while I dont want this \\\" ")
and if you have " in the pattern, use '...' to enclose the pattern so
that you don't have to use \"
>
Traceback (most recent call last):
File "<interacti ve input>", line 1, in <module>
File "C:\Python25\li b\re.py", line 175, in findall
return _compile(patter n, flags).findall( string)
File "C:\Python25\li b\re.py", line 241, in _compile
raise error, v # invalid expression
error: unexpected end of regular expression
As expected.

What you want is:
>import re
text = r'frob this " avoid this \", OK?'
>>text
'frob this " avoid this \\", OK?'
>re.sub(r'(?<!\ \)"', r'\"', text)
frob this \\" avoid this \\", OK?'
>>
HTH,
John
Jun 27 '08 #2
John Machin <sj******@lexic on.netwrote:
What you want is:
>>import re
text = r'frob this " avoid this \", OK?'
text
'frob this " avoid this \\", OK?'
>>re.sub(r'(?<! \\)"', r'\"', text)
frob this \\" avoid this \\", OK?'
>>>
Or you can do it without using regular expressions at all. Just replace
them all and then fix up the result:
>>text = r'frob this " avoid this \", OK?'
text.replace( '"', r'\"').replace( r'\\"', r'\"')
'frob this \\" avoid this \\", OK?'
--
Duncan Booth http://kupuguy.blogspot.com
Jun 27 '08 #3
anton wrote:
I want to replace all occourences of " by \" in a string.

But I want to leave all occourences of \" as they are.

The following should happen:

this I want " while I dont want this \"

should be transformed to:

this I want \" while I dont want this \"

and NOT:

this I want \" while I dont want this \\"

I tried even the (?<=...) construction but here I get an unbalanced
paranthesis error.

It seems tha re is not able to do the job due to parsing/compiling
problems for this sort of strings.
Have you any idea??
The problem is underspecified. Should r'\\"' become r'\\\"' or remain
unchanged? If the backslash is supposed to escape the following letter
including another backslash -- that can't be done with regular expressions
alone:

# John's proposal:
>>print re.sub(r'(?<!\\ )"', r'\"', 'no " one \\", two \\\\"')
no \" one \", two \\"
One possible fix:
>>parts = re.compile("(\\ \\.)").split('n o " one \\", two \\\\"')
parts[::2] = [p.replace('"', '\\"') for p in parts[::2]]
print "".join(par ts)
no \" one \", two \\\"

Peter

Jun 27 '08 #4
John Machin <sjmachin <atlexicon.netw rites:
>
On Jun 12, 7:11 pm, anton <anto...@gmx.de wrote:
Hi,

I want to replace all occourences of " by \" in a string.

But I want to leave all occourences of \" as they are.

The following should happen:

this I want " while I dont want this \"
.... cut text off
What you want is:
import re
text = r'frob this " avoid this \", OK?'
text
'frob this " avoid this \\", OK?'
re.sub(r'(?<!\\ )"', r'\"', text)
frob this \\" avoid this \\", OK?'
>

HTH,
John
--
http://mail.python.org/mailman/listinfo/python-list


First.. thanks John.

The whole problem is discussed in

http://docs.python.org/dev/howto/reg...ckslash-plague

in the section "The Backslash Plague"

Unfortunately this is *NOT* mentioned in the standard
python documentation of the re module.

Another thing which will always remain strange to me, is that
even if in the python doc of raw string:

http://docs.python.org/ref/strings.html

its written:
"Specifical ly, a raw string cannot end in a single backslash"

s=r"\\" # works fine
s=r"\" # works not (as stated)

But both ENDS IN A SINGLE BACKSLASH !

The main thing which is hard to understand is:

If a raw string is a string which ignores backslashes,
then it should ignore them in all circumstances,

or where could be the problem here (python parser somewhere??).

Bye

Anton
Jun 27 '08 #5
On Jun 13, 6:23 pm, anton <anto...@gmx.de wrote:
John Machin <sjmachin <atlexicon.netw rites:
On Jun 12, 7:11 pm, anton <anto...@gmx.de wrote:
Hi,
I want to replace all occourences of " by \" in a string.
But I want to leave all occourences of \" as they are.
The following should happen:
this I want " while I dont want this \"

... cut text off
What you want is:
>import re
>text = r'frob this " avoid this \", OK?'
>>text
'frob this " avoid this \\", OK?'
>re.sub(r'(?<!\ \)"', r'\"', text)
frob this \\" avoid this \\", OK?'
HTH,
John
--
http://mail.python.org/mailman/listinfo/python-list

First.. thanks John.

The whole problem is discussed in

http://docs.python.org/dev/howto/reg...ckslash-plague

in the section "The Backslash Plague"

Unfortunately this is *NOT* mentioned in the standard
python documentation of the re module.
Yes, and there's more to driving a car in heavy traffic than you will
find in the manufacturer's manual.
>
Another thing which will always remain strange to me, is that
even if in the python doc of raw string:

http://docs.python.org/ref/strings.html

its written:
"Specifical ly, a raw string cannot end in a single backslash"

s=r"\\" # works fine
s=r"\" # works not (as stated)

But both ENDS IN A SINGLE BACKSLASH !
Apply the interpretation that the first case ends in a double
backslash, and move on.
>
The main thing which is hard to understand is:

If a raw string is a string which ignores backslashes,
then it should ignore them in all circumstances,
Nobody defines a raw string to be a "string that ignores backslashes",
so your premise is invalid.
or where could be the problem here (python parser somewhere??).
Why r"\" is not a valid string token has been done to death IIRC at
least twice in this newsgroup ...

Cheers,
John
Jun 27 '08 #6
On Jun 12, 4:11*am, anton <anto...@gmx.de wrote:
Hi,

I want to replace all occourences of " by \" in a string.

But I want to leave all occourences of \" as they are.

The following should happen:

* this I want " while I dont want this \"

should be transformed to:

* this I want \" while I dont want this \"

and NOT:

* this I want \" while I dont want this \\"
A pyparsing version is not as terse as an re, and certainly not as
fast, but it is easy enough to read. Here is my first brute-force
approach to your problem:

from pyparsing import Literal, replaceWith

escQuote = Literal(r'\"')
unescQuote = Literal(r'"')
unescQuote.setP arseAction(repl aceWith(r'\"'))

test1 = r'this I want " while I dont want this \"'
test2 = r'frob this " avoid this \", OK?'

for test in (test1, test2):
print (escQuote | unescQuote).tra nsformString(te st)

And it prints out the desired:

this I want \" while I dont want this \"
frob this \" avoid this \", OK?

This works by defining both of the patterns escQuote and unescQuote,
and only defines a transforming parse action for the unescQuote. By
listing escQuote first in the list of patterns to match, properly
escaped quotes are skipped over.

Then I looked at your problem slightly differently - why not find both
'\"' and '"', and replace either one with '\"'. In some cases, I'm
"replacing" '\"' with '\"', but so what? Here is the simplfied
transformer:

from pyparsing import Optional, replaceWith

quotes = Optional(r'\\') + '"'
quotes.setParse Action(replaceW ith(r'\"'))
for test in (test1, test2):
print quotes.transfor mString(test)
Again, this prints out the desired output.

Now let's retrofit this altered logic back onto John Machin's
solution:

import re
for test in (test1, test2):
print re.sub(r'\\?"', r'\"', test)
Pretty short and sweet, and pretty readable for an re.

To address Peter Otten's question about what to do with an escaped
backslash, I can't compose this with an re, but I can by adjusting the
first pyparsing version to include an escaped backslash as a "match
but don't do anything with it" expression, just like we did with
escQuote:

from pyparsing import Optional, Literal, replaceWith

escQuote = Literal(r'\"')
unescQuote = Literal(r'"')
unescQuote.setP arseAction(repl aceWith(r'\"'))
backslash = chr(92)
escBackslash = Literal(backsla sh+backslash)

test3 = r'no " one \", two \\"'
for test in (test1, test2, test3):
print (escBackslash | escQuote |
unescQuote).tra nsformString(te st)

Prints:
this I want \" while I dont want this \"
frob this \" avoid this \", OK?
no \" one \", two \\\"

At first I thought the last transform was an error, but on closer
inspection, I see that the input line ends with an escaped backslash,
followed by a lone '"', which must be replaced with '\"'. So in the
transformed version we see '\\\"', the original escaped backslash,
followed by the replacement '\"' string.

Cheers,
-- Paul
Jun 27 '08 #7

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

2
17711
by: x-herbert | last post by:
Hi, I have a small test to "compile" al litle script as a WMI-Tester. The script include a wmi-wrapper and "insert" the Win32-modeles. here the code: my "WMI-Tester.py" ----- import wmi
3
16514
by: David T. Ashley | last post by:
Hi, Red Hat Enterprise Linux 4.X. I'm writing command-line PHP scripts for the first time. I get the messages below. What do they mean? Are these operating system library modules, or something in PHP that I don't have? Do I need to install more Linux packages? Or adjust PHP in some way?
5
1886
by: dananrg | last post by:
I was messing around with the native ODBC module (I am using Python in a Win32 environment), e.g: import dbi, odbc ....and it seems to meet my needs. I'd rather use a module that comes natively with Python if it works (don't care about performance in this particular use case; just that it works). The only issue I've had so far is retrieving data from Oracle when an
9
3782
by: Fei Liu | last post by:
In Accellerated C++, the author recommends that in a header file one should not declare using std::string, using std::vector etc instead one should directly specify the namespace specifier in code. for example, this is bad practice: header.h #include <string> using std::string;
0
8913
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
8761
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
9426
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
9280
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
9200
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
9142
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
1
6722
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
6016
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
2
2677
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.