473,411 Members | 2,085 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,411 software developers and data experts.

'\\' in regex affects the following parenthesis?

Could someone tell me why:
>>import re
p = re.compile('\\.*\\(.*)')
Fails with message:

Traceback (most recent call last):
File "<pyshell#12>", line 1, in <module>
re.compile('\\dir\\(file)')
File "C:\Python25\lib\re.py", line 180, in compile
return _compile(pattern, flags)
File "C:\Python25\lib\re.py", line 233, in _compile
raise error, v # invalid expression
error: unbalanced parenthesis

I thought '\\' should just be interpreted as a single '\' and not
affect anything afterwards...

The script 'redemo.py' shipped with Python by default is just fine
about this regex however.

Apr 22 '07 #1
2 2398
On Apr 21, 6:56 pm, vox...@gmail.com wrote:
Could someone tell me why:
>import re
p = re.compile('\\.*\\(.*)')

Fails with message:

Traceback (most recent call last):
File "<pyshell#12>", line 1, in <module>
re.compile('\\dir\\(file)')
File "C:\Python25\lib\re.py", line 180, in compile
return _compile(pattern, flags)
File "C:\Python25\lib\re.py", line 233, in _compile
raise error, v # invalid expression
error: unbalanced parenthesis

I thought '\\' should just be interpreted as a single '\' and not
affect anything afterwards...

The script 'redemo.py' shipped with Python by default is just fine
about this regex however.
You are getting overlap between the Python string literal \\ escaping
and re's \\ escaping. In a Python string literal '\\' gets collapsed
down to '\', so to get your desired result, you would need to double-
double every '\', as in:

p = re.compile('\\\\.*\\\\(.*)')

Ugly, no? Fortunately, Python has a special form for string literals,
called "raw" which suppresses Python's processing of \'s for escaping
- I think this was done expressly to help simplify entering re
strings. To use raw format for a string literal, just precede the
opening quotation mark with an r. Here is your original string, using
a raw literal:

p = re.compile(r'\\.*\\(.*)')

This will compile ok.

(Sometimes these literals are referred to as "raw strings" - I think
this is confusing because new users think this is a special type of
string type, different from str. This creates the EXACT SAME type of
str; the r just tells the compiler/interpreter to handle the quoted
literal a little differently. So I prefer to call them "raw
literals".)

-- Paul

Apr 22 '07 #2
On Apr 22, 9:56 am, vox...@gmail.com wrote:
Could someone tell me why:
>import re
p = re.compile('\\.*\\(.*)')
Short answer: *ALWAYS* use raw strings for regexes in Python source
files.

Long answer:

'\\.*\\(.*)' is equivalent to
r'\.*\(.*)'

So what re.compile is seeing is:

\. -- a literal dot or period or full stop (not a metacharacter)
* -- meaning 0 or more occurrences of the dot
\( -- a literal left parenthesis
.. -- dot metacharacter meaning any character bar a newline
* -- meaning 0 or more occurences of almost anything
) -- a right parenthesis grouping metacharacter; a bit lonely hence
the exception.

What you probably want is:

\\ -- literal backslash
..* -- any stuff
\\ -- literal backslash
(.*) -- grouped (any stuff)

>
Fails with message:

Traceback (most recent call last):
File "<pyshell#12>", line 1, in <module>
re.compile('\\dir\\(file)')
File "C:\Python25\lib\re.py", line 180, in compile
return _compile(pattern, flags)
File "C:\Python25\lib\re.py", line 233, in _compile
raise error, v # invalid expression
error: unbalanced parenthesis

I thought '\\' should just be interpreted as a single '\' and not
affect anything afterwards...
The second and third paragraphs of the re docs (http://docs.python.org/
lib/module-re.html) cover this:
"""
Regular expressions use the backslash character ("\") to indicate
special forms or to allow special characters to be used without
invoking their special meaning. This collides with Python's usage of
the same character for the same purpose in string literals; for
example, to match a literal backslash, one might have to write '\\\\'
as the pattern string, because the regular expression must be "\\",
and each backslash must be expressed as "\\" inside a regular Python
string literal.

The solution is to use Python's raw string notation for regular
expression patterns; backslashes are not handled in any special way in
a string literal prefixed with "r". So r"\n" is a two-character string
containing "\" and "n", while "\n" is a one-character string
containing a newline. Usually patterns will be expressed in Python
code using this raw string notation.
"""

Recommended reading: http://www.amk.ca/python/howto/regex...00000000000000
>
The script 'redemo.py' shipped with Python by default is just fine
about this regex however.
That's because you are typing the regex into a Tkinter app. Likewise
if you were reading the regex from (say) a config file or were typing
it to a raw_input call. The common factor is that you are not passing
it through an extra level of backslash processing.

HTH,
John

Apr 22 '07 #3

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

14
by: jumpstart | last post by:
I need a RegEx to validate INTERNATIONAL PHONE NUMBRES. Thanks, JS
16
by: Stephane | last post by:
Hi, I'm trying to replace parenthesis using Regex.replace but I'm always having this error: System.ArgumentException: parsing ":-)" - Too many )'s. Parameter name: :-) Here's my code: ...
7
by: alphatan | last post by:
Is there relative source or document for this purpose? I've searched the index of "Mastering Regular Expression", but cannot get the useful information for C. Thanks in advanced. -- Learning...
2
by: Tim Conner | last post by:
Hi, Thanks to Peter, Chris and Steven who answered my previous answer about regex to split a string. Actually, it was as easy as create a regex with the pattern "/*-+()," and most of my string...
1
by: vmoreau | last post by:
I have a text and I need to find a Word that are not enclosed in paranthesis. Can it be done with a regex? Is someone could help me? I am not familar with regex... Example looking for WORD:...
5
by: Bragadiru | last post by:
Hi, I'm using the following Regex to parse for method parameters. It works if there are no spaces between commas. How can I change the regex to support method calls like : MyMethod('uno', ...
15
by: nagar | last post by:
I need to split a string whenever a separator string is present (lets sey #Key(val) where val is a variable) and rejoin it in the proper order after doing some processing. Is there a way to use...
4
by: Flomo Togba Kwele | last post by:
I am having difficulty writing a Regex constructor. A line has a quote(") at its beginning and its end. I need to strip both characters off. If the line looks like "1", I need the result to be 1....
3
by: =?Utf-8?B?UmF5IE1pdGNoZWxs?= | last post by:
I'm trying to learn regex but since I've spent way too much time on the following "simple" case, there's obviously something I'm missing. I need to find all occurrences of a specific...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...
0
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.