473,657 Members | 2,507 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Help needed: cryptic perl regular expression in python syntax

Hi there,

I have perl script that uses dynamically
constructed regular in this way:

------perl code starts ----
$result "";
$key = AAA\?01;
$key = quotemeta $key;
$line = " s^\?AAA\?01^BBB ^g; #Comment "
if ($line =~ /(^\s*)(s|tr)(.) (\\?\??$key\??) \3(.*?)\3(.*)/) {
$result = $5;

# $result should be "BBB"
# \3 gets the same value as returned by (.)
# which is in this example ^. So we are searching
# parameter limited by first two ^-signs
# and returning the one limited byt the second
# and third ^-sign. Note that using \3 in regular
# expression enables other constants used than ^ -sign.

------perl code stops ----

How can I construct equivalent python regural expression ?

I have tested with constant regular expression like this:
line = ' s^\\?AAA\\?01^B BB^g; #Comment '
r1 = "(^\s*)(s|tr)(. )(\\\\\?\\\??AA A\\\\\?01)"
re.compile(r1). findall(line)

[(' ', 's', '^', '\\?AAA\\?01')]

Which is fine, but is there a way to join 3 raw strings
together into another raw strings? like:

r1 = r'''(^\s*)(s|tr )(.)(\\?\??'''
r2 = r'''\\?\??)\3(. *?)\3(.*)'''
p1 = r1 + key + r2 # p1 should remain raw string too

-pekka-
Jul 18 '05 #1
4 2203
>>>>> "pekka" == pekka niiranen <pe************ @wlanmail.com> writes:

pekka> Which is fine, but is there a way to join 3 raw strings
pekka> together into another raw strings? like:

pekka> r1 = r'''(^\s*)(s|tr )(.)(\\?\??'''
pekka> r2 = r'''\\?\??)\3(. *?)\3(.*)'''
pekka> p1 = r1 + key + r2 # p1 should remain raw string too

The term "raw string" only has significance with string literals -
every string object is a "raw string". Backslashes are only
interpreted when converting string literals to in-memory string
objects.

--
Ville Vainio http://tinyurl.com/2prnb
Jul 18 '05 #2
Op 2004-10-19, pekka niiranen schreef <pe************ @wlanmail.com>:
Hi there,

I have perl script that uses dynamically
constructed regular in this way:

------perl code starts ----
$result "";
$key = AAA\?01;
$key = quotemeta $key;
$line = " s^\?AAA\?01^BBB ^g; #Comment "
if ($line =~ /(^\s*)(s|tr)(.) (\\?\??$key\??) \3(.*?)\3(.*)/) {
$result = $5;

# $result should be "BBB"
# \3 gets the same value as returned by (.)
# which is in this example ^. So we are searching
# parameter limited by first two ^-signs
# and returning the one limited byt the second
# and third ^-sign. Note that using \3 in regular
# expression enables other constants used than ^ -sign.

------perl code stops ----

How can I construct equivalent python regural expression ?

I have tested with constant regular expression like this:
line = ' s^\\?AAA\\?01^B BB^g; #Comment '
r1 = "(^\s*)(s|tr)(. )(\\\\\?\\\??AA A\\\\\?01)"
re.compile(r1). findall(line) [(' ', 's', '^', '\\?AAA\\?01')]

Which is fine, but is there a way to join 3 raw strings
together into another raw strings? like:

r1 = r'''(^\s*)(s|tr )(.)(\\?\??'''
r2 = r'''\\?\??)\3(. *?)\3(.*)'''
p1 = r1 + key + r2 # p1 should remain raw string too


If I understand correctly there are no raw strings, just raw string
literals. The re.compile uses just a normal string.

raw string literal just make it easier to form a strings that are
typically used for regular expressions but the strings themselves
are just ordinary strings.
s1="\\b"
s2=r"\b"
s1==s2 1 s1 '\\b' s2 '\\b' print s1 \b print s2 \b


--
Antoon Pardon
Jul 18 '05 #3
Thanks,

I managed to solve my problem with code like this:
line = ' s^\\?AAA\\?01^B BB^g; #Comment '
r1 = '(^\\s*)(s|tr)( .)(\\\\\\?\\\\? ?'
key = "AAA\?01"
r2 = '\\\\??)\\3(.*? )\\3(.*)'
r = r1 + re.escape(key) + r2
re.compile(r).f indall(line) [(' ', 's', '^', '\\?AAA\\?01', 'BBB', 'g; #Comment ')]

but what an ugly piece of code...

I was hoping to do without excess backslashes with re.escape(),
but no avail since group item '\3' gets misquoted (among other things):
r2 = "\??)\3(.*?)\3( .*)/)"
re.escape(r2)
'\\\\\\?\\?\\)\ \\x03\\(\\.\\*\ \?\\)\\\x03\\(\ \.\\*\\)\\/\\)'
-pekka-

Antoon Pardon wrote:
Op 2004-10-19, pekka niiranen schreef <pe************ @wlanmail.com>:
Hi there,

I have perl script that uses dynamically
constructed regular in this way:

------perl code starts ----
$result "";
$key = AAA\?01;
$key = quotemeta $key;
$line = " s^\?AAA\?01^BBB ^g; #Comment "
if ($line =~ /(^\s*)(s|tr)(.) (\\?\??$key\??) \3(.*?)\3(.*)/) {
$result = $5;

# $result should be "BBB"
# \3 gets the same value as returned by (.)
# which is in this example ^. So we are searching
# parameter limited by first two ^-signs
# and returning the one limited byt the second
# and third ^-sign. Note that using \3 in regular
# expression enables other constants used than ^ -sign.

------perl code stops ----

How can I construct equivalent python regural expression ?

I have tested with constant regular expression like this:

>line = ' s^\\?AAA\\?01^B BB^g; #Comment '
>r1 = "(^\s*)(s|tr)(. )(\\\\\?\\\??AA A\\\\\?01)"
>re.compile (r1).findall(li ne)


[(' ', 's', '^', '\\?AAA\\?01')]

Which is fine, but is there a way to join 3 raw strings
together into another raw strings? like:

r1 = r'''(^\s*)(s|tr )(.)(\\?\??'''
r2 = r'''\\?\??)\3(. *?)\3(.*)'''
p1 = r1 + key + r2 # p1 should remain raw string too

If I understand correctly there are no raw strings, just raw string
literals. The re.compile uses just a normal string.

raw string literal just make it easier to form a strings that are
typically used for regular expressions but the strings themselves
are just ordinary strings.

s1="\\b"
s2=r"\b"
s1==s2
1
s1
'\\b'
s2
'\\b'
print s1
\b
print s2


\b

Jul 18 '05 #4
"Steven Bethard" <st************ @gmail.com> wrote in message
news:ma******** *************** *************** @python.org...
Could you do something like:
line = ' s^\\?AAA\\?01^B BB^g; #Comment '
expr = r'(^\s*)(s|tr)( .)(\\\?%s)\3(.* ?)\3(.*)'
matcher = re.compile(expr % re.escape("AAA\ ?01"))
matcher.findall (line)
[(' ', 's', '^', '\\?AAA\\?01', 'BBB', 'g; #Comment ')]

Basically, I still use the r'' string so that I don't have to write so

many backslashes, but then I use a %s to insert the "AAA\?01" into the middle of the expression. Looks at least a little cleaner to me.

Steve


Here's a more verbose version of Steve Bethard's suggestion. By building
up the regexp from individual parts, it is possible to give each part some
semi-meaningful name, or to attach comments to individual pieces. It also
makes it easier to maintain later. What if you had to support an additional
command besides s and tr, like 'rep'? Just change replaceCmd to read
replaceCmd = r'(s|tr|rep)'. What if you needed to support leading tabs
in addition to leading spaces? Change leadingWhite as needed. For
that matter, just giving the finished regexp the name 'replaceCmdExpr '
gives the reader more of a clue as to what the regexp's purpose is,
as the original code did with extra comments.

I find nearly *all* regexp's to be cryptic, and when I need them, I
usually assemble them in some fashion such as this. David Mertz
proposes a similar style in his very good book, "Text Processing
in Python."

(Some quibble with the practice of aligning '=' signs, but I find it to be a
helpful guide to the eye when declaring a set of related strings such as
these, assuming of course that one edits using a fixed space font.)

So why does the key get prepended with the backslashes and
question marks?

-- Paul
(I'll bet you thought I'd post a pyparsing version. :) Well, in a
certain way, I did.)
import re

line = ' s^\\?AAA\\?01^B BB^g; #Comment '

r1 = r'(^\s*)(s|tr)( .)(\\\?\\??'
key = "AAA\?01"
r2 = r'\\??)\3(.*?)\ 3(.*)'
r = r1 + re.escape(key) + r2
print re.compile(r).f indall(line)

# desired regexp, from Steve Bethard's post
# r'(^\s*)(s|tr)( .)(\\\?%s)\3(.* ?)\3(.*)'

# build up regexp by parts
key = r'AAA\?01'
leadingWhite = r'(^\s*)'
replaceCmd = r'(s|tr)'
sepChar = r'(.)'
# prepend \'s and ?'s, only the OP knows why...
findString = r'(\\\?\\??%s)' % re.escape(key)
# sepCharRef references the char read by sepChar,
# to support separators other than '^'
sepCharRef = r'\3'
replString = r'(.*?)'
restOfLine = r'(.*)'
replaceCmdExpr = leadingWhite + replaceCmd + \
sepChar + findString + sepCharRef + \
replString + sepCharRef + restOfLine

matcher = re.compile( replaceCmdExpr )
print matcher.findall (line)

Jul 18 '05 #5

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

6
2048
by: Tony C | last post by:
I'm writing a python program which uses regular expressions, but I'm totally new to regexps. I've got Kuchling's "Regexp HOWTO", "Mastering Regular Expresions" by Oreilly, and have access to online stuff too. But I would like to find a mailing list or newsgroup where I can ask questions about regexps (when things don't work), not specifically dealing with Python. When I have Python-regexp questions, I'll post them here of course.
17
3093
by: Michael McGarry | last post by:
Hi, I am just starting to use Python. Does Python have all the regular expression features of Perl? Is Python missing any features available in Perl? Thanks, Michael
9
3205
by: Xah Lee | last post by:
# -*- coding: utf-8 -*- # Python # Matching string patterns # # Sometimes you want to know if a string is of # particular pattern. Let's say in your website # you have converted all images files from gif # format to png format. Now you need to change the # html code to use the .png files. So, essentially
31
4778
by: surfunbear | last post by:
I've read some posts on Perl versus Python and studied a bit of my Python book. I'm a software engineer, familiar with C++ objected oriented development, but have been using Perl because it is great for pattern matching, text processing, and automated testing. Our company is really fixated on risk managnemt and the only way I can do enough testing without working overtime (which some people have ended up doing) is by automating my...
0
1784
by: Xah Lee | last post by:
# -*- coding: utf-8 -*- # Python # David Eppstein of the Geometry Junkyard fame gave this elegant # version for returing all possible pairs from a range of n numbers. def combo2(n): return dict() print combo2(5)
9
4512
by: Dieter Vanderelst | last post by:
Dear all, I'm currently comparing Python versus Perl to use in a project that involved a lot of text processing. I'm trying to determine what the most efficient language would be for our purposes. I have to admit that, although I'm very familiar with Python, I'm complete Perl noob (and I hope to stay one) which is reflected in my questions. I know that the web offers a lot of resources on Python/Perl differences. But I couldn't find a...
1
3705
by: Rahul | last post by:
Hi Everybody I have some problem in my script. please help me. This is script file. I have one *.inq file. I want run this script in XML files. But this script errors shows . If u want i am attach this script files and inq files. I cant understand this error. Please suggest me. You can talk with my yahoo id b_sahoo1@yahoo.com. Now i am online. Plz....Plz..Plz...
3
2795
by: seberino | last post by:
How similar is Python's re module (regular expressions) compared to Perl's and grep's regular expression syntaxes? I really hope regular expression syntax is sufficiently standardized that we don't have to learn new dialects everytime we move from one language or shell command to another. chris
3
283
by: William Gill | last post by:
I am not to sharp on my regular expressions because I haven't used them in quite a while. So I am relearning regex and the PHP regex functions at the same time. Which means when I screw up, I'm not sure it's the regular expression that's wrong or the specifics of the PHP function's application thereof. I have a couple of questions, since I'm basically starting from scratch should I focus on the ereg* functions instead of the preg*...
0
8326
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
8845
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
8743
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
8522
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
1
6177
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
5647
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
4173
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
1
2745
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
2
1736
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.