473,394 Members | 1,658 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,394 software developers and data experts.

splitting perl-style find/replace regexp using python

Hi all

I have a file with a bunch of perl regular expressions like so:

/(^|[\s\(])\*([^ ].*?[^ ])\*([\s\)\.\,\:\;\!\?]|$)/$1'''$2'''$3/ #
bold
/(^|[\s\(])\_\_([^ ].*?[^ ])\_\_([\s\)\.\,\:\;\!\?]|$)/$1''<b>$2<\/
b>''$3/ # italic bold
/(^|[\s\(])\_([^ ].*?[^ ])\_([\s\)\.\,\:\;\!\?]|$)/$1''$2''$3/ #
italic

These are all find/replace expressions delimited as '/search/replace/
# comment' where 'search' is the regular expression we're searching
for and 'replace' is the replacement expression.

Is there an easy and general way that I can split these perl-style
find-and-replace expressions into something I can use with Python, eg
re.sub('search','replace',str) ?

I though generally it would be good enough to split on '/' but as you
see the <\/bmesses that up. I really don't want to learn perl
here :-)

Cheers
JP

Mar 1 '07 #1
8 2690
John Pye wrote:
Hi all

I have a file with a bunch of perl regular expressions like so:

/(^|[\s\(])\*([^ ].*?[^ ])\*([\s\)\.\,\:\;\!\?]|$)/$1'''$2'''$3/ #
bold
/(^|[\s\(])\_\_([^ ].*?[^ ])\_\_([\s\)\.\,\:\;\!\?]|$)/$1''<b>$2<\/
b>''$3/ # italic bold
/(^|[\s\(])\_([^ ].*?[^ ])\_([\s\)\.\,\:\;\!\?]|$)/$1''$2''$3/ #
italic

These are all find/replace expressions delimited as '/search/replace/
# comment' where 'search' is the regular expression we're searching
for and 'replace' is the replacement expression.

Is there an easy and general way that I can split these perl-style
find-and-replace expressions into something I can use with Python, eg
re.sub('search','replace',str) ?

I though generally it would be good enough to split on '/' but as you
see the <\/bmesses that up. I really don't want to learn perl
here :-)

Cheers
JP
This could be more general, in principal a perl regex could end with a
"\", e.g. "\\/", but I'm guessing that won't happen here.

pyfor p in perlish:
.... print p
....
/(^|[\s\(])\*([^ ].*?[^ ])\*([\s\)\.\,\:\;\!\?]|$)/$1'''$2'''$3/
/(^|[\s\(])\_\_([^ ].*?[^ ])\_\_([\s\)\.\,\:\;\!\?]|$)/$1''<b>$2<\/b>''$3/
/(^|[\s\(])\_([^ ].*?[^ ])\_([\s\)\.\,\:\;\!\?]|$)/$1''$2''$3/
pyimport re
pysplitter = re.compile(r'[^\\]/')
pyfor p in perlish:
.... print splitter.split(p)
....
['/(^|[\\s\\(])\\*([^ ].*?[^ ])\\*([\\s\\)\\.\\,\\:\\;\\!\\?]|$',
"$1'''$2'''$", '']
['/(^|[\\s\\(])\\_\\_([^ ].*?[^ ])\\_\\_([\\s\\)\\.\\,\\:\\;\\!\\?]|$',
"$1''<b>$2<\\/b>''$", '']
['/(^|[\\s\\(])\\_([^ ].*?[^ ])\\_([\\s\\)\\.\\,\\:\\;\\!\\?]|$',
"$1''$2''$", '']

(I'm hoping this doesn't wrap!)

James
Mar 1 '07 #2
John Pye wrote:
I have a file with a bunch of perl regular expressions like so:

/(^|[\s\(])\*([^ ].*?[^ ])\*([\s\)\.\,\:\;\!\?]|$)/$1'''$2'''$3/ #
bold
/(^|[\s\(])\_\_([^ ].*?[^ ])\_\_([\s\)\.\,\:\;\!\?]|$)/$1''<b>$2<\/
b>''$3/ # italic bold
/(^|[\s\(])\_([^ ].*?[^ ])\_([\s\)\.\,\:\;\!\?]|$)/$1''$2''$3/ #
italic

These are all find/replace expressions delimited as '/search/replace/
# comment' where 'search' is the regular expression we're searching
for and 'replace' is the replacement expression.

Is there an easy and general way that I can split these perl-style
find-and-replace expressions into something I can use with Python, eg
re.sub('search','replace',str) ?

I though generally it would be good enough to split on '/' but as you
see the <\/bmesses that up. I really don't want to learn perl
here :-)
How about matching all escaped chars and '/', and then throwing away the
former:

def split(s):
breaks = re.compile(r"(\\.)|(/)").finditer(s)
left, mid, right = [b.start() for b in breaks if b.group(2)]
return s[left+1:mid], s[mid+1:right]

Peter
Mar 1 '07 #3
James Stroud wrote:
John Pye wrote:
>Hi all

I have a file with a bunch of perl regular expressions like so:

/(^|[\s\(])\*([^ ].*?[^ ])\*([\s\)\.\,\:\;\!\?]|$)/$1'''$2'''$3/ #
bold
/(^|[\s\(])\_\_([^ ].*?[^ ])\_\_([\s\)\.\,\:\;\!\?]|$)/$1''<b>$2<\/
b>''$3/ # italic bold
/(^|[\s\(])\_([^ ].*?[^ ])\_([\s\)\.\,\:\;\!\?]|$)/$1''$2''$3/ #
italic

These are all find/replace expressions delimited as '/search/replace/
# comment' where 'search' is the regular expression we're searching
for and 'replace' is the replacement expression.

Is there an easy and general way that I can split these perl-style
find-and-replace expressions into something I can use with Python, eg
re.sub('search','replace',str) ?

I though generally it would be good enough to split on '/' but as you
see the <\/bmesses that up. I really don't want to learn perl
here :-)

Cheers
JP

This could be more general, in principal a perl regex could end with a
"\", e.g. "\\/", but I'm guessing that won't happen here.

pyfor p in perlish:
... print p
...
/(^|[\s\(])\*([^ ].*?[^ ])\*([\s\)\.\,\:\;\!\?]|$)/$1'''$2'''$3/
/(^|[\s\(])\_\_([^ ].*?[^ ])\_\_([\s\)\.\,\:\;\!\?]|$)/$1''<b>$2<\/b>''$3/
/(^|[\s\(])\_([^ ].*?[^ ])\_([\s\)\.\,\:\;\!\?]|$)/$1''$2''$3/
pyimport re
pysplitter = re.compile(r'[^\\]/')
pyfor p in perlish:
... print splitter.split(p)
...
['/(^|[\\s\\(])\\*([^ ].*?[^ ])\\*([\\s\\)\\.\\,\\:\\;\\!\\?]|$',
"$1'''$2'''$", '']
['/(^|[\\s\\(])\\_\\_([^ ].*?[^ ])\\_\\_([\\s\\)\\.\\,\\:\\;\\!\\?]|$',
"$1''<b>$2<\\/b>''$", '']
['/(^|[\\s\\(])\\_([^ ].*?[^ ])\\_([\\s\\)\\.\\,\\:\\;\\!\\?]|$',
"$1''$2''$", '']

(I'm hoping this doesn't wrap!)

James
I realized that threw away the closing parentheses. This is the correct
version:

pysplitter = re.compile(r'(?<!\\)/')
pyfor p in perlish:
.... print splitter.split(p)
....
['', '(^|[\\s\\(])\\*([^ ].*?[^ ])\\*([\\s\\)\\.\\,\\:\\;\\!\\?]|$)',
"$1'''$2'''$3", '']
['', '(^|[\\s\\(])\\_\\_([^ ].*?[^
])\\_\\_([\\s\\)\\.\\,\\:\\;\\!\\?]|$)', "$1''<b>$2<\\/b>''$3", '']
['', '(^|[\\s\\(])\\_([^ ].*?[^ ])\\_([\\s\\)\\.\\,\\:\\;\\!\\?]|$)',
"$1''$2''$3", '']

James
Mar 1 '07 #4
James Stroud wrote:
James Stroud wrote:
>John Pye wrote:
>>Hi all

I have a file with a bunch of perl regular expressions like so:

/(^|[\s\(])\*([^ ].*?[^ ])\*([\s\)\.\,\:\;\!\?]|$)/$1'''$2'''$3/ #
bold
/(^|[\s\(])\_\_([^ ].*?[^ ])\_\_([\s\)\.\,\:\;\!\?]|$)/$1''<b>$2<\/
b>''$3/ # italic bold
/(^|[\s\(])\_([^ ].*?[^ ])\_([\s\)\.\,\:\;\!\?]|$)/$1''$2''$3/ #
italic

These are all find/replace expressions delimited as '/search/replace/
# comment' where 'search' is the regular expression we're searching
for and 'replace' is the replacement expression.

Is there an easy and general way that I can split these perl-style
find-and-replace expressions into something I can use with Python, eg
re.sub('search','replace',str) ?

I though generally it would be good enough to split on '/' but as you
see the <\/bmesses that up. I really don't want to learn perl
here :-)

Cheers
JP

This could be more general, in principal a perl regex could end with a
"\", e.g. "\\/", but I'm guessing that won't happen here.

pyfor p in perlish:
... print p
...
/(^|[\s\(])\*([^ ].*?[^ ])\*([\s\)\.\,\:\;\!\?]|$)/$1'''$2'''$3/
/(^|[\s\(])\_\_([^ ].*?[^
])\_\_([\s\)\.\,\:\;\!\?]|$)/$1''<b>$2<\/b>''$3/ /(^|[\s\(])\_([^ ].*?[^
])\_([\s\)\.\,\:\;\!\?]|$)/$1''$2''$3/ pyimport re
pysplitter = re.compile(r'[^\\]/')
pyfor p in perlish:
... print splitter.split(p)
...
['/(^|[\\s\\(])\\*([^ ].*?[^ ])\\*([\\s\\)\\.\\,\\:\\;\\!\\?]|$',
"$1'''$2'''$", '']
['/(^|[\\s\\(])\\_\\_([^ ].*?[^ ])\\_\\_([\\s\\)\\.\\,\\:\\;\\!\\?]|$',
"$1''<b>$2<\\/b>''$", '']
['/(^|[\\s\\(])\\_([^ ].*?[^ ])\\_([\\s\\)\\.\\,\\:\\;\\!\\?]|$',
"$1''$2''$", '']

(I'm hoping this doesn't wrap!)

James

I realized that threw away the closing parentheses. This is the correct
version:

pysplitter = re.compile(r'(?<!\\)/')
pyfor p in perlish:
... print splitter.split(p)
...
['', '(^|[\\s\\(])\\*([^ ].*?[^ ])\\*([\\s\\)\\.\\,\\:\\;\\!\\?]|$)',
"$1'''$2'''$3", '']
['', '(^|[\\s\\(])\\_\\_([^ ].*?[^
])\\_\\_([\\s\\)\\.\\,\\:\\;\\!\\?]|$)', "$1''<b>$2<\\/b>''$3", '']
['', '(^|[\\s\\(])\\_([^ ].*?[^ ])\\_([\\s\\)\\.\\,\\:\\;\\!\\?]|$)',
"$1''$2''$3", '']
There is another problem with escaped backslashes:
>>re.compile(r'(?<!\\)/').split(r"/abc\\/def/")
['', 'abc\\\\/def', '']

Peter
Mar 1 '07 #5
Peter Otten wrote:
James Stroud wrote:
>James Stroud wrote:
>>John Pye wrote:
Hi all

I have a file with a bunch of perl regular expressions like so:

/(^|[\s\(])\*([^ ].*?[^ ])\*([\s\)\.\,\:\;\!\?]|$)/$1'''$2'''$3/ #
bold
/(^|[\s\(])\_\_([^ ].*?[^ ])\_\_([\s\)\.\,\:\;\!\?]|$)/$1''<b>$2<\/
b>''$3/ # italic bold
/(^|[\s\(])\_([^ ].*?[^ ])\_([\s\)\.\,\:\;\!\?]|$)/$1''$2''$3/ #
italic

These are all find/replace expressions delimited as '/search/replace/
# comment' where 'search' is the regular expression we're searching
for and 'replace' is the replacement expression.

Is there an easy and general way that I can split these perl-style
find-and-replace expressions into something I can use with Python, eg
re.sub('search','replace',str) ?

I though generally it would be good enough to split on '/' but as you
see the <\/bmesses that up. I really don't want to learn perl
here :-)

Cheers
JP

This could be more general, in principal a perl regex could end with a
"\", e.g. "\\/", but I'm guessing that won't happen here.

pyfor p in perlish:
... print p
...
/(^|[\s\(])\*([^ ].*?[^ ])\*([\s\)\.\,\:\;\!\?]|$)/$1'''$2'''$3/
/(^|[\s\(])\_\_([^ ].*?[^
])\_\_([\s\)\.\,\:\;\!\?]|$)/$1''<b>$2<\/b>''$3/ /(^|[\s\(])\_([^ ].*?[^
])\_([\s\)\.\,\:\;\!\?]|$)/$1''$2''$3/ pyimport re
pysplitter = re.compile(r'[^\\]/')
pyfor p in perlish:
... print splitter.split(p)
...
['/(^|[\\s\\(])\\*([^ ].*?[^ ])\\*([\\s\\)\\.\\,\\:\\;\\!\\?]|$',
"$1'''$2'''$", '']
['/(^|[\\s\\(])\\_\\_([^ ].*?[^ ])\\_\\_([\\s\\)\\.\\,\\:\\;\\!\\?]|$',
"$1''<b>$2<\\/b>''$", '']
['/(^|[\\s\\(])\\_([^ ].*?[^ ])\\_([\\s\\)\\.\\,\\:\\;\\!\\?]|$',
"$1''$2''$", '']

(I'm hoping this doesn't wrap!)

James
I realized that threw away the closing parentheses. This is the correct
version:

pysplitter = re.compile(r'(?<!\\)/')
pyfor p in perlish:
... print splitter.split(p)
...
['', '(^|[\\s\\(])\\*([^ ].*?[^ ])\\*([\\s\\)\\.\\,\\:\\;\\!\\?]|$)',
"$1'''$2'''$3", '']
['', '(^|[\\s\\(])\\_\\_([^ ].*?[^
])\\_\\_([\\s\\)\\.\\,\\:\\;\\!\\?]|$)', "$1''<b>$2<\\/b>''$3", '']
['', '(^|[\\s\\(])\\_([^ ].*?[^ ])\\_([\\s\\)\\.\\,\\:\\;\\!\\?]|$)',
"$1''$2''$3", '']

There is another problem with escaped backslashes:
>>>re.compile(r'(?<!\\)/').split(r"/abc\\/def/")
['', 'abc\\\\/def', '']

Peter
Yes, this would be a case of the expression (left side) ending with a
"\" as I mentioned above.

James
Mar 1 '07 #6
James Stroud wrote:
Yes, this would be a case of the expression (left side) ending with a
"\" as I mentioned above.
Sorry for not tracking the context.

Peter

Mar 1 '07 #7
John Pye wrote:
Is there an easy and general way that I can split these perl-style
find-and-replace expressions into something I can use with Python, eg
re.sub('search','replace',str) ?
Another candidate:
>>re.compile(r"(?:/((?:\\.|[^/])*))").findall(r"/abc\\/def\/ghi//jkl")
['abc\\\\', 'def\\/ghi', '', 'jkl']

Peter
Mar 1 '07 #8
Thanks all for your suggestions on this. The 'splitter' idea was
particularly good, not something I'd thought of. Sorry for my late
reply.

Mar 22 '07 #9

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

3
by: Jan | last post by:
Hi! I have to split a string with a pattern which contains sometimes chars like + \ (the code is needed in an interpreter, written in Perl, of my own scripting language, so I never know the...
3
by: William Ahern | last post by:
I'm looking for resources on splitting and merging XML trees. Specifically, on methods to pare large XML documents into smaller documents which can be merged later. Off of the top of my head, I...
3
by: Rakesh | last post by:
Hi, I was 'googling' to look out for some ways of optimizing the code and came across this term - 'hot / cold splitting'. In short, the discussion is about splitting heavily accessed ( hot )...
20
by: Ed | last post by:
I am running Access 2002 and just ran the built in Access wizard for splitting a database into a back end (with tables) and front end (with queries, forms, modules, etc.). After running the...
11
by: MM | last post by:
Hi I have never written any C programs before, but it seems that I need to do so now. Hope some of you out there can spend a few minutes and help me by writing a simple example of something...
1
by: CaptainWillard | last post by:
Is there any difference between splitting the .mdb vs. linking a table in another .mdb file ? For example, if I use the database splitter tool I end up with Widgets.mdb and Widgets_be.mdb. All the...
9
by: acatejr | last post by:
I have a text file and each line is a list of values. The values are not delimited, but every four characters is a value. How do I get python to split this kind of data? Thanks.
2
by: shadow_ | last post by:
Hi i m new at C and trying to write a parser and a string class. Basicly program will read data from file and splits it into lines then lines to words. i used strtok function for splitting data to...
6
by: jacc14 | last post by:
Good morning all. I have been working on a database for the past couple of weeks and it is pretty nippy. I have an ODBC link in there from another software program. Since splitting it and...
37
by: xyz | last post by:
I have a string 16:23:18.659343 131.188.37.230.22 131.188.37.59.1398 tcp 168 for example lets say for the above string 16:23:18.659343 -- time 131.188.37.230 -- srcaddress 22 ...
0
by: ryjfgjl | last post by:
If we have dozens or hundreds of excel to import into the database, if we use the excel import function provided by database editors such as navicat, it will be extremely tedious and time-consuming...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.