By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
443,983 Members | 1,582 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 443,983 IT Pros & Developers. It's quick & easy.

splitting perl-style find/replace regexp using python

P: n/a
Hi all

I have a file with a bunch of perl regular expressions like so:

/(^|[\s\(])\*([^ ].*?[^ ])\*([\s\)\.\,\:\;\!\?]|$)/$1'''$2'''$3/ #
bold
/(^|[\s\(])\_\_([^ ].*?[^ ])\_\_([\s\)\.\,\:\;\!\?]|$)/$1''<b>$2<\/
b>''$3/ # italic bold
/(^|[\s\(])\_([^ ].*?[^ ])\_([\s\)\.\,\:\;\!\?]|$)/$1''$2''$3/ #
italic

These are all find/replace expressions delimited as '/search/replace/
# comment' where 'search' is the regular expression we're searching
for and 'replace' is the replacement expression.

Is there an easy and general way that I can split these perl-style
find-and-replace expressions into something I can use with Python, eg
re.sub('search','replace',str) ?

I though generally it would be good enough to split on '/' but as you
see the <\/bmesses that up. I really don't want to learn perl
here :-)

Cheers
JP

Mar 1 '07 #1
Share this Question
Share on Google+
8 Replies


P: n/a
John Pye wrote:
Hi all

I have a file with a bunch of perl regular expressions like so:

/(^|[\s\(])\*([^ ].*?[^ ])\*([\s\)\.\,\:\;\!\?]|$)/$1'''$2'''$3/ #
bold
/(^|[\s\(])\_\_([^ ].*?[^ ])\_\_([\s\)\.\,\:\;\!\?]|$)/$1''<b>$2<\/
b>''$3/ # italic bold
/(^|[\s\(])\_([^ ].*?[^ ])\_([\s\)\.\,\:\;\!\?]|$)/$1''$2''$3/ #
italic

These are all find/replace expressions delimited as '/search/replace/
# comment' where 'search' is the regular expression we're searching
for and 'replace' is the replacement expression.

Is there an easy and general way that I can split these perl-style
find-and-replace expressions into something I can use with Python, eg
re.sub('search','replace',str) ?

I though generally it would be good enough to split on '/' but as you
see the <\/bmesses that up. I really don't want to learn perl
here :-)

Cheers
JP
This could be more general, in principal a perl regex could end with a
"\", e.g. "\\/", but I'm guessing that won't happen here.

pyfor p in perlish:
.... print p
....
/(^|[\s\(])\*([^ ].*?[^ ])\*([\s\)\.\,\:\;\!\?]|$)/$1'''$2'''$3/
/(^|[\s\(])\_\_([^ ].*?[^ ])\_\_([\s\)\.\,\:\;\!\?]|$)/$1''<b>$2<\/b>''$3/
/(^|[\s\(])\_([^ ].*?[^ ])\_([\s\)\.\,\:\;\!\?]|$)/$1''$2''$3/
pyimport re
pysplitter = re.compile(r'[^\\]/')
pyfor p in perlish:
.... print splitter.split(p)
....
['/(^|[\\s\\(])\\*([^ ].*?[^ ])\\*([\\s\\)\\.\\,\\:\\;\\!\\?]|$',
"$1'''$2'''$", '']
['/(^|[\\s\\(])\\_\\_([^ ].*?[^ ])\\_\\_([\\s\\)\\.\\,\\:\\;\\!\\?]|$',
"$1''<b>$2<\\/b>''$", '']
['/(^|[\\s\\(])\\_([^ ].*?[^ ])\\_([\\s\\)\\.\\,\\:\\;\\!\\?]|$',
"$1''$2''$", '']

(I'm hoping this doesn't wrap!)

James
Mar 1 '07 #2

P: n/a
John Pye wrote:
I have a file with a bunch of perl regular expressions like so:

/(^|[\s\(])\*([^ ].*?[^ ])\*([\s\)\.\,\:\;\!\?]|$)/$1'''$2'''$3/ #
bold
/(^|[\s\(])\_\_([^ ].*?[^ ])\_\_([\s\)\.\,\:\;\!\?]|$)/$1''<b>$2<\/
b>''$3/ # italic bold
/(^|[\s\(])\_([^ ].*?[^ ])\_([\s\)\.\,\:\;\!\?]|$)/$1''$2''$3/ #
italic

These are all find/replace expressions delimited as '/search/replace/
# comment' where 'search' is the regular expression we're searching
for and 'replace' is the replacement expression.

Is there an easy and general way that I can split these perl-style
find-and-replace expressions into something I can use with Python, eg
re.sub('search','replace',str) ?

I though generally it would be good enough to split on '/' but as you
see the <\/bmesses that up. I really don't want to learn perl
here :-)
How about matching all escaped chars and '/', and then throwing away the
former:

def split(s):
breaks = re.compile(r"(\\.)|(/)").finditer(s)
left, mid, right = [b.start() for b in breaks if b.group(2)]
return s[left+1:mid], s[mid+1:right]

Peter
Mar 1 '07 #3

P: n/a
James Stroud wrote:
John Pye wrote:
>Hi all

I have a file with a bunch of perl regular expressions like so:

/(^|[\s\(])\*([^ ].*?[^ ])\*([\s\)\.\,\:\;\!\?]|$)/$1'''$2'''$3/ #
bold
/(^|[\s\(])\_\_([^ ].*?[^ ])\_\_([\s\)\.\,\:\;\!\?]|$)/$1''<b>$2<\/
b>''$3/ # italic bold
/(^|[\s\(])\_([^ ].*?[^ ])\_([\s\)\.\,\:\;\!\?]|$)/$1''$2''$3/ #
italic

These are all find/replace expressions delimited as '/search/replace/
# comment' where 'search' is the regular expression we're searching
for and 'replace' is the replacement expression.

Is there an easy and general way that I can split these perl-style
find-and-replace expressions into something I can use with Python, eg
re.sub('search','replace',str) ?

I though generally it would be good enough to split on '/' but as you
see the <\/bmesses that up. I really don't want to learn perl
here :-)

Cheers
JP

This could be more general, in principal a perl regex could end with a
"\", e.g. "\\/", but I'm guessing that won't happen here.

pyfor p in perlish:
... print p
...
/(^|[\s\(])\*([^ ].*?[^ ])\*([\s\)\.\,\:\;\!\?]|$)/$1'''$2'''$3/
/(^|[\s\(])\_\_([^ ].*?[^ ])\_\_([\s\)\.\,\:\;\!\?]|$)/$1''<b>$2<\/b>''$3/
/(^|[\s\(])\_([^ ].*?[^ ])\_([\s\)\.\,\:\;\!\?]|$)/$1''$2''$3/
pyimport re
pysplitter = re.compile(r'[^\\]/')
pyfor p in perlish:
... print splitter.split(p)
...
['/(^|[\\s\\(])\\*([^ ].*?[^ ])\\*([\\s\\)\\.\\,\\:\\;\\!\\?]|$',
"$1'''$2'''$", '']
['/(^|[\\s\\(])\\_\\_([^ ].*?[^ ])\\_\\_([\\s\\)\\.\\,\\:\\;\\!\\?]|$',
"$1''<b>$2<\\/b>''$", '']
['/(^|[\\s\\(])\\_([^ ].*?[^ ])\\_([\\s\\)\\.\\,\\:\\;\\!\\?]|$',
"$1''$2''$", '']

(I'm hoping this doesn't wrap!)

James
I realized that threw away the closing parentheses. This is the correct
version:

pysplitter = re.compile(r'(?<!\\)/')
pyfor p in perlish:
.... print splitter.split(p)
....
['', '(^|[\\s\\(])\\*([^ ].*?[^ ])\\*([\\s\\)\\.\\,\\:\\;\\!\\?]|$)',
"$1'''$2'''$3", '']
['', '(^|[\\s\\(])\\_\\_([^ ].*?[^
])\\_\\_([\\s\\)\\.\\,\\:\\;\\!\\?]|$)', "$1''<b>$2<\\/b>''$3", '']
['', '(^|[\\s\\(])\\_([^ ].*?[^ ])\\_([\\s\\)\\.\\,\\:\\;\\!\\?]|$)',
"$1''$2''$3", '']

James
Mar 1 '07 #4

P: n/a
James Stroud wrote:
James Stroud wrote:
>John Pye wrote:
>>Hi all

I have a file with a bunch of perl regular expressions like so:

/(^|[\s\(])\*([^ ].*?[^ ])\*([\s\)\.\,\:\;\!\?]|$)/$1'''$2'''$3/ #
bold
/(^|[\s\(])\_\_([^ ].*?[^ ])\_\_([\s\)\.\,\:\;\!\?]|$)/$1''<b>$2<\/
b>''$3/ # italic bold
/(^|[\s\(])\_([^ ].*?[^ ])\_([\s\)\.\,\:\;\!\?]|$)/$1''$2''$3/ #
italic

These are all find/replace expressions delimited as '/search/replace/
# comment' where 'search' is the regular expression we're searching
for and 'replace' is the replacement expression.

Is there an easy and general way that I can split these perl-style
find-and-replace expressions into something I can use with Python, eg
re.sub('search','replace',str) ?

I though generally it would be good enough to split on '/' but as you
see the <\/bmesses that up. I really don't want to learn perl
here :-)

Cheers
JP

This could be more general, in principal a perl regex could end with a
"\", e.g. "\\/", but I'm guessing that won't happen here.

pyfor p in perlish:
... print p
...
/(^|[\s\(])\*([^ ].*?[^ ])\*([\s\)\.\,\:\;\!\?]|$)/$1'''$2'''$3/
/(^|[\s\(])\_\_([^ ].*?[^
])\_\_([\s\)\.\,\:\;\!\?]|$)/$1''<b>$2<\/b>''$3/ /(^|[\s\(])\_([^ ].*?[^
])\_([\s\)\.\,\:\;\!\?]|$)/$1''$2''$3/ pyimport re
pysplitter = re.compile(r'[^\\]/')
pyfor p in perlish:
... print splitter.split(p)
...
['/(^|[\\s\\(])\\*([^ ].*?[^ ])\\*([\\s\\)\\.\\,\\:\\;\\!\\?]|$',
"$1'''$2'''$", '']
['/(^|[\\s\\(])\\_\\_([^ ].*?[^ ])\\_\\_([\\s\\)\\.\\,\\:\\;\\!\\?]|$',
"$1''<b>$2<\\/b>''$", '']
['/(^|[\\s\\(])\\_([^ ].*?[^ ])\\_([\\s\\)\\.\\,\\:\\;\\!\\?]|$',
"$1''$2''$", '']

(I'm hoping this doesn't wrap!)

James

I realized that threw away the closing parentheses. This is the correct
version:

pysplitter = re.compile(r'(?<!\\)/')
pyfor p in perlish:
... print splitter.split(p)
...
['', '(^|[\\s\\(])\\*([^ ].*?[^ ])\\*([\\s\\)\\.\\,\\:\\;\\!\\?]|$)',
"$1'''$2'''$3", '']
['', '(^|[\\s\\(])\\_\\_([^ ].*?[^
])\\_\\_([\\s\\)\\.\\,\\:\\;\\!\\?]|$)', "$1''<b>$2<\\/b>''$3", '']
['', '(^|[\\s\\(])\\_([^ ].*?[^ ])\\_([\\s\\)\\.\\,\\:\\;\\!\\?]|$)',
"$1''$2''$3", '']
There is another problem with escaped backslashes:
>>re.compile(r'(?<!\\)/').split(r"/abc\\/def/")
['', 'abc\\\\/def', '']

Peter
Mar 1 '07 #5

P: n/a
Peter Otten wrote:
James Stroud wrote:
>James Stroud wrote:
>>John Pye wrote:
Hi all

I have a file with a bunch of perl regular expressions like so:

/(^|[\s\(])\*([^ ].*?[^ ])\*([\s\)\.\,\:\;\!\?]|$)/$1'''$2'''$3/ #
bold
/(^|[\s\(])\_\_([^ ].*?[^ ])\_\_([\s\)\.\,\:\;\!\?]|$)/$1''<b>$2<\/
b>''$3/ # italic bold
/(^|[\s\(])\_([^ ].*?[^ ])\_([\s\)\.\,\:\;\!\?]|$)/$1''$2''$3/ #
italic

These are all find/replace expressions delimited as '/search/replace/
# comment' where 'search' is the regular expression we're searching
for and 'replace' is the replacement expression.

Is there an easy and general way that I can split these perl-style
find-and-replace expressions into something I can use with Python, eg
re.sub('search','replace',str) ?

I though generally it would be good enough to split on '/' but as you
see the <\/bmesses that up. I really don't want to learn perl
here :-)

Cheers
JP

This could be more general, in principal a perl regex could end with a
"\", e.g. "\\/", but I'm guessing that won't happen here.

pyfor p in perlish:
... print p
...
/(^|[\s\(])\*([^ ].*?[^ ])\*([\s\)\.\,\:\;\!\?]|$)/$1'''$2'''$3/
/(^|[\s\(])\_\_([^ ].*?[^
])\_\_([\s\)\.\,\:\;\!\?]|$)/$1''<b>$2<\/b>''$3/ /(^|[\s\(])\_([^ ].*?[^
])\_([\s\)\.\,\:\;\!\?]|$)/$1''$2''$3/ pyimport re
pysplitter = re.compile(r'[^\\]/')
pyfor p in perlish:
... print splitter.split(p)
...
['/(^|[\\s\\(])\\*([^ ].*?[^ ])\\*([\\s\\)\\.\\,\\:\\;\\!\\?]|$',
"$1'''$2'''$", '']
['/(^|[\\s\\(])\\_\\_([^ ].*?[^ ])\\_\\_([\\s\\)\\.\\,\\:\\;\\!\\?]|$',
"$1''<b>$2<\\/b>''$", '']
['/(^|[\\s\\(])\\_([^ ].*?[^ ])\\_([\\s\\)\\.\\,\\:\\;\\!\\?]|$',
"$1''$2''$", '']

(I'm hoping this doesn't wrap!)

James
I realized that threw away the closing parentheses. This is the correct
version:

pysplitter = re.compile(r'(?<!\\)/')
pyfor p in perlish:
... print splitter.split(p)
...
['', '(^|[\\s\\(])\\*([^ ].*?[^ ])\\*([\\s\\)\\.\\,\\:\\;\\!\\?]|$)',
"$1'''$2'''$3", '']
['', '(^|[\\s\\(])\\_\\_([^ ].*?[^
])\\_\\_([\\s\\)\\.\\,\\:\\;\\!\\?]|$)', "$1''<b>$2<\\/b>''$3", '']
['', '(^|[\\s\\(])\\_([^ ].*?[^ ])\\_([\\s\\)\\.\\,\\:\\;\\!\\?]|$)',
"$1''$2''$3", '']

There is another problem with escaped backslashes:
>>>re.compile(r'(?<!\\)/').split(r"/abc\\/def/")
['', 'abc\\\\/def', '']

Peter
Yes, this would be a case of the expression (left side) ending with a
"\" as I mentioned above.

James
Mar 1 '07 #6

P: n/a
James Stroud wrote:
Yes, this would be a case of the expression (left side) ending with a
"\" as I mentioned above.
Sorry for not tracking the context.

Peter

Mar 1 '07 #7

P: n/a
John Pye wrote:
Is there an easy and general way that I can split these perl-style
find-and-replace expressions into something I can use with Python, eg
re.sub('search','replace',str) ?
Another candidate:
>>re.compile(r"(?:/((?:\\.|[^/])*))").findall(r"/abc\\/def\/ghi//jkl")
['abc\\\\', 'def\\/ghi', '', 'jkl']

Peter
Mar 1 '07 #8

P: n/a
Thanks all for your suggestions on this. The 'splitter' idea was
particularly good, not something I'd thought of. Sorry for my late
reply.

Mar 22 '07 #9

This discussion thread is closed

Replies have been disabled for this discussion.