473,767 Members | 1,579 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

splitting perl-style find/replace regexp using python

Hi all

I have a file with a bunch of perl regular expressions like so:

/(^|[\s\(])\*([^ ].*?[^ ])\*([\s\)\.\,\:\;\!\ ?]|$)/$1'''$2'''$3/ #
bold
/(^|[\s\(])\_\_([^ ].*?[^ ])\_\_([\s\)\.\,\:\;\!\ ?]|$)/$1''<b>$2<\/
b>''$3/ # italic bold
/(^|[\s\(])\_([^ ].*?[^ ])\_([\s\)\.\,\:\;\!\ ?]|$)/$1''$2''$3/ #
italic

These are all find/replace expressions delimited as '/search/replace/
# comment' where 'search' is the regular expression we're searching
for and 'replace' is the replacement expression.

Is there an easy and general way that I can split these perl-style
find-and-replace expressions into something I can use with Python, eg
re.sub('search' ,'replace',str) ?

I though generally it would be good enough to split on '/' but as you
see the <\/bmesses that up. I really don't want to learn perl
here :-)

Cheers
JP

Mar 1 '07 #1
8 2724
John Pye wrote:
Hi all

I have a file with a bunch of perl regular expressions like so:

/(^|[\s\(])\*([^ ].*?[^ ])\*([\s\)\.\,\:\;\!\ ?]|$)/$1'''$2'''$3/ #
bold
/(^|[\s\(])\_\_([^ ].*?[^ ])\_\_([\s\)\.\,\:\;\!\ ?]|$)/$1''<b>$2<\/
b>''$3/ # italic bold
/(^|[\s\(])\_([^ ].*?[^ ])\_([\s\)\.\,\:\;\!\ ?]|$)/$1''$2''$3/ #
italic

These are all find/replace expressions delimited as '/search/replace/
# comment' where 'search' is the regular expression we're searching
for and 'replace' is the replacement expression.

Is there an easy and general way that I can split these perl-style
find-and-replace expressions into something I can use with Python, eg
re.sub('search' ,'replace',str) ?

I though generally it would be good enough to split on '/' but as you
see the <\/bmesses that up. I really don't want to learn perl
here :-)

Cheers
JP
This could be more general, in principal a perl regex could end with a
"\", e.g. "\\/", but I'm guessing that won't happen here.

pyfor p in perlish:
.... print p
....
/(^|[\s\(])\*([^ ].*?[^ ])\*([\s\)\.\,\:\;\!\ ?]|$)/$1'''$2'''$3/
/(^|[\s\(])\_\_([^ ].*?[^ ])\_\_([\s\)\.\,\:\;\!\ ?]|$)/$1''<b>$2<\/b>''$3/
/(^|[\s\(])\_([^ ].*?[^ ])\_([\s\)\.\,\:\;\!\ ?]|$)/$1''$2''$3/
pyimport re
pysplitter = re.compile(r'[^\\]/')
pyfor p in perlish:
.... print splitter.split( p)
....
['/(^|[\\s\\(])\\*([^ ].*?[^ ])\\*([\\s\\)\\.\\,\\: \\;\\!\\?]|$',
"$1'''$2''' $", '']
['/(^|[\\s\\(])\\_\\_([^ ].*?[^ ])\\_\\_([\\s\\)\\.\\,\\: \\;\\!\\?]|$',
"$1''<b>$2< \\/b>''$", '']
['/(^|[\\s\\(])\\_([^ ].*?[^ ])\\_([\\s\\)\\.\\,\\: \\;\\!\\?]|$',
"$1''$2''$" , '']

(I'm hoping this doesn't wrap!)

James
Mar 1 '07 #2
John Pye wrote:
I have a file with a bunch of perl regular expressions like so:

/(^|[\s\(])\*([^ ].*?[^ ])\*([\s\)\.\,\:\;\!\ ?]|$)/$1'''$2'''$3/ #
bold
/(^|[\s\(])\_\_([^ ].*?[^ ])\_\_([\s\)\.\,\:\;\!\ ?]|$)/$1''<b>$2<\/
b>''$3/ # italic bold
/(^|[\s\(])\_([^ ].*?[^ ])\_([\s\)\.\,\:\;\!\ ?]|$)/$1''$2''$3/ #
italic

These are all find/replace expressions delimited as '/search/replace/
# comment' where 'search' is the regular expression we're searching
for and 'replace' is the replacement expression.

Is there an easy and general way that I can split these perl-style
find-and-replace expressions into something I can use with Python, eg
re.sub('search' ,'replace',str) ?

I though generally it would be good enough to split on '/' but as you
see the <\/bmesses that up. I really don't want to learn perl
here :-)
How about matching all escaped chars and '/', and then throwing away the
former:

def split(s):
breaks = re.compile(r"(\ \.)|(/)").finditer (s)
left, mid, right = [b.start() for b in breaks if b.group(2)]
return s[left+1:mid], s[mid+1:right]

Peter
Mar 1 '07 #3
James Stroud wrote:
John Pye wrote:
>Hi all

I have a file with a bunch of perl regular expressions like so:

/(^|[\s\(])\*([^ ].*?[^ ])\*([\s\)\.\,\:\;\!\ ?]|$)/$1'''$2'''$3/ #
bold
/(^|[\s\(])\_\_([^ ].*?[^ ])\_\_([\s\)\.\,\:\;\!\ ?]|$)/$1''<b>$2<\/
b>''$3/ # italic bold
/(^|[\s\(])\_([^ ].*?[^ ])\_([\s\)\.\,\:\;\!\ ?]|$)/$1''$2''$3/ #
italic

These are all find/replace expressions delimited as '/search/replace/
# comment' where 'search' is the regular expression we're searching
for and 'replace' is the replacement expression.

Is there an easy and general way that I can split these perl-style
find-and-replace expressions into something I can use with Python, eg
re.sub('search ','replace',str ) ?

I though generally it would be good enough to split on '/' but as you
see the <\/bmesses that up. I really don't want to learn perl
here :-)

Cheers
JP

This could be more general, in principal a perl regex could end with a
"\", e.g. "\\/", but I'm guessing that won't happen here.

pyfor p in perlish:
... print p
...
/(^|[\s\(])\*([^ ].*?[^ ])\*([\s\)\.\,\:\;\!\ ?]|$)/$1'''$2'''$3/
/(^|[\s\(])\_\_([^ ].*?[^ ])\_\_([\s\)\.\,\:\;\!\ ?]|$)/$1''<b>$2<\/b>''$3/
/(^|[\s\(])\_([^ ].*?[^ ])\_([\s\)\.\,\:\;\!\ ?]|$)/$1''$2''$3/
pyimport re
pysplitter = re.compile(r'[^\\]/')
pyfor p in perlish:
... print splitter.split( p)
...
['/(^|[\\s\\(])\\*([^ ].*?[^ ])\\*([\\s\\)\\.\\,\\: \\;\\!\\?]|$',
"$1'''$2''' $", '']
['/(^|[\\s\\(])\\_\\_([^ ].*?[^ ])\\_\\_([\\s\\)\\.\\,\\: \\;\\!\\?]|$',
"$1''<b>$2< \\/b>''$", '']
['/(^|[\\s\\(])\\_([^ ].*?[^ ])\\_([\\s\\)\\.\\,\\: \\;\\!\\?]|$',
"$1''$2''$" , '']

(I'm hoping this doesn't wrap!)

James
I realized that threw away the closing parentheses. This is the correct
version:

pysplitter = re.compile(r'(? <!\\)/')
pyfor p in perlish:
.... print splitter.split( p)
....
['', '(^|[\\s\\(])\\*([^ ].*?[^ ])\\*([\\s\\)\\.\\,\\: \\;\\!\\?]|$)',
"$1'''$2''' $3", '']
['', '(^|[\\s\\(])\\_\\_([^ ].*?[^
])\\_\\_([\\s\\)\\.\\,\\: \\;\\!\\?]|$)', "$1''<b>$2< \\/b>''$3", '']
['', '(^|[\\s\\(])\\_([^ ].*?[^ ])\\_([\\s\\)\\.\\,\\: \\;\\!\\?]|$)',
"$1''$2''$3 ", '']

James
Mar 1 '07 #4
James Stroud wrote:
James Stroud wrote:
>John Pye wrote:
>>Hi all

I have a file with a bunch of perl regular expressions like so:

/(^|[\s\(])\*([^ ].*?[^ ])\*([\s\)\.\,\:\;\!\ ?]|$)/$1'''$2'''$3/ #
bold
/(^|[\s\(])\_\_([^ ].*?[^ ])\_\_([\s\)\.\,\:\;\!\ ?]|$)/$1''<b>$2<\/
b>''$3/ # italic bold
/(^|[\s\(])\_([^ ].*?[^ ])\_([\s\)\.\,\:\;\!\ ?]|$)/$1''$2''$3/ #
italic

These are all find/replace expressions delimited as '/search/replace/
# comment' where 'search' is the regular expression we're searching
for and 'replace' is the replacement expression.

Is there an easy and general way that I can split these perl-style
find-and-replace expressions into something I can use with Python, eg
re.sub('searc h','replace',st r) ?

I though generally it would be good enough to split on '/' but as you
see the <\/bmesses that up. I really don't want to learn perl
here :-)

Cheers
JP

This could be more general, in principal a perl regex could end with a
"\", e.g. "\\/", but I'm guessing that won't happen here.

pyfor p in perlish:
... print p
...
/(^|[\s\(])\*([^ ].*?[^ ])\*([\s\)\.\,\:\;\!\ ?]|$)/$1'''$2'''$3/
/(^|[\s\(])\_\_([^ ].*?[^
])\_\_([\s\)\.\,\:\;\!\ ?]|$)/$1''<b>$2<\/b>''$3/ /(^|[\s\(])\_([^ ].*?[^
])\_([\s\)\.\,\:\;\!\ ?]|$)/$1''$2''$3/ pyimport re
pysplitter = re.compile(r'[^\\]/')
pyfor p in perlish:
... print splitter.split( p)
...
['/(^|[\\s\\(])\\*([^ ].*?[^ ])\\*([\\s\\)\\.\\,\\: \\;\\!\\?]|$',
"$1'''$2'''$ ", '']
['/(^|[\\s\\(])\\_\\_([^ ].*?[^ ])\\_\\_([\\s\\)\\.\\,\\: \\;\\!\\?]|$',
"$1''<b>$2<\ \/b>''$", '']
['/(^|[\\s\\(])\\_([^ ].*?[^ ])\\_([\\s\\)\\.\\,\\: \\;\\!\\?]|$',
"$1''$2''$" , '']

(I'm hoping this doesn't wrap!)

James

I realized that threw away the closing parentheses. This is the correct
version:

pysplitter = re.compile(r'(? <!\\)/')
pyfor p in perlish:
... print splitter.split( p)
...
['', '(^|[\\s\\(])\\*([^ ].*?[^ ])\\*([\\s\\)\\.\\,\\: \\;\\!\\?]|$)',
"$1'''$2''' $3", '']
['', '(^|[\\s\\(])\\_\\_([^ ].*?[^
])\\_\\_([\\s\\)\\.\\,\\: \\;\\!\\?]|$)', "$1''<b>$2< \\/b>''$3", '']
['', '(^|[\\s\\(])\\_([^ ].*?[^ ])\\_([\\s\\)\\.\\,\\: \\;\\!\\?]|$)',
"$1''$2''$3 ", '']
There is another problem with escaped backslashes:
>>re.compile(r' (?<!\\)/').split(r"/abc\\/def/")
['', 'abc\\\\/def', '']

Peter
Mar 1 '07 #5
Peter Otten wrote:
James Stroud wrote:
>James Stroud wrote:
>>John Pye wrote:
Hi all

I have a file with a bunch of perl regular expressions like so:

/(^|[\s\(])\*([^ ].*?[^ ])\*([\s\)\.\,\:\;\!\ ?]|$)/$1'''$2'''$3/ #
bold
/(^|[\s\(])\_\_([^ ].*?[^ ])\_\_([\s\)\.\,\:\;\!\ ?]|$)/$1''<b>$2<\/
b>''$3/ # italic bold
/(^|[\s\(])\_([^ ].*?[^ ])\_([\s\)\.\,\:\;\!\ ?]|$)/$1''$2''$3/ #
italic

These are all find/replace expressions delimited as '/search/replace/
# comment' where 'search' is the regular expression we're searching
for and 'replace' is the replacement expression.

Is there an easy and general way that I can split these perl-style
find-and-replace expressions into something I can use with Python, eg
re.sub('sear ch','replace',s tr) ?

I though generally it would be good enough to split on '/' but as you
see the <\/bmesses that up. I really don't want to learn perl
here :-)

Cheers
JP

This could be more general, in principal a perl regex could end with a
"\", e.g. "\\/", but I'm guessing that won't happen here.

pyfor p in perlish:
... print p
...
/(^|[\s\(])\*([^ ].*?[^ ])\*([\s\)\.\,\:\;\!\ ?]|$)/$1'''$2'''$3/
/(^|[\s\(])\_\_([^ ].*?[^
])\_\_([\s\)\.\,\:\;\!\ ?]|$)/$1''<b>$2<\/b>''$3/ /(^|[\s\(])\_([^ ].*?[^
])\_([\s\)\.\,\:\;\!\ ?]|$)/$1''$2''$3/ pyimport re
pysplitter = re.compile(r'[^\\]/')
pyfor p in perlish:
... print splitter.split( p)
...
['/(^|[\\s\\(])\\*([^ ].*?[^ ])\\*([\\s\\)\\.\\,\\: \\;\\!\\?]|$',
"$1'''$2'''$" , '']
['/(^|[\\s\\(])\\_\\_([^ ].*?[^ ])\\_\\_([\\s\\)\\.\\,\\: \\;\\!\\?]|$',
"$1''<b>$2< \\/b>''$", '']
['/(^|[\\s\\(])\\_([^ ].*?[^ ])\\_([\\s\\)\\.\\,\\: \\;\\!\\?]|$',
"$1''$2''$" , '']

(I'm hoping this doesn't wrap!)

James
I realized that threw away the closing parentheses. This is the correct
version:

pysplitter = re.compile(r'(? <!\\)/')
pyfor p in perlish:
... print splitter.split( p)
...
['', '(^|[\\s\\(])\\*([^ ].*?[^ ])\\*([\\s\\)\\.\\,\\: \\;\\!\\?]|$)',
"$1'''$2'''$3" , '']
['', '(^|[\\s\\(])\\_\\_([^ ].*?[^
])\\_\\_([\\s\\)\\.\\,\\: \\;\\!\\?]|$)', "$1''<b>$2< \\/b>''$3", '']
['', '(^|[\\s\\(])\\_([^ ].*?[^ ])\\_([\\s\\)\\.\\,\\: \\;\\!\\?]|$)',
"$1''$2''$3" , '']

There is another problem with escaped backslashes:
>>>re.compile(r '(?<!\\)/').split(r"/abc\\/def/")
['', 'abc\\\\/def', '']

Peter
Yes, this would be a case of the expression (left side) ending with a
"\" as I mentioned above.

James
Mar 1 '07 #6
James Stroud wrote:
Yes, this would be a case of the expression (left side) ending with a
"\" as I mentioned above.
Sorry for not tracking the context.

Peter

Mar 1 '07 #7
John Pye wrote:
Is there an easy and general way that I can split these perl-style
find-and-replace expressions into something I can use with Python, eg
re.sub('search' ,'replace',str) ?
Another candidate:
>>re.compile(r" (?:/((?:\\.|[^/])*))").findall( r"/abc\\/def\/ghi//jkl")
['abc\\\\', 'def\\/ghi', '', 'jkl']

Peter
Mar 1 '07 #8
Thanks all for your suggestions on this. The 'splitter' idea was
particularly good, not something I'd thought of. Sorry for my late
reply.

Mar 22 '07 #9

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

3
2217
by: Jan | last post by:
Hi! I have to split a string with a pattern which contains sometimes chars like + \ (the code is needed in an interpreter, written in Perl, of my own scripting language, so I never know the pattern exactly). split(/$pattern/, $string); It works well most of all time, but if the pattern is + the execution of the program stops and perl prints an error message.
3
2539
by: William Ahern | last post by:
I'm looking for resources on splitting and merging XML trees. Specifically, on methods to pare large XML documents into smaller documents which can be merged later. Off of the top of my head, I can envision unions of node sets, and unions of node text. But I know there's much more to the subject than that, if not more alternatives than greater technical detail. TIA,
3
4141
by: Rakesh | last post by:
Hi, I was 'googling' to look out for some ways of optimizing the code and came across this term - 'hot / cold splitting'. In short, the discussion is about splitting heavily accessed ( hot ) portions of data structure from rarely accessed cold portions. I haven't used this one myself anytime before, but am interested in learning more about this. Can you please share your experience here, so that I can understand better and this could...
20
3026
by: Ed | last post by:
I am running Access 2002 and just ran the built in Access wizard for splitting a database into a back end (with tables) and front end (with queries, forms, modules, etc.). After running the wizard, I opened the table relationship view and noticed that all the relationships are missing. Is this supposed to happen? If so, why? I've noticed that queries are behaving strangely now, seemingly because the relationships are not established....
11
4765
by: MM | last post by:
Hi I have never written any C programs before, but it seems that I need to do so now. Hope some of you out there can spend a few minutes and help me by writing a simple example of something fairly similar to what I need. I really think it is a simple matter if you know C programming, but to me it is not easy at all. An example from some "professional" C programmer will probably give me all I need to complete it into exactly what I need....
1
1785
by: CaptainWillard | last post by:
Is there any difference between splitting the .mdb vs. linking a table in another .mdb file ? For example, if I use the database splitter tool I end up with Widgets.mdb and Widgets_be.mdb. All the tables are located in Widgets_be.mdb. However, let us say that I create a file called Widgets2.mdb, move all the tables from Widgets.mdb into it, and then link the tables from Widgets.mdb to Widgets2.mdb. Is there any difference in how Access...
9
5531
by: acatejr | last post by:
I have a text file and each line is a list of values. The values are not delimited, but every four characters is a value. How do I get python to split this kind of data? Thanks.
2
3270
by: shadow_ | last post by:
Hi i m new at C and trying to write a parser and a string class. Basicly program will read data from file and splits it into lines then lines to words. i used strtok function for splitting data to lines it worked quite well but srttok isnot working for multiple blank or commas. Can strtok do this kind of splitting if it cant what should i use . Unal
6
1706
by: jacc14 | last post by:
Good morning all. I have been working on a database for the past couple of weeks and it is pretty nippy. I have an ODBC link in there from another software program. Since splitting it and putting the backend in a backend folder (on the same network) may I add, it has really slowed down. When you first open the form it is quite quick but after that it slows down. Its as if there is something running in the background. Not sure if my...
37
1856
by: xyz | last post by:
I have a string 16:23:18.659343 131.188.37.230.22 131.188.37.59.1398 tcp 168 for example lets say for the above string 16:23:18.659343 -- time 131.188.37.230 -- srcaddress 22 --srcport 131.188.37.59 --destaddress 1398 --destport tcp --protocol
0
9407
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
10170
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
10014
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
7384
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules. He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms. Adolph will...
0
6656
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
5425
by: adsilva | last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
1
3931
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
2
3534
muto222
by: muto222 | last post by:
How can i add a mobile payment intergratation into php mysql website.
3
2808
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.