473,587 Members | 2,230 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

RE Engine error with sub()

Hi,

I have the following codes:

from __future__ import nested_scopes
import re
from UserDict import UserDict
class Replacer(UserDi ct):
"""
An all-in-one multiple string substitution class. This class was
contributed by Xavier
Defrang to the ASPN Python Cookbook
(http://aspn.activestate.com/ASPN/Coo...n/Recipe/81330)
and al***@sourcefor ge.net.

Copyright: The methods _make_regex(), __call__() and substitute()
were the work of Xavier Defrang,
__init__() was the work of al***@sourcefor ge.net, all others were
the work of Maurice Ling"""

def __init__(self, dict = None, file = None):
"""Construc tor. It calls for the compilation of regular
expressions from either
a dictionary object or a replacement rule file.

@param dict: dictionary object containing replacement rules
with the string to be
replaced as keys.
@param file: file name of replacement rule file
"""
self.re = None
self.regex = None
if file == None:
UserDict.__init __(self, dict)
self._make_rege x()
else:
UserDict.__init __(self, self.readDictio naryFile(file))
self._make_rege x()

def cleanDictionary File(self, file):
"""
Method to clean up the replacement rule dictionary file and
write the cleaned
file as the same name as the original file."""
import os
dict = self.readDictio naryFile(file)
f = open(file, 'w')
for key in dict.keys(): f.write(str(key ) + '=' + str(dict[key])
+ os.linesep)
f.close()

def readDictionaryF ile(self, file):
"""
Method to parse a replacement rule file (file) into a
dictionary for regular
expression processing. Each rule in the rule file is in the form:
<string to be replaced>=<stri ng to replace with>
"""
import string
import os
f = open(file, 'r')
data = f.readlines()
f.close()
dict = {}
for rule in data:
rule = rule.split('=')
if rule[1][-1] == os.linesep: rule[1] = rule[1][:-1]
dict[str(rule[0])] = str(rule[1])
print '%s replacement rule(s) read from %s' %
(str(len(dict.k eys())), str(file))
return dict

def _make_regex(sel f):
""" Build a regular expression object based on the keys of the
current dictionary """
self.re = "(%s)" % "|".join(map(re .escape, self.keys()))
self.regex = re.compile(self .re)

def __call__(self, mo):
""" This handler will be invoked for each regex match """
# Count substitutions
self.count += 1 # Look-up string
return self[mo.string[mo.start():mo.e nd()]]

def substitute(self , text):
""" Translate text, returns the modified text. """
# Reset substitution counter
self.count = 0
# Process text
#return self._make_rege x().sub(self, text)
return self.regex.sub( self, text)

def rmBracketDuplic ate(self, text):
"""Removes the bracketed text in occurrences of '<text-x>
(<text-x>)'"""
regex = re.compile(r'(\ w+)\s*(\(\1\))' )
return regex.sub(r'\1' , text)

def substituteMulti ple(self, text):
"""Similar to substitute() method except that this method loops
round the same text
multiple times until no more substitutions can be made or when
it had looped
10 times. This is to pre-ampt for cases of recursive
abbreviations." ""
count = 1 # to get into the loop
run = 0 # counter for number of runs thru the text
while count > 0 and run < 10:
count = 0
text = self.rmBracketD uplicate(self.s ubstitute(text) )
count = count + self.count
run = run + 1
print "Pass %d: Changed %d things(s)" % (run, count)
return text


Normally I will use the following to instantiate my module:

replace = Replacer('', 'rule.mdf')

rule.mdf is in the format of "<string to be replaced>=<stri ng to replace
with>\n"

Then using replace.substit uteMultiple('<m y text>') to carry out multiple
replacements.

It all works well for rule count up to 800+ but when my replacement
rules swells up to 1800+, it gives me a runtime error that says
"Internal error in regular expression engine"... traceable to "return
self.regex.sub( self, text)" in substitute() method.
Any ideas or workarounds?

Thanks in advance.

Cheers,
Maurice
Jul 19 '05 #1
6 1934

Instead of using regular expressions, you could perhaps
use a multiple keyword matcher, and then for each match,
replace it with the correct string.

http://hkn.eecs.berkeley.edu/~dyoo/python/ahocorasick/

contains the Aho-Corasick algorithm written in C with
a Python extension.
Maurice LING wrote:
Hi,

I have the following codes:

from __future__ import nested_scopes
import re
from UserDict import UserDict
class Replacer(UserDi ct):
"""
An all-in-one multiple string substitution class. This class was
contributed by Xavier
Defrang to the ASPN Python Cookbook
(http://aspn.activestate.com/ASPN/Coo...n/Recipe/81330)
and al***@sourcefor ge.net.

Copyright: The methods _make_regex(), __call__() and substitute()
were the work of Xavier Defrang,
__init__() was the work of al***@sourcefor ge.net, all others were
the work of Maurice Ling"""

def __init__(self, dict = None, file = None):
"""Construc tor. It calls for the compilation of regular
expressions from either
a dictionary object or a replacement rule file.

@param dict: dictionary object containing replacement rules with
the string to be
replaced as keys.
@param file: file name of replacement rule file
"""
self.re = None
self.regex = None
if file == None:
UserDict.__init __(self, dict)
self._make_rege x()
else:
UserDict.__init __(self, self.readDictio naryFile(file))
self._make_rege x()

def cleanDictionary File(self, file):
"""
Method to clean up the replacement rule dictionary file and
write the cleaned
file as the same name as the original file."""
import os
dict = self.readDictio naryFile(file)
f = open(file, 'w')
for key in dict.keys(): f.write(str(key ) + '=' + str(dict[key])
+ os.linesep)
f.close()

def readDictionaryF ile(self, file):
"""
Method to parse a replacement rule file (file) into a dictionary
for regular
expression processing. Each rule in the rule file is in the form:
<string to be replaced>=<stri ng to replace with>
"""
import string
import os
f = open(file, 'r')
data = f.readlines()
f.close()
dict = {}
for rule in data:
rule = rule.split('=')
if rule[1][-1] == os.linesep: rule[1] = rule[1][:-1]
dict[str(rule[0])] = str(rule[1])
print '%s replacement rule(s) read from %s' %
(str(len(dict.k eys())), str(file))
return dict

def _make_regex(sel f):
""" Build a regular expression object based on the keys of the
current dictionary """
self.re = "(%s)" % "|".join(map(re .escape, self.keys()))
self.regex = re.compile(self .re)

def __call__(self, mo):
""" This handler will be invoked for each regex match """
# Count substitutions
self.count += 1 # Look-up string
return self[mo.string[mo.start():mo.e nd()]]

def substitute(self , text):
""" Translate text, returns the modified text. """
# Reset substitution counter
self.count = 0
# Process text
#return self._make_rege x().sub(self, text)
return self.regex.sub( self, text)

def rmBracketDuplic ate(self, text):
"""Removes the bracketed text in occurrences of '<text-x>
(<text-x>)'"""
regex = re.compile(r'(\ w+)\s*(\(\1\))' )
return regex.sub(r'\1' , text)

def substituteMulti ple(self, text):
"""Similar to substitute() method except that this method loops
round the same text
multiple times until no more substitutions can be made or when
it had looped
10 times. This is to pre-ampt for cases of recursive
abbreviations." ""
count = 1 # to get into the loop
run = 0 # counter for number of runs thru the text
while count > 0 and run < 10:
count = 0
text = self.rmBracketD uplicate(self.s ubstitute(text) )
count = count + self.count
run = run + 1
print "Pass %d: Changed %d things(s)" % (run, count)
return text


Normally I will use the following to instantiate my module:

replace = Replacer('', 'rule.mdf')

rule.mdf is in the format of "<string to be replaced>=<stri ng to replace
with>\n"

Then using replace.substit uteMultiple('<m y text>') to carry out multiple
replacements.

It all works well for rule count up to 800+ but when my replacement
rules swells up to 1800+, it gives me a runtime error that says
"Internal error in regular expression engine"... traceable to "return
self.regex.sub( self, text)" in substitute() method.
Any ideas or workarounds?

Thanks in advance.

Cheers,
Maurice

Jul 19 '05 #2
Maurice LING schrieb:
Hi,

I have the following codes:

from __future__ import nested_scopes
[...]
Are you still using Python 2.1?

In every later version you don't need the
"from __future__ import nested_scopes" line.

So, if you are using Python 2.1 I strongly recommend
upgrading to Python 2.4.1.
[...]
It all works well for rule count up to 800+ but when my replacement
rules swells up to 1800+, it gives me a runtime error that says
"Internal error in regular expression engine"... traceable to "return
self.regex.sub( self, text)" in substitute() method.
[...]


I didn't read your code, but this sounds like you have a problem with
the regular expression engine being recursive in Python versions < 2.4.
Try again using Python 2.4 or later (i.e. Python 2.4.1). The new regular
expression engine is not recursive anymore.

Bye,
Dennis
Jul 19 '05 #3
The "Internal error in regular expression engine" occurs also
in Python 2.4.0 when creating a regular expression containing
more than 9999 or's ("|").

Dennis Benzinger wrote:
Maurice LING schrieb:
Hi,

I have the following codes:

from __future__ import nested_scopes

> [...]


Are you still using Python 2.1?

In every later version you don't need the
"from __future__ import nested_scopes" line.

So, if you are using Python 2.1 I strongly recommend
upgrading to Python 2.4.1.
[...]
It all works well for rule count up to 800+ but when my replacement
rules swells up to 1800+, it gives me a runtime error that says
"Internal error in regular expression engine"... traceable to "return
self.regex.sub( self, text)" in substitute() method.
[...]

I didn't read your code, but this sounds like you have a problem with
the regular expression engine being recursive in Python versions < 2.4.
Try again using Python 2.4 or later (i.e. Python 2.4.1). The new regular
expression engine is not recursive anymore.

Bye,
Dennis

Jul 19 '05 #4
Hi Dennis,

Dennis Benzinger wrote:
Maurice LING schrieb:
Hi,

I have the following codes:

from __future__ import nested_scopes

> [...]


Are you still using Python 2.1?

In every later version you don't need the
"from __future__ import nested_scopes" line.

So, if you are using Python 2.1 I strongly recommend
upgrading to Python 2.4.1.


I am using Python 2.3.5, installed using Fink. That is the latest
version Fink has to offer.
[...]
It all works well for rule count up to 800+ but when my replacement
rules swells up to 1800+, it gives me a runtime error that says
"Internal error in regular expression engine"... traceable to "return
self.regex.sub( self, text)" in substitute() method.
[...]

I didn't read your code, but this sounds like you have a problem with
the regular expression engine being recursive in Python versions < 2.4.
Try again using Python 2.4 or later (i.e. Python 2.4.1). The new regular
expression engine is not recursive anymore.


Apparently this problem had been reported in Bugs item #857676
(http://mail.python.org/pipermail/pyt...er/021473.html)
and (http://www.lehuen.com/nicolas/index.php/Pytst/2005/04). Bugs item
#857676 is consistent with my problem as in it works with smaller lists
(~1000) but not much bigger than that. The problem seems to lie in the
fact that the SRE engine works on a 16-bit opcode...

Cheers
Maurice
Jul 19 '05 #5
Hi all,

I think I might have a workaround to this problem but have no idea how
to work it through. I hope that someone can kindly help me out because I
do not quite understand the mechanics of the _make_regex() method in the
original codes...

My idea is, instead of having one UserDict, have a list of UserDicts. So
a large unprocessable replacement rule set is split into multiple
smaller files, with each file read into a UserDict and it is made into a
RE matcher. Then iterative matching using a list of REs.

In short, the current precedure is
1 dictionary, 1 RE, 1 RE matcher... to match inputs
My workaround is to change it to
list of dictionaries, list of REs, list of RE matcher... iterative
matching of inputs.

Can someone kindly help me out here?

Thanks in advance.

Cheers,
Maurice
Maurice LING wrote:
Hi,

I have the following codes:

from __future__ import nested_scopes
import re
from UserDict import UserDict
class Replacer(UserDi ct):
"""
An all-in-one multiple string substitution class. This class was
contributed by Xavier
Defrang to the ASPN Python Cookbook
(http://aspn.activestate.com/ASPN/Coo...n/Recipe/81330)
and al***@sourcefor ge.net.

Copyright: The methods _make_regex(), __call__() and substitute()
were the work of Xavier Defrang,
__init__() was the work of al***@sourcefor ge.net, all others were
the work of Maurice Ling"""

def __init__(self, dict = None, file = None):
"""Construc tor. It calls for the compilation of regular
expressions from either
a dictionary object or a replacement rule file.

@param dict: dictionary object containing replacement rules with
the string to be
replaced as keys.
@param file: file name of replacement rule file
"""
self.re = None
self.regex = None
if file == None:
UserDict.__init __(self, dict)
self._make_rege x()
else:
UserDict.__init __(self, self.readDictio naryFile(file))
self._make_rege x()

def cleanDictionary File(self, file):
"""
Method to clean up the replacement rule dictionary file and
write the cleaned
file as the same name as the original file."""
import os
dict = self.readDictio naryFile(file)
f = open(file, 'w')
for key in dict.keys(): f.write(str(key ) + '=' + str(dict[key])
+ os.linesep)
f.close()

def readDictionaryF ile(self, file):
"""
Method to parse a replacement rule file (file) into a dictionary
for regular
expression processing. Each rule in the rule file is in the form:
<string to be replaced>=<stri ng to replace with>
"""
import string
import os
f = open(file, 'r')
data = f.readlines()
f.close()
dict = {}
for rule in data:
rule = rule.split('=')
if rule[1][-1] == os.linesep: rule[1] = rule[1][:-1]
dict[str(rule[0])] = str(rule[1])
print '%s replacement rule(s) read from %s' %
(str(len(dict.k eys())), str(file))
return dict

def _make_regex(sel f):
""" Build a regular expression object based on the keys of the
current dictionary """
self.re = "(%s)" % "|".join(map(re .escape, self.keys()))
self.regex = re.compile(self .re)

def __call__(self, mo):
""" This handler will be invoked for each regex match """
# Count substitutions
self.count += 1 # Look-up string
return self[mo.string[mo.start():mo.e nd()]]

def substitute(self , text):
""" Translate text, returns the modified text. """
# Reset substitution counter
self.count = 0
# Process text
#return self._make_rege x().sub(self, text)
return self.regex.sub( self, text)

def rmBracketDuplic ate(self, text):
"""Removes the bracketed text in occurrences of '<text-x>
(<text-x>)'"""
regex = re.compile(r'(\ w+)\s*(\(\1\))' )
return regex.sub(r'\1' , text)

def substituteMulti ple(self, text):
"""Similar to substitute() method except that this method loops
round the same text
multiple times until no more substitutions can be made or when
it had looped
10 times. This is to pre-ampt for cases of recursive
abbreviations." ""
count = 1 # to get into the loop
run = 0 # counter for number of runs thru the text
while count > 0 and run < 10:
count = 0
text = self.rmBracketD uplicate(self.s ubstitute(text) )
count = count + self.count
run = run + 1
print "Pass %d: Changed %d things(s)" % (run, count)
return text


Normally I will use the following to instantiate my module:

replace = Replacer('', 'rule.mdf')

rule.mdf is in the format of "<string to be replaced>=<stri ng to replace
with>\n"

Then using replace.substit uteMultiple('<m y text>') to carry out multiple
replacements.

It all works well for rule count up to 800+ but when my replacement
rules swells up to 1800+, it gives me a runtime error that says
"Internal error in regular expression engine"... traceable to "return
self.regex.sub( self, text)" in substitute() method.
Any ideas or workarounds?

Thanks in advance.

Cheers,
Maurice

Jul 19 '05 #6
Solved it. Instead of modifying Replacer class, I've made another class
which initiates a list of Replacer objects from a list of substitution
rule files. And then iterates through the list of Replacer objects and
calls upon their own substitute() method. It seems to work.

Thanks for all your advices.

Cheers
Maurice
Maurice LING wrote:
Hi all,

I think I might have a workaround to this problem but have no idea how
to work it through. I hope that someone can kindly help me out because I
do not quite understand the mechanics of the _make_regex() method in the
original codes...

My idea is, instead of having one UserDict, have a list of UserDicts. So
a large unprocessable replacement rule set is split into multiple
smaller files, with each file read into a UserDict and it is made into a
RE matcher. Then iterative matching using a list of REs.

In short, the current precedure is
1 dictionary, 1 RE, 1 RE matcher... to match inputs
My workaround is to change it to
list of dictionaries, list of REs, list of RE matcher... iterative
matching of inputs.

Can someone kindly help me out here?

Thanks in advance.

Cheers,
Maurice


Jul 19 '05 #7

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

9
1488
by: Barb | last post by:
Hi there, I sincerely hope that someone out there can help. I have two instances of the SQL 2000 Desktop Engine running. One is on my local machine for development and the other is on another machine on our network which is the production environment. I have built an Access 2003 front end application which connects to this database. ...
8
6982
by: chippy | last post by:
Hi, I've a VB script that creates a Access object in a word doc. Here is the full script. It works for all but the Export. Which always fails with a 3011 error. If I do the same in Access as a straight Macro or script it works. Add it as an object and it won't work. HELP.
7
5717
by: alphatan | last post by:
Is there relative source or document for this purpose? I've searched the index of "Mastering Regular Expression", but cannot get the useful information for C. Thanks in advanced. -- Learning is to improve, but not to prove.
4
11398
by: anannj | last post by:
Hi everyone, I have a Crystal Report with a subreport in a windows application, which is to be exported to the pdf file. I followed the instructions on http://support.crystaldecisions.com/library/kbase/articles/c2010275.asp to set up the log on information. But I always got the exception at the statement: tbl.Location = ...; The error...
0
2460
by: cw | last post by:
Hi all, I'm trying to generate some crystal report using ASP.NET. However, i stucked on the avove error.. below is the code for web form 1: Private Sub Page_Load(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles MyBase.Load 'Put user code to initialize the page here Dim tbCurrent As...
0
425
by: Alixx Skevington | last post by:
OK, I am confused here, so I hope that somebody out there can help me out. I am trying to run a report allows me to choose the report and then change the logon info for it. Noiw before you say anything; I can actually run the exact same code to do exactly the same things in a windows form; but it won't work if I try to pass it to a...
7
12173
by: Mathew Butler | last post by:
I'm investigating an issue I have when pulling data back from MS access I'm retrieving all rows from a column in a table of 5000 rows - accessing a column of type "memo" ( can be 65353 character long) . I'm pulling the data back using ODBC. Details are: Microsoft Access Driver 4.00.6306.00 Jet 4.0 release level that is currently installed...
0
2160
by: Rob | last post by:
"The Microsoft Jet database engine stopped the process because you and another user are attempting to change the same data at the same time" I am receiving this error in Munti-user production envirnoment (ocassionallu. not all the times).. and never on my local copy. I have a mainform and few subform - i am receiving this error on one of...
3
1561
by: gyap88 | last post by:
My search engine in visual basic 2005 has 4 textbox for users to input values, named textbox1,textbox2,textbox3,textbox4. I have a string assigned to each of the textbox named accession,classname,description,proteinseq respectively. Sub accessioncase() Dim accession As String accession = textbox1.Text Dim accessionfound As...
0
7915
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main...
0
7843
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language...
0
8205
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. ...
0
8339
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that...
1
7967
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For...
0
6619
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then...
0
5392
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert...
0
3840
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in...
1
2347
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.