RE Engine error with sub()

Maurice LING

Hi,

I have the following codes:

from __future__ import nested_scopes
import re
from UserDict import UserDict
class Replacer(UserDict):
"""
An all-in-one multiple string substitution class. This class was
contributed by Xavier
Defrang to the ASPN Python Cookbook
(http://aspn.activestate.com/ASPN/Coo...n/Recipe/81330)
and al***@sourceforge.net.

Copyright: The methods _make_regex(), __call__() and substitute()
were the work of Xavier Defrang,
__init__() was the work of al***@sourceforge.net, all others were
the work of Maurice Ling"""

def __init__(self, dict = None, file = None):
"""Constructor. It calls for the compilation of regular
expressions from either
a dictionary object or a replacement rule file.

@param dict: dictionary object containing replacement rules
with the string to be
replaced as keys.
@param file: file name of replacement rule file
"""
self.re = None
self.regex = None
if file == None:
UserDict.__init__(self, dict)
self._make_regex()
else:
UserDict.__init__(self, self.readDictionaryFile(file))
self._make_regex()

def cleanDictionaryFile(self, file):
"""
Method to clean up the replacement rule dictionary file and
write the cleaned
file as the same name as the original file."""
import os
dict = self.readDictionaryFile(file)
f = open(file, 'w')
for key in dict.keys(): f.write(str(key) + '=' + str(dict[key])
+ os.linesep)
f.close()

def readDictionaryFile(self, file):
"""
Method to parse a replacement rule file (file) into a
dictionary for regular
expression processing. Each rule in the rule file is in the form:
<string to be replaced>=<string to replace with>
"""
import string
import os
f = open(file, 'r')
data = f.readlines()
f.close()
dict = {}
for rule in data:
rule = rule.split('=')
if rule[1][-1] == os.linesep: rule[1] = rule[1][:-1]
dict[str(rule[0])] = str(rule[1])
print '%s replacement rule(s) read from %s' %
(str(len(dict.keys())), str(file))
return dict

def _make_regex(self):
""" Build a regular expression object based on the keys of the
current dictionary """
self.re = "(%s)" % "|".join(map(re.escape, self.keys()))
self.regex = re.compile(self.re)

def __call__(self, mo):
""" This handler will be invoked for each regex match """
# Count substitutions
self.count += 1 # Look-up string
return self[mo.string[mo.start():mo.end()]]

def substitute(self, text):
""" Translate text, returns the modified text. """
# Reset substitution counter
self.count = 0
# Process text
#return self._make_regex().sub(self, text)
return self.regex.sub(self, text)

def rmBracketDuplicate(self, text):
"""Removes the bracketed text in occurrences of '<text-x>
(<text-x>)'"""
regex = re.compile(r'(\w+)\s*(\(\1\))')
return regex.sub(r'\1', text)

def substituteMultiple(self, text):
"""Similar to substitute() method except that this method loops
round the same text
multiple times until no more substitutions can be made or when
it had looped
10 times. This is to pre-ampt for cases of recursive
abbreviations."""
count = 1 # to get into the loop
run = 0 # counter for number of runs thru the text
while count > 0 and run < 10:
count = 0
text = self.rmBracketDuplicate(self.substitute(text))
count = count + self.count
run = run + 1
print "Pass %d: Changed %d things(s)" % (run, count)
return text

Normally I will use the following to instantiate my module:

replace = Replacer('', 'rule.mdf')

rule.mdf is in the format of "<string to be replaced>=<string to replace
with>\n"

Then using replace.substituteMultiple('<my text>') to carry out multiple
replacements.

It all works well for rule count up to 800+ but when my replacement
rules swells up to 1800+, it gives me a runtime error that says
"Internal error in regular expression engine"... traceable to "return
self.regex.sub(self, text)" in substitute() method.
Any ideas or workarounds?

Thanks in advance.

Cheers,
Maurice

Jul 19 '05 #1

Subscribe Post Reply

1926

André Søreng

Instead of using regular expressions, you could perhaps
use a multiple keyword matcher, and then for each match,
replace it with the correct string.

http://hkn.eecs.berkeley.edu/~dyoo/python/ahocorasick/

contains the Aho-Corasick algorithm written in C with
a Python extension.
Maurice LING wrote:

Hi,

I have the following codes:

from __future__ import nested_scopes
import re
from UserDict import UserDict
class Replacer(UserDict):
"""
An all-in-one multiple string substitution class. This class was
contributed by Xavier
Defrang to the ASPN Python Cookbook
(http://aspn.activestate.com/ASPN/Coo...n/Recipe/81330)
and al***@sourceforge.net.

Copyright: The methods _make_regex(), __call__() and substitute()
were the work of Xavier Defrang,
__init__() was the work of al***@sourceforge.net, all others were
the work of Maurice Ling"""

def __init__(self, dict = None, file = None):
"""Constructor. It calls for the compilation of regular
expressions from either
a dictionary object or a replacement rule file.

@param dict: dictionary object containing replacement rules with
the string to be
replaced as keys.
@param file: file name of replacement rule file
"""
self.re = None
self.regex = None
if file == None:
UserDict.__init__(self, dict)
self._make_regex()
else:
UserDict.__init__(self, self.readDictionaryFile(file))
self._make_regex()

def cleanDictionaryFile(self, file):
"""
Method to clean up the replacement rule dictionary file and
write the cleaned
file as the same name as the original file."""
import os
dict = self.readDictionaryFile(file)
f = open(file, 'w')
for key in dict.keys(): f.write(str(key) + '=' + str(dict[key])
+ os.linesep)
f.close()

def readDictionaryFile(self, file):
"""
Method to parse a replacement rule file (file) into a dictionary
for regular
expression processing. Each rule in the rule file is in the form:
<string to be replaced>=<string to replace with>
"""
import string
import os
f = open(file, 'r')
data = f.readlines()
f.close()
dict = {}
for rule in data:
rule = rule.split('=')
if rule[1][-1] == os.linesep: rule[1] = rule[1][:-1]
dict[str(rule[0])] = str(rule[1])
print '%s replacement rule(s) read from %s' %
(str(len(dict.keys())), str(file))
return dict

def _make_regex(self):
""" Build a regular expression object based on the keys of the
current dictionary """
self.re = "(%s)" % "|".join(map(re.escape, self.keys()))
self.regex = re.compile(self.re)

def __call__(self, mo):
""" This handler will be invoked for each regex match """
# Count substitutions
self.count += 1 # Look-up string
return self[mo.string[mo.start():mo.end()]]

def substitute(self, text):
""" Translate text, returns the modified text. """
# Reset substitution counter
self.count = 0
# Process text
#return self._make_regex().sub(self, text)
return self.regex.sub(self, text)

def rmBracketDuplicate(self, text):
"""Removes the bracketed text in occurrences of '<text-x>
(<text-x>)'"""
regex = re.compile(r'(\w+)\s*(\(\1\))')
return regex.sub(r'\1', text)

def substituteMultiple(self, text):
"""Similar to substitute() method except that this method loops
round the same text
multiple times until no more substitutions can be made or when
it had looped
10 times. This is to pre-ampt for cases of recursive
abbreviations."""
count = 1 # to get into the loop
run = 0 # counter for number of runs thru the text
while count > 0 and run < 10:
count = 0
text = self.rmBracketDuplicate(self.substitute(text))
count = count + self.count
run = run + 1
print "Pass %d: Changed %d things(s)" % (run, count)
return text

Normally I will use the following to instantiate my module:

replace = Replacer('', 'rule.mdf')

rule.mdf is in the format of "<string to be replaced>=<string to replace
with>\n"

Then using replace.substituteMultiple('<my text>') to carry out multiple
replacements.

It all works well for rule count up to 800+ but when my replacement
rules swells up to 1800+, it gives me a runtime error that says
"Internal error in regular expression engine"... traceable to "return
self.regex.sub(self, text)" in substitute() method.
Any ideas or workarounds?

Thanks in advance.

Cheers,
Maurice

Jul 19 '05 #2

Dennis Benzinger

Maurice LING schrieb:

Hi,

I have the following codes:

from __future__ import nested_scopes
[...]
Are you still using Python 2.1?

In every later version you don't need the
"from __future__ import nested_scopes" line.

So, if you are using Python 2.1 I strongly recommend
upgrading to Python 2.4.1.
[...]
It all works well for rule count up to 800+ but when my replacement
rules swells up to 1800+, it gives me a runtime error that says
"Internal error in regular expression engine"... traceable to "return
self.regex.sub(self, text)" in substitute() method.
[...]

I didn't read your code, but this sounds like you have a problem with
the regular expression engine being recursive in Python versions < 2.4.
Try again using Python 2.4 or later (i.e. Python 2.4.1). The new regular
expression engine is not recursive anymore.

Bye,
Dennis

Jul 19 '05 #3

André Søreng

The "Internal error in regular expression engine" occurs also
in Python 2.4.0 when creating a regular expression containing
more than 9999 or's ("|").

Dennis Benzinger wrote:

Maurice LING schrieb:
Hi,

I have the following codes:

from __future__ import nested_scopes

> [...]

Are you still using Python 2.1?

In every later version you don't need the
"from __future__ import nested_scopes" line.

So, if you are using Python 2.1 I strongly recommend
upgrading to Python 2.4.1.
[...]
It all works well for rule count up to 800+ but when my replacement
rules swells up to 1800+, it gives me a runtime error that says
"Internal error in regular expression engine"... traceable to "return
self.regex.sub(self, text)" in substitute() method.
[...]

I didn't read your code, but this sounds like you have a problem with
the regular expression engine being recursive in Python versions < 2.4.
Try again using Python 2.4 or later (i.e. Python 2.4.1). The new regular
expression engine is not recursive anymore.

Bye,
Dennis

Jul 19 '05 #4

Maurice LING

Hi Dennis,

Dennis Benzinger wrote:

Maurice LING schrieb:
Hi,

I have the following codes:

from __future__ import nested_scopes

> [...]

Are you still using Python 2.1?

In every later version you don't need the
"from __future__ import nested_scopes" line.

So, if you are using Python 2.1 I strongly recommend
upgrading to Python 2.4.1.

I am using Python 2.3.5, installed using Fink. That is the latest
version Fink has to offer.

[...]
It all works well for rule count up to 800+ but when my replacement
rules swells up to 1800+, it gives me a runtime error that says
"Internal error in regular expression engine"... traceable to "return
self.regex.sub(self, text)" in substitute() method.
[...]

I didn't read your code, but this sounds like you have a problem with
the regular expression engine being recursive in Python versions < 2.4.
Try again using Python 2.4 or later (i.e. Python 2.4.1). The new regular
expression engine is not recursive anymore.

Apparently this problem had been reported in Bugs item #857676
(http://mail.python.org/pipermail/pyt...er/021473.html)
and (http://www.lehuen.com/nicolas/index.php/Pytst/2005/04). Bugs item
#857676 is consistent with my problem as in it works with smaller lists
(~1000) but not much bigger than that. The problem seems to lie in the
fact that the SRE engine works on a 16-bit opcode...

Cheers
Maurice

Jul 19 '05 #5

Maurice LING

Hi all,

I think I might have a workaround to this problem but have no idea how
to work it through. I hope that someone can kindly help me out because I
do not quite understand the mechanics of the _make_regex() method in the
original codes...

My idea is, instead of having one UserDict, have a list of UserDicts. So
a large unprocessable replacement rule set is split into multiple
smaller files, with each file read into a UserDict and it is made into a
RE matcher. Then iterative matching using a list of REs.

In short, the current precedure is
1 dictionary, 1 RE, 1 RE matcher... to match inputs
My workaround is to change it to
list of dictionaries, list of REs, list of RE matcher... iterative
matching of inputs.

Can someone kindly help me out here?

Thanks in advance.

Cheers,
Maurice
Maurice LING wrote:

Hi,

I have the following codes:

from __future__ import nested_scopes
import re
from UserDict import UserDict
class Replacer(UserDict):
"""
An all-in-one multiple string substitution class. This class was
contributed by Xavier
Defrang to the ASPN Python Cookbook
(http://aspn.activestate.com/ASPN/Coo...n/Recipe/81330)
and al***@sourceforge.net.

Copyright: The methods _make_regex(), __call__() and substitute()
were the work of Xavier Defrang,
__init__() was the work of al***@sourceforge.net, all others were
the work of Maurice Ling"""

def __init__(self, dict = None, file = None):
"""Constructor. It calls for the compilation of regular
expressions from either
a dictionary object or a replacement rule file.

@param dict: dictionary object containing replacement rules with
the string to be
replaced as keys.
@param file: file name of replacement rule file
"""
self.re = None
self.regex = None
if file == None:
UserDict.__init__(self, dict)
self._make_regex()
else:
UserDict.__init__(self, self.readDictionaryFile(file))
self._make_regex()

def cleanDictionaryFile(self, file):
"""
Method to clean up the replacement rule dictionary file and
write the cleaned
file as the same name as the original file."""
import os
dict = self.readDictionaryFile(file)
f = open(file, 'w')
for key in dict.keys(): f.write(str(key) + '=' + str(dict[key])
+ os.linesep)
f.close()

def readDictionaryFile(self, file):
"""
Method to parse a replacement rule file (file) into a dictionary
for regular
expression processing. Each rule in the rule file is in the form:
<string to be replaced>=<string to replace with>
"""
import string
import os
f = open(file, 'r')
data = f.readlines()
f.close()
dict = {}
for rule in data:
rule = rule.split('=')
if rule[1][-1] == os.linesep: rule[1] = rule[1][:-1]
dict[str(rule[0])] = str(rule[1])
print '%s replacement rule(s) read from %s' %
(str(len(dict.keys())), str(file))
return dict

def _make_regex(self):
""" Build a regular expression object based on the keys of the
current dictionary """
self.re = "(%s)" % "|".join(map(re.escape, self.keys()))
self.regex = re.compile(self.re)

def __call__(self, mo):
""" This handler will be invoked for each regex match """
# Count substitutions
self.count += 1 # Look-up string
return self[mo.string[mo.start():mo.end()]]

def substitute(self, text):
""" Translate text, returns the modified text. """
# Reset substitution counter
self.count = 0
# Process text
#return self._make_regex().sub(self, text)
return self.regex.sub(self, text)

def rmBracketDuplicate(self, text):
"""Removes the bracketed text in occurrences of '<text-x>
(<text-x>)'"""
regex = re.compile(r'(\w+)\s*(\(\1\))')
return regex.sub(r'\1', text)

def substituteMultiple(self, text):
"""Similar to substitute() method except that this method loops
round the same text
multiple times until no more substitutions can be made or when
it had looped
10 times. This is to pre-ampt for cases of recursive
abbreviations."""
count = 1 # to get into the loop
run = 0 # counter for number of runs thru the text
while count > 0 and run < 10:
count = 0
text = self.rmBracketDuplicate(self.substitute(text))
count = count + self.count
run = run + 1
print "Pass %d: Changed %d things(s)" % (run, count)
return text

Normally I will use the following to instantiate my module:

replace = Replacer('', 'rule.mdf')

rule.mdf is in the format of "<string to be replaced>=<string to replace
with>\n"

Then using replace.substituteMultiple('<my text>') to carry out multiple
replacements.

It all works well for rule count up to 800+ but when my replacement
rules swells up to 1800+, it gives me a runtime error that says
"Internal error in regular expression engine"... traceable to "return
self.regex.sub(self, text)" in substitute() method.
Any ideas or workarounds?

Thanks in advance.

Cheers,
Maurice

Jul 19 '05 #6

Maurice LING

Solved it. Instead of modifying Replacer class, I've made another class
which initiates a list of Replacer objects from a list of substitution
rule files. And then iterates through the list of Replacer objects and
calls upon their own substitute() method. It seems to work.

Thanks for all your advices.

Cheers
Maurice
Maurice LING wrote:

Hi all,

I think I might have a workaround to this problem but have no idea how
to work it through. I hope that someone can kindly help me out because I
do not quite understand the mechanics of the _make_regex() method in the
original codes...

My idea is, instead of having one UserDict, have a list of UserDicts. So
a large unprocessable replacement rule set is split into multiple
smaller files, with each file read into a UserDict and it is made into a
RE matcher. Then iterative matching using a list of REs.

In short, the current precedure is
1 dictionary, 1 RE, 1 RE matcher... to match inputs
My workaround is to change it to
list of dictionaries, list of REs, list of RE matcher... iterative
matching of inputs.

Can someone kindly help me out here?

Thanks in advance.

Cheers,
Maurice

Jul 19 '05 #7

by: Barb | last post by:

Hi there, I sincerely hope that someone out there can help. I have two instances of the SQL 2000 Desktop Engine running. One is on my local machine for development and the other is on another...

Microsoft Access / VBA

Exporting From Access to a Dbase iV. via Object. Fails with error The Microsoft Jet Database engine could not find the object

by: chippy | last post by:

Hi, I've a VB script that creates a Access object in a word doc. Here is the full script. It works for all but the Export. Which always fails with a 3011 error. If I do the same in Access as a...

Microsoft Access / VBA

How can I embed the *regex* engine into C program?

by: alphatan | last post by:

Is there relative source or document for this purpose? I've searched the index of "Mastering Regular Expression", but cannot get the useful information for C. Thanks in advanced. -- Learning...

C / C++

CrystalDecisions.CrystalReports.Engine.LoadSaveReportException when OpenSubreport() was called

by: anannj | last post by:

Hi everyone, I have a Crystal Report with a subreport in a windows application, which is to be exported to the pdf file. I followed the instructions on...

C# / C Sharp

Exception Details: CrystalDecisions.CrystalReports.Engine.LogOnException: Logon failed. What's wrong ?????

by: cw | last post by:

Hi all, I'm trying to generate some crystal report using ASP.NET. However, i stucked on the avove error.. below is the code for web form 1: Private Sub Page_Load(ByVal sender As System.Object,...

ASP.NET

CrystalDecisions.CrystalReports.Engine.LogOnException: Logon faile

by: Alixx Skevington | last post by:

OK, I am confused here, so I hope that somebody out there can help me out. I am trying to run a report allows me to choose the report and then change the logon info for it. Noiw before you say...

ASP.NET

MS Jet Engine error: "Cannot open any more tables" -1311

by: Mathew Butler | last post by:

I'm investigating an issue I have when pulling data back from MS access I'm retrieving all rows from a column in a table of 5000 rows - accessing a column of type "memo" ( can be 65353 character...

Microsoft Access / VBA

The Microsoft Jet database engine stopped the process

by: Rob | last post by:

"The Microsoft Jet database engine stopped the process because you and another user are attempting to change the same data at the same time" I am receiving this error in Munti-user production...

Microsoft Access / VBA

Multiple option search engine

by: gyap88 | last post by:

My search engine in visual basic 2005 has 4 textbox for users to input values, named textbox1,textbox2,textbox3,textbox4. I have a string assigned to each of the textbox named...

Visual Basic 4 / 5 / 6

Cloud Servers without Credit Card and Email Registration: A Simpler Way to Get on the Cloud

by: CloudSolutions | last post by:

Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...

General

One-click Importing Excel Data into a*Database

by: ryjfgjl | last post by:

In our work, we often need to import Excel data into databases (such as MySQL, SQL Server, Oracle) for data analysis and processing. Usually, we use database tools like Navicat or the Excel import...

Microsoft Excel

Easy Steps to Fix "Canon Printer Won't Connect to WiFi Network"

by: taylorcarr | last post by:

A Canon printer is a smart device known for being advanced, efficient, and reliable. It is designed for home, office, and hybrid workspace use and can also be used for a variety of purposes. However,...

General

How to turn on java script in a villaon keypad mobile phone

by: Charles Arthur | last post by:

How do i turn on java script on a villaon, callus and itel keypad mobile phone

Java

Merging data from multiple Excel files

by: ryjfgjl | last post by:

In our work, we often receive Excel tables with data in the same format. If we want to analyze these data, it can be difficult to analyze them because the data is spread across multiple Excel files...

Data Management

Navigating the Data Structures and Algorithms (DSA)

by: BarryA | last post by:

What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...

Algorithms / Advanced Math

Looking to do Android software development, any suggestions? Is flutter better?

by: nemocccc | last post by:

hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?

General

Is that possible of reading the .csv file in column wise and the column have different lengths ?

by: Sonnysonu | last post by:

This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...

C / C++

How to build RAID in BIOS?

by: Hystou | last post by:

There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...

Computer Hardware

RE Engine error with sub()

Similar topics