473,796 Members | 2,655 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

88k regex = RuntimeError

I need to find a bunch of C function declarations by searching
thousands of source or html files for thousands of known function
names. My initial simple approach was to do this:

rxAllSupported = re.compile(r"\b (" + "|".join(gAllSu pported) + r")\b")
# giving a regex of \b(AAFoo|ABFoo| (uh... 88kb more...) |zFoo)\b

for root, dirs, files in os.walk( ... ):
....
for fileName in files:
....
filePath = os.path.join(ro ot, fileName)
file = open(filePath, "r")
contents = file.read()
....
result = re.search(rxAll Supported, contents)

but this happens:

result = re.search(rxAll Supported, contents)
File "C:\Python24\Li b\sre.py", line 134, in search
return _compile(patter n, flags).search(s tring)
RuntimeError: internal error in regular expression engine

I assume it's hitting some limit, but don't know where the limit is to
remove it. I tried stepping into it repeatedly with Komodo, but didn't
see the problem.

Suggestions?

Feb 14 '06 #1
6 1486
> I assume it's hitting some limit, but don't know where the limit is to
remove it. I tried stepping into it repeatedly with Komodo, but didn't
see the problem.
That's because it is buried in the C-library that is the actual
implementation. There has been a discussion about this a few weeks ago -
and AFAIK there isn't much you can do about that.
Suggestions?


Yes. Don't do it :) After all, what you do is nothing but a simple
word-search. If I had that problem, my naive approach would be to simply
tokenize the sources and look for the words in them being part of your
function-name-set. A bit of statekeeping to keep track of the position, and
you're done. Check out pyparsing, it might help you doing the tokenization.
I admit that the apparent ease of the regular expression would have lured me
into the same trap.

Diez
Feb 14 '06 #2
Why don't you create a regex that finds for you all C function
declarations (and which returns you the function-names); apply
re.findall() to all files with that regex; and then check those
funtion-names against the set of allSupported?

You might even be able to find a regex for C funtion declarations on
the web.

Your gAllSupported can be a set(); you can then create the intersection
between gAllSupported and the function-names found by your regex.

Cheers,

--Tim

Feb 14 '06 #3
jodawi wrote:
I need to find a bunch of C function declarations by searching
thousands of source or html files for thousands of known function
names. My initial simple approach was to do this:

rxAllSupported = re.compile(r"\b (" + "|".join(gAllSu pported) + r")\b")
# giving a regex of \b(AAFoo|ABFoo| (uh... 88kb more...) |zFoo)\b

for root, dirs, files in os.walk( ... ):
...
for fileName in files:
...
filePath = os.path.join(ro ot, fileName)
file = open(filePath, "r")
contents = file.read()
...
result = re.search(rxAll Supported, contents)

but this happens:

result = re.search(rxAll Supported, contents)
File "C:\Python24\Li b\sre.py", line 134, in search
return _compile(patter n, flags).search(s tring)
RuntimeError: internal error in regular expression engine

I assume it's hitting some limit, but don't know where the limit is to
remove it. I tried stepping into it repeatedly with Komodo, but didn't
see the problem.

Suggestions?


One workaround may be as easy as

wanted = set(["foo", "bar", "baz"])
file_content = "foo bar-baz ignored foo()"

r = re.compile(r"\w +")
found = [name for name in r.findall(file_ content) if name in wanted]

print found

Peter

Feb 14 '06 #4
jodawi wrote:
I need to find a bunch of C function declarations by searching
thousands of source or html files for thousands of known function
names. My initial simple approach was to do this:

rxAllSupported = re.compile(r"\b (" + "|".join(gAllSu pported) + r")\b")
# giving a regex of \b(AAFoo|ABFoo| (uh... 88kb more...) |zFoo)\b


Maybe you can be more clever about the regex? If the names above are
representative then something like r'\b(\w{1,2})Fo o\b' might work.
Feb 14 '06 #5
This is basically the same idea as what I tried to describe in my
previous post but without any samples.
I wonder if it's more efficient to create a new list using a
list-comprehension, and checking each entry against the 'wanted' set,
or to create a new set which is the intersection of set 'wanted' and
the iterable of all matches...

Your sample code would then look like this:
import re
r = re.compile(r"\w +")
file_content = "foo bar-baz ignored foo()"
wanted = set(["foo", "bar", "baz"])
found = wanted.intersec tion(name for name in r.findall(file_ content))
print found set(['baz', 'foo', 'bar'])


Anyone who has an idea what is faster? (This dataset is so limited that
it doesn't make sense to do any performance-tests with it)

Cheers,

--Tim

Feb 14 '06 #6
Tim N. van der Leeuw wrote:
This is basically the same idea as what I tried to describe in my
previous post but without any samples.
I wonder if it's more efficient to create a new list using a
list-comprehension, and checking each entry against the 'wanted' set,
or to create a new set which is the intersection of set 'wanted' and
the iterable of all matches...

Your sample code would then look like this:
import re
r = re.compile(r"\w +")
file_content = "foo bar-baz ignored foo()"
wanted = set(["foo", "bar", "baz"])
found = wanted.intersec tion(name for name in r.findall(file_ content))
Just

found = wanted.intersec tion(r.findall( file_content))
print found set(['baz', 'foo', 'bar'])


Anyone who has an idea what is faster? (This dataset is so limited that
it doesn't make sense to do any performance-tests with it)


I guess that your approach would be a bit faster though most of the time
will be spent on IO anyway. The result would be slightly different, and
again yours (without duplicates) seems more useful.

However, I'm not sure whether the OP would rather stop at the first match or
need a match object and not just the text. In that case:

matches = (m for m in r.finditer(file _content) if m.group(0) in wanted)

Peter
Feb 14 '06 #7

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

0
1968
by: Bill Loren | last post by:
Hi ppl, Trying to substitute some html tags within a big html file using RE ended up with the "RuntimeError: maximum recursion limit exceeded" message. Any idea why that might happen and how should one cope with it ? thx ~B (btw I'm trying to take off several <a href>s, i.e. substiture them with the
6
44956
by: Georgy Pruss | last post by:
Sometimes I get this error. E.g. >>> sum = lambda n: n<=1 or n+sum(n-1) # just to illustrate the error >>> sum(999) 499500 >>> sum(1000) ............ RuntimeError: maximum recursion depth exceeded
0
1307
by: Denis S. Otkidach | last post by:
When I execute python code from C++ application with PyEval_EvalCode and this code contains imports of other modules then I got the error "RuntimeError: cannot unmarshal code objects in restricted execution mode". How can I switch into noraml (unrestricted) mode? Python 2.2.2, Linux. Actual code looks like the following: tstate = Py_NewInterpreter(); main_module = PyImport_AddModule("__main__");
0
1370
by: Roman Suzi | last post by:
I think, the behaviour below is misfeature: >>> Traceback (most recent call last): File "<stdin>", line 1, in ? RuntimeError: dictionary changed size during iteration >>> e = None >>> ', 'atexit', '__name__', 'readline', '__doc__']
0
1137
by: Steven Bethard | last post by:
Does anyone know where the documentation for "restricted environments" is? I see this referred to in the documentation for eval, but not explained. Does eval use rexec? I thought that was deprecated, but I get the following error: py> eval('sub.func_globals', dict(__builtins__=None, .... sub=string.Template.substitute)) Traceback (most recent call last): File "<interactive input>", line 2, in ? File...
28
7415
by: robert | last post by:
In very rare cases a program crashes (hard to reproduce) : * several threads work on an object tree with dict's etc. in it. Items are added, deleted, iteration over .keys() ... ). The threads are "good" in such terms, that this core data structure is changed only by atomic operations, so that the data structure is always consistent regarding the application. Only the change-operations on the dicts and lists itself seem to cause problems...
0
2323
by: Petr Jakes | last post by:
Hi my script is working well when I am running it from the terminal window. While I am trying to start it as a Cron job, I am getting an Error described bellow: My configuration: Pentium III, Python 2.4.1, Fedora Core4 Thanks for your comments. Petr Jakes File "/root/eric/analyza_dat_TPC/htmlgenerator.py", line 7, in ?
15
50264
by: morleyc | last post by:
Hi, i would like to remove a number of characters from my string (\t \r \n which are throughout the string), i know regex can do this but i have no idea how. Any pointers much appreciated. Chris
0
3970
by: kishorys | last post by:
I have python version 2.5.1 and cx_Oracle version 4.3.1. Oracle is also in running state. I am getting an error while trying to connect to Oracle DB which is installed in my system from python. >>> import cx_Oracle >>> cx_Oracle.Connection('scott/tiger@orcl') Traceback (most recent call last): File "<pyshell#3>", line 1, in <module> cx_Oracle.Connection('scott/tiger@orcl')
0
9685
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look ! Part I. Meaning of...
0
9535
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it. First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
0
10465
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers, it seems that the internal comparison operator "<=>" tries to promote arguments from unsigned to signed. This is as boiled down as I can make it. Here is my compilation command: g++-12 -std=c++20 -Wnarrowing bit_field.cpp Here is the code in...
0
10242
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven tapestry of website design and digital marketing. It's not merely about having a website; it's about crafting an immersive digital experience that captivates audiences and drives business growth. The Art of Business Website Design Your website is...
1
10200
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
0
10021
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each protocol has its own unique characteristics and advantages, but as a user who is planning to build a smart home system, I am a bit confused by the choice of these technologies. I'm particularly interested in Zigbee because I've heard it does some...
0
6800
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one. At the time of converting from word file to html my equations which are in the word document file was convert into image. Globals.ThisAddIn.Application.ActiveDocument.Select();...
0
5453
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols. I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
3
2931
bsmnconsultancy
by: bsmnconsultancy | last post by:
In today's digital era, a well-designed website is crucial for businesses looking to succeed. Whether you're a small business owner or a large corporation in Toronto, having a strong online presence can significantly impact your brand's success. BSMN Consultancy, a leader in Website Development in Toronto offers valuable insights into creating effective websites that not only look great but also perform exceptionally well. In this comprehensive...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.